Classification Data
Send one or more URLs (up to 1 million per request) and choose which classification signals you want back. Classify analyzes each page and returns the requested data alongside the URL, TLD, and any error flags.
Classification is asynchronous. You submit a job and receive an ID immediately, then poll for results. Small batches typically complete in seconds; large batches may take longer.
The classification object
{
"id": 501,
"status": "complete",
"url_count": 3,
"fields": ["iab_categories", "language", "entities", "keywords"],
"iab_version": 2,
"created_date": "2026-02-20T09:15:00Z",
"processed_date": "2026-02-20T09:15:28Z",
"results": [...]
}
| Field | Type | Description |
|---|---|---|
id | integer | Unique identifier for this classification job |
status | string | pending → processing → complete or failed |
url_count | integer | Number of URLs submitted |
fields | array[string] | The classification signals requested |
iab_version | integer | null | IAB Content Taxonomy version (1, 2, or 3). Present when iab_categories was requested. |
created_date | string (ISO 8601) | When the job was submitted |
processed_date | string (ISO 8601) | null | When results were ready. null until complete. |
results | array[object] | null | Per-URL classification data. Present only when status is complete. Paginated for large batches. |
The result object
Every URL in the response includes three default fields regardless of what you request:
| Field | Type | Always returned | Description |
|---|---|---|---|
url | string | Yes | The URL that was classified |
tld | string | Yes | Top-level domain extracted from the URL |
errors | array[string] | Yes | Error flags for this URL (empty array if none). See Error tags. |
All other fields appear only if you included them in the fields array.
{
"url": "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"tld": "nytimes.com",
"errors": [],
"iab_categories": [
{"id": "IAB19-6", "name": "Technology & Computing", "confidence": 0.95}
],
"language": "en",
"entities": [
{"name": "NVIDIA", "type": "brand", "confidence": 0.92},
{"name": "Jensen Huang", "type": "person", "confidence": 0.88},
{"name": "Taiwan", "type": "place", "confidence": 0.81}
],
"keywords": ["AI chips", "semiconductor", "GPU", "data center"],
"google_product_taxonomy": [
{"id": "222", "name": "Electronics > Computers > Computer Components", "confidence": 0.74}
],
"sentiment": {"label": "positive", "score": 0.68},
"stance": [
{"subject": "AI investment", "stance": "positive", "confidence": 0.85},
{"subject": "chip export controls", "stance": "negative", "confidence": 0.72}
]
}
Create a classification job
POST https://api.clsfy.me/v1/clsfy/classifications
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
urls | array[string] | Required | URLs to classify. Maximum 1,000,000 per request. |
fields | array[string] | Required | Classification signals to return. See Available fields. |
iab_version | integer | Conditional | IAB Content Taxonomy version: 1, 2, or 3. Required when fields includes iab_categories. |
Request
- curl
- Python
curl -X POST "https://api.clsfy.me/v1/clsfy/classifications" \
-H "X-API-Key: <your_api_key>" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"https://www.bbc.com/sport/football/premier-league",
"https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/"
],
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2
}'
import requests
response = requests.post(
"https://api.clsfy.me/v1/clsfy/classifications",
headers={
"X-API-Key": "<your_api_key>",
"Content-Type": "application/json",
},
json={
"urls": [
"https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"https://www.bbc.com/sport/football/premier-league",
"https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/",
],
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2,
},
)
job = response.json()
print(job["id"]) # e.g. 501
Response
Returns the classification object with status: "pending".
{
"id": 501,
"status": "pending",
"url_count": 3,
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2,
"created_date": "2026-02-20T09:15:00Z",
"processed_date": null,
"results": null
}
Get classification results
Retrieves a classification job by ID. Use this to poll for completion and retrieve results.
GET https://api.clsfy.me/v1/clsfy/classifications/{id}
| Parameter | Type | Description |
|---|---|---|
id | integer | The classification job ID returned at creation |
limit | integer (query) | Number of results to return per page. Default 1000, max 10000. |
offset | integer (query) | Number of results to skip. Default 0. |
- curl
- Python
curl "https://api.clsfy.me/v1/clsfy/classifications/501" \
-H "X-API-Key: <your_api_key>"
import requests
response = requests.get(
"https://api.clsfy.me/v1/clsfy/classifications/501",
headers={"X-API-Key": "<your_api_key>"},
)
job = response.json()
if job["status"] == "complete":
for result in job["results"]:
print(result["url"], result.get("iab_categories"))
Completed response
{
"id": 501,
"status": "complete",
"url_count": 3,
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2,
"created_date": "2026-02-20T09:15:00Z",
"processed_date": "2026-02-20T09:15:28Z",
"results": [
{
"url": "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"tld": "nytimes.com",
"errors": [],
"iab_categories": [
{"id": "IAB19-6", "name": "Technology & Computing", "confidence": 0.95}
],
"language": "en",
"entities": [
{"name": "NVIDIA", "type": "brand", "confidence": 0.92},
{"name": "Jensen Huang", "type": "person", "confidence": 0.88}
],
"keywords": ["AI chips", "semiconductor", "GPU", "data center"],
"sentiment": {"label": "positive", "score": 0.68}
},
{
"url": "https://www.bbc.com/sport/football/premier-league",
"tld": "bbc.com",
"errors": [],
"iab_categories": [
{"id": "IAB17-44", "name": "Sports", "confidence": 0.97}
],
"language": "en",
"entities": [
{"name": "Premier League", "type": "thing", "confidence": 0.95},
{"name": "Arsenal", "type": "brand", "confidence": 0.82}
],
"keywords": ["football", "Premier League", "match results"],
"sentiment": {"label": "neutral", "score": 0.52}
},
{
"url": "https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/",
"tld": "allrecipes.com",
"errors": [],
"iab_categories": [
{"id": "IAB8-5", "name": "Food & Drink", "confidence": 0.96}
],
"language": "en",
"entities": [],
"keywords": ["almond cookies", "crescent cookies", "baking", "holiday recipes"],
"sentiment": {"label": "positive", "score": 0.71}
}
]
}
Paginating large results
For jobs with many URLs, use limit and offset to page through results:
- Python
import requests
def get_all_results(job_id: int, api_key: str, page_size: int = 5000):
"""Retrieve all classification results, paginating automatically."""
url = f"https://api.clsfy.me/v1/clsfy/classifications/{job_id}"
headers = {"X-API-Key": api_key}
all_results = []
offset = 0
while True:
response = requests.get(
url,
headers=headers,
params={"limit": page_size, "offset": offset},
)
job = response.json()
results = job.get("results", [])
if not results:
break
all_results.extend(results)
offset += len(results)
if len(results) < page_size:
break
return all_results
Polling for completion
Poll GET /v1/clsfy/classifications/{id} until status is "complete". Small batches finish in seconds; larger batches scale with URL count.
- Python
import requests
import time
def wait_for_classification(job_id: int, api_key: str, poll_interval: int = 10):
"""Poll until classification results are ready."""
url = f"https://api.clsfy.me/v1/clsfy/classifications/{job_id}"
headers = {"X-API-Key": api_key}
while True:
job = requests.get(url, headers=headers).json()
if job["status"] == "complete":
print(f"Done — {job['url_count']} URLs classified")
return job
elif job["status"] == "failed":
raise RuntimeError(f"Classification job {job_id} failed.")
print(f"Status: {job['status']} — retrying in {poll_interval}s")
time.sleep(poll_interval)
Available fields
Request these values in the fields array to control what classification data is returned for each URL.
| Field value | Description | Returns |
|---|---|---|
iab_categories | IAB Content Taxonomy categories. Requires iab_version. | Array of {id, name, confidence} |
language | Detected language of the page content | ISO 639-1 code (e.g. "en", "es", "de") |
entities | Named entities: people, places, things, products, brands | Array of {name, type, confidence} |
keywords | Extracted topic keywords | Array of strings |
google_product_taxonomy | Google Product Taxonomy categories | Array of {id, name, confidence} |
sentiment | Overall sentiment of the page | {label, score} where label is positive, negative, or neutral |
stance | Stance toward key subjects mentioned on the page | Array of {subject, stance, confidence} where stance is positive, negative, or neutral |
IAB versions
When requesting iab_categories, you must set iab_version to one of:
| Version | Description |
|---|---|
1 | IAB Tech Lab Content Taxonomy 1.0 |
2 | IAB Tech Lab Content Taxonomy 2.0 |
3 | IAB Tech Lab Content Taxonomy 3.0 |
Entity types
The entities field returns objects with a type value from the following set:
| Type | Examples |
|---|---|
person | Individuals, public figures |
place | Cities, countries, landmarks |
thing | Concepts, events, organizations |
product | Specific products or product lines |
brand | Companies, brands |
Stance vs. sentiment
Sentiment is the overall tone of the page — is the content positive, negative, or neutral?
Stance is more granular: for each key subject mentioned, what position does the content take? A single page can have positive stance toward one subject and negative stance toward another.
"sentiment": {"label": "positive", "score": 0.68},
"stance": [
{"subject": "renewable energy", "stance": "positive", "confidence": 0.91},
{"subject": "coal mining", "stance": "negative", "confidence": 0.84}
]
Error responses
When a request fails, the API returns a JSON object with an error code and a human-readable message:
{
"error": "not_found",
"message": "Classification job with ID 999 not found"
}
HTTP status codes
| Status | Meaning |
|---|---|
200 OK | Success |
201 Created | Classification job created |
400 Bad Request | Invalid or missing parameters |
401 Unauthorized | Missing or invalid API key |
404 Not Found | Job not found |
422 Unprocessable Content | Validation error (e.g. invalid field names) |
429 Too Many Requests | Rate limit exceeded |
Error tags
The errors array on each result URL indicates issues encountered during classification. An empty array means the URL was classified successfully.
| Error tag | Description |
|---|---|
fetch_failed | The URL could not be fetched (unreachable, timeout, or blocked) |
parse_failed | The page was fetched but its content could not be parsed |
empty_content | The page returned no meaningful text content |
unsupported_format | The URL points to a non-HTML resource (PDF, image, etc.) |
rate_limited | The origin server rate-limited the fetch request |