Skip to main content

Classification Data

Send one or more URLs (up to 1 million per request) and choose which classification signals you want back. Classify analyzes each page and returns the requested data alongside the URL, TLD, and any error flags.

Classification is asynchronous. You submit a job and receive an ID immediately, then poll for results. Small batches typically complete in seconds; large batches may take longer.


The classification object

{
"id": 501,
"status": "complete",
"url_count": 3,
"fields": ["iab_categories", "language", "entities", "keywords"],
"iab_version": 2,
"created_date": "2026-02-20T09:15:00Z",
"processed_date": "2026-02-20T09:15:28Z",
"results": [...]
}
FieldTypeDescription
idintegerUnique identifier for this classification job
statusstringpendingprocessingcomplete or failed
url_countintegerNumber of URLs submitted
fieldsarray[string]The classification signals requested
iab_versioninteger | nullIAB Content Taxonomy version (1, 2, or 3). Present when iab_categories was requested.
created_datestring (ISO 8601)When the job was submitted
processed_datestring (ISO 8601) | nullWhen results were ready. null until complete.
resultsarray[object] | nullPer-URL classification data. Present only when status is complete. Paginated for large batches.

The result object

Every URL in the response includes three default fields regardless of what you request:

FieldTypeAlways returnedDescription
urlstringYesThe URL that was classified
tldstringYesTop-level domain extracted from the URL
errorsarray[string]YesError flags for this URL (empty array if none). See Error tags.

All other fields appear only if you included them in the fields array.

{
"url": "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"tld": "nytimes.com",
"errors": [],
"iab_categories": [
{"id": "IAB19-6", "name": "Technology & Computing", "confidence": 0.95}
],
"language": "en",
"entities": [
{"name": "NVIDIA", "type": "brand", "confidence": 0.92},
{"name": "Jensen Huang", "type": "person", "confidence": 0.88},
{"name": "Taiwan", "type": "place", "confidence": 0.81}
],
"keywords": ["AI chips", "semiconductor", "GPU", "data center"],
"google_product_taxonomy": [
{"id": "222", "name": "Electronics > Computers > Computer Components", "confidence": 0.74}
],
"sentiment": {"label": "positive", "score": 0.68},
"stance": [
{"subject": "AI investment", "stance": "positive", "confidence": 0.85},
{"subject": "chip export controls", "stance": "negative", "confidence": 0.72}
]
}

Create a classification job

POST https://api.clsfy.me/v1/clsfy/classifications

Parameters

ParameterTypeRequiredDescription
urlsarray[string]RequiredURLs to classify. Maximum 1,000,000 per request.
fieldsarray[string]RequiredClassification signals to return. See Available fields.
iab_versionintegerConditionalIAB Content Taxonomy version: 1, 2, or 3. Required when fields includes iab_categories.

Request

curl -X POST "https://api.clsfy.me/v1/clsfy/classifications" \
-H "X-API-Key: <your_api_key>" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"https://www.bbc.com/sport/football/premier-league",
"https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/"
],
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2
}'

Response

Returns the classification object with status: "pending".

{
"id": 501,
"status": "pending",
"url_count": 3,
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2,
"created_date": "2026-02-20T09:15:00Z",
"processed_date": null,
"results": null
}

Get classification results

Retrieves a classification job by ID. Use this to poll for completion and retrieve results.

GET https://api.clsfy.me/v1/clsfy/classifications/{id}
ParameterTypeDescription
idintegerThe classification job ID returned at creation
limitinteger (query)Number of results to return per page. Default 1000, max 10000.
offsetinteger (query)Number of results to skip. Default 0.
curl "https://api.clsfy.me/v1/clsfy/classifications/501" \
-H "X-API-Key: <your_api_key>"

Completed response

{
"id": 501,
"status": "complete",
"url_count": 3,
"fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
"iab_version": 2,
"created_date": "2026-02-20T09:15:00Z",
"processed_date": "2026-02-20T09:15:28Z",
"results": [
{
"url": "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
"tld": "nytimes.com",
"errors": [],
"iab_categories": [
{"id": "IAB19-6", "name": "Technology & Computing", "confidence": 0.95}
],
"language": "en",
"entities": [
{"name": "NVIDIA", "type": "brand", "confidence": 0.92},
{"name": "Jensen Huang", "type": "person", "confidence": 0.88}
],
"keywords": ["AI chips", "semiconductor", "GPU", "data center"],
"sentiment": {"label": "positive", "score": 0.68}
},
{
"url": "https://www.bbc.com/sport/football/premier-league",
"tld": "bbc.com",
"errors": [],
"iab_categories": [
{"id": "IAB17-44", "name": "Sports", "confidence": 0.97}
],
"language": "en",
"entities": [
{"name": "Premier League", "type": "thing", "confidence": 0.95},
{"name": "Arsenal", "type": "brand", "confidence": 0.82}
],
"keywords": ["football", "Premier League", "match results"],
"sentiment": {"label": "neutral", "score": 0.52}
},
{
"url": "https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/",
"tld": "allrecipes.com",
"errors": [],
"iab_categories": [
{"id": "IAB8-5", "name": "Food & Drink", "confidence": 0.96}
],
"language": "en",
"entities": [],
"keywords": ["almond cookies", "crescent cookies", "baking", "holiday recipes"],
"sentiment": {"label": "positive", "score": 0.71}
}
]
}

Paginating large results

For jobs with many URLs, use limit and offset to page through results:

import requests

def get_all_results(job_id: int, api_key: str, page_size: int = 5000):
"""Retrieve all classification results, paginating automatically."""
url = f"https://api.clsfy.me/v1/clsfy/classifications/{job_id}"
headers = {"X-API-Key": api_key}
all_results = []
offset = 0

while True:
response = requests.get(
url,
headers=headers,
params={"limit": page_size, "offset": offset},
)
job = response.json()
results = job.get("results", [])

if not results:
break

all_results.extend(results)
offset += len(results)

if len(results) < page_size:
break

return all_results

Polling for completion

Poll GET /v1/clsfy/classifications/{id} until status is "complete". Small batches finish in seconds; larger batches scale with URL count.

import requests
import time

def wait_for_classification(job_id: int, api_key: str, poll_interval: int = 10):
"""Poll until classification results are ready."""
url = f"https://api.clsfy.me/v1/clsfy/classifications/{job_id}"
headers = {"X-API-Key": api_key}

while True:
job = requests.get(url, headers=headers).json()

if job["status"] == "complete":
print(f"Done — {job['url_count']} URLs classified")
return job
elif job["status"] == "failed":
raise RuntimeError(f"Classification job {job_id} failed.")

print(f"Status: {job['status']} — retrying in {poll_interval}s")
time.sleep(poll_interval)

Available fields

Request these values in the fields array to control what classification data is returned for each URL.

Field valueDescriptionReturns
iab_categoriesIAB Content Taxonomy categories. Requires iab_version.Array of {id, name, confidence}
languageDetected language of the page contentISO 639-1 code (e.g. "en", "es", "de")
entitiesNamed entities: people, places, things, products, brandsArray of {name, type, confidence}
keywordsExtracted topic keywordsArray of strings
google_product_taxonomyGoogle Product Taxonomy categoriesArray of {id, name, confidence}
sentimentOverall sentiment of the page{label, score} where label is positive, negative, or neutral
stanceStance toward key subjects mentioned on the pageArray of {subject, stance, confidence} where stance is positive, negative, or neutral

IAB versions

When requesting iab_categories, you must set iab_version to one of:

VersionDescription
1IAB Tech Lab Content Taxonomy 1.0
2IAB Tech Lab Content Taxonomy 2.0
3IAB Tech Lab Content Taxonomy 3.0

Entity types

The entities field returns objects with a type value from the following set:

TypeExamples
personIndividuals, public figures
placeCities, countries, landmarks
thingConcepts, events, organizations
productSpecific products or product lines
brandCompanies, brands

Stance vs. sentiment

Sentiment is the overall tone of the page — is the content positive, negative, or neutral?

Stance is more granular: for each key subject mentioned, what position does the content take? A single page can have positive stance toward one subject and negative stance toward another.

"sentiment": {"label": "positive", "score": 0.68},
"stance": [
{"subject": "renewable energy", "stance": "positive", "confidence": 0.91},
{"subject": "coal mining", "stance": "negative", "confidence": 0.84}
]

Error responses

When a request fails, the API returns a JSON object with an error code and a human-readable message:

{
"error": "not_found",
"message": "Classification job with ID 999 not found"
}

HTTP status codes

StatusMeaning
200 OKSuccess
201 CreatedClassification job created
400 Bad RequestInvalid or missing parameters
401 UnauthorizedMissing or invalid API key
404 Not FoundJob not found
422 Unprocessable ContentValidation error (e.g. invalid field names)
429 Too Many RequestsRate limit exceeded

Error tags

The errors array on each result URL indicates issues encountered during classification. An empty array means the URL was classified successfully.

Error tagDescription
fetch_failedThe URL could not be fetched (unreachable, timeout, or blocked)
parse_failedThe page was fetched but its content could not be parsed
empty_contentThe page returned no meaningful text content
unsupported_formatThe URL points to a non-HTML resource (PDF, image, etc.)
rate_limitedThe origin server rate-limited the fetch request