Classification Data

Send one or more URLs (up to 1 million per request) and choose which classification signals you want back. Classify analyzes each page and returns the requested data alongside the URL, TLD, and any error flags.

Classification is asynchronous. You submit a job and receive an ID immediately, then poll for results. Small batches typically complete in seconds; large batches may take longer.

The classification object

{
  "id": 501,
  "status": "complete",
  "url_count": 3,
  "fields": ["iab_categories", "language", "entities", "keywords"],
  "iab_version": 2,
  "created_date": "2026-02-20T09:15:00Z",
  "processed_date": "2026-02-20T09:15:28Z",
  "results": [...]
}

Field	Type	Description
`id`	integer	Unique identifier for this classification job
`status`	string	`pending` → `processing` → `complete` or `failed`
`url_count`	integer	Number of URLs submitted
`fields`	array[string]	The classification signals requested
`iab_version`	integer \| null	IAB Content Taxonomy version (`1`, `2`, or `3`). Present when `iab_categories` was requested.
`created_date`	string (ISO 8601)	When the job was submitted
`processed_date`	string (ISO 8601) \| null	When results were ready. `null` until complete.
`results`	array[object] \| null	Per-URL classification data. Present only when `status` is `complete`. Paginated for large batches.

The result object

Every URL in the response includes three default fields regardless of what you request:

Field	Type	Always returned	Description
`url`	string	Yes	The URL that was classified
`tld`	string	Yes	Top-level domain extracted from the URL
`errors`	array[string]	Yes	Error flags for this URL (empty array if none). See Error tags.

All other fields appear only if you included them in the fields array.

{
  "url": "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
  "tld": "nytimes.com",
  "errors": [],
  "iab_categories": [
    {"id": "IAB19-6", "name": "Technology & Computing", "confidence": 0.95}
  ],
  "language": "en",
  "entities": [
    {"name": "NVIDIA", "type": "brand", "confidence": 0.92},
    {"name": "Jensen Huang", "type": "person", "confidence": 0.88},
    {"name": "Taiwan", "type": "place", "confidence": 0.81}
  ],
  "keywords": ["AI chips", "semiconductor", "GPU", "data center"],
  "google_product_taxonomy": [
    {"id": "222", "name": "Electronics > Computers > Computer Components", "confidence": 0.74}
  ],
  "sentiment": {"label": "positive", "score": 0.68},
  "stance": [
    {"subject": "AI investment", "stance": "positive", "confidence": 0.85},
    {"subject": "chip export controls", "stance": "negative", "confidence": 0.72}
  ]
}

Create a classification job

POST https://api.clsfy.me/v1/clsfy/classifications

Parameters

Parameter	Type	Required	Description
`urls`	array[string]	Required	URLs to classify. Maximum 1,000,000 per request.
`fields`	array[string]	Required	Classification signals to return. See Available fields.
`iab_version`	integer	Conditional	IAB Content Taxonomy version: `1`, `2`, or `3`. Required when `fields` includes `iab_categories`.

Request

curl
Python

curl -X POST "https://api.clsfy.me/v1/clsfy/classifications" \
  -H "X-API-Key: <your_api_key>" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
      "https://www.bbc.com/sport/football/premier-league",
      "https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/"
    ],
    "fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
    "iab_version": 2
  }'

import requests

response = requests.post(
    "https://api.clsfy.me/v1/clsfy/classifications",
    headers={
        "X-API-Key": "<your_api_key>",
        "Content-Type": "application/json",
    },
    json={
        "urls": [
            "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
            "https://www.bbc.com/sport/football/premier-league",
            "https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/",
        ],
        "fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
        "iab_version": 2,
    },
)

job = response.json()
print(job["id"])  # e.g. 501

Response

Returns the classification object with status: "pending".

{
  "id": 501,
  "status": "pending",
  "url_count": 3,
  "fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
  "iab_version": 2,
  "created_date": "2026-02-20T09:15:00Z",
  "processed_date": null,
  "results": null
}

Get classification results

Retrieves a classification job by ID. Use this to poll for completion and retrieve results.

GET https://api.clsfy.me/v1/clsfy/classifications/{id}

Parameter	Type	Description
`id`	integer	The classification job ID returned at creation
`limit`	integer (query)	Number of results to return per page. Default `1000`, max `10000`.
`offset`	integer (query)	Number of results to skip. Default `0`.

curl
Python

curl "https://api.clsfy.me/v1/clsfy/classifications/501" \
  -H "X-API-Key: <your_api_key>"

import requests

response = requests.get(
    "https://api.clsfy.me/v1/clsfy/classifications/501",
    headers={"X-API-Key": "<your_api_key>"},
)

job = response.json()
if job["status"] == "complete":
    for result in job["results"]:
        print(result["url"], result.get("iab_categories"))

Completed response

{
  "id": 501,
  "status": "complete",
  "url_count": 3,
  "fields": ["iab_categories", "language", "entities", "keywords", "sentiment"],
  "iab_version": 2,
  "created_date": "2026-02-20T09:15:00Z",
  "processed_date": "2026-02-20T09:15:28Z",
  "results": [
    {
      "url": "https://www.nytimes.com/2026/01/15/technology/ai-chips.html",
      "tld": "nytimes.com",
      "errors": [],
      "iab_categories": [
        {"id": "IAB19-6", "name": "Technology & Computing", "confidence": 0.95}
      ],
      "language": "en",
      "entities": [
        {"name": "NVIDIA", "type": "brand", "confidence": 0.92},
        {"name": "Jensen Huang", "type": "person", "confidence": 0.88}
      ],
      "keywords": ["AI chips", "semiconductor", "GPU", "data center"],
      "sentiment": {"label": "positive", "score": 0.68}
    },
    {
      "url": "https://www.bbc.com/sport/football/premier-league",
      "tld": "bbc.com",
      "errors": [],
      "iab_categories": [
        {"id": "IAB17-44", "name": "Sports", "confidence": 0.97}
      ],
      "language": "en",
      "entities": [
        {"name": "Premier League", "type": "thing", "confidence": 0.95},
        {"name": "Arsenal", "type": "brand", "confidence": 0.82}
      ],
      "keywords": ["football", "Premier League", "match results"],
      "sentiment": {"label": "neutral", "score": 0.52}
    },
    {
      "url": "https://www.allrecipes.com/recipe/24074/almond-crescent-cookies/",
      "tld": "allrecipes.com",
      "errors": [],
      "iab_categories": [
        {"id": "IAB8-5", "name": "Food & Drink", "confidence": 0.96}
      ],
      "language": "en",
      "entities": [],
      "keywords": ["almond cookies", "crescent cookies", "baking", "holiday recipes"],
      "sentiment": {"label": "positive", "score": 0.71}
    }
  ]
}

Paginating large results

For jobs with many URLs, use limit and offset to page through results:

Python

import requests

def get_all_results(job_id: int, api_key: str, page_size: int = 5000):
    """Retrieve all classification results, paginating automatically."""
    url = f"https://api.clsfy.me/v1/clsfy/classifications/{job_id}"
    headers = {"X-API-Key": api_key}
    all_results = []
    offset = 0

    while True:
        response = requests.get(
            url,
            headers=headers,
            params={"limit": page_size, "offset": offset},
        )
        job = response.json()
        results = job.get("results", [])

        if not results:
            break

        all_results.extend(results)
        offset += len(results)

        if len(results) < page_size:
            break

    return all_results

Polling for completion

Poll GET /v1/clsfy/classifications/{id} until status is "complete". Small batches finish in seconds; larger batches scale with URL count.

Python

import requests
import time

def wait_for_classification(job_id: int, api_key: str, poll_interval: int = 10):
    """Poll until classification results are ready."""
    url = f"https://api.clsfy.me/v1/clsfy/classifications/{job_id}"
    headers = {"X-API-Key": api_key}

    while True:
        job = requests.get(url, headers=headers).json()

        if job["status"] == "complete":
            print(f"Done — {job['url_count']} URLs classified")
            return job
        elif job["status"] == "failed":
            raise RuntimeError(f"Classification job {job_id} failed.")

        print(f"Status: {job['status']} — retrying in {poll_interval}s")
        time.sleep(poll_interval)

Available fields

Request these values in the fields array to control what classification data is returned for each URL.

Field value	Description	Returns
`iab_categories`	IAB Content Taxonomy categories. Requires `iab_version`.	Array of `{id, name, confidence}`
`language`	Detected language of the page content	ISO 639-1 code (e.g. `"en"`, `"es"`, `"de"`)
`entities`	Named entities: people, places, things, products, brands	Array of `{name, type, confidence}`
`keywords`	Extracted topic keywords	Array of strings
`google_product_taxonomy`	Google Product Taxonomy categories	Array of `{id, name, confidence}`
`sentiment`	Overall sentiment of the page	`{label, score}` where label is `positive`, `negative`, or `neutral`
`stance`	Stance toward key subjects mentioned on the page	Array of `{subject, stance, confidence}` where stance is `positive`, `negative`, or `neutral`

IAB versions

When requesting iab_categories, you must set iab_version to one of:

Version	Description
`1`	IAB Tech Lab Content Taxonomy 1.0
`2`	IAB Tech Lab Content Taxonomy 2.0
`3`	IAB Tech Lab Content Taxonomy 3.0

Entity types

The entities field returns objects with a type value from the following set:

Type	Examples
`person`	Individuals, public figures
`place`	Cities, countries, landmarks
`thing`	Concepts, events, organizations
`product`	Specific products or product lines
`brand`	Companies, brands

Stance vs. sentiment

Sentiment is the overall tone of the page — is the content positive, negative, or neutral?

Stance is more granular: for each key subject mentioned, what position does the content take? A single page can have positive stance toward one subject and negative stance toward another.

"sentiment": {"label": "positive", "score": 0.68},
"stance": [
  {"subject": "renewable energy", "stance": "positive", "confidence": 0.91},
  {"subject": "coal mining", "stance": "negative", "confidence": 0.84}
]

Error responses

When a request fails, the API returns a JSON object with an error code and a human-readable message:

{
  "error": "not_found",
  "message": "Classification job with ID 999 not found"
}

HTTP status codes

Status	Meaning
`200 OK`	Success
`201 Created`	Classification job created
`400 Bad Request`	Invalid or missing parameters
`401 Unauthorized`	Missing or invalid API key
`404 Not Found`	Job not found
`422 Unprocessable Content`	Validation error (e.g. invalid field names)
`429 Too Many Requests`	Rate limit exceeded

Error tags

The errors array on each result URL indicates issues encountered during classification. An empty array means the URL was classified successfully.

Error tag	Description
`fetch_failed`	The URL could not be fetched (unreachable, timeout, or blocked)
`parse_failed`	The page was fetched but its content could not be parsed
`empty_content`	The page returned no meaningful text content
`unsupported_format`	The URL points to a non-HTML resource (PDF, image, etc.)
`rate_limited`	The origin server rate-limited the fetch request

The classification object​

The result object​

Create a classification job​

Parameters​

Request​

Response​

Get classification results​

Completed response​

Paginating large results​

Polling for completion​

Available fields​

IAB versions​

Entity types​

Stance vs. sentiment​

Error responses​

HTTP status codes​

Error tags​