Scraping

Classify maintains a continuously updated corpus of scraped web content. When you submit a URL, Classify fetches and indexes its content — extracting the full page text, title, language, metadata, and supply chain information. This data powers both classification and direct content retrieval.

What scraping gives you

Data	Description
Full page text	The complete readable text of the page, stripped of HTML
Title	The page title as published
Language	Detected content language (e.g. `en`, `fr`)
Ads.txt supply paths	Authorized seller data for the domain
Header metadata	HTTP and HTML metadata associated with the page
Published / updated timestamps	When the content was originally published and last modified

How it works

Scraping is asynchronous. If a URL hasn't been indexed yet, you submit it for scraping and Classify crawls it in the background. Once indexed, the full artifact is available for retrieval.

A typical flow:

Check whether the URL has already been scraped (POST /v1/scraping/search)
Request scraping if it hasn't (POST /v1/scraping/jobs)
Retrieve the artifact once processing is complete

See the Scraping API reference for full endpoint documentation.

Use cases

Content research — retrieve full page text from any URL for analysis or enrichment
Brand safety — inspect page content before allowing ad placement
Audience building — provide URLs as seeds when creating contextual segments
Supply chain transparency — access ads.txt data to verify authorized sellers

What scraping gives you​

How it works​

Use cases​

What scraping gives you

How it works

Use cases