Index Website

Start crawling and indexing a website. Returns a job_id to track the crawling progress.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Path Parameters

domainstringRequired

Request

This endpoint expects an object.
base_urlstringRequired
The base URL to start indexing from (e.g., 'https://docs.example.com')
domain_filterstring or nullOptional

Domain to filter crawling (e.g., ‘docs.example.com’). Defaults to base_url domain.

path_filterstring or nullOptional

Path prefix to restrict crawling (e.g., ‘/docs’). Only URLs starting with this will be crawled.

url_patternstring or nullOptional

Regex pattern to filter URLs (e.g., ‘https://example\.com/(docs|api)/.*’).

chunk_sizeinteger or nullOptionalDefaults to 1000
Size of text chunks for splitting documents
chunk_overlapinteger or nullOptionalDefaults to 200
Overlap between consecutive chunks
min_content_lengthinteger or nullOptionalDefaults to 100
Minimum content length to index a page
max_pagesinteger or nullOptional
Maximum number of pages to crawl. None means unlimited.
delaydouble or nullOptionalDefaults to 1
Delay in seconds between requests
versionstring or nullOptional
Version to tag all indexed pages with
productstring or nullOptional
Product to tag all indexed pages with
authedboolean or nullOptional

Whether indexed pages should be auth-gated

Response

Successful Response
job_idstring
ID to track the indexing job status
base_urlstring
The base URL being indexed

Errors