Index Website
Start crawling and indexing a website. Returns a job_id to track the crawling progress.
Authentication
AuthorizationBearer
Bearer authentication of the form Bearer <token>, where token is your auth token.
Path Parameters
domain
Request
This endpoint expects an object.
base_url
The base URL to start indexing from (e.g., 'https://docs.example.com')
domain_filter
Domain to filter crawling (e.g., ‘docs.example.com’). Defaults to base_url domain.
path_filter
Path prefix to restrict crawling (e.g., ‘/docs’). Only URLs starting with this will be crawled.
url_pattern
Regex pattern to filter URLs (e.g., ‘https://example\.com/(docs|api)/.*’).
chunk_size
Size of text chunks for splitting documents
chunk_overlap
Overlap between consecutive chunks
min_content_length
Minimum content length to index a page
max_pages
Maximum number of pages to crawl. None means unlimited.
delay
Delay in seconds between requests
version
Version to tag all indexed pages with
product
Product to tag all indexed pages with
authed
Whether indexed pages should be auth-gated
Response
Successful Response
job_id
ID to track the indexing job status
base_url
The base URL being indexed