Index Website

以 Markdown 格式查看

Start crawling and indexing a website. Returns a job_id to track the crawling progress.

身份验证

AuthorizationBearer

Bearer 身份验证,格式为 Bearer <token>,其中 token 是您的身份验证令牌。

路径参数

domainstring必需

请求

This endpoint expects an object.
base_urlstring必需

The base URL to start indexing from (e.g., ‘https://docs.example.com’)

domain_filterstring or null可选

Domain to filter crawling (e.g., ‘docs.example.com’). Defaults to base_url domain.

path_filterstring or null可选

Path prefix to restrict crawling (e.g., ‘/docs’). Only URLs starting with this will be crawled.

url_patternstring or null可选

Regex pattern to filter URLs (e.g., https://example\.com/(docs|api)/.*).

chunk_sizeinteger or null可选默认为 1000
Size of text chunks for splitting documents
chunk_overlapinteger or null可选默认为 200
Overlap between consecutive chunks
min_content_lengthinteger or null可选默认为 100
Minimum content length to index a page
max_pagesinteger or null可选
Maximum number of pages to crawl. None means unlimited.
delaydouble or null可选默认为 1
Delay in seconds between requests
versionstring or null可选
Version to tag all indexed pages with
productstring or null可选
Product to tag all indexed pages with
authedboolean or null可选

Whether indexed pages should be auth-gated

响应

Successful Response
job_idstring
ID to track the indexing job status
base_urlstring
The base URL being indexed

错误

422
Unprocessable Entity Error