Reindex Website

以 Markdown 格式查看

Re-crawl a website by starting a new crawl job. The job will delete old pages before indexing. Uses the configuration from the original index request.

身份验证

AuthorizationBearer

Bearer 身份验证,格式为 Bearer <token>,其中 token 是您的身份验证令牌。

路径参数

domainstring必需

请求

This endpoint expects an object.
base_urlstring必需

The base URL to re-crawl (will delete old pages and re-index)

domain_filterstring or null可选

Domain to filter crawling (e.g., ‘docs.example.com’). If not provided, uses previous config.

path_filterstring or null可选

Path prefix to restrict crawling (e.g., ‘/docs’). If not provided, uses previous config.

url_patternstring or null可选

Regex pattern to filter URLs (e.g., https://example\.com/(docs|api)/.*). If not provided, uses previous config.

chunk_sizeinteger or null可选
Size of text chunks for splitting documents. If not provided, uses previous config.
chunk_overlapinteger or null可选
Overlap between consecutive chunks. If not provided, uses previous config.
min_content_lengthinteger or null可选
Minimum content length to index a page. If not provided, uses previous config.
max_pagesinteger or null可选
Maximum number of pages to crawl. If not provided, uses previous config.
delaydouble or null可选
Delay in seconds between requests. If not provided, uses previous config.
versionstring or null可选
Version to tag all indexed pages with. If not provided, uses previous config.
productstring or null可选
Product to tag all indexed pages with. If not provided, uses previous config.
authedboolean or null可选

Whether indexed pages should be auth-gated. If not provided, uses previous config.

响应

Successful Response
job_idstring

ID to track the re-crawling job status

base_urlstring

The base URL being re-crawled

错误

422
Unprocessable Entity Error