Deep Research
Crawl and analyze websites with AI-powered deep research capabilities. The Crawl API uses Server-Sent Events (SSE) to stream results in real-time as pages are discovered and analyzed.
Overview
The crawl endpoint enables you to:
- Research websites comprehensively with AI guidance
- Get real-time streaming updates as content is discovered
- Control crawl depth, breadth, and focus
- Receive AI thinking process and analysis
- Gather structured insights from multiple pages
Endpoint
POST /search/crawlAuthentication
Headers:
X-Client-ID: your_client_id
X-Client-Secret: your_client_secret
Content-Type: application/json
Accept: text/event-streamRequest Body
| Field | Type | Required | Description |
|---|---|---|---|
instructions | string | Yes | Natural language crawl instructions (must include URL) |
options | object | No | Crawl configuration options |
options Object
| Option | Type | Description |
|---|---|---|
type | string | Crawl type: 'deep', 'shallow', 'focused' (default: 'deep') |
thinking | boolean | Enable AI thinking process (default: true) |
allow_thinking_callback | boolean | Stream thinking events (default: true) |
stream_text | boolean | Stream text results (default: true) |
maxDepth | number | Maximum link depth to follow |
maxPages | number | Maximum pages to crawl |
includeExternal | boolean | Include external links |
timeout | number | Request timeout in milliseconds |
format | string | Output format: 'markdown', 'html', 'text' |
Response
Server-Sent Events stream with the following event types:
| Event Type | Description |
|---|---|
page_crawled | A page was successfully crawled |
content | Content chunk extracted |
thinking | AI thinking/analysis process |
error | Error occurred during crawl |
crawl_end | Crawl completed (final event) |
Final Response (crawl_end):
{
type: 'crawl_end',
data: {
success: boolean,
time_took: number
}
}Examples
Basic Research
import { SearchClient } from 'search-agent';
const client = new SearchClient({
clientId: process.env.OBLIEN_CLIENT_ID,
clientSecret: process.env.OBLIEN_CLIENT_SECRET
});
const result = await client.crawl(
'Research https://example.com/docs and summarize all API endpoints'
);
console.log(`Completed in ${result.time_took}ms`);const response = await fetch('https://api.oblien.com/search/crawl', {
method: 'POST',
headers: {
'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
'Content-Type': 'application/json'
},
body: JSON.stringify({
instructions: 'Research https://example.com/docs and summarize all API endpoints'
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Process SSE events
}curl -X POST https://api.oblien.com/search/crawl \
-H "X-Client-ID: $OBLIEN_CLIENT_ID" \
-H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"instructions": "Research https://example.com/docs and summarize all API endpoints"
}'With Real-Time Events
let pageCount = 0;
let contentChunks = [];
const result = await client.crawl(
'Crawl https://example.com/blog and extract all article titles and summaries',
(event) => {
if (event.type === 'page_crawled') {
pageCount++;
console.log(`Crawled page ${pageCount}: ${event.url}`);
} else if (event.type === 'content') {
contentChunks.push(event.text);
console.log(`Content: ${event.text}`);
} else if (event.type === 'thinking') {
console.log(`AI: ${event.thought}`);
}
},
{
type: 'deep',
maxPages: 20
}
);
console.log(`\nCompleted:`);
console.log(`- Pages crawled: ${pageCount}`);
console.log(`- Content chunks: ${contentChunks.length}`);
console.log(`- Time: ${result.time_took}ms`);const response = await fetch('https://api.oblien.com/search/crawl', {
method: 'POST',
headers: {
'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
'Content-Type': 'application/json'
},
body: JSON.stringify({
instructions: 'Crawl https://example.com/blog and extract all article titles and summaries',
options: {
type: 'deep',
maxPages: 20
}
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let pageCount = 0;
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
if (event.type === 'page_crawled') {
pageCount++;
console.log(`Crawled page ${pageCount}`);
} else if (event.type === 'crawl_end') {
console.log('Crawl completed:', event.data);
}
}
}
}curl -N -X POST https://api.oblien.com/search/crawl \
-H "X-Client-ID: $OBLIEN_CLIENT_ID" \
-H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"instructions": "Crawl https://example.com/blog and extract all article titles and summaries",
"options": {
"type": "deep",
"maxPages": 20
}
}' | while read line; do
echo "$line"
doneCompetitive Intelligence
const insights = {
pages: [],
features: [],
pricing: []
};
const result = await client.crawl(
`Research https://competitor.com and create a comprehensive report including:
- Product offerings and features
- Pricing plans and tiers
- Target audience
- Key differentiators
- Customer testimonials`,
(event) => {
if (event.type === 'page_crawled') {
insights.pages.push(event.url);
} else if (event.type === 'content') {
if (event.text.includes('price') || event.text.includes('$')) {
insights.pricing.push(event.text);
} else if (event.text.includes('feature')) {
insights.features.push(event.text);
}
}
},
{
type: 'deep',
maxDepth: 3,
maxPages: 30,
thinking: true
}
);
console.log('Research Complete:');
console.log(`Pages analyzed: ${insights.pages.length}`);
console.log(`Features found: ${insights.features.length}`);
console.log(`Pricing info: ${insights.pricing.length}`);const response = await fetch('https://api.oblien.com/search/crawl', {
method: 'POST',
headers: {
'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
'Content-Type': 'application/json'
},
body: JSON.stringify({
instructions: `Research https://competitor.com and create a comprehensive report including:
- Product offerings and features
- Pricing plans and tiers
- Target audience
- Key differentiators
- Customer testimonials`,
options: {
type: 'deep',
maxDepth: 3,
maxPages: 30,
thinking: true
}
})
});
// Process SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();
const insights = { pages: [], content: [] };
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
if (event.type === 'page_crawled') {
insights.pages.push(event.url);
} else if (event.type === 'content') {
insights.content.push(event.text);
}
}
}
}curl -N -X POST https://api.oblien.com/search/crawl \
-H "X-Client-ID: $OBLIEN_CLIENT_ID" \
-H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{
"instructions": "Research https://competitor.com and create a comprehensive report",
"options": {
"type": "deep",
"maxDepth": 3,
"maxPages": 30,
"thinking": true
}
}'Targeted Research
const result = await client.crawl(
'Find all articles about artificial intelligence on https://news.example.com from the last month',
(event) => {
if (event.type === 'content') {
console.log('Found:', event.text);
}
},
{
type: 'focused',
maxPages: 50,
thinking: true,
format: 'markdown'
}
);const response = await fetch('https://api.oblien.com/search/crawl', {
method: 'POST',
headers: {
'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
'Content-Type': 'application/json'
},
body: JSON.stringify({
instructions: 'Find all articles about artificial intelligence on https://news.example.com from the last month',
options: {
type: 'focused',
maxPages: 50,
thinking: true,
format: 'markdown'
}
})
});curl -N -X POST https://api.oblien.com/search/crawl \
-H "X-Client-ID: $OBLIEN_CLIENT_ID" \
-H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{
"instructions": "Find all articles about artificial intelligence on https://news.example.com from the last month",
"options": {
"type": "focused",
"maxPages": 50,
"thinking": true,
"format": "markdown"
}
}'Crawl Types
| Type | Behavior | Use Case | Speed |
|---|---|---|---|
| deep | Thorough exploration, follows all links | Comprehensive research | Slowest |
| shallow | Quick scan of main pages only | Quick overview | Fast |
| focused | Targeted crawl based on instructions | Specific information | Medium |
SSE Event Types
page_crawled
{
"type": "page_crawled",
"url": "https://example.com/page",
"title": "Page Title"
}content
{
"type": "content",
"text": "Extracted content text",
"url": "https://example.com/page"
}thinking
{
"type": "thinking",
"thought": "AI analysis of the current page..."
}error
{
"type": "error",
"error": "Error message",
"url": "https://example.com/failed-page"
}crawl_end
{
"type": "crawl_end",
"data": {
"success": true,
"time_took": 45231
}
}Error Handling
try {
const result = await client.crawl(
'Research https://example.com',
(event) => {
if (event.type === 'error') {
console.error('Crawl error:', event.error);
}
}
);
console.log('Completed successfully');
} catch (error) {
console.error('Fatal error:', error.message);
}const response = await fetch('https://api.oblien.com/search/crawl', {
method: 'POST',
headers: {
'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
'Content-Type': 'application/json'
},
body: JSON.stringify({
instructions: 'Research https://example.com'
})
});
if (!response.ok) {
console.error('Request failed:', response.status);
return;
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const event = JSON.parse(line.slice(6));
if (event.type === 'error') {
console.error('Crawl error:', event.error);
}
}
}
}
} catch (error) {
console.error('Stream error:', error);
}# Missing instructions
curl -X POST https://api.oblien.com/search/crawl \
-H "X-Client-ID: $OBLIEN_CLIENT_ID" \
-H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{}'
# Response:
# {"success": false, "error": "Instructions are required"}Best Practices
-
Provide clear instructions - Include URL and specific research goals
- Good: "Crawl https://example.com/docs and extract all API method signatures with descriptions"
- Bad: "Get API docs"
-
Set appropriate limits - Use maxPages and maxDepth to prevent runaway crawls
options: { maxPages: 50, // Limit total pages maxDepth: 3 // Limit link depth } -
Process events in real-time - Handle streaming events as they arrive for better UX
-
Choose the right crawl type
- Use 'focused' for specific information
- Use 'deep' for comprehensive analysis
- Use 'shallow' for quick overview
-
Handle errors gracefully - Monitor error events and implement retry logic
-
Monitor progress - Track page_crawled events to show progress
-
Store results incrementally - Save content chunks as they arrive for large crawls
Common Use Cases
Documentation Gathering
await client.crawl(
'Crawl https://docs.example.com and create a comprehensive guide of all API endpoints with parameters and examples',
(event) => saveToDatabase(event),
{ type: 'deep', maxDepth: 4 }
);Market Research
await client.crawl(
'Research https://industry-leader.com and analyze their product strategy, pricing model, and target market',
(event) => collectInsights(event),
{ type: 'deep', maxPages: 30 }
);Content Aggregation
await client.crawl(
'Find all blog posts about machine learning on https://blog.example.com and extract titles, dates, and summaries',
(event) => aggregateContent(event),
{ type: 'focused', maxPages: 100 }
);Compliance Monitoring
await client.crawl(
'Scan https://our-website.com and identify all instances of personal data collection, cookie usage, and third-party integrations',
(event) => auditCompliance(event),
{ type: 'deep', maxDepth: 5 }
);Rate Limits
- 10 concurrent crawls per account
- Maximum 1000 pages per crawl
- Maximum depth of 10 levels
- 60 minute timeout per crawl
Next Steps
- Learn about Web Search for quick information lookup
- Explore Content Extraction for targeted data gathering