Deep Research

Crawl and analyze websites with AI-powered deep research capabilities. The Crawl API uses Server-Sent Events (SSE) to stream results in real-time as pages are discovered and analyzed.

Overview

The crawl endpoint enables you to:

Research websites comprehensively with AI guidance
Get real-time streaming updates as content is discovered
Control crawl depth, breadth, and focus
Receive AI thinking process and analysis
Gather structured insights from multiple pages

Endpoint

POST /search/crawl

Authentication

Headers:
  X-Client-ID: your_client_id
  X-Client-Secret: your_client_secret
  Content-Type: application/json
  Accept: text/event-stream

Request Body

Field	Type	Required	Description
`instructions`	string	Yes	Natural language crawl instructions (must include URL)
`options`	object	No	Crawl configuration options

options Object

Option	Type	Description
`type`	string	Crawl type: 'deep', 'shallow', 'focused' (default: 'deep')
`thinking`	boolean	Enable AI thinking process (default: true)
`allow_thinking_callback`	boolean	Stream thinking events (default: true)
`stream_text`	boolean	Stream text results (default: true)
`maxDepth`	number	Maximum link depth to follow
`maxPages`	number	Maximum pages to crawl
`includeExternal`	boolean	Include external links
`timeout`	number	Request timeout in milliseconds
`format`	string	Output format: 'markdown', 'html', 'text'

Response

Server-Sent Events stream with the following event types:

Event Type	Description
`page_crawled`	A page was successfully crawled
`content`	Content chunk extracted
`thinking`	AI thinking/analysis process
`error`	Error occurred during crawl
`crawl_end`	Crawl completed (final event)

Final Response (crawl_end):

{
  type: 'crawl_end',
  data: {
    success: boolean,
    time_took: number
  }
}

Examples

Basic Research

import { SearchClient } from 'search-agent';

const client = new SearchClient({
  clientId: process.env.OBLIEN_CLIENT_ID,
  clientSecret: process.env.OBLIEN_CLIENT_SECRET
});

const result = await client.crawl(
  'Research https://example.com/docs and summarize all API endpoints'
);

console.log(`Completed in ${result.time_took}ms`);

const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Research https://example.com/docs and summarize all API endpoints'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  // Process SSE events
}

curl -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "instructions": "Research https://example.com/docs and summarize all API endpoints"
  }'

With Real-Time Events

let pageCount = 0;
let contentChunks = [];

const result = await client.crawl(
  'Crawl https://example.com/blog and extract all article titles and summaries',
  (event) => {
    if (event.type === 'page_crawled') {
      pageCount++;
      console.log(`Crawled page ${pageCount}: ${event.url}`);
    } else if (event.type === 'content') {
      contentChunks.push(event.text);
      console.log(`Content: ${event.text}`);
    } else if (event.type === 'thinking') {
      console.log(`AI: ${event.thought}`);
    }
  },
  {
    type: 'deep',
    maxPages: 20
  }
);

console.log(`\nCompleted:`);
console.log(`- Pages crawled: ${pageCount}`);
console.log(`- Content chunks: ${contentChunks.length}`);
console.log(`- Time: ${result.time_took}ms`);

const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Crawl https://example.com/blog and extract all article titles and summaries',
    options: {
      type: 'deep',
      maxPages: 20
    }
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let pageCount = 0;

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      
      if (event.type === 'page_crawled') {
        pageCount++;
        console.log(`Crawled page ${pageCount}`);
      } else if (event.type === 'crawl_end') {
        console.log('Crawl completed:', event.data);
      }
    }
  }
}

curl -N -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "instructions": "Crawl https://example.com/blog and extract all article titles and summaries",
    "options": {
      "type": "deep",
      "maxPages": 20
    }
  }' | while read line; do
  echo "$line"
done

Competitive Intelligence

const insights = {
  pages: [],
  features: [],
  pricing: []
};

const result = await client.crawl(
  `Research https://competitor.com and create a comprehensive report including:
   - Product offerings and features
   - Pricing plans and tiers
   - Target audience
   - Key differentiators
   - Customer testimonials`,
  (event) => {
    if (event.type === 'page_crawled') {
      insights.pages.push(event.url);
    } else if (event.type === 'content') {
      if (event.text.includes('price') || event.text.includes('$')) {
        insights.pricing.push(event.text);
      } else if (event.text.includes('feature')) {
        insights.features.push(event.text);
      }
    }
  },
  {
    type: 'deep',
    maxDepth: 3,
    maxPages: 30,
    thinking: true
  }
);

console.log('Research Complete:');
console.log(`Pages analyzed: ${insights.pages.length}`);
console.log(`Features found: ${insights.features.length}`);
console.log(`Pricing info: ${insights.pricing.length}`);

const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: `Research https://competitor.com and create a comprehensive report including:
     - Product offerings and features
     - Pricing plans and tiers
     - Target audience
     - Key differentiators
     - Customer testimonials`,
    options: {
      type: 'deep',
      maxDepth: 3,
      maxPages: 30,
      thinking: true
    }
  })
});

// Process SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();
const insights = { pages: [], content: [] };

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      
      if (event.type === 'page_crawled') {
        insights.pages.push(event.url);
      } else if (event.type === 'content') {
        insights.content.push(event.text);
      }
    }
  }
}

curl -N -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "instructions": "Research https://competitor.com and create a comprehensive report",
    "options": {
      "type": "deep",
      "maxDepth": 3,
      "maxPages": 30,
      "thinking": true
    }
  }'

Targeted Research

const result = await client.crawl(
  'Find all articles about artificial intelligence on https://news.example.com from the last month',
  (event) => {
    if (event.type === 'content') {
      console.log('Found:', event.text);
    }
  },
  {
    type: 'focused',
    maxPages: 50,
    thinking: true,
    format: 'markdown'
  }
);

const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Find all articles about artificial intelligence on https://news.example.com from the last month',
    options: {
      type: 'focused',
      maxPages: 50,
      thinking: true,
      format: 'markdown'
    }
  })
});

curl -N -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "instructions": "Find all articles about artificial intelligence on https://news.example.com from the last month",
    "options": {
      "type": "focused",
      "maxPages": 50,
      "thinking": true,
      "format": "markdown"
    }
  }'

Crawl Types

Type	Behavior	Use Case	Speed
deep	Thorough exploration, follows all links	Comprehensive research	Slowest
shallow	Quick scan of main pages only	Quick overview	Fast
focused	Targeted crawl based on instructions	Specific information	Medium

SSE Event Types

page_crawled

{
  "type": "page_crawled",
  "url": "https://example.com/page",
  "title": "Page Title"
}

content

{
  "type": "content",
  "text": "Extracted content text",
  "url": "https://example.com/page"
}

thinking

{
  "type": "thinking",
  "thought": "AI analysis of the current page..."
}

error

{
  "type": "error",
  "error": "Error message",
  "url": "https://example.com/failed-page"
}

crawl_end

{
  "type": "crawl_end",
  "data": {
    "success": true,
    "time_took": 45231
  }
}

Error Handling

try {
  const result = await client.crawl(
    'Research https://example.com',
    (event) => {
      if (event.type === 'error') {
        console.error('Crawl error:', event.error);
      }
    }
  );
  
  console.log('Completed successfully');
} catch (error) {
  console.error('Fatal error:', error.message);
}

const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Research https://example.com'
  })
});

if (!response.ok) {
  console.error('Request failed:', response.status);
  return;
}

const reader = response.body.getReader();
const decoder = new TextDecoder();

try {
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const event = JSON.parse(line.slice(6));
        
        if (event.type === 'error') {
          console.error('Crawl error:', event.error);
        }
      }
    }
  }
} catch (error) {
  console.error('Stream error:', error);
}

# Missing instructions
curl -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{}'

# Response:
# {"success": false, "error": "Instructions are required"}

Best Practices

Provide clear instructions - Include URL and specific research goals
- Good: "Crawl https://example.com/docs and extract all API method signatures with descriptions"
- Bad: "Get API docs"

Set appropriate limits - Use maxPages and maxDepth to prevent runaway crawls

options: {
  maxPages: 50,    // Limit total pages
  maxDepth: 3      // Limit link depth
}

Process events in real-time - Handle streaming events as they arrive for better UX
Choose the right crawl type
- Use 'focused' for specific information
- Use 'deep' for comprehensive analysis
- Use 'shallow' for quick overview
Handle errors gracefully - Monitor error events and implement retry logic
Monitor progress - Track page_crawled events to show progress
Store results incrementally - Save content chunks as they arrive for large crawls

Common Use Cases

Documentation Gathering

await client.crawl(
  'Crawl https://docs.example.com and create a comprehensive guide of all API endpoints with parameters and examples',
  (event) => saveToDatabase(event),
  { type: 'deep', maxDepth: 4 }
);

Market Research

await client.crawl(
  'Research https://industry-leader.com and analyze their product strategy, pricing model, and target market',
  (event) => collectInsights(event),
  { type: 'deep', maxPages: 30 }
);

Content Aggregation

await client.crawl(
  'Find all blog posts about machine learning on https://blog.example.com and extract titles, dates, and summaries',
  (event) => aggregateContent(event),
  { type: 'focused', maxPages: 100 }
);

Compliance Monitoring

await client.crawl(
  'Scan https://our-website.com and identify all instances of personal data collection, cookie usage, and third-party integrations',
  (event) => auditCompliance(event),
  { type: 'deep', maxDepth: 5 }
);

Rate Limits

10 concurrent crawls per account
Maximum 1000 pages per crawl
Maximum depth of 10 levels
60 minute timeout per crawl

Next Steps

Learn about Web Search for quick information lookup
Explore Content Extraction for targeted data gathering

On this page