Ctrl + K

Deep Research

Crawl and analyze websites with AI-powered deep research capabilities. The Crawl API uses Server-Sent Events (SSE) to stream results in real-time as pages are discovered and analyzed.

Overview

The crawl endpoint enables you to:

  • Research websites comprehensively with AI guidance
  • Get real-time streaming updates as content is discovered
  • Control crawl depth, breadth, and focus
  • Receive AI thinking process and analysis
  • Gather structured insights from multiple pages

Endpoint

POST /search/crawl

Authentication

Headers:
  X-Client-ID: your_client_id
  X-Client-Secret: your_client_secret
  Content-Type: application/json
  Accept: text/event-stream

Request Body

FieldTypeRequiredDescription
instructionsstringYesNatural language crawl instructions (must include URL)
optionsobjectNoCrawl configuration options

options Object

OptionTypeDescription
typestringCrawl type: 'deep', 'shallow', 'focused' (default: 'deep')
thinkingbooleanEnable AI thinking process (default: true)
allow_thinking_callbackbooleanStream thinking events (default: true)
stream_textbooleanStream text results (default: true)
maxDepthnumberMaximum link depth to follow
maxPagesnumberMaximum pages to crawl
includeExternalbooleanInclude external links
timeoutnumberRequest timeout in milliseconds
formatstringOutput format: 'markdown', 'html', 'text'

Response

Server-Sent Events stream with the following event types:

Event TypeDescription
page_crawledA page was successfully crawled
contentContent chunk extracted
thinkingAI thinking/analysis process
errorError occurred during crawl
crawl_endCrawl completed (final event)

Final Response (crawl_end):

{
  type: 'crawl_end',
  data: {
    success: boolean,
    time_took: number
  }
}

Examples

Basic Research

import { SearchClient } from 'search-agent';

const client = new SearchClient({
  clientId: process.env.OBLIEN_CLIENT_ID,
  clientSecret: process.env.OBLIEN_CLIENT_SECRET
});

const result = await client.crawl(
  'Research https://example.com/docs and summarize all API endpoints'
);

console.log(`Completed in ${result.time_took}ms`);
const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Research https://example.com/docs and summarize all API endpoints'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  // Process SSE events
}
curl -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "instructions": "Research https://example.com/docs and summarize all API endpoints"
  }'

With Real-Time Events

let pageCount = 0;
let contentChunks = [];

const result = await client.crawl(
  'Crawl https://example.com/blog and extract all article titles and summaries',
  (event) => {
    if (event.type === 'page_crawled') {
      pageCount++;
      console.log(`Crawled page ${pageCount}: ${event.url}`);
    } else if (event.type === 'content') {
      contentChunks.push(event.text);
      console.log(`Content: ${event.text}`);
    } else if (event.type === 'thinking') {
      console.log(`AI: ${event.thought}`);
    }
  },
  {
    type: 'deep',
    maxPages: 20
  }
);

console.log(`\nCompleted:`);
console.log(`- Pages crawled: ${pageCount}`);
console.log(`- Content chunks: ${contentChunks.length}`);
console.log(`- Time: ${result.time_took}ms`);
const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Crawl https://example.com/blog and extract all article titles and summaries',
    options: {
      type: 'deep',
      maxPages: 20
    }
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let pageCount = 0;

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      
      if (event.type === 'page_crawled') {
        pageCount++;
        console.log(`Crawled page ${pageCount}`);
      } else if (event.type === 'crawl_end') {
        console.log('Crawl completed:', event.data);
      }
    }
  }
}
curl -N -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "instructions": "Crawl https://example.com/blog and extract all article titles and summaries",
    "options": {
      "type": "deep",
      "maxPages": 20
    }
  }' | while read line; do
  echo "$line"
done

Competitive Intelligence

const insights = {
  pages: [],
  features: [],
  pricing: []
};

const result = await client.crawl(
  `Research https://competitor.com and create a comprehensive report including:
   - Product offerings and features
   - Pricing plans and tiers
   - Target audience
   - Key differentiators
   - Customer testimonials`,
  (event) => {
    if (event.type === 'page_crawled') {
      insights.pages.push(event.url);
    } else if (event.type === 'content') {
      if (event.text.includes('price') || event.text.includes('$')) {
        insights.pricing.push(event.text);
      } else if (event.text.includes('feature')) {
        insights.features.push(event.text);
      }
    }
  },
  {
    type: 'deep',
    maxDepth: 3,
    maxPages: 30,
    thinking: true
  }
);

console.log('Research Complete:');
console.log(`Pages analyzed: ${insights.pages.length}`);
console.log(`Features found: ${insights.features.length}`);
console.log(`Pricing info: ${insights.pricing.length}`);
const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: `Research https://competitor.com and create a comprehensive report including:
     - Product offerings and features
     - Pricing plans and tiers
     - Target audience
     - Key differentiators
     - Customer testimonials`,
    options: {
      type: 'deep',
      maxDepth: 3,
      maxPages: 30,
      thinking: true
    }
  })
});

// Process SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();
const insights = { pages: [], content: [] };

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      
      if (event.type === 'page_crawled') {
        insights.pages.push(event.url);
      } else if (event.type === 'content') {
        insights.content.push(event.text);
      }
    }
  }
}
curl -N -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "instructions": "Research https://competitor.com and create a comprehensive report",
    "options": {
      "type": "deep",
      "maxDepth": 3,
      "maxPages": 30,
      "thinking": true
    }
  }'

Targeted Research

const result = await client.crawl(
  'Find all articles about artificial intelligence on https://news.example.com from the last month',
  (event) => {
    if (event.type === 'content') {
      console.log('Found:', event.text);
    }
  },
  {
    type: 'focused',
    maxPages: 50,
    thinking: true,
    format: 'markdown'
  }
);
const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Find all articles about artificial intelligence on https://news.example.com from the last month',
    options: {
      type: 'focused',
      maxPages: 50,
      thinking: true,
      format: 'markdown'
    }
  })
});
curl -N -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "instructions": "Find all articles about artificial intelligence on https://news.example.com from the last month",
    "options": {
      "type": "focused",
      "maxPages": 50,
      "thinking": true,
      "format": "markdown"
    }
  }'

Crawl Types

TypeBehaviorUse CaseSpeed
deepThorough exploration, follows all linksComprehensive researchSlowest
shallowQuick scan of main pages onlyQuick overviewFast
focusedTargeted crawl based on instructionsSpecific informationMedium

SSE Event Types

page_crawled

{
  "type": "page_crawled",
  "url": "https://example.com/page",
  "title": "Page Title"
}

content

{
  "type": "content",
  "text": "Extracted content text",
  "url": "https://example.com/page"
}

thinking

{
  "type": "thinking",
  "thought": "AI analysis of the current page..."
}

error

{
  "type": "error",
  "error": "Error message",
  "url": "https://example.com/failed-page"
}

crawl_end

{
  "type": "crawl_end",
  "data": {
    "success": true,
    "time_took": 45231
  }
}

Error Handling

try {
  const result = await client.crawl(
    'Research https://example.com',
    (event) => {
      if (event.type === 'error') {
        console.error('Crawl error:', event.error);
      }
    }
  );
  
  console.log('Completed successfully');
} catch (error) {
  console.error('Fatal error:', error.message);
}
const response = await fetch('https://api.oblien.com/search/crawl', {
  method: 'POST',
  headers: {
    'X-Client-ID': process.env.OBLIEN_CLIENT_ID,
    'X-Client-Secret': process.env.OBLIEN_CLIENT_SECRET,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instructions: 'Research https://example.com'
  })
});

if (!response.ok) {
  console.error('Request failed:', response.status);
  return;
}

const reader = response.body.getReader();
const decoder = new TextDecoder();

try {
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const event = JSON.parse(line.slice(6));
        
        if (event.type === 'error') {
          console.error('Crawl error:', event.error);
        }
      }
    }
  }
} catch (error) {
  console.error('Stream error:', error);
}
# Missing instructions
curl -X POST https://api.oblien.com/search/crawl \
  -H "X-Client-ID: $OBLIEN_CLIENT_ID" \
  -H "X-Client-Secret: $OBLIEN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{}'

# Response:
# {"success": false, "error": "Instructions are required"}

Best Practices

  1. Provide clear instructions - Include URL and specific research goals

  2. Set appropriate limits - Use maxPages and maxDepth to prevent runaway crawls

    options: {
      maxPages: 50,    // Limit total pages
      maxDepth: 3      // Limit link depth
    }
  3. Process events in real-time - Handle streaming events as they arrive for better UX

  4. Choose the right crawl type

    • Use 'focused' for specific information
    • Use 'deep' for comprehensive analysis
    • Use 'shallow' for quick overview
  5. Handle errors gracefully - Monitor error events and implement retry logic

  6. Monitor progress - Track page_crawled events to show progress

  7. Store results incrementally - Save content chunks as they arrive for large crawls

Common Use Cases

Documentation Gathering

await client.crawl(
  'Crawl https://docs.example.com and create a comprehensive guide of all API endpoints with parameters and examples',
  (event) => saveToDatabase(event),
  { type: 'deep', maxDepth: 4 }
);

Market Research

await client.crawl(
  'Research https://industry-leader.com and analyze their product strategy, pricing model, and target market',
  (event) => collectInsights(event),
  { type: 'deep', maxPages: 30 }
);

Content Aggregation

await client.crawl(
  'Find all blog posts about machine learning on https://blog.example.com and extract titles, dates, and summaries',
  (event) => aggregateContent(event),
  { type: 'focused', maxPages: 100 }
);

Compliance Monitoring

await client.crawl(
  'Scan https://our-website.com and identify all instances of personal data collection, cookie usage, and third-party integrations',
  (event) => auditCompliance(event),
  { type: 'deep', maxDepth: 5 }
);

Rate Limits

  • 10 concurrent crawls per account
  • Maximum 1000 pages per crawl
  • Maximum depth of 10 levels
  • 60 minute timeout per crawl

Next Steps