API Documentation

CrawlRocket is a web scraping and real-time data API. Scrape any URL, look up people, or tap into live feeds for news, sports, markets, and alerts — all through a single API.

Base URL: https://api.crawlrocket.com

#Quick Start

Three steps to your first scrape. Get an API key from the dashboard, submit a job, poll for results.

1. Submit a person lookup

curl

curl -X POST https://api.crawlrocket.com/api/lookup \
  -H "Authorization: Bearer sk_pro_your_key" \
  -H "Content-Type: application/json" \
  -d '{"name": "Jane Smith", "sources": ["linkedin", "github"]}'

2. Get the job ID back

Response — 202 Accepted

{
  "job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "queued",
  "poll_url": "/api/jobs/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

3. Poll for results (wait 5-20 seconds)

curl

curl https://api.crawlrocket.com/api/jobs/a1b2c3d4-... \
  -H "Authorization: Bearer sk_pro_your_key"

Response — 200 OK

{
  "id": "a1b2c3d4-...",
  "type": "person",
  "status": "completed",
  "result": {
    "name": "Jane Smith",
    "headline": "Staff Engineer at Stripe",
    "photo": "https://...",
    "sources": {
      "linkedin": { "url": "linkedin.com/in/janesmith", ... },
      "github": { "url": "github.com/jsmith", ... }
    },
    "emails": ["jane@example.com"],
    "phones": []
  }
}

#Authentication

All requests require a Bearer token. Get your API key from the dashboard.

Header

Authorization: Bearer sk_pro_your_api_key_here

Keys are prefixed by tier: sk_free_, sk_pro_, sk_enterprise_. Missing or invalid keys return 401.

#Person Lookup

POST/api/lookup

Search for a person across LinkedIn, GitHub, and X. Results from all sources are merged into a single profile with contact info, photos, and headlines.

Parameters

name*

string

The person's full name to search for.

sources

string[]

Which platforms to search. Options: linkedin, github, twitter. Defaults to ["linkedin", "github"].

curl

curl -X POST https://api.crawlrocket.com/api/lookup \
  -H "Authorization: Bearer sk_pro_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "Amer Sarhan", "sources": ["linkedin", "github"]}'

#Search & Scrape

POST/api/search

Run a Google search and scrape the top N result pages. Returns structured data for each page.

Parameters

query*

string

The search query.

limit

number

Max results to scrape, 1-10. Default: 3.

curl

curl -X POST https://api.crawlrocket.com/api/search \
  -H "Authorization: Bearer sk_pro_..." \
  -H "Content-Type: application/json" \
  -d '{"query": "best web scraping tools 2024", "limit": 5}'

#URL Scrape

POST/api/scrape

Scrape a single URL using a headless browser. Returns page title, meta, headings, body text, links, and extracted contact info.

Parameters

url*

string

The URL to scrape. Must be a valid HTTP/HTTPS URL.

curl

curl -X POST https://api.crawlrocket.com/api/scrape \
  -H "Authorization: Bearer sk_pro_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

#Live Data Feeds

Real-time data feeds for news, sports, markets, and alerts. Data is fetched on-demand from 12 sources and cached for 1-3 minutes. Requires Pro or Enterprise plan.

Available feeds

Feed	Sources	Cache
`/api/feeds/news`	Al Jazeera, CNN, Sky News, Khaleej Times, Fox News	2 min
`/api/feeds/sports`	Goal.com (22+ leagues)	1 min
`/api/feeds/markets`	CNBC, CoinGecko, Exchange Rates	3 min
`/api/feeds/alerts`	USGS Earthquakes, Red Alert Israel	1 min

GET/api/feeds/:category

Fetch a feed by category. Returns items sorted by most recent first.

Query parameters

limit

number

Max items to return, 1-100. Default: 20.

source

string

Filter by source ID. e.g. aljazeera, coingecko.

curl

curl https://api.crawlrocket.com/api/feeds/news?limit=5 \
  -H "Authorization: Bearer sk_pro_..."

Response

{
  "items": [
    {
      "id": "aje-breaking-4434655-0",
      "source": "aljazeera",
      "category": "breaking",
      "title": "Breaking headline from Al Jazeera",
      "summary": "Article excerpt...",
      "url": "https://www.aljazeera.com/news/...",
      "image": "https://www.aljazeera.com/wp-content/uploads/...",
      "author": "Reporter Name",
      "publishedAt": "2026-03-26T15:49:26Z",
      "tags": ["breaking", "middle-east"]
    }
  ],
  "sources": [
    { "id": "aljazeera", "name": "Al Jazeera", "count": 5, "cached": false }
  ],
  "fromCache": false,
  "fetchedAt": "2026-03-26T17:22:57.607Z"
}

GET/api/feeds/source/:id

Fetch from a single source by ID. Source IDs: aljazeera, cnn, sky-news, sky-news-arabia, khaleej-times, fox-news, goal-scores, cnbc, coingecko, exchange-rates, usgs, tzeva-adom.

curl

curl https://api.crawlrocket.com/api/feeds/source/coingecko?limit=5 \
  -H "Authorization: Bearer sk_pro_..."

GET/api/feeds

List all available feeds with sources and cache TTLs. Public — no API key needed. Use this to discover available feeds.

#Job Polling

GET/api/jobs/:id

All endpoints return a job ID. Poll this endpoint to get results. Jobs typically complete in 5-20 seconds.

Job statuses

`queued`	Job is waiting to be processed
`running`	Job is being processed
`completed`	Results available in result field
`failed`	Error occurred — check error field

You can also list all your jobs with GET /api/jobs.

#Usage Stats

GET/api/usage

Returns your current plan, rate limits, and request counts.

Response

{
  "tier": "pro",
  "limits": { "rate_per_minute": 60, "monthly": 2000 },
  "usage": {
    "monthly": 142,
    "today": 23,
    "byEndpoint": [
      { "endpoint": "/api/lookup", "count": 89 },
      { "endpoint": "/api/search", "count": 41 },
      { "endpoint": "/api/scrape", "count": 12 }
    ]
  }
}

#Errors

Errors return a JSON body with an error field.

Code	Meaning
`400`	Bad request — missing or invalid parameters
`401`	Unauthorized — missing or invalid API key
`404`	Not found — job ID doesn't exist or isn't yours
`429`	Rate limit exceeded — slow down or upgrade
`500`	Server error — try again or contact support

#Rate Limits

Tier	Per Minute	Per Month	Price
Free	5	5	$0
Pro	60	2,000	$29/mo
Enterprise	200	50,000	$199/mo

When you exceed a limit, you'll get a 429 with a retry_after field in seconds.

#Caching

Results are cached for 1 hour. If you look up the same person or scrape the same URL within that window, you get the cached result instantly — no additional request counted against your quota.

Cached results include "_cached": true in the response so you can tell.

#SDKs & Libraries

CrawlRocket is a REST API — use it from any language. Here are quick examples:

JavaScript / Node.js

fetch

const res = await fetch("https://api.crawlrocket.com/api/lookup", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk_pro_...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Jane Smith",
    sources: ["linkedin", "github"],
  }),
});

const { job_id } = await res.json();

// Poll for result
const result = await fetch(
  `https://api.crawlrocket.com/api/jobs/${job_id}`,
  { headers: { "Authorization": "Bearer sk_pro_..." } }
).then(r => r.json());

Python

requests

import requests, time

headers = {
    "Authorization": "Bearer sk_pro_...",
    "Content-Type": "application/json",
}

# Submit
r = requests.post("https://api.crawlrocket.com/api/lookup",
    json={"name": "Jane Smith", "sources": ["linkedin", "github"]},
    headers=headers)

job_id = r.json()["job_id"]

# Poll
while True:
    r = requests.get(f"https://api.crawlrocket.com/api/jobs/{job_id}",
        headers=headers)
    data = r.json()
    if data["status"] in ("completed", "failed"):
        break
    time.sleep(3)

print(data["result"])