API Documentation

CrawlRocket is a web scraping and real-time data API. Scrape any URL, look up people, or tap into live feeds for news, sports, markets, and alerts — all through a single API.

Base URL: https://api.crawlrocket.com

#Quick Start

Three steps to your first scrape. Get an API key from the dashboard, submit a job, poll for results.

1. Submit a person lookup
curl
curl -X POST https://api.crawlrocket.com/api/lookup \
  -H "Authorization: Bearer sk_pro_your_key" \
  -H "Content-Type: application/json" \
  -d '{"name": "Jane Smith", "sources": ["linkedin", "github"]}'
2. Get the job ID back
Response — 202 Accepted
{
  "job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "queued",
  "poll_url": "/api/jobs/a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
3. Poll for results (wait 5-20 seconds)
curl
curl https://api.crawlrocket.com/api/jobs/a1b2c3d4-... \
  -H "Authorization: Bearer sk_pro_your_key"
Response — 200 OK
{
  "id": "a1b2c3d4-...",
  "type": "person",
  "status": "completed",
  "result": {
    "name": "Jane Smith",
    "headline": "Staff Engineer at Stripe",
    "photo": "https://...",
    "sources": {
      "linkedin": { "url": "linkedin.com/in/janesmith", ... },
      "github": { "url": "github.com/jsmith", ... }
    },
    "emails": ["jane@example.com"],
    "phones": []
  }
}

#Authentication

All requests require a Bearer token. Get your API key from the dashboard.

Header
Authorization: Bearer sk_pro_your_api_key_here

Keys are prefixed by tier: sk_free_, sk_pro_, sk_enterprise_. Missing or invalid keys return 401.

#Person Lookup

POST/api/lookup

Search for a person across LinkedIn, GitHub, and X. Results from all sources are merged into a single profile with contact info, photos, and headlines.

Parameters
name*
string
The person's full name to search for.
sources
string[]
Which platforms to search. Options: linkedin, github, twitter. Defaults to ["linkedin", "github"].
curl
curl -X POST https://api.crawlrocket.com/api/lookup \
  -H "Authorization: Bearer sk_pro_..." \
  -H "Content-Type: application/json" \
  -d '{"name": "Amer Sarhan", "sources": ["linkedin", "github"]}'

#URL Scrape

POST/api/scrape

Scrape a single URL using a headless browser. Returns page title, meta, headings, body text, links, and extracted contact info.

Parameters
url*
string
The URL to scrape. Must be a valid HTTP/HTTPS URL.
curl
curl -X POST https://api.crawlrocket.com/api/scrape \
  -H "Authorization: Bearer sk_pro_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

#Live Data Feeds

Real-time data feeds for news, sports, markets, and alerts. Data is fetched on-demand from 12 sources and cached for 1-3 minutes. Requires Pro or Enterprise plan.

Available feeds
FeedSourcesCache
/api/feeds/newsAl Jazeera, CNN, Sky News, Khaleej Times, Fox News2 min
/api/feeds/sportsGoal.com (22+ leagues)1 min
/api/feeds/marketsCNBC, CoinGecko, Exchange Rates3 min
/api/feeds/alertsUSGS Earthquakes, Red Alert Israel1 min
GET/api/feeds/:category

Fetch a feed by category. Returns items sorted by most recent first.

Query parameters
limit
number
Max items to return, 1-100. Default: 20.
source
string
Filter by source ID. e.g. aljazeera, coingecko.
curl
curl https://api.crawlrocket.com/api/feeds/news?limit=5 \
  -H "Authorization: Bearer sk_pro_..."
Response
{
  "items": [
    {
      "id": "aje-breaking-4434655-0",
      "source": "aljazeera",
      "category": "breaking",
      "title": "Breaking headline from Al Jazeera",
      "summary": "Article excerpt...",
      "url": "https://www.aljazeera.com/news/...",
      "image": "https://www.aljazeera.com/wp-content/uploads/...",
      "author": "Reporter Name",
      "publishedAt": "2026-03-26T15:49:26Z",
      "tags": ["breaking", "middle-east"]
    }
  ],
  "sources": [
    { "id": "aljazeera", "name": "Al Jazeera", "count": 5, "cached": false }
  ],
  "fromCache": false,
  "fetchedAt": "2026-03-26T17:22:57.607Z"
}
GET/api/feeds/source/:id

Fetch from a single source by ID. Source IDs: aljazeera, cnn, sky-news, sky-news-arabia, khaleej-times, fox-news, goal-scores, cnbc, coingecko, exchange-rates, usgs, tzeva-adom.

curl
curl https://api.crawlrocket.com/api/feeds/source/coingecko?limit=5 \
  -H "Authorization: Bearer sk_pro_..."
GET/api/feeds

List all available feeds with sources and cache TTLs. Public — no API key needed. Use this to discover available feeds.

#Job Polling

GET/api/jobs/:id

All endpoints return a job ID. Poll this endpoint to get results. Jobs typically complete in 5-20 seconds.

Job statuses
queuedJob is waiting to be processed
runningJob is being processed
completedResults available in result field
failedError occurred — check error field

You can also list all your jobs with GET /api/jobs.

#Usage Stats

GET/api/usage

Returns your current plan, rate limits, and request counts.

Response
{
  "tier": "pro",
  "limits": { "rate_per_minute": 60, "monthly": 2000 },
  "usage": {
    "monthly": 142,
    "today": 23,
    "byEndpoint": [
      { "endpoint": "/api/lookup", "count": 89 },
      { "endpoint": "/api/search", "count": 41 },
      { "endpoint": "/api/scrape", "count": 12 }
    ]
  }
}

#Errors

Errors return a JSON body with an error field.

CodeMeaning
400Bad request — missing or invalid parameters
401Unauthorized — missing or invalid API key
404Not found — job ID doesn't exist or isn't yours
429Rate limit exceeded — slow down or upgrade
500Server error — try again or contact support

#Rate Limits

TierPer MinutePer MonthPrice
Free55$0
Pro602,000$29/mo
Enterprise20050,000$199/mo

When you exceed a limit, you'll get a 429 with a retry_after field in seconds.

#Caching

Results are cached for 1 hour. If you look up the same person or scrape the same URL within that window, you get the cached result instantly — no additional request counted against your quota.

Cached results include "_cached": true in the response so you can tell.

#SDKs & Libraries

CrawlRocket is a REST API — use it from any language. Here are quick examples:

JavaScript / Node.js
fetch
const res = await fetch("https://api.crawlrocket.com/api/lookup", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk_pro_...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Jane Smith",
    sources: ["linkedin", "github"],
  }),
});

const { job_id } = await res.json();

// Poll for result
const result = await fetch(
  `https://api.crawlrocket.com/api/jobs/${job_id}`,
  { headers: { "Authorization": "Bearer sk_pro_..." } }
).then(r => r.json());
Python
requests
import requests, time

headers = {
    "Authorization": "Bearer sk_pro_...",
    "Content-Type": "application/json",
}

# Submit
r = requests.post("https://api.crawlrocket.com/api/lookup",
    json={"name": "Jane Smith", "sources": ["linkedin", "github"]},
    headers=headers)

job_id = r.json()["job_id"]

# Poll
while True:
    r = requests.get(f"https://api.crawlrocket.com/api/jobs/{job_id}",
        headers=headers)
    data = r.json()
    if data["status"] in ("completed", "failed"):
        break
    time.sleep(3)

print(data["result"])