Blog
Tutorial·8 min·

Building an Autonomous Web Scraping Agent with Firecrawl

Step-by-step: build an agent that crawls websites, extracts structured data, and pays per page — all with x402 and Mithril.

Web scraping is one of the most common tasks for AI agents. This tutorial shows how to build a scraping agent that pays per-page via x402 — no Firecrawl API key required.

What We're Building

An autonomous agent that:

  • Takes a list of URLs or a search query
  • Scrapes each page using Firecrawl (paid per-page via x402)
  • Extracts structured data using an LLM
  • Returns clean, structured results
  • Setup

    import Mithril from "@mithril/sdk"
    
    const m = new Mithril({
      apiKey: process.env.MITHRIL_API_KEY,
    })

    Scraping a Single Page

    async function scrapePage(url: string) {
      const result = await m.pay({
        url: "https://api.firecrawl.dev/v1/scrape",
        method: "POST",
        body: JSON.stringify({
          url,
          formats: ["markdown", "html"],
          onlyMainContent: true,
        }),
      })
      return result.data
    }

    Cost: ~$0.01 per page. The x402 payment is handled automatically by the Mithril SDK.

    Scraping Multiple Pages

    async function scrapeMany(urls: string[]) {
      const results = await Promise.all(
        urls.map(url => scrapePage(url))
      )
      return results
    }

    For 100 pages, total cost: ~$1.00. No subscription needed.

    Extracting Structured Data

    Combine Firecrawl scraping with LLM extraction:

    async function extractProducts(url: string) {
      const page = await scrapePage(url)
    
      const extraction = await m.pay({
        url: "https://openrouter.ai/api/v1/chat/completions",
        method: "POST",
        body: JSON.stringify({
          model: "anthropic/claude-sonnet-4-6",
          messages: [{
            role: "user",
            content: `Extract all products from this page as JSON:
            ${page.markdown}`,
          }],
        }),
      })
    
      return JSON.parse(extraction.data.choices[0].message.content)
    }

    Crawling an Entire Site

    Firecrawl also supports site-wide crawling:

    async function crawlSite(url: string, maxPages: number = 50) {
      const result = await m.pay({
        url: "https://api.firecrawl.dev/v1/crawl",
        method: "POST",
        body: JSON.stringify({
          url,
          limit: maxPages,
          scrapeOptions: {
            formats: ["markdown"],
            onlyMainContent: true,
          },
        }),
      })
      return result.data
    }

    Use Cases

  • Price monitoring: Scrape competitor product pages daily
  • Lead generation: Extract company information from websites
  • Content aggregation: Collect and summarize industry news
  • Market research: Analyze competitor messaging and positioning
  • Data enrichment: Supplement CRM data with web-sourced information
  • Cost Control

    Set daily limits on the scraping agent's wallet:

  • Testing: $2/day (200 pages)
  • Production: $10/day (1,000 pages)
  • Heavy use: $50/day (5,000 pages)
  • Monitor via the Mithril dashboard to ensure costs align with expectations.