Skip to main content
Version: Old ArrowSquid docs

Substrate Processor

This section applies to squid processors indexing Substrate-based chains, including:

  • Polkadot
  • Kusama
  • Acala

See Supported networks for a full list.

If you are building on one of the networks implementing EVM on Substrate, such as

  • Astar
  • Moonbeam
  • Moonriver

and only require EVM data, consider using EVM processor.

Overview and the data model

A squid processor is a Node.js process that fetches historical on-chain data from an Archive and/or a chain node RPC endpoint, performs arbitrary transformations and saves the result. SubstrateBatchProcessor is the central class that handles Substrate data extraction, transformation and persistence. By convention, the processor entry point is src/main.ts; it is started by calling SubstrateBatchProcessor.run() there. A single batch handler function supplied to that method is responsible for transforming data from multiple blocks in a single in-memory batch.

A batch provides iterables to access all items requested in processor configuration, which may include

  • Events, corresponding to matching Substrate runtime events.
  • Calls, corresponding to matching calls executed by the Substrate runtime.

See the batch context and block data pages for details.

Additional support is available for log items produced by the Frontier EVM pallet (see EVM support), the Contracts pallet (see ink! support) and the Gear Messages pallet. Further, processor can extract additional data by querying the historical runtime state and indeed any external API.

Results of the ETL process can be stored in any Postgres-compatible database or in filesystem-based datasets in CSV and Parquet formats.

RPC ingestion

Starting with the ArrowSquid release, the processor can ingest data either from an Archive or directly from an RPC endpoint. If both an Archive and an RPC endpoint are provided, the processor will use the Archive until it reaches the highest block available there, then index the remaining blocks using the RPC endpoint. This allows squids to combine low sync times with near real-time chain data access. It is, however, possible to use just the RPC endpoint.

RPC ingestion can create a heavy load on node endpoints. With Archives the load is typically short and the total number of requests is low, but their frequency may be sufficient to trigger http 429 responses. Use private endpoints and rate limit your requests with the rateLimit chain source option.

What's next?