Version: Current

Development flow

This page is a definitive end-to-end guide into practical squid development. It uses templates and sqd scripts to simplify the process. Check out Squid from scratch for a more educational barebones approach.

Prepare the environment

Node v16.x or newer
Git
Squid CLI
Docker (if your squid will store its data to PostgreSQL)

Understand your technical requirements

Consider your business requirements and find out

How the data should be delivered. Options:
- PostgreSQL with an optional GraphQL API - can be real-time
- file-based dataset - local or on S3
- Google BigQuery
What data should be delivered
What are the technologies powering the blockchain(s) in question. Supported options:
- Ethereum Virtual Machine (EVM) chains like Ethereum - supported networks
- Substrate-powered chains like Polkadot and Kusama - supported networks
Note that you can use Subsquid via RPC ingestion even if your network is not listed.
What exact data should be retrieved from blockchain(s)
Whether you need to mix in any off-chain data

Example requirements

DEX analytics on Polygon

Suppose you want to train a prototype ML model on all trades done on Uniswap Polygon since the v3 upgrade.

A delay of a few hours typically won't matter for training, so you may want to deliver the data as files for easier handling.
The output could be a simple list of swaps, listing pair, direction and token amounts for each.
Polygon is an EVM chain.
All the required data is contained within Swap events emitted by the pair pool contracts. Uniswap deploys these dynamically, so you will also have to capture PoolCreated events from the factory contract to know which Swap events are coming from Uniswap and map them to pairs.
No off-chain data will be necessary for this task.

NFT ownership on Ethereum

Suppose you want to make a website that shows the image and ownership history for ERC721 NFTs from a certain Polygon contract.

For this application it makes sense to deliver a GraphQL API.
Output data might have Token, Owner and Transfer entities, with e.g. Token supplying all the fields necessary to show ownership history and the image.
Ethereum is an EVM chain.
Data on token mints and ownership history can be derived from Transfer(address,address,uint256) EVM event logs emitted by the contract. To render images, you will also need token metadata URLs that are only available by querying the contract state with the tokenURI(uint256) function.
You'll need to retrieve the off-chain token metadata (usually from IPFS).

Kusama transfers BigQuery dataset

Suppose you want to create a BigQuery dataset with Kusama native tokens transfers.

The delivery format is BigQuery.
A single table with from, to and amount columns may suffice.
Kusama is a Substrate chain.
The required data is available from Transfer events emitted by the Balances pallet. Take a look at our Substrate data sourcing miniguide for more info on how to figure out which pallets, events and calls are necessary for your task.
No off-chain data will be necessary for this task.

Start from a template

Although it is possible to compose a squid from individual packages, in practice it is usually easier to start from a template.

EVM
Substrate

Templates for the PostgreSQL+GraphQL data destination

A minimal template intended for developing EVM squids. Indexes ETH burns.
```
sqd init my-squid-name -t evm
```

A starter squid for indexing ERC20 transfers.

sqd init my-squid-name -t https://github.com/subsquid-labs/squid-erc20-template

Classic example Subgraph after a migration to Subsquid.
```
sqd init my-squid-name -t gravatar
```
A template showing how to combine data from multiple chains. Indexes USDC transfers on Ethereum and Binance.
```
sqd init my-squid-name -t multichain
```

Templates for storing data in files

USDC transfers -> local CSV

sqd init my-squid-name -t https://github.com/subsquid-labs/file-store-csv-example

USDC transfers -> local Parquet

sqd init my-squid-name -t https://github.com/subsquid-labs/file-store-parquet-example

USDC transfers -> CSV on S3

sqd init my-squid-name -t https://github.com/subsquid-labs/file-store-s3-example

Templates for the Google BigQuery data destination

USDC transfers -> BigQuery dataset

sqd init my-squid-name -t https://github.com/subsquid-labs/squid-bigquery-example

Templates for the PostgreSQL+GraphQL data destination

Native events emitted by Substrate-based chains
```
sqd init my-squid-name -t substrate
```
ink! smart contracts
```
sqd init my-squid-name -t ink
```
Frontier EVM contracts on Astar and Moonbeam
```
sqd init my-squid-name -t frontier-evm
```

After retrieving the template of choice install its dependencies:

cd my-squid-name
npm ci

Test the template locally. The procedure varies depending on the data sink:

PostgreSQL+GraphQL
filesystem dataset
BigQuery

Launch a PostgreSQL container with sqd up

Start the squid processor with sqd process. You should see output that contains lines like these ones:

11:24 INFO  sqd:processor processing blocks from 6000000
11:24 INFO  sqd:processor using archive data source
11:24 INFO  sqd:processor prometheus metrics are served at port 45829
11:27 INFO  sqd:processor 6051219 / 18079056, rate: 16781 blocks/sec, mapping: 770 blocks/sec, 544 items/sec, eta: 12m

Start the GraphQL server by running sqd serve in a separate terminal, then visit the GraphiQL console to verify that the GraphQL API is up.

When done, shut down and erase your database with sqd down.

(for the S3 template only) Set the credentials and prepare a bucket for your data as described in the template README.

Start the squid processor with sqd process. You should see output that contains lines like these ones:

11:24 INFO  sqd:processor processing blocks from 6000000
11:24 INFO  sqd:processor using archive data source
11:24 INFO  sqd:processor prometheus metrics are served at port 45829
11:27 INFO  sqd:processor 6051219 / 18079056, rate: 16781 blocks/sec, mapping: 770 blocks/sec, 544 items/sec, eta: 12m

You should see a ./data folder populated with indexer data appear in a bit. A local folder looks like this:

$ tree ./data/
./data/
├── 0000000000-0007242369
│   └── transfers.tsv
├── 0007242370-0007638609
│   └── transfers.tsv
...
└── status.txt

info

To make local runs more convenient squid templates define additional sqd commands at commands.json. All of sqd commands used here are such extras. Take a look at the contents of this file to learn more about how your template works under the hood.

The bottom-up development cycle

The advantage of this approach is that the code remains buildable at all times, making it easier to catch issues early.

I. Regenerate the task-specific utilities

EVM
Substrate

Retrieve JSON ABIs for all contracts of interest (e.g. from Etherscan), taking care to get implementation ABIs for proxies where appropriate. Assuming that you saved the ABI files to ./abi, you can then regenerate the utilities with

sqd typegen

Or if you would like the tool to retrieve the ABI from Etherscan in your stead, you can run e.g.

npx squid-evm-typegen \
  src/abi \
  0xdAC17F958D2ee523a2206206994597C13D831ec7#usdt

The utility classes will become available at src/abi.

II. Configure the data requests

Data requests are customarily defined at src/processor.ts. The details depend on the network type:

EVM
Substrate

Edit the definition of const processor to

Use a data source appropriate for your chain and task.
- It is possible to use RPC as the only data source, but adding a Subsquid Network data source will make your squid sync much faster.
- RPC is a hard requirement if you're building a real-time API.
- If you're using RPC as one of your data sources, make sure to set the number of finality confirmations so that hot blocks ingestion works properly.
Request all event logs, transactions, execution traces and state diffs that your task requires, with any necessary related data (e.g. parent transactions for event logs).
Select all data fields necessary for your task (e.g. gasUsed for transactions).

See reference documentation for more info and processor configuration showcase for a representative set of examples.

Edit the definition of const processor to

Use a data source appropriate for your chain and task
- Use a Subsquid Network dataset whenever it is available. RPC is still required in this case.
- For networks without a dataset use just the RPC.
Request all events and calls that your task requires, with any necessary related data (e.g. parent extrinsics).
If your squid indexes any of the following:
- an ink! contract
- an EVM contract running on the Frontier EVM pallet
- Gear messages
then you can use some of the specialized data requesting methods to retrieve data more selectively.
Select all data fields necessary for your task (e.g. fee for extrinsics).

See reference documentation for more info. Processor config examples can be found in the tutorials:

III. Decode and normalize the data

Next, change the batch handler to decode and normalize your data.

In templates, the batch handler is defined at the processor.run() call in src/main.ts as an inline function. Its sole argument ctx contains:

at ctx.blocks: all the requested data for a batch of blocks
at ctx.store: the means to save the processed data
at ctx.log: a Logger
at ctx.isHead: a boolean indicating whether the batch is at the current chain head
at ctx._chain: the means to access RPC for state calls

This structure (reference) is common for all processors; the structure of ctx.blocks items varies.

EVM
Substrate

Each item in ctx.blocks contains the data for the requested logs, transactions, traces and state diffs for a particular block, plus some info on the block itself. See EVM batch context reference.

Use the .decode methods from the contract ABI utilities to decode events and transactions, e.g.

import * as erc20abi from './abi/erc20'

processor.run(db, async ctx => {
  for (let block of ctx.blocks) {
    for (let log of block.logs) {
      if (log.topics[0]===erc20abi.events.Transfer.topic) {
        let {from, to, value} = erc20.events.Transfer.decode(log)
      }
    }
  }
})

(Optional) IV. Mix in external data and chain state calls output

If you need external (i.e. non-blockchain) data in your transformation, take a look at the External APIs and IPFS page.

If any of the on-chain data you need is unavalable from the processor or incovenient to retrieve with it, you have an option to get it via direct chain queries.

V. Prepare the store

At src/main.ts, change the Database object definition to accept your output data. The methods for saving data will be exposed by ctx.store within the batch handler.

PostgreSQL+GraphQL
filesystem dataset
BigQuery

Define the schema of the database (and the core schema of the GraphQL API if it is used) at schema.graphql.
Regenerate the TypeORM model classes with
```
sqd codegen
```
The classes will become available at src/model.
Compile the models code with
```
sqd build
```
Ensure that the squid has access to a blank database. The easiest way to do so is to start PostgreSQL in a Docker container with
```
sqd up
```
If the container is running, stop it and erase the database with
```
sqd down
```
before issuing an sqd up.

The alternative is to connect to an external database. See this section to learn how to specify the connection parameters.
Generate a migration with
```
sqd migration:generate
```
The migration will be automatically applied when you start the processor with sqd process.

You can now use the async functions ctx.store.upsert() and ctx.store.insert(), as well as various TypeORM lookup methods to access the database.

See the typeorm-store guide and reference for more info.

Filesystem dataset writing, as performed by the @subsquid/file-store package and its extensions, stores the data into one or more flat tables. The exact table definition format depends on the output file format.

Decide on the file format you're going to use:
- Parquet
- CSV
- JSON/JSONL
If your template does not have any of the necessary packages, install them.

Define any tables you need at the tables field of the Database constructor argument:

import { Database } from '@subsquid/file-store'

const dbOptions = {
  tables: {
    FirstTable: new Table(/* ... */),
    SecondTable: new Table(/* ... */),
    // ...
  },
  // ...
}

processor.run(new Database(dbOptions), async ctx => { // ...

Define the destination filesystem via the dest field of the Database constructor argument. Options:
- local folder - use LocalDest from @subsquid/file-store
- S3-compatible file storage service - install @subsquid/file-store-s3 and use S3Dest

Once you're done you'll be able to enqueue data rows for saving using the write() and writeMany() methods of the context store-provided table objects:

ctx.store.FirstTable.writeMany(/* ... */)
ctx.store.SecondTable.write(/* ... */)

The store will write the files automatically as soon as the buffer reaches the size set by the chunkSizeMb field of the Database constructor argument, or at the end of the batch if a call to setForceFlush() was made anywhere in the batch handler.

See the file-store guide and the reference pages of its extensions.

VI. Persist the transformed data to your data sink

Once your data is decoded, optionally enriched with external data and transformed the way you need it to be, it is time to save it.

PostgreSQL+GraphQL
filesystem dataset
BigQuery

For each batch, create all the instances of all TypeORM model classes at once, then save them with the minimal number of calls to upsert() or insert(), e.g.:

import { EntityA, EntityB } from './model'

processor.run(new TypeormDatabase(), async ctx => {
  const aEntities: Map<string, EntityA> = new Map() // id -> entity instance
  const bEntities: EntityB = []

  for (let block of ctx.blocks) {
    // fill the containets aEntities and bEntities
  }

  await ctx.store.upsert([...aEntities.values()])
  await ctx.store.insert(bEntities)
})

It will often make sense to keep the entity instances in maps rather than arrays to make it easier to reuse them when defining instances of other entities with relations to the previous ones. The process is described in more detail in the step 2 of the BAYC tutorial.

If you perform any database lookups, try to do so in batches and make sure that the entity fields that you're searching over are indexed.

See also the patterns and anti-pattens sections of the Batch processing guide.

You can enqueue the transformed data for writing whenever convenient without any sizeable impact on performance.

At low output data rates (e.g. if your entire dataset is in tens of Mbytes or under) take care to call ctx.store.setForceFlush() when appropriate to make sure your data actually gets written.

The top-down development cycle

The bottom-up development cycle described above is convenient for inital squid development and for trying out new things, but it has the disadvantage of not having the means of saving the data ready at hand when initially writing the data decoding/transformation code. That makes it necessary to come back to that code later, which is somewhat inconvenient e.g. when adding new squid features incrementally.

The alternative is to do the same steps in a different order:

Update the store
If necessary, regenerate the utility classes
Update the processor configuration
Decode and normalize the added data
Retrieve any external data if necessary
Add the persistence code for the transformed data

Scaling up

If you're developing a large squid, make sure to use batch processing throughout your code.

A common mistake is to make handlers for individual event logs or transactions; for updates that require data retrieval that results in lots of small database lookups and ultimately in poor syncing performance. Collect all the relevant data and process it at once. A simple architecture of that type is discussed in the BAYC tutorial.

You should also check the Cloud best practices page even if you're not planning to deploy to Subsquid Cloud - it contains valuable performance-related tips.

Many issues commonly arising when developing larger squids are addressed by the third party @belopash/typeorm-store package. Consider using it.

For complete examples of complex squids take a look at the Giant Squid Explorer and Thena Squid repos.

Next steps

Deploy your squid on own infrastructure or to Subsquid Cloud
If your squid serves a GraphQL API, consult the Core GraphQL API reference while writing your frontend

Prepare the environment​

Understand your technical requirements​

Example requirements​

Start from a template​

The bottom-up development cycle​

I. Regenerate the task-specific utilities​

II. Configure the data requests​

III. Decode and normalize the data​

(Optional) IV. Mix in external data and chain state calls output​

V. Prepare the store​

VI. Persist the transformed data to your data sink​

The top-down development cycle​

Scaling up​

Next steps​