cryo¶

cryo is a rust library for bulk extraction of data from EVM nodes

cryo can be used:

  • as a rust library
  • as a CLI tool
  • as a python package

cryo can collect many datatypes:

  • blocks
  • txs
  • logs
  • traces
  • storage_diffs
  • nonce_diffs
  • balance_diffs
  • code_diffs
  • vm_trace

cryo provides many options for filtering and formatting data. For example:

  • can select which columns of data to collect or save
  • can store binary data natively or encode as hex
  • can choose which column(s) to sort by
  • can output to many standard formats

For a complete list of options, either run cryo -h in a terminal or see this file.

cryo python package¶

The cryo python package has 4 functions:

  • cryo.collect(): gather data and assemble into a python object
  • cryo.freeze(): gather data and save as files on disk
  • cryo.async_collect(): async version of collect()
  • cryo.async_freeze(): async version of freeze()

These functions are simple wrappers to the cryo rust library.

cryo python benchmarks¶

This is a comparison between cryo and a "native" python rpc client (ctc).

The native client is highly-optimized and uses msgspec under the hood.

Results are copied from the end of this notebook.


    fetch time: time it takes to fetch data from server     
dataframe time: time it takes to package data into dataframe
    total time: fetch time + dataframe time                 


                    │          │  native  │  native  │    cryo  │             
                    │       #  │   fetch  │   total  │   total  │       cryo  
              test  │  blocks  │  (secs)  │  (secs)  │  (secs)  │      speed  
────────────────────┼──────────┼──────────┼──────────┼──────────┼─────────────
        get_blocks  │  10,000  │    9.69  │   10.48  │    0.98  │  1,066.86%  
  get_transactions  │   1,000  │    6.50  │    7.86  │    0.62  │  1,266.92%  
          get_logs  │   1,000  │    2.22  │    4.85  │    1.16  │    416.93%  
        get_traces  │     300  │    9.28  │   12.04  │    0.74  │  1,616.32%  

cryo is anywhere from 4x to 16x faster than the native python client

In [1]:
import asyncio
import time

import ctc.rpc
import cryo
import polars as pl
import toolstr

Demo¶

Basic usage¶

In [2]:
datatype = 'balance_diffs'
start_block = 17_000_000
end_block = 17_001_000

df = cryo.collect(
    datatype,
    start_block=start_block,
    end_block=end_block,
)

df
Out[2]:
shape: (471_920, 6)
block_numbertransaction_indextransaction_hashaddressfrom_valueto_value
u32u32binarybinarystrstr
170000000[binary data][binary data]"140134547161""140290700632"
170000000[binary data][binary data]"0""0"
170000000[binary data][binary data]"73760561678587…"73736963772689…
170000000[binary data][binary data]"0""0"
170000000[binary data][binary data]"0""0"
170000001[binary data][binary data]"22686637522436…"28564712158454…
170000001[binary data][binary data]"20867217338993…"20870360818993…
170000001[binary data][binary data]"0""0"
170000001[binary data][binary data]"37187346606855…"37187348554378…
170000001[binary data][binary data]"0""0"
170000002[binary data][binary data]"20870360818993…"21044998157486…
170000002[binary data][binary data]"140290700632""140451972424"
………………
17000999133[binary data][binary data]"10738692192853…"10739087770618…
17000999133[binary data][binary data]"23105450702970…"13763222901384…
17000999133[binary data][binary data]"0""0"
17000999134[binary data][binary data]"38944504444300…"15178621259941…
17000999134[binary data][binary data]"10739087770618…"10739262193258…
17000999134[binary data][binary data]"22784851438257…"22769636834650…
17000999135[binary data][binary data]"71815157240477…"59012426359069…
17000999135[binary data][binary data]"10739262193258…"10739787097429…
17000999135[binary data][binary data]"0""0"
17000999136[binary data][binary data]"10739787097429…"10740169971735…
17000999136[binary data][binary data]"0""0"
17000999136[binary data][binary data]"13412033774471…"13411099920830…

Different output formats¶

In [3]:
# polars is default

blocks = cryo.collect(
    'blocks',
    start_block=17_000_000,
    end_block=17_010_000,
)

blocks
Out[3]:
shape: (10_000, 7)
hashauthornumbergas_usedextra_datatimestampbase_fee_per_gas
binarybinaryu32u32binaryu32u64
[binary data][binary data]170000009160778[binary data]168091189120582738913
[binary data][binary data]170000019389175[binary data]168091190319581179064
[binary data][binary data]1700000229993802[binary data]168091191518665624323
[binary data][binary data]1700000311343154[binary data]168091192720997863283
[binary data][binary data]170000047688030[binary data]168091193920357980347
[binary data][binary data]170000057191844[binary data]168091195119117505835
[binary data][binary data]1700000629983575[binary data]168091196317873568603
[binary data][binary data]1700000729988259[binary data]168091197520105318233
[binary data][binary data]1700000822330289[binary data]168091198722616515874
[binary data][binary data]1700000911057578[binary data]168091199923998062520
[binary data][binary data]1700001014843889[binary data]168091201123209641774
[binary data][binary data]1700001112450500[binary data]168091202323179447771
…………………
[binary data][binary data]1700998812428779[binary data]168103405121987087603
[binary data][binary data]1700998910985717[binary data]168103406321515973759
[binary data][binary data]1700999010205129[binary data]168103407520796213695
[binary data][binary data]170099916369046[binary data]168103408719965254013
[binary data][binary data]1700999229994683[binary data]168103409918529260772
[binary data][binary data]1700999315733305[binary data]168103411120844597367
[binary data][binary data]170099949897435[binary data]168103412320971976095
[binary data][binary data]170099951770796[binary data]168103413520080218835
[binary data][binary data]1700999629990959[binary data]168103414717866507908
[binary data][binary data]1700999729994044[binary data]168103417120098475304
[binary data][binary data]1700999821943476[binary data]168103419522609787162
[binary data][binary data]170099999631723[binary data]168103420723918041449
In [4]:
# pandas

blocks_pandas = cryo.collect(
    'blocks',
    start_block=17_000_000,
    end_block=17_010_000,
    output_format='pandas',
)

blocks_pandas
Out[4]:
hash author number gas_used extra_data timestamp base_fee_per_gas
0 b'\x96\xcf\xa0\xfb^P\xb0\xa3\xf6\xccv\xf3)\x9c... b'i\x0b\x9a\x9e\x9a\xa1\xc9\xdb\x99\x1cw!\xa9-... 17000000 9160778 b'by @builder0x69' 1680911891 20582738913
1 b'\x81\xa2\x8cv\xb0\xe3\xc8g \xe8\x99\xe3\x88\... b'\xba\xf6\xdc.dz\xeboQ\x0f\x9e1\x88V\xa1\xbc\... 17000001 9389175 b'Made on the moon by Blocknative' 1680911903 19581179064
2 b'\xc8EvA\xf7\xbet\x8cv\xe6\xa0\x8ax{~\xdaL_u@... b'H8\xb1\x06\xfc\xe9d{\xdf\x1exw\xbfs\xce\x8b\... 17000002 29993802 b'Titan' 1680911915 18665624323
3 b'\x90C\x83}\x9dv\r\x8d\xe1\xc2\xab\x13\xd4\xa... b'`+,r\xa8\x9a\x1c\x80\xdb\xa2\x19\x8c\x9d\xc4... 17000003 11343154 b'by builder0x69' 1680911927 20997863283
4 b'\x1e\xa4 \xde\t[\xe1\x06\xfa\xc4\xd7~\xa6!\\... b'\x1f\x90\x90\xaa\xe2\x8b\x8a=\xce\xad\xf2\x8... 17000004 7688030 b'rsync-builder.xyz' 1680911939 20357980347
... ... ... ... ... ... ... ...
9995 b')\xe5\x11%\x04z,~\x15SR\xa1q\ti\xf3O\x88\xfe... b'\xd0\x11\x1c\xf5\xbf#\x082\xf4"\xda\x1cl\x1d... 17009995 1770796 b'' 1681034135 20080218835
9996 b'61\xc1\xd9\xbf\xaa\xb0\x80M\x98\xb0:=\x9c\r7... b'\x1f\x90\x90\xaa\xe2\x8b\x8a=\xce\xad\xf2\x8... 17009996 29990959 b'rsync-builder.xyz' 1681034147 17866507908
9997 b'\x8f\xf4Q\x8f\x89\x07{NK\xaa]\xb1c\xa2{\t\x8... b'\x1f\x90\x90\xaa\xe2\x8b\x8a=\xce\xad\xf2\x8... 17009997 29994044 b'rsync-builder.xyz' 1681034171 20098475304
9998 b'o>S\xaau\xee_D\xfd\xff3\xc8_\xe1\x03\x80qRjf... b'8\x8c\x81\x8c\xa8\xb9%\x1b911\xc0\x8asjg\xcc... 17009998 21943476 b'\xd8\x83\x01\x0b\x05\x84geth\x88go1.20.2\x85... 1681034195 22609787162
9999 b'C\x89\xed\x1aL\xea\x92\xf4\x8djN\xbf\xa3q~\x... b'\x95""\x90\xddrx\xaa=\xdd8\x9c\xc1\xe1\xd1e\... 17009999 9631723 b'beaverbuild.org' 1681034207 23918041449

10000 rows × 7 columns

In [5]:
# list of dicts

blocks_dict = cryo.collect(
    'blocks',
    start_block=17_000_000,
    end_block=17_000_003,
    output_format='list',
)

blocks_dict
Out[5]:
[{'hash': b'\x96\xcf\xa0\xfb^P\xb0\xa3\xf6\xccv\xf3)\x9c\xfb\xf4\x8f\x17\xe8\xb4\x17\x98\xd19Dt\xe6~\xc8\xa9~\x9f',
  'author': b'i\x0b\x9a\x9e\x9a\xa1\xc9\xdb\x99\x1cw!\xa9-5\x1d\xb4\xfa\xc9\x90',
  'number': 17000000,
  'gas_used': 9160778,
  'extra_data': b'by @builder0x69',
  'timestamp': 1680911891,
  'base_fee_per_gas': 20582738913},
 {'hash': b'\x81\xa2\x8cv\xb0\xe3\xc8g \xe8\x99\xe3\x88\xf1,!\xd3\xc1\xd0\x97b\xbe\xba\x02\xe2w\xb5\xb4\x12\xcd\x8f\xef',
  'author': b'\xba\xf6\xdc.dz\xeboQ\x0f\x9e1\x88V\xa1\xbc\xd6l^\x19',
  'number': 17000001,
  'gas_used': 9389175,
  'extra_data': b'Made on the moon by Blocknative',
  'timestamp': 1680911903,
  'base_fee_per_gas': 19581179064},
 {'hash': b'\xc8EvA\xf7\xbet\x8cv\xe6\xa0\x8ax{~\xdaL_u@\xcc\xc59\xe7\x8b(Y\x88\xfd\xf5\xcb\xca',
  'author': b'H8\xb1\x06\xfc\xe9d{\xdf\x1exw\xbfs\xce\x8b\x0b\xad_\x97',
  'number': 17000002,
  'gas_used': 29993802,
  'extra_data': b'Titan',
  'timestamp': 1680911915,
  'base_fee_per_gas': 18665624323}]
In [6]:
# dict of lists

blocks_dict = cryo.collect(
    'blocks',
    start_block=17_000_000,
    end_block=17_000_003,
    output_format='dict',
)

blocks_dict
Out[6]:
{'hash': [b'\x96\xcf\xa0\xfb^P\xb0\xa3\xf6\xccv\xf3)\x9c\xfb\xf4\x8f\x17\xe8\xb4\x17\x98\xd19Dt\xe6~\xc8\xa9~\x9f',
  b'\x81\xa2\x8cv\xb0\xe3\xc8g \xe8\x99\xe3\x88\xf1,!\xd3\xc1\xd0\x97b\xbe\xba\x02\xe2w\xb5\xb4\x12\xcd\x8f\xef',
  b'\xc8EvA\xf7\xbet\x8cv\xe6\xa0\x8ax{~\xdaL_u@\xcc\xc59\xe7\x8b(Y\x88\xfd\xf5\xcb\xca'],
 'author': [b'i\x0b\x9a\x9e\x9a\xa1\xc9\xdb\x99\x1cw!\xa9-5\x1d\xb4\xfa\xc9\x90',
  b'\xba\xf6\xdc.dz\xeboQ\x0f\x9e1\x88V\xa1\xbc\xd6l^\x19',
  b'H8\xb1\x06\xfc\xe9d{\xdf\x1exw\xbfs\xce\x8b\x0b\xad_\x97'],
 'number': [17000000, 17000001, 17000002],
 'gas_used': [9160778, 9389175, 29993802],
 'extra_data': [b'by @builder0x69',
  b'Made on the moon by Blocknative',
  b'Titan'],
 'timestamp': [1680911891, 1680911903, 1680911915],
 'base_fee_per_gas': [20582738913, 19581179064, 18665624323]}
In [7]:
# parquet files

result = cryo.freeze(
    'blocks',
    start_block=17_000_000,
    end_block=17_010_000,
    verbose=False,
)

result['paths']['blocks']
Out[7]:
['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.parquet',
 '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.parquet']
In [8]:
# csv files

result = cryo.freeze(
    "blocks",
    start_block=17_000_000,
    end_block=17_010_000,
    csv=True,
    verbose=False,
)

result['paths']['blocks']
Out[8]:
['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.csv',
 '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.csv']
In [9]:
# json files

result = cryo.freeze(
    "blocks",
    start_block=17_000_000,
    end_block=17_010_000,
    json=True,
    verbose=False,
)

result['paths']['blocks']
Out[9]:
['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.json',
 '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.json']

Run as async¶

In [10]:
blocks = await cryo.async_collect('blocks', start_block=17_000_000, end_block=17_000_003)

blocks
Out[10]:
shape: (3, 7)
hashauthornumbergas_usedextra_datatimestampbase_fee_per_gas
binarybinaryu32u32binaryu32u64
[binary data][binary data]170000009160778[binary data]168091189120582738913
[binary data][binary data]170000019389175[binary data]168091190319581179064
[binary data][binary data]1700000229993802[binary data]168091191518665624323
In [11]:
result = await cryo.async_freeze(
    "blocks",
    start_block=17_000_000,
    end_block=17_010_000,
    verbose=False,
)

result['paths']
Out[11]:
{'blocks': ['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.parquet',
  '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.parquet']}

Benchmarks¶

Below is a comparison of cryo to the highly-optimized rpc client of ctc

In [12]:
start_block = 17_000_000
In [13]:
benchmarks = {
    'get_blocks': {},
    'get_transactions': {},
    'get_logs': {},
    'get_traces': {},
}
n_blocks = {}

blocks¶

In [14]:
n_blocks['get_blocks'] = 10_000
In [15]:
t_start = time.time()
coroutines = [
    ctc.rpc.async_eth_get_block_by_number(number)
    for number in range(start_block, start_block + n_blocks['get_blocks'])
]
blocks = await asyncio.gather(*coroutines)
benchmarks['get_blocks']['native_fetch'] = time.time() - t_start
pl.DataFrame(blocks)
benchmarks['get_blocks']['native_total'] = time.time() - t_start
In [16]:
t_start = time.time()
df = await cryo.async_collect(
    datatype='blocks',
    start_block=start_block,
    end_block=start_block + n_blocks['get_blocks'],
)
benchmarks['get_blocks']['cryo_total'] = time.time() - t_start

transactions¶

In [17]:
n_blocks['get_transactions'] = 1000
In [18]:
t_start = time.time()
coroutines = [
    ctc.rpc.async_eth_get_block_by_number(number, include_full_transactions=True)
    for number in range(start_block, start_block + n_blocks["get_transactions"])
]
results = await asyncio.gather(*coroutines)
txs = [tx for result in results for tx in result["transactions"]]
benchmarks["get_transactions"]["native_fetch"] = time.time() - t_start
pl.DataFrame(
    txs,
    infer_schema_length=1000,
)
benchmarks["get_transactions"]["native_total"] = time.time() - t_start
In [19]:
len(txs)
Out[19]:
126359
In [20]:
t_start = time.time()
df = await cryo.async_collect(
    'transactions',
    start_block=start_block,
    end_block=start_block + n_blocks['get_transactions'],
    chunk_size=10000,
)
benchmarks['get_transactions']['cryo_total'] = time.time() - t_start

logs¶

In [21]:
n_blocks['get_logs'] = 1000
In [22]:
t_start = time.time()
coroutines = [
    ctc.rpc.async_eth_get_logs(start_block=number, end_block=number)
    for number in range(start_block, start_block + n_blocks['get_logs'])
]
results = await asyncio.gather(*coroutines)
logs = [log for block_logs in results for log in block_logs]
benchmarks['get_logs']['native_fetch'] = time.time() - t_start
pl.DataFrame(logs)
benchmarks['get_logs']['native_total'] = time.time() - t_start

len(logs)
Out[22]:
296866
In [23]:
t_start = time.time()
df = await cryo.async_collect(
    datatype='logs',
    blocks=[str(start_block) + ':' + str(start_block + n_blocks['get_logs'])],
    chunk_size=10000,
)
benchmarks['get_logs']['cryo_total'] = time.time() - t_start

traces¶

In [24]:
n_blocks['get_traces'] = 300
In [25]:
t_start = time.time()
coroutines = [
    ctc.rpc.async_trace_block(number)
    for number in range(start_block, start_block + n_blocks['get_traces'])
]
block_traces = await asyncio.gather(*coroutines)
benchmarks['get_traces']['native_fetch'] = time.time() - t_start
pl.DataFrame([trace for block_trace in block_traces for trace in block_trace])
benchmarks['get_traces']['native_total'] = time.time() - t_start
In [26]:
t_start = time.time()
df = await cryo.async_collect(
    'traces',
    blocks=[str(start_block) + ':' + str(start_block + n_blocks['get_traces'])],
)
benchmarks['get_traces']['cryo_total'] = time.time() - t_start

Benchmark Summary¶

In [27]:
df = pl.DataFrame(list(benchmarks.values()))
df = df.insert_at_idx(0, pl.Series('#_blocks', n_blocks.values()))
df = df.insert_at_idx(0, pl.Series('test', benchmarks.keys()))
df = df.with_columns(
    (pl.col('native_total') / pl.col('cryo_total')).alias('cryo_speed')
)
df = df.rename({column: column.replace('_', '\n') for column in df.columns})
df = df.rename(
    {
        column: column + '\n(secs)'
        for column in df.columns
        if column.endswith('total') or column.endswith('fetch')
    }
)

column_formats = {column: {'decimals': 2} for column in df.columns}
column_formats['#\nblocks']['decimals'] = 0
column_formats['cryo\nspeed']['percentage'] = True

toolstr.print_text_box('cryo benchmarks')
print('this is a comparison between cryo and a "native" python rpc client')
print('the native client is highly-optimized and uses msgspec under the hood')
rows = [
    ('fetch time:', 'time it takes to fetch data from server'),
    ('dataframe time:', 'time it takes to package data into dataframe'),
    ('total time:', 'fetch time + dataframe time'),
]
print()
toolstr.print_table(rows, column_justify=['right', 'left'], compact=True)
print()
toolstr.print_dataframe_as_table(df, column_formats=column_formats)
┌─────────────────┐
│ cryo benchmarks │
└─────────────────┘
this is a comparison between cryo and a "native" python rpc client
the native client is highly-optimized and uses msgspec under the hood

    fetch time: time it takes to fetch data from server     
dataframe time: time it takes to package data into dataframe
    total time: fetch time + dataframe time                 

                    │          │  native  │  native  │    cryo  │             
                    │       #  │   fetch  │   total  │   total  │       cryo  
              test  │  blocks  │  (secs)  │  (secs)  │  (secs)  │      speed  
────────────────────┼──────────┼──────────┼──────────┼──────────┼─────────────
        get_blocks  │  10,000  │    9.69  │   10.48  │    0.98  │  1,066.86%  
  get_transactions  │   1,000  │    6.50  │    7.86  │    0.62  │  1,266.92%  
          get_logs  │   1,000  │    2.22  │    4.85  │    1.16  │    416.93%  
        get_traces  │     300  │    9.28  │   12.04  │    0.74  │  1,616.32%