cryo is a rust library for bulk extraction of data from EVM nodes
cryo can be used:
cryo can collect many datatypes:
cryo provides many options for filtering and formatting data. For example:
For a complete list of options, either run cryo -h in a terminal or see this file.
The cryo python package has 4 functions:
cryo.collect(): gather data and assemble into a python objectcryo.freeze(): gather data and save as files on diskcryo.async_collect(): async version of collect()cryo.async_freeze(): async version of freeze()These functions are simple wrappers to the cryo rust library.
This is a comparison between cryo and a "native" python rpc client (ctc).
The native client is highly-optimized and uses msgspec under the hood.
Results are copied from the end of this notebook.
fetch time: time it takes to fetch data from server
dataframe time: time it takes to package data into dataframe
total time: fetch time + dataframe time
│ │ native │ native │ cryo │
│ # │ fetch │ total │ total │ cryo
test │ blocks │ (secs) │ (secs) │ (secs) │ speed
────────────────────┼──────────┼──────────┼──────────┼──────────┼─────────────
get_blocks │ 10,000 │ 9.69 │ 10.48 │ 0.98 │ 1,066.86%
get_transactions │ 1,000 │ 6.50 │ 7.86 │ 0.62 │ 1,266.92%
get_logs │ 1,000 │ 2.22 │ 4.85 │ 1.16 │ 416.93%
get_traces │ 300 │ 9.28 │ 12.04 │ 0.74 │ 1,616.32%
cryo is anywhere from 4x to 16x faster than the native python client
import asyncio
import time
import ctc.rpc
import cryo
import polars as pl
import toolstr
datatype = 'balance_diffs'
start_block = 17_000_000
end_block = 17_001_000
df = cryo.collect(
datatype,
start_block=start_block,
end_block=end_block,
)
df
| block_number | transaction_index | transaction_hash | address | from_value | to_value |
|---|---|---|---|---|---|
| u32 | u32 | binary | binary | str | str |
| 17000000 | 0 | [binary data] | [binary data] | "140134547161" | "140290700632" |
| 17000000 | 0 | [binary data] | [binary data] | "0" | "0" |
| 17000000 | 0 | [binary data] | [binary data] | "73760561678587… | "73736963772689… |
| 17000000 | 0 | [binary data] | [binary data] | "0" | "0" |
| 17000000 | 0 | [binary data] | [binary data] | "0" | "0" |
| 17000000 | 1 | [binary data] | [binary data] | "22686637522436… | "28564712158454… |
| 17000000 | 1 | [binary data] | [binary data] | "20867217338993… | "20870360818993… |
| 17000000 | 1 | [binary data] | [binary data] | "0" | "0" |
| 17000000 | 1 | [binary data] | [binary data] | "37187346606855… | "37187348554378… |
| 17000000 | 1 | [binary data] | [binary data] | "0" | "0" |
| 17000000 | 2 | [binary data] | [binary data] | "20870360818993… | "21044998157486… |
| 17000000 | 2 | [binary data] | [binary data] | "140290700632" | "140451972424" |
| … | … | … | … | … | … |
| 17000999 | 133 | [binary data] | [binary data] | "10738692192853… | "10739087770618… |
| 17000999 | 133 | [binary data] | [binary data] | "23105450702970… | "13763222901384… |
| 17000999 | 133 | [binary data] | [binary data] | "0" | "0" |
| 17000999 | 134 | [binary data] | [binary data] | "38944504444300… | "15178621259941… |
| 17000999 | 134 | [binary data] | [binary data] | "10739087770618… | "10739262193258… |
| 17000999 | 134 | [binary data] | [binary data] | "22784851438257… | "22769636834650… |
| 17000999 | 135 | [binary data] | [binary data] | "71815157240477… | "59012426359069… |
| 17000999 | 135 | [binary data] | [binary data] | "10739262193258… | "10739787097429… |
| 17000999 | 135 | [binary data] | [binary data] | "0" | "0" |
| 17000999 | 136 | [binary data] | [binary data] | "10739787097429… | "10740169971735… |
| 17000999 | 136 | [binary data] | [binary data] | "0" | "0" |
| 17000999 | 136 | [binary data] | [binary data] | "13412033774471… | "13411099920830… |
# polars is default
blocks = cryo.collect(
'blocks',
start_block=17_000_000,
end_block=17_010_000,
)
blocks
| hash | author | number | gas_used | extra_data | timestamp | base_fee_per_gas |
|---|---|---|---|---|---|---|
| binary | binary | u32 | u32 | binary | u32 | u64 |
| [binary data] | [binary data] | 17000000 | 9160778 | [binary data] | 1680911891 | 20582738913 |
| [binary data] | [binary data] | 17000001 | 9389175 | [binary data] | 1680911903 | 19581179064 |
| [binary data] | [binary data] | 17000002 | 29993802 | [binary data] | 1680911915 | 18665624323 |
| [binary data] | [binary data] | 17000003 | 11343154 | [binary data] | 1680911927 | 20997863283 |
| [binary data] | [binary data] | 17000004 | 7688030 | [binary data] | 1680911939 | 20357980347 |
| [binary data] | [binary data] | 17000005 | 7191844 | [binary data] | 1680911951 | 19117505835 |
| [binary data] | [binary data] | 17000006 | 29983575 | [binary data] | 1680911963 | 17873568603 |
| [binary data] | [binary data] | 17000007 | 29988259 | [binary data] | 1680911975 | 20105318233 |
| [binary data] | [binary data] | 17000008 | 22330289 | [binary data] | 1680911987 | 22616515874 |
| [binary data] | [binary data] | 17000009 | 11057578 | [binary data] | 1680911999 | 23998062520 |
| [binary data] | [binary data] | 17000010 | 14843889 | [binary data] | 1680912011 | 23209641774 |
| [binary data] | [binary data] | 17000011 | 12450500 | [binary data] | 1680912023 | 23179447771 |
| … | … | … | … | … | … | … |
| [binary data] | [binary data] | 17009988 | 12428779 | [binary data] | 1681034051 | 21987087603 |
| [binary data] | [binary data] | 17009989 | 10985717 | [binary data] | 1681034063 | 21515973759 |
| [binary data] | [binary data] | 17009990 | 10205129 | [binary data] | 1681034075 | 20796213695 |
| [binary data] | [binary data] | 17009991 | 6369046 | [binary data] | 1681034087 | 19965254013 |
| [binary data] | [binary data] | 17009992 | 29994683 | [binary data] | 1681034099 | 18529260772 |
| [binary data] | [binary data] | 17009993 | 15733305 | [binary data] | 1681034111 | 20844597367 |
| [binary data] | [binary data] | 17009994 | 9897435 | [binary data] | 1681034123 | 20971976095 |
| [binary data] | [binary data] | 17009995 | 1770796 | [binary data] | 1681034135 | 20080218835 |
| [binary data] | [binary data] | 17009996 | 29990959 | [binary data] | 1681034147 | 17866507908 |
| [binary data] | [binary data] | 17009997 | 29994044 | [binary data] | 1681034171 | 20098475304 |
| [binary data] | [binary data] | 17009998 | 21943476 | [binary data] | 1681034195 | 22609787162 |
| [binary data] | [binary data] | 17009999 | 9631723 | [binary data] | 1681034207 | 23918041449 |
# pandas
blocks_pandas = cryo.collect(
'blocks',
start_block=17_000_000,
end_block=17_010_000,
output_format='pandas',
)
blocks_pandas
| hash | author | number | gas_used | extra_data | timestamp | base_fee_per_gas | |
|---|---|---|---|---|---|---|---|
| 0 | b'\x96\xcf\xa0\xfb^P\xb0\xa3\xf6\xccv\xf3)\x9c... | b'i\x0b\x9a\x9e\x9a\xa1\xc9\xdb\x99\x1cw!\xa9-... | 17000000 | 9160778 | b'by @builder0x69' | 1680911891 | 20582738913 |
| 1 | b'\x81\xa2\x8cv\xb0\xe3\xc8g \xe8\x99\xe3\x88\... | b'\xba\xf6\xdc.dz\xeboQ\x0f\x9e1\x88V\xa1\xbc\... | 17000001 | 9389175 | b'Made on the moon by Blocknative' | 1680911903 | 19581179064 |
| 2 | b'\xc8EvA\xf7\xbet\x8cv\xe6\xa0\x8ax{~\xdaL_u@... | b'H8\xb1\x06\xfc\xe9d{\xdf\x1exw\xbfs\xce\x8b\... | 17000002 | 29993802 | b'Titan' | 1680911915 | 18665624323 |
| 3 | b'\x90C\x83}\x9dv\r\x8d\xe1\xc2\xab\x13\xd4\xa... | b'`+,r\xa8\x9a\x1c\x80\xdb\xa2\x19\x8c\x9d\xc4... | 17000003 | 11343154 | b'by builder0x69' | 1680911927 | 20997863283 |
| 4 | b'\x1e\xa4 \xde\t[\xe1\x06\xfa\xc4\xd7~\xa6!\\... | b'\x1f\x90\x90\xaa\xe2\x8b\x8a=\xce\xad\xf2\x8... | 17000004 | 7688030 | b'rsync-builder.xyz' | 1680911939 | 20357980347 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | b')\xe5\x11%\x04z,~\x15SR\xa1q\ti\xf3O\x88\xfe... | b'\xd0\x11\x1c\xf5\xbf#\x082\xf4"\xda\x1cl\x1d... | 17009995 | 1770796 | b'' | 1681034135 | 20080218835 |
| 9996 | b'61\xc1\xd9\xbf\xaa\xb0\x80M\x98\xb0:=\x9c\r7... | b'\x1f\x90\x90\xaa\xe2\x8b\x8a=\xce\xad\xf2\x8... | 17009996 | 29990959 | b'rsync-builder.xyz' | 1681034147 | 17866507908 |
| 9997 | b'\x8f\xf4Q\x8f\x89\x07{NK\xaa]\xb1c\xa2{\t\x8... | b'\x1f\x90\x90\xaa\xe2\x8b\x8a=\xce\xad\xf2\x8... | 17009997 | 29994044 | b'rsync-builder.xyz' | 1681034171 | 20098475304 |
| 9998 | b'o>S\xaau\xee_D\xfd\xff3\xc8_\xe1\x03\x80qRjf... | b'8\x8c\x81\x8c\xa8\xb9%\x1b911\xc0\x8asjg\xcc... | 17009998 | 21943476 | b'\xd8\x83\x01\x0b\x05\x84geth\x88go1.20.2\x85... | 1681034195 | 22609787162 |
| 9999 | b'C\x89\xed\x1aL\xea\x92\xf4\x8djN\xbf\xa3q~\x... | b'\x95""\x90\xddrx\xaa=\xdd8\x9c\xc1\xe1\xd1e\... | 17009999 | 9631723 | b'beaverbuild.org' | 1681034207 | 23918041449 |
10000 rows × 7 columns
# list of dicts
blocks_dict = cryo.collect(
'blocks',
start_block=17_000_000,
end_block=17_000_003,
output_format='list',
)
blocks_dict
[{'hash': b'\x96\xcf\xa0\xfb^P\xb0\xa3\xf6\xccv\xf3)\x9c\xfb\xf4\x8f\x17\xe8\xb4\x17\x98\xd19Dt\xe6~\xc8\xa9~\x9f',
'author': b'i\x0b\x9a\x9e\x9a\xa1\xc9\xdb\x99\x1cw!\xa9-5\x1d\xb4\xfa\xc9\x90',
'number': 17000000,
'gas_used': 9160778,
'extra_data': b'by @builder0x69',
'timestamp': 1680911891,
'base_fee_per_gas': 20582738913},
{'hash': b'\x81\xa2\x8cv\xb0\xe3\xc8g \xe8\x99\xe3\x88\xf1,!\xd3\xc1\xd0\x97b\xbe\xba\x02\xe2w\xb5\xb4\x12\xcd\x8f\xef',
'author': b'\xba\xf6\xdc.dz\xeboQ\x0f\x9e1\x88V\xa1\xbc\xd6l^\x19',
'number': 17000001,
'gas_used': 9389175,
'extra_data': b'Made on the moon by Blocknative',
'timestamp': 1680911903,
'base_fee_per_gas': 19581179064},
{'hash': b'\xc8EvA\xf7\xbet\x8cv\xe6\xa0\x8ax{~\xdaL_u@\xcc\xc59\xe7\x8b(Y\x88\xfd\xf5\xcb\xca',
'author': b'H8\xb1\x06\xfc\xe9d{\xdf\x1exw\xbfs\xce\x8b\x0b\xad_\x97',
'number': 17000002,
'gas_used': 29993802,
'extra_data': b'Titan',
'timestamp': 1680911915,
'base_fee_per_gas': 18665624323}]
# dict of lists
blocks_dict = cryo.collect(
'blocks',
start_block=17_000_000,
end_block=17_000_003,
output_format='dict',
)
blocks_dict
{'hash': [b'\x96\xcf\xa0\xfb^P\xb0\xa3\xf6\xccv\xf3)\x9c\xfb\xf4\x8f\x17\xe8\xb4\x17\x98\xd19Dt\xe6~\xc8\xa9~\x9f',
b'\x81\xa2\x8cv\xb0\xe3\xc8g \xe8\x99\xe3\x88\xf1,!\xd3\xc1\xd0\x97b\xbe\xba\x02\xe2w\xb5\xb4\x12\xcd\x8f\xef',
b'\xc8EvA\xf7\xbet\x8cv\xe6\xa0\x8ax{~\xdaL_u@\xcc\xc59\xe7\x8b(Y\x88\xfd\xf5\xcb\xca'],
'author': [b'i\x0b\x9a\x9e\x9a\xa1\xc9\xdb\x99\x1cw!\xa9-5\x1d\xb4\xfa\xc9\x90',
b'\xba\xf6\xdc.dz\xeboQ\x0f\x9e1\x88V\xa1\xbc\xd6l^\x19',
b'H8\xb1\x06\xfc\xe9d{\xdf\x1exw\xbfs\xce\x8b\x0b\xad_\x97'],
'number': [17000000, 17000001, 17000002],
'gas_used': [9160778, 9389175, 29993802],
'extra_data': [b'by @builder0x69',
b'Made on the moon by Blocknative',
b'Titan'],
'timestamp': [1680911891, 1680911903, 1680911915],
'base_fee_per_gas': [20582738913, 19581179064, 18665624323]}
# parquet files
result = cryo.freeze(
'blocks',
start_block=17_000_000,
end_block=17_010_000,
verbose=False,
)
result['paths']['blocks']
['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.parquet', '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.parquet']
# csv files
result = cryo.freeze(
"blocks",
start_block=17_000_000,
end_block=17_010_000,
csv=True,
verbose=False,
)
result['paths']['blocks']
['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.csv', '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.csv']
# json files
result = cryo.freeze(
"blocks",
start_block=17_000_000,
end_block=17_010_000,
json=True,
verbose=False,
)
result['paths']['blocks']
['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.json', '/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.json']
blocks = await cryo.async_collect('blocks', start_block=17_000_000, end_block=17_000_003)
blocks
| hash | author | number | gas_used | extra_data | timestamp | base_fee_per_gas |
|---|---|---|---|---|---|---|
| binary | binary | u32 | u32 | binary | u32 | u64 |
| [binary data] | [binary data] | 17000000 | 9160778 | [binary data] | 1680911891 | 20582738913 |
| [binary data] | [binary data] | 17000001 | 9389175 | [binary data] | 1680911903 | 19581179064 |
| [binary data] | [binary data] | 17000002 | 29993802 | [binary data] | 1680911915 | 18665624323 |
result = await cryo.async_freeze(
"blocks",
start_block=17_000_000,
end_block=17_010_000,
verbose=False,
)
result['paths']
{'blocks': ['/home/storm/notebooks/cryo/ethereum__blocks__17000000_to_17000999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17001000_to_17001999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17002000_to_17002999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17003000_to_17003999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17004000_to_17004999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17005000_to_17005999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17006000_to_17006999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17007000_to_17007999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17008000_to_17008999.parquet',
'/home/storm/notebooks/cryo/ethereum__blocks__17009000_to_17009999.parquet']}
Below is a comparison of cryo to the highly-optimized rpc client of ctc
start_block = 17_000_000
benchmarks = {
'get_blocks': {},
'get_transactions': {},
'get_logs': {},
'get_traces': {},
}
n_blocks = {}
n_blocks['get_blocks'] = 10_000
t_start = time.time()
coroutines = [
ctc.rpc.async_eth_get_block_by_number(number)
for number in range(start_block, start_block + n_blocks['get_blocks'])
]
blocks = await asyncio.gather(*coroutines)
benchmarks['get_blocks']['native_fetch'] = time.time() - t_start
pl.DataFrame(blocks)
benchmarks['get_blocks']['native_total'] = time.time() - t_start
t_start = time.time()
df = await cryo.async_collect(
datatype='blocks',
start_block=start_block,
end_block=start_block + n_blocks['get_blocks'],
)
benchmarks['get_blocks']['cryo_total'] = time.time() - t_start
n_blocks['get_transactions'] = 1000
t_start = time.time()
coroutines = [
ctc.rpc.async_eth_get_block_by_number(number, include_full_transactions=True)
for number in range(start_block, start_block + n_blocks["get_transactions"])
]
results = await asyncio.gather(*coroutines)
txs = [tx for result in results for tx in result["transactions"]]
benchmarks["get_transactions"]["native_fetch"] = time.time() - t_start
pl.DataFrame(
txs,
infer_schema_length=1000,
)
benchmarks["get_transactions"]["native_total"] = time.time() - t_start
len(txs)
126359
t_start = time.time()
df = await cryo.async_collect(
'transactions',
start_block=start_block,
end_block=start_block + n_blocks['get_transactions'],
chunk_size=10000,
)
benchmarks['get_transactions']['cryo_total'] = time.time() - t_start
n_blocks['get_logs'] = 1000
t_start = time.time()
coroutines = [
ctc.rpc.async_eth_get_logs(start_block=number, end_block=number)
for number in range(start_block, start_block + n_blocks['get_logs'])
]
results = await asyncio.gather(*coroutines)
logs = [log for block_logs in results for log in block_logs]
benchmarks['get_logs']['native_fetch'] = time.time() - t_start
pl.DataFrame(logs)
benchmarks['get_logs']['native_total'] = time.time() - t_start
len(logs)
296866
t_start = time.time()
df = await cryo.async_collect(
datatype='logs',
blocks=[str(start_block) + ':' + str(start_block + n_blocks['get_logs'])],
chunk_size=10000,
)
benchmarks['get_logs']['cryo_total'] = time.time() - t_start
n_blocks['get_traces'] = 300
t_start = time.time()
coroutines = [
ctc.rpc.async_trace_block(number)
for number in range(start_block, start_block + n_blocks['get_traces'])
]
block_traces = await asyncio.gather(*coroutines)
benchmarks['get_traces']['native_fetch'] = time.time() - t_start
pl.DataFrame([trace for block_trace in block_traces for trace in block_trace])
benchmarks['get_traces']['native_total'] = time.time() - t_start
t_start = time.time()
df = await cryo.async_collect(
'traces',
blocks=[str(start_block) + ':' + str(start_block + n_blocks['get_traces'])],
)
benchmarks['get_traces']['cryo_total'] = time.time() - t_start
df = pl.DataFrame(list(benchmarks.values()))
df = df.insert_at_idx(0, pl.Series('#_blocks', n_blocks.values()))
df = df.insert_at_idx(0, pl.Series('test', benchmarks.keys()))
df = df.with_columns(
(pl.col('native_total') / pl.col('cryo_total')).alias('cryo_speed')
)
df = df.rename({column: column.replace('_', '\n') for column in df.columns})
df = df.rename(
{
column: column + '\n(secs)'
for column in df.columns
if column.endswith('total') or column.endswith('fetch')
}
)
column_formats = {column: {'decimals': 2} for column in df.columns}
column_formats['#\nblocks']['decimals'] = 0
column_formats['cryo\nspeed']['percentage'] = True
toolstr.print_text_box('cryo benchmarks')
print('this is a comparison between cryo and a "native" python rpc client')
print('the native client is highly-optimized and uses msgspec under the hood')
rows = [
('fetch time:', 'time it takes to fetch data from server'),
('dataframe time:', 'time it takes to package data into dataframe'),
('total time:', 'fetch time + dataframe time'),
]
print()
toolstr.print_table(rows, column_justify=['right', 'left'], compact=True)
print()
toolstr.print_dataframe_as_table(df, column_formats=column_formats)
┌─────────────────┐ │ cryo benchmarks │ └─────────────────┘
this is a comparison between cryo and a "native" python rpc client the native client is highly-optimized and uses msgspec under the hood
fetch time: time it takes to fetch data from server
dataframe time: time it takes to package data into dataframe
total time: fetch time + dataframe time
│ │ native │ native │ cryo │
│ # │ fetch │ total │ total │ cryo
test │ blocks │ (secs) │ (secs) │ (secs) │ speed
────────────────────┼──────────┼──────────┼──────────┼──────────┼─────────────
get_blocks │ 10,000 │ 9.69 │ 10.48 │ 0.98 │ 1,066.86%
get_transactions │ 1,000 │ 6.50 │ 7.86 │ 0.62 │ 1,266.92%
get_logs │ 1,000 │ 2.22 │ 4.85 │ 1.16 │ 416.93%
get_traces │ 300 │ 9.28 │ 12.04 │ 0.74 │ 1,616.32%