Polars for Finance: High-Performance Expressions & Contexts

In quantitative finance, the ability to process massive tick-level or daily datasets quickly is a competitive advantage. While Python has long relied on Pandas, its single-threaded nature and memory overhead often become bottlenecks during heavy backtesting or risk simulations. Polars has emerged as a high-performance alternative, written in Rust and built on the Apache Arrow memory model.

The core of Polars’ efficiency lies in its Expression Engine. Unlike Pandas, where operations are often executed eagerly and sequentially, Polars separates the logic of what to calculate (Expressions) from where and how to calculate it (Contexts). This allows the query optimizer to look at your entire chain of operations, prune unnecessary columns, and parallelize the computation across all available CPU cores without being hindered by Python’s Global Interpreter Lock (GIL).

1. Loading Financial Data

We start by loading OHLCV (Open, High, Low, Close, Volume) data for S&P 500 constituents. Using the Arrow IPC (Feather) format is preferred in finance because it provides near-instantaneous loading via memory-mapping, preserving the data types and structure without the parsing overhead of CSV or Parquet.

Show the code

import polars as pl
import polars.selectors as cs
from pathlib import Path
import os

1root = Path(os.path.abspath("")).parent.parent

2df = pl.read_ipc(root / "assets" / "data" / "ticker_data.arrow")
3df = df.sort(["ticker", "date"])

df.head(5)

1: Set root path to project directory.
2: Load ticker data from an Arrow file. Polars’ read_ipc is significantly faster than Pandas’ read_feather.
3: Sorting is critical for time-series operations like shifts and rolling windows.

Could not memory_map compressed IPC file, defaulting to normal read. Toggle off 'memory_map' to silence this warning.

shape: (5, 7)

date	ticker	open	high	low	close	volume
date	str	f32	f32	f32	f32	f32
2006-01-03	"A"	19.918699	20.025999	19.5728	19.9783	5.307088e6
2006-01-04	"A"	20.008101	20.1751	19.900801	20.032	4.195817e6
2006-01-05	"A"	19.9485	20.556801	19.9485	20.556801	4.835402e6
2006-01-06	"A"	20.574699	20.747601	20.3302	20.664101	6.146307e6
2006-01-09	"A"	20.664101	20.753599	20.527	20.6045	4.082859e6

2. Expressions: The Atomic Units of Logic

Expressions are the building blocks of Polars logic. They are lazily evaluated and represent a tree of operations. In a financial context, we can define modular expressions for standard metrics like log returns or volatility and reuse them across different DataFrames or pipelines.

One major advantage here is composability. You can define an expression once and apply it to any column that fits the required data type.

Show the code

1log_return = (pl.col("close") / pl.col("close").shift(1)).log().over("ticker")

realized_vol = (log_return.rolling_std(window_size=21) * (252**0.5)).over(
    "ticker"
2)


def min_max_scale(name: str) -> pl.Expr:
    col = pl.col(name)
3    return (col - col.min()) / (col.max() - col.min())

1: Log Returns: ln(P_t / P_{t-1}). We use .over("ticker") to ensure returns are calculated within each instrument.
2: Realized Volatility: Annualized standard deviation of log returns. window_size=21 represents a typical trading month.
3: Min-Max Scaling: A common normalization for machine learning features.

3. Context: Selection & Feature Engineering

Contexts are the environments where expressions are executed. The two most common are select (creating a new DataFrame from expressions) and with_columns (adding or updating columns in an existing DataFrame).

For financial feature engineering, with_columns is the workhorse. It allows for the parallel creation of multiple technical indicators in a single pass over the data.

Show the code

df = df.with_columns(
1    ret=log_return,
    vol=realized_vol,
2    typical_price=(pl.col("high") + pl.col("low") + pl.col("close")) / 3,
)

df.tail(5)

1: Append log returns and realized volatility to the DataFrame.
2: Typical price is a common input for indicators like Money Flow Index.

shape: (5, 10)

date	ticker	open	high	low	close	volume	ret	vol	typical_price
date	str	f32	f32	f32	f32	f32	f32	f32	f32
2026-04-22	"ZTS"	118.949997	119.910004	116.599998	117.519997	3.5146e6	-0.0056	0.272725	118.010002
2026-04-23	"ZTS"	117.190002	117.599998	114.949997	116.059998	4.5328e6	-0.012501	0.276069	116.203331
2026-04-24	"ZTS"	116.010002	117.050003	115.410004	116.870003	4.1851e6	0.006955	0.276144	116.443344
2026-04-27	"ZTS"	116.620003	119.68	116.599998	117.870003	3.1603e6	0.00852	0.277579	118.050003
2026-04-28	"ZTS"	117.230003	118.290001	116.080002	116.650002	2.9725e6	-0.010404	0.260076	117.006668

4. Context: Aggregation (`group_by`)

The group_by context allows for powerful split-apply-combine operations. In Polars, these aggregations are “push-down” optimized. When you run a risk summary, Polars doesn’t just group the data; it uses vectorized SIMD instructions to calculate sums and means across the groups simultaneously.

This is where we calculate portfolio-level or ticker-level risk metrics like Maximum Drawdown or the Sharpe Ratio.

Show the code

risk_summary = df.group_by("ticker").agg(
    annual_return=(pl.col("ret").mean() * 252),
    annual_vol=(pl.col("ret").std() * (252**0.5)),
1    max_drawdown=((pl.col("close") / pl.col("close").cum_max() - 1).min()),
2    sharpe_ratio=(pl.col("ret").mean() / pl.col("ret").std()) * (252**0.5),
)

risk_summary.sort("sharpe_ratio", descending=True).head(5)

1: Max Drawdown calculation using cumulative maximum.
2: Risk-adjusted return (assuming 0% risk-free rate for simplicity).

shape: (5, 5)

ticker	annual_return	annual_vol	max_drawdown	sharpe_ratio
str	f32	f32	f32	f32
"SNDK"	2.785047	0.98521	-0.475009	2.826855
"GEV"	1.023233	0.533674	-0.382856	1.917337
"Q"	0.743194	0.545531	-0.271231	1.362331
"CEG"	0.475289	0.490132	-0.507023	0.969717
"AVGO"	0.351236	0.378535	-0.483	0.927882

5. Power Selectors (`polars.selectors`)

Selectors are a recent addition to Polars that allow you to target columns based on their properties (name, dtype, etc.) rather than hardcoding names. This is incredibly useful for stress testing or “what-if” analysis where you might want to apply a shock to all price-related columns simultaneously.

Show the code

1price_shock = cs.starts_with("open", "high", "low", "close") * 1.01

df.select(
    "date",
    "ticker",
2    price_shock.name.suffix("_shock"),
).head(3)

1: Scenario Analysis: Apply a 1% upward shock to all OHLC columns.
2: Apply the shock expression and append a suffix to the column names.

shape: (3, 6)

date	ticker	open_shock	high_shock	low_shock	close_shock
date	str	f32	f32	f32	f32
2006-01-03	"A"	20.117886	20.226259	19.768528	20.178083
2006-01-04	"A"	20.208181	20.376852	20.099808	20.232319
2006-01-05	"A"	20.147984	20.762369	20.147984	20.762369

6. Window Functions (`over`)

Window functions (over) are perhaps the most powerful tool for cross-sectional analysis. They allow you to compute group-level statistics (like a sector average or a market-cap weight) and project them back onto the original rows without performing a merge/join.

In quantitative research, this is used for z-scoring features across the universe or calculating relative strength.

Show the code

df.with_columns(
    vol_z=(pl.col("volume") - pl.col("volume").mean().over("ticker"))
1    / pl.col("volume").std().over("ticker"),
2    rel_strength=pl.col("close") / pl.col("close").mean().over("date"),
).head(5)

1: Z-score of volume: identifying unusual trading activity relative to the ticker’s history.
2: Cross-sectional Relative Strength: Price relative to the market average for that specific day.

shape: (5, 12)

date	ticker	open	high	low	close	volume	ret	vol	typical_price	vol_z	rel_strength
date	str	f32	f32	f32	f32	f32	f32	f32	f32	f32	f32
2006-01-03	"A"	19.918699	20.025999	19.5728	19.9783	5.307088e6	null	null	19.859035	1.077447	0.749611
2006-01-04	"A"	20.008101	20.1751	19.900801	20.032	4.195817e6	0.002684	null	20.035969	0.542916	0.748959
2006-01-05	"A"	19.9485	20.556801	19.9485	20.556801	4.835402e6	0.025861	null	20.354034	0.850562	0.766769
2006-01-06	"A"	20.574699	20.747601	20.3302	20.664101	6.146307e6	0.005206	null	20.580635	1.481119	0.764833
2006-01-09	"A"	20.664101	20.753599	20.527	20.6045	4.082859e6	-0.002888	null	20.628368	0.488583	0.754984

7. Time Series: Rolling Metrics & Asof Joins

Polars was built with time-series as a first-class citizen. Its rolling expressions are highly optimized for windowed calculations. Furthermore, the join_asof function solves the “as-of” join problem (matching a trade to the most recent quote) with performance that far exceeds Pandas.

Show the code

df.with_columns(
    vwap=(pl.col("close") * pl.col("volume")).rolling_sum(20)
1    / pl.col("volume").rolling_sum(20)
).head(5)

1: Volume-Weighted Average Price (VWAP) over a rolling 20-day window.

shape: (5, 11)

date	ticker	open	high	low	close	volume	ret	vol	typical_price	vwap
date	str	f32	f32	f32	f32	f32	f32	f32	f32	f32
2006-01-03	"A"	19.918699	20.025999	19.5728	19.9783	5.307088e6	null	null	19.859035	null
2006-01-04	"A"	20.008101	20.1751	19.900801	20.032	4.195817e6	0.002684	null	20.035969	null
2006-01-05	"A"	19.9485	20.556801	19.9485	20.556801	4.835402e6	0.025861	null	20.354034	null
2006-01-06	"A"	20.574699	20.747601	20.3302	20.664101	6.146307e6	0.005206	null	20.580635	null
2006-01-09	"A"	20.664101	20.753599	20.527	20.6045	4.082859e6	-0.002888	null	20.628368	null

8. Reshaping for Correlation Analysis

Quant analysts often need to compute correlation matrices or covariance matrices for portfolio optimization. This requires pivoting “long” ticker data into a “wide” format where each column represents a different asset’s returns.

Show the code

wide_returns = df.pivot(
    index="date",
    on="ticker",
    values="ret",
1).drop_nulls()

2corr_matrix = wide_returns.select(cs.numeric()).corr()
corr_matrix.head(5)

1: Transforming data from Long format (tidy) to Wide format for multivariate analysis.
2: Efficient correlation matrix calculation. Polars handles the underlying matrix math in parallel.

shape: (5, 501)

A	AAPL	ABBV	ABNB	ABT	ACGL	ACN	ADBE	ADI	ADM	ADP	ADSK	AEE	AEP	AES	AFL	AIG	AIZ	AJG	AKAM	ALB	ALGN	ALL	ALLE	AMAT	AMCR	AMD	AME	AMGN	AMP	AMT	AMZN	ANET	AON	AOS	APA	APD	…	VICI	VLO	VLTO	VMC	VRSK	VRSN	VRT	VRTX	VST	VTR	VTRS	VZ	WAB	WAT	WBD	WDAY	WDC	WEC	WELL	WFC	WM	WMB	WMT	WRB	WSM	WST	WTW	WY	WYNN	XEL	XOM	XYL	XYZ	YUM	ZBH	ZBRA	ZTS
f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	…	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64
1.0	0.252005	0.159886	0.387158	0.291319	0.051918	0.314924	0.220464	0.365594	0.082219	0.250166	0.352096	0.01177	-0.145537	0.20628	0.149047	-0.048577	0.142817	0.096217	0.21591	0.184219	0.467583	0.129767	0.202938	0.322944	0.333373	0.084619	0.375688	0.321947	0.361348	0.011551	0.34124	0.173192	0.162829	0.229952	-0.147347	0.205694	…	0.016885	-0.049898	0.323352	0.333929	0.135326	0.065517	0.177153	0.315086	0.133985	-0.169327	0.253472	-0.07502	0.337305	0.62843	0.056035	0.257507	0.22573	-0.032459	-0.059922	0.214895	0.004414	-0.119114	0.066659	0.013125	0.385525	0.357492	0.103034	0.16268	0.289355	0.007724	-0.119016	0.23687	0.381205	0.216098	0.196699	0.274606	0.362176
0.252005	1.0	0.032607	0.353655	0.207605	0.10381	0.050583	0.132521	0.365957	-0.057716	0.06369	0.121071	-0.158196	-0.146113	0.018032	0.205258	0.216694	0.115361	0.003543	-0.085321	0.028355	0.376391	0.127262	0.008084	0.178174	0.273976	0.092632	0.342363	0.248612	0.294118	0.025718	0.231091	0.098731	-0.052414	0.18999	-0.127354	0.106166	…	0.109663	-0.125555	0.149918	0.116569	-0.076417	0.007996	0.242075	0.07343	-0.009934	0.081029	0.401407	-0.069197	0.336963	0.267525	0.150331	0.083502	0.115886	-0.190388	0.044669	0.36324	-0.072876	-0.057587	0.033092	0.117635	0.27519	0.175975	0.068427	0.108035	0.365579	-0.149933	-0.108486	0.194755	0.151375	0.146997	0.06224	0.188564	0.252188
0.159886	0.032607	1.0	0.025042	0.17823	0.1801	0.051987	0.065414	0.029073	0.030087	-0.049793	0.074998	0.27563	0.201132	0.01249	0.162784	0.032822	0.151542	0.089796	0.13993	0.098696	0.106788	0.131893	0.054494	0.021722	0.108147	0.003968	0.181843	0.389602	-0.032594	0.105185	-0.112818	-0.002974	0.035057	0.120331	-0.133418	0.137989	…	0.028821	-0.092267	0.206458	0.08881	0.081341	-0.101082	0.004792	0.433486	-0.154702	0.184427	0.14594	0.123576	0.107561	0.05267	0.036274	-0.130214	-0.060284	0.194187	0.229703	-0.040875	0.105738	0.055912	0.229642	0.057308	0.04018	0.160114	0.013678	0.040071	-0.121142	0.183901	-0.111634	0.084572	0.053288	0.195927	0.009359	-0.03683	-0.018306
0.387158	0.353655	0.025042	1.0	0.179758	0.069289	0.408465	0.469951	0.41908	-0.003487	0.454314	0.500417	-0.086092	-0.187378	-0.005041	0.155108	0.131687	0.26926	0.201723	0.092757	0.196378	0.559737	0.112407	0.127923	0.296559	0.276444	0.173123	0.297822	0.195929	0.433715	-0.006856	0.417421	0.322316	0.226075	0.217916	-0.153585	-0.079396	…	0.053546	-0.091131	0.448981	0.12535	0.24754	0.193942	0.155101	0.2933	0.108703	-0.09358	0.293449	-0.223877	0.354082	0.348563	0.159664	0.396202	0.156269	-0.177693	-0.075159	0.368768	0.012266	-0.014705	-0.084896	-0.146465	0.44977	0.284118	0.182924	0.037971	0.439027	-0.130453	-0.272673	0.284202	0.451018	0.021685	0.142714	0.384437	0.45244
0.291319	0.207605	0.17823	0.179758	1.0	0.135045	0.082257	-0.01935	0.121082	0.129791	0.053078	-0.057555	0.235583	0.179526	0.038046	0.225051	0.124796	0.100274	0.090429	0.032748	-0.07607	0.23032	0.184531	0.133566	0.092136	0.224374	-0.145982	0.167634	0.177653	0.073567	0.151422	-0.07323	-0.16457	0.005134	0.142195	-0.005853	0.199751	…	0.141295	-0.051328	0.31528	0.121736	0.047432	0.100863	0.08491	0.126726	0.014749	0.292106	0.13791	0.088639	0.223678	0.151275	0.095275	-0.110802	0.028947	0.160011	0.325483	-0.008272	0.135166	-0.005325	0.131658	0.12927	0.217428	0.32369	0.019537	0.198712	0.092924	0.155969	-0.001896	0.056342	-0.007221	0.300103	0.180629	-0.065155	0.275692

9. Performance Optimization: Categoricals & Streaming

When working with millions of rows, string columns (like tickers) can consume massive amounts of memory. Polars provides a Categorical type (similar to Pandas) and an Enum type (for fixed sets) that represent strings as integers under the hood.

For datasets that exceed RAM, Polars supports LazyFrames and Streaming. Instead of loading the whole file, Polars builds a query plan and executes it in batches (sinks).

Memory Optimization: Tickers should almost always be Categorical or Enum.
Trigger execution in a memory-efficient, batched manner using streaming.

Summary

Polars is not just a faster Pandas; it’s a fundamentally different execution model. By leveraging Rust’s safety and speed, Apache Arrow’s memory efficiency, and a sophisticated query optimizer, Polars allows you to perform complex financial engineering tasks on your laptop that previously required distributed clusters.

Speed: Often 10x-50x faster than Pandas for grouping and joins.
Memory: Drastically lower footprint due to Arrow and zero-copy operations.
Syntax: A consistent, functional DSL that reduces the “copy-paste” bugs common in Pandas.

Note

Download the updated companion script here.

1. Loading Financial Data

2. Expressions: The Atomic Units of Logic

3. Context: Selection & Feature Engineering

4. Context: Aggregation (group_by)

5. Power Selectors (polars.selectors)

6. Window Functions (over)

7. Time Series: Rolling Metrics & Asof Joins

8. Reshaping for Correlation Analysis

9. Performance Optimization: Categoricals & Streaming

Summary

4. Context: Aggregation (`group_by`)

5. Power Selectors (`polars.selectors`)

6. Window Functions (`over`)