Polars for Finance: High-Performance Expressions & Contexts

An in-depth guide to Polars’ expression engine, focusing on financial metrics, risk aggregations, and high-performance time-series transformations for quantitative research.
Data Science
Finance
Programming
Published

April 5, 2026

In quantitative finance, the ability to process massive tick-level or daily datasets quickly is a competitive advantage. While Python has long relied on Pandas, its single-threaded nature and memory overhead often become bottlenecks during heavy backtesting or risk simulations. Polars has emerged as a high-performance alternative, written in Rust and built on the Apache Arrow memory model.

The core of Polars’ efficiency lies in its Expression Engine. Unlike Pandas, where operations are often executed eagerly and sequentially, Polars separates the logic of what to calculate (Expressions) from where and how to calculate it (Contexts). This allows the query optimizer to look at your entire chain of operations, prune unnecessary columns, and parallelize the computation across all available CPU cores without being hindered by Python’s Global Interpreter Lock (GIL).

1. Loading Financial Data

We start by loading OHLCV (Open, High, Low, Close, Volume) data for S&P 500 constituents. Using the Arrow IPC (Feather) format is preferred in finance because it provides near-instantaneous loading via memory-mapping, preserving the data types and structure without the parsing overhead of CSV or Parquet.

Show the code
import polars as pl
import polars.selectors as cs
from pathlib import Path
import os

1root = Path(os.path.abspath("")).parent.parent

2df = pl.read_ipc(root / "assets" / "data" / "ticker_data.arrow")
3df = df.sort(["ticker", "date"])

df.head(5)
1
Set root path to project directory.
2
Load ticker data from an Arrow file. Polars’ read_ipc is significantly faster than Pandas’ read_feather.
3
Sorting is critical for time-series operations like shifts and rolling windows.
Could not memory_map compressed IPC file, defaulting to normal read. Toggle off 'memory_map' to silence this warning.
shape: (5, 7)
date ticker open high low close volume
date str f32 f32 f32 f32 f32
2006-01-03 "A" 19.918699 20.025999 19.5728 19.9783 5.307088e6
2006-01-04 "A" 20.008101 20.1751 19.900801 20.032 4.195817e6
2006-01-05 "A" 19.9485 20.556801 19.9485 20.556801 4.835402e6
2006-01-06 "A" 20.574699 20.747601 20.3302 20.664101 6.146307e6
2006-01-09 "A" 20.664101 20.753599 20.527 20.6045 4.082859e6

2. Expressions: The Atomic Units of Logic

Expressions are the building blocks of Polars logic. They are lazily evaluated and represent a tree of operations. In a financial context, we can define modular expressions for standard metrics like log returns or volatility and reuse them across different DataFrames or pipelines.

One major advantage here is composability. You can define an expression once and apply it to any column that fits the required data type.

Show the code
1log_return = (pl.col("close") / pl.col("close").shift(1)).log().over("ticker")

realized_vol = (log_return.rolling_std(window_size=21) * (252**0.5)).over(
    "ticker"
2)


def min_max_scale(name: str) -> pl.Expr:
    col = pl.col(name)
3    return (col - col.min()) / (col.max() - col.min())
1
Log Returns: ln(P_t / P_{t-1}). We use .over("ticker") to ensure returns are calculated within each instrument.
2
Realized Volatility: Annualized standard deviation of log returns. window_size=21 represents a typical trading month.
3
Min-Max Scaling: A common normalization for machine learning features.

3. Context: Selection & Feature Engineering

Contexts are the environments where expressions are executed. The two most common are select (creating a new DataFrame from expressions) and with_columns (adding or updating columns in an existing DataFrame).

For financial feature engineering, with_columns is the workhorse. It allows for the parallel creation of multiple technical indicators in a single pass over the data.

Show the code
df = df.with_columns(
1    ret=log_return,
    vol=realized_vol,
2    typical_price=(pl.col("high") + pl.col("low") + pl.col("close")) / 3,
)

df.tail(5)
1
Append log returns and realized volatility to the DataFrame.
2
Typical price is a common input for indicators like Money Flow Index.
shape: (5, 10)
date ticker open high low close volume ret vol typical_price
date str f32 f32 f32 f32 f32 f32 f32 f32
2026-04-22 "ZTS" 118.949997 119.910004 116.599998 117.519997 3.5146e6 -0.0056 0.272725 118.010002
2026-04-23 "ZTS" 117.190002 117.599998 114.949997 116.059998 4.5328e6 -0.012501 0.276069 116.203331
2026-04-24 "ZTS" 116.010002 117.050003 115.410004 116.870003 4.1851e6 0.006955 0.276144 116.443344
2026-04-27 "ZTS" 116.620003 119.68 116.599998 117.870003 3.1603e6 0.00852 0.277579 118.050003
2026-04-28 "ZTS" 117.230003 118.290001 116.080002 116.650002 2.9725e6 -0.010404 0.260076 117.006668

4. Context: Aggregation (group_by)

The group_by context allows for powerful split-apply-combine operations. In Polars, these aggregations are “push-down” optimized. When you run a risk summary, Polars doesn’t just group the data; it uses vectorized SIMD instructions to calculate sums and means across the groups simultaneously.

This is where we calculate portfolio-level or ticker-level risk metrics like Maximum Drawdown or the Sharpe Ratio.

Show the code
risk_summary = df.group_by("ticker").agg(
    annual_return=(pl.col("ret").mean() * 252),
    annual_vol=(pl.col("ret").std() * (252**0.5)),
1    max_drawdown=((pl.col("close") / pl.col("close").cum_max() - 1).min()),
2    sharpe_ratio=(pl.col("ret").mean() / pl.col("ret").std()) * (252**0.5),
)

risk_summary.sort("sharpe_ratio", descending=True).head(5)
1
Max Drawdown calculation using cumulative maximum.
2
Risk-adjusted return (assuming 0% risk-free rate for simplicity).
shape: (5, 5)
ticker annual_return annual_vol max_drawdown sharpe_ratio
str f32 f32 f32 f32
"SNDK" 2.785047 0.98521 -0.475009 2.826855
"GEV" 1.023233 0.533674 -0.382856 1.917337
"Q" 0.743194 0.545531 -0.271231 1.362331
"CEG" 0.475289 0.490132 -0.507023 0.969717
"AVGO" 0.351236 0.378535 -0.483 0.927882

5. Power Selectors (polars.selectors)

Selectors are a recent addition to Polars that allow you to target columns based on their properties (name, dtype, etc.) rather than hardcoding names. This is incredibly useful for stress testing or “what-if” analysis where you might want to apply a shock to all price-related columns simultaneously.

Show the code
1price_shock = cs.starts_with("open", "high", "low", "close") * 1.01

df.select(
    "date",
    "ticker",
2    price_shock.name.suffix("_shock"),
).head(3)
1
Scenario Analysis: Apply a 1% upward shock to all OHLC columns.
2
Apply the shock expression and append a suffix to the column names.
shape: (3, 6)
date ticker open_shock high_shock low_shock close_shock
date str f32 f32 f32 f32
2006-01-03 "A" 20.117886 20.226259 19.768528 20.178083
2006-01-04 "A" 20.208181 20.376852 20.099808 20.232319
2006-01-05 "A" 20.147984 20.762369 20.147984 20.762369

6. Window Functions (over)

Window functions (over) are perhaps the most powerful tool for cross-sectional analysis. They allow you to compute group-level statistics (like a sector average or a market-cap weight) and project them back onto the original rows without performing a merge/join.

In quantitative research, this is used for z-scoring features across the universe or calculating relative strength.

Show the code
df.with_columns(
    vol_z=(pl.col("volume") - pl.col("volume").mean().over("ticker"))
1    / pl.col("volume").std().over("ticker"),
2    rel_strength=pl.col("close") / pl.col("close").mean().over("date"),
).head(5)
1
Z-score of volume: identifying unusual trading activity relative to the ticker’s history.
2
Cross-sectional Relative Strength: Price relative to the market average for that specific day.
shape: (5, 12)
date ticker open high low close volume ret vol typical_price vol_z rel_strength
date str f32 f32 f32 f32 f32 f32 f32 f32 f32 f32
2006-01-03 "A" 19.918699 20.025999 19.5728 19.9783 5.307088e6 null null 19.859035 1.077447 0.749611
2006-01-04 "A" 20.008101 20.1751 19.900801 20.032 4.195817e6 0.002684 null 20.035969 0.542916 0.748959
2006-01-05 "A" 19.9485 20.556801 19.9485 20.556801 4.835402e6 0.025861 null 20.354034 0.850562 0.766769
2006-01-06 "A" 20.574699 20.747601 20.3302 20.664101 6.146307e6 0.005206 null 20.580635 1.481119 0.764833
2006-01-09 "A" 20.664101 20.753599 20.527 20.6045 4.082859e6 -0.002888 null 20.628368 0.488583 0.754984

7. Time Series: Rolling Metrics & Asof Joins

Polars was built with time-series as a first-class citizen. Its rolling expressions are highly optimized for windowed calculations. Furthermore, the join_asof function solves the “as-of” join problem (matching a trade to the most recent quote) with performance that far exceeds Pandas.

Show the code
df.with_columns(
    vwap=(pl.col("close") * pl.col("volume")).rolling_sum(20)
1    / pl.col("volume").rolling_sum(20)
).head(5)
1
Volume-Weighted Average Price (VWAP) over a rolling 20-day window.
shape: (5, 11)
date ticker open high low close volume ret vol typical_price vwap
date str f32 f32 f32 f32 f32 f32 f32 f32 f32
2006-01-03 "A" 19.918699 20.025999 19.5728 19.9783 5.307088e6 null null 19.859035 null
2006-01-04 "A" 20.008101 20.1751 19.900801 20.032 4.195817e6 0.002684 null 20.035969 null
2006-01-05 "A" 19.9485 20.556801 19.9485 20.556801 4.835402e6 0.025861 null 20.354034 null
2006-01-06 "A" 20.574699 20.747601 20.3302 20.664101 6.146307e6 0.005206 null 20.580635 null
2006-01-09 "A" 20.664101 20.753599 20.527 20.6045 4.082859e6 -0.002888 null 20.628368 null

8. Reshaping for Correlation Analysis

Quant analysts often need to compute correlation matrices or covariance matrices for portfolio optimization. This requires pivoting “long” ticker data into a “wide” format where each column represents a different asset’s returns.

Show the code
wide_returns = df.pivot(
    index="date",
    on="ticker",
    values="ret",
1).drop_nulls()

2corr_matrix = wide_returns.select(cs.numeric()).corr()
corr_matrix.head(5)
1
Transforming data from Long format (tidy) to Wide format for multivariate analysis.
2
Efficient correlation matrix calculation. Polars handles the underlying matrix math in parallel.
shape: (5, 501)
A AAPL ABBV ABNB ABT ACGL ACN ADBE ADI ADM ADP ADSK AEE AEP AES AFL AIG AIZ AJG AKAM ALB ALGN ALL ALLE AMAT AMCR AMD AME AMGN AMP AMT AMZN ANET AON AOS APA APD VICI VLO VLTO VMC VRSK VRSN VRT VRTX VST VTR VTRS VZ WAB WAT WBD WDAY WDC WEC WELL WFC WM WMB WMT WRB WSM WST WTW WY WYNN XEL XOM XYL XYZ YUM ZBH ZBRA ZTS
f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64
1.0 0.252005 0.159886 0.387158 0.291319 0.051918 0.314924 0.220464 0.365594 0.082219 0.250166 0.352096 0.01177 -0.145537 0.20628 0.149047 -0.048577 0.142817 0.096217 0.21591 0.184219 0.467583 0.129767 0.202938 0.322944 0.333373 0.084619 0.375688 0.321947 0.361348 0.011551 0.34124 0.173192 0.162829 0.229952 -0.147347 0.205694 0.016885 -0.049898 0.323352 0.333929 0.135326 0.065517 0.177153 0.315086 0.133985 -0.169327 0.253472 -0.07502 0.337305 0.62843 0.056035 0.257507 0.22573 -0.032459 -0.059922 0.214895 0.004414 -0.119114 0.066659 0.013125 0.385525 0.357492 0.103034 0.16268 0.289355 0.007724 -0.119016 0.23687 0.381205 0.216098 0.196699 0.274606 0.362176
0.252005 1.0 0.032607 0.353655 0.207605 0.10381 0.050583 0.132521 0.365957 -0.057716 0.06369 0.121071 -0.158196 -0.146113 0.018032 0.205258 0.216694 0.115361 0.003543 -0.085321 0.028355 0.376391 0.127262 0.008084 0.178174 0.273976 0.092632 0.342363 0.248612 0.294118 0.025718 0.231091 0.098731 -0.052414 0.18999 -0.127354 0.106166 0.109663 -0.125555 0.149918 0.116569 -0.076417 0.007996 0.242075 0.07343 -0.009934 0.081029 0.401407 -0.069197 0.336963 0.267525 0.150331 0.083502 0.115886 -0.190388 0.044669 0.36324 -0.072876 -0.057587 0.033092 0.117635 0.27519 0.175975 0.068427 0.108035 0.365579 -0.149933 -0.108486 0.194755 0.151375 0.146997 0.06224 0.188564 0.252188
0.159886 0.032607 1.0 0.025042 0.17823 0.1801 0.051987 0.065414 0.029073 0.030087 -0.049793 0.074998 0.27563 0.201132 0.01249 0.162784 0.032822 0.151542 0.089796 0.13993 0.098696 0.106788 0.131893 0.054494 0.021722 0.108147 0.003968 0.181843 0.389602 -0.032594 0.105185 -0.112818 -0.002974 0.035057 0.120331 -0.133418 0.137989 0.028821 -0.092267 0.206458 0.08881 0.081341 -0.101082 0.004792 0.433486 -0.154702 0.184427 0.14594 0.123576 0.107561 0.05267 0.036274 -0.130214 -0.060284 0.194187 0.229703 -0.040875 0.105738 0.055912 0.229642 0.057308 0.04018 0.160114 0.013678 0.040071 -0.121142 0.183901 -0.111634 0.084572 0.053288 0.195927 0.009359 -0.03683 -0.018306
0.387158 0.353655 0.025042 1.0 0.179758 0.069289 0.408465 0.469951 0.41908 -0.003487 0.454314 0.500417 -0.086092 -0.187378 -0.005041 0.155108 0.131687 0.26926 0.201723 0.092757 0.196378 0.559737 0.112407 0.127923 0.296559 0.276444 0.173123 0.297822 0.195929 0.433715 -0.006856 0.417421 0.322316 0.226075 0.217916 -0.153585 -0.079396 0.053546 -0.091131 0.448981 0.12535 0.24754 0.193942 0.155101 0.2933 0.108703 -0.09358 0.293449 -0.223877 0.354082 0.348563 0.159664 0.396202 0.156269 -0.177693 -0.075159 0.368768 0.012266 -0.014705 -0.084896 -0.146465 0.44977 0.284118 0.182924 0.037971 0.439027 -0.130453 -0.272673 0.284202 0.451018 0.021685 0.142714 0.384437 0.45244
0.291319 0.207605 0.17823 0.179758 1.0 0.135045 0.082257 -0.01935 0.121082 0.129791 0.053078 -0.057555 0.235583 0.179526 0.038046 0.225051 0.124796 0.100274 0.090429 0.032748 -0.07607 0.23032 0.184531 0.133566 0.092136 0.224374 -0.145982 0.167634 0.177653 0.073567 0.151422 -0.07323 -0.16457 0.005134 0.142195 -0.005853 0.199751 0.141295 -0.051328 0.31528 0.121736 0.047432 0.100863 0.08491 0.126726 0.014749 0.292106 0.13791 0.088639 0.223678 0.151275 0.095275 -0.110802 0.028947 0.160011 0.325483 -0.008272 0.135166 -0.005325 0.131658 0.12927 0.217428 0.32369 0.019537 0.198712 0.092924 0.155969 -0.001896 0.056342 -0.007221 0.300103 0.180629 -0.065155 0.275692

9. Performance Optimization: Categoricals & Streaming

When working with millions of rows, string columns (like tickers) can consume massive amounts of memory. Polars provides a Categorical type (similar to Pandas) and an Enum type (for fixed sets) that represent strings as integers under the hood.

For datasets that exceed RAM, Polars supports LazyFrames and Streaming. Instead of loading the whole file, Polars builds a query plan and executes it in batches (sinks).

  1. Memory Optimization: Tickers should almost always be Categorical or Enum.
  2. Trigger execution in a memory-efficient, batched manner using streaming.

Summary

Polars is not just a faster Pandas; it’s a fundamentally different execution model. By leveraging Rust’s safety and speed, Apache Arrow’s memory efficiency, and a sophisticated query optimizer, Polars allows you to perform complex financial engineering tasks on your laptop that previously required distributed clusters.

  • Speed: Often 10x-50x faster than Pandas for grouping and joins.
  • Memory: Drastically lower footprint due to Arrow and zero-copy operations.
  • Syntax: A consistent, functional DSL that reduces the “copy-paste” bugs common in Pandas.
Note

Download the updated companion script here.

Back to top