bwrob blog

Introduction

In the quant world, we’re always neck-deep in data. It’s not just about how fast you can process it, but whether your code is going to crash your server by eating up all the memory.

Python has some incredible tools for handling data streams efficiently. This post is all about iterators and generators—the “lazy” way to handle data that can save you a ton of memory and a lot of headaches.

Lists vs. Tuples: The Eager Iterables

Lists and tuples are the most common sequence types in Python. They are eager in the sense that all their elements are stored in memory at once. When you create a list of a million numbers, Python allocates memory for all of them.

my_list = [i for i in range(1_000_000)]
# This creates a list with 1,000,000 integers in memory.

This is fine for small to medium-sized datasets, but for very large datasets, this can lead to MemoryError.

Key Differences

The primary difference between lists and tuples is mutability.

Lists are mutable, meaning you can change their content. You can add, remove, or change elements. This flexibility comes at a cost: lists typically require more memory to store the same number of elements compared to tuples. This is because Python allocates extra memory to accommodate future additions.
Tuples are immutable. Once a tuple is created, you cannot change its content. This immutability makes them slightly more memory-efficient and faster to access than lists.

From a quant’s perspective, you can think of a tuple as a “frozen” or “read-only” list. Use tuples for data that should not change, like coordinates, configuration settings, or records from a database. If you need a collection that you will modify, a list is the way to go.

Iterators: The Lazy Approach

An iterator is an object that represents a stream of data. It produces one item at a time, only when requested. This “lazy” evaluation is incredibly memory-efficient. The iterator protocol in Python consists of two methods:

__iter__(): Returns the iterator object itself.
__next__(): Returns the next item from the stream. When there are no more items, it raises a StopIteration exception.

You can get an iterator from any iterable (like a list) using the iter() function.

my_list = [1, 2, 3]
my_iterator = iter(my_list)

print(next(my_iterator))  # Output: 1
print(next(my_iterator))  # Output: 2
print(next(my_iterator))  # Output: 3
# The next call would raise StopIteration

1
2
3

Generators: Simplified Iterators

Writing a class with __iter__ and __next__ can be cumbersome. Generators provide a much simpler way to create iterators. A generator is a function that uses the yield keyword to return an item. When a generator function is called, it returns a generator object, which is a type of iterator.

Here’s a simple generator that produces a sequence of numbers:

def number_generator(n):
    for i in range(n):
        yield i

gen = number_generator(1_000_000)
# No large list is created in memory here.
# The generator object 'gen' is created, ready to produce numbers.

The state of the generator is saved between yield calls. This allows it to resume where it left off.

Example: Flattening a List of Lists

A common task is to flatten a list of lists into a single list. A generator is a perfect tool for this, especially when dealing with large datasets, as it avoids creating a new, large list in memory.

Imagine you have a list of trades, where each trade has a list of associated cashflows. You want to process all cashflows from all trades.

trades_cashflows = [
    [10, 20, 30],  # Cashflows for Trade 1
    [15, 25],  # Cashflows for Trade 2
    [100, -10, 5],  # Cashflows for Trade 3
    110,  # Simple payment doesn't need to be in a list!
]


def flatten(list_of_lists):
    for item in list_of_lists:
        if isinstance(item, list):
            for subitem in item:
                yield subitem
        else:
            yield item


# The generator does not hold all cashflows in memory
all_cashflows_generator = flatten(trades_cashflows)

for cf in all_cashflows_generator:
    print(cf, end=" ")
# Output: 10 20 30 15 25 100 -10 5 110

10 20 30 15 25 100 -10 5 110

This flatten generator is memory-efficient. It only needs to store one cashflow at a time, regardless of the total number of cashflows.

Memory Efficiency in Action

To make our memory analysis cleaner, we can define a decorator. This is a more Pythonic way to wrap functions with common boilerplate code, like starting and stopping tracemalloc.

Let’s demonstrate the memory difference using a decorator with the standard library’s tracemalloc module. We’ll compare the memory usage of creating a list versus creating a generator.

import tracemalloc
from functools import wraps


def profile_memory(func):
    """A decorator to profile the memory usage of a function."""

    @wraps(func)
    def wrapper(*args, **kwargs):
        tracemalloc.start()
        result = func(*args, **kwargs)
        current, peak = tracemalloc.get_traced_memory()
        print(f"Function: {func.__name__}")
        print(
            f"Current memory usage is {current / 10**6:.6f}MB; Peak was {peak / 10**6:.6f}MB"
        )
        tracemalloc.stop()
        return result

    return wrapper


@profile_memory
def create_list(n):
    """This function creates a list of n numbers."""
    return [i for i in range(n)]


@profile_memory
def create_generator(n):
    """This function creates a generator of n numbers."""
    return (i for i in range(n))


n = 1_000_000
print("Profiling memory for list creation...")
my_list = create_list(n)


print("Profiling memory for generator creation...")
my_generator = create_generator(n)

# The generator itself is small. Let's profile consuming it.
@profile_memory
def consume_generator(gen):
    return list(gen)


print("Profiling memory for generator consumption...")
consumed_list = consume_generator(my_generator)

Profiling memory for list creation...
Function: create_list
Current memory usage is 40.441798MB; Peak was 40.441838MB
Profiling memory for generator creation...
Function: create_generator
Current memory usage is 0.000392MB; Peak was 0.000392MB
Profiling memory for generator consumption...
Function: consume_generator
Current memory usage is 40.442983MB; Peak was 40.442983MB

When you run this script, the decorator will handle the memory profiling for each function call. You will see that creating the list consumes a significant amount of memory, while creating the generator uses a negligible amount. The final step shows that consuming the generator into a list uses a similar amount of memory as creating the list in the first place, proving that the memory is only used when the values are actually needed.

Essential Iteration Tools: `zip`, `sorted`, `enumerate`

Python’s standard library offers several built-in functions that are indispensable for iteration tasks in finance.

enumerate(iterable, start=0): When processing a series of cashflows, you might need to know the period number for each payment. enumerate is perfect for this.

cashflows = [100, 100, 100, 1100] # Coupon payments and principal
for period, cf in enumerate(cashflows, 1):
    print(f"Period {period}: Cashflow = {cf}")

Period 1: Cashflow = 100
Period 2: Cashflow = 100
Period 3: Cashflow = 100
Period 4: Cashflow = 1100

zip(*iterables): This function is very useful for combining different streams of data. For example, you can pair trade dates with their corresponding notionals.

trade_dates = ['2025-11-05', '2025-11-06', '2025-11-07']
notionals = [1_000_000, 2_500_000, 500_000]
for date, notional in zip(trade_dates, notionals):
    print(f"On {date}, we traded a notional of {notional:,}")

On 2025-11-05, we traded a notional of 1,000,000
On 2025-11-06, we traded a notional of 2,500,000
On 2025-11-07, we traded a notional of 500,000

sorted(iterable, key=None, reverse=False): Sorting is a common task. You might want to sort a list of trades by maturity date or notional.

from dataclasses import dataclass
from datetime import date

@dataclass
class Trade:
    trade_id: str
    maturity: date
    notional: float

trades = [
    Trade('T1', date(2026, 12, 31), 10_000_000),
    Trade('T2', date(2025, 12, 31), 5_000_000),
    Trade('T3', date(2027, 12, 31), 15_000_000),
]

# Sort trades by maturity date
sorted_by_maturity = sorted(trades, key=lambda t: t.maturity)
for trade in sorted_by_maturity:
    print(trade)

Trade(trade_id='T2', maturity=datetime.date(2025, 12, 31), notional=5000000)
Trade(trade_id='T1', maturity=datetime.date(2026, 12, 31), notional=10000000)
Trade(trade_id='T3', maturity=datetime.date(2027, 12, 31), notional=15000000)

The `itertools` Module: A Treasure Trove for Financial Iteration

The itertools module is a gem in the Python standard library, providing a collection of fast, memory-efficient tools for working with iterators. For financial applications, where we often deal with time series, cashflow streams, and simulations, these tools are particularly powerful.

itertools.chain(*iterables) and itertools.chain.from_iterable(iterable): Often you need to process items from several sequences in a row. chain lets you treat them as a single, continuous stream without merging them in memory. A great example is pricing a fixed-to-float swap leg, where initial coupon payments are fixed, and later ones are floating.

chain.from_iterable is a useful variant that takes a single iterable of iterables.

import itertools
import random

# First 2 years have fixed coupons
fixed_leg = [50, 50, 50, 50]

# Next 3 years have floating coupons (calculated for demonstration)
floating_leg = [50 + random.uniform(-5, 5) for _ in range(6)]

# We have a list of cashflow legs
bond_legs = [fixed_leg, floating_leg]

# We can use chain.from_iterable to chain them together
full_swap_leg = itertools.chain.from_iterable(bond_legs)

print("Full cashflow stream for the leg:")
for cf in full_swap_leg:
    print(f"{cf:.2f}", end=' ')

Full cashflow stream for the leg:
50.00 50.00 50.00 50.00 54.75 46.22 49.54 48.24 46.38 50.59

itertools.accumulate(iterable[, func]): This is ideal for calculating cumulative sums or running products. A common use case is to calculate the cumulative P&L of a trading strategy.

import itertools
daily_pnl = [150, -200, 50, 300, -100]
cumulative_pnl = itertools.accumulate(daily_pnl)
print(list(cumulative_pnl))
# Output: [150, -50, 0, 300, 200]

[150, -50, 0, 300, 200]

It can also model more complex recurrence relations, like the amortization of a loan. Let’s calculate the outstanding balance of a fixed-payment annuity until it’s paid off.

import itertools


def outstanding_balance(balance, payment, rate):
    return balance * (1 + rate) - payment


initial_notional = 1_000
interest_rate = 0.01  # Monthly rate (around 12% annually)
monthly_payment = 50

# Create an infinite stream of payments
payments = itertools.repeat(monthly_payment)

# The first argument to the lambda is the accumulated value (the balance)
# The second argument is the next item from the iterable (the payment)
balances = itertools.accumulate(
    payments,
    lambda balance, pmt: outstanding_balance(balance, pmt, interest_rate),
    initial=initial_notional,
)

# We can use takewhile to iterate until the loan is paid off (balance > 0)
# The +1 is to see the final payment that brings the balance to <= 0
amortization_schedule = itertools.takewhile(lambda balance: balance > 0, balances)

for i, balance in enumerate(amortization_schedule):
    print(f"Month          {i+1}: {balance:,.2f}")

# Note: A proper amortization schedule would have a final smaller payment.
# This example is to demonstrate the power of itertools.

Month          1: 1,000.00
Month          2: 960.00
Month          3: 919.60
Month          4: 878.80
Month          5: 837.58
Month          6: 795.96
Month          7: 753.92
Month          8: 711.46
Month          9: 668.57
Month          10: 625.26
Month          11: 581.51
Month          12: 537.33
Month          13: 492.70
Month          14: 447.63
Month          15: 402.10
Month          16: 356.12
Month          17: 309.69
Month          18: 262.78
Month          19: 215.41
Month          20: 167.56
Month          21: 119.24
Month          22: 70.43
Month          23: 21.14

itertools.pairwise(iterable): As we’ve seen, this is ideal for working with consecutive items in a sequence. Calculating year fractions for a swap leg is a prime example.

import itertools
from datetime import date

payment_dates = [date(2025, 1, 15), date(2025, 7, 15), date(2026, 1, 15)]
def year_fraction(start, end):
    return (end - start).days / 365.25

for start, end in itertools.pairwise(payment_dates):
    yf = year_fraction(start, end)
    print(f"Period: {start} to {end}, Year Fraction: {yf:.4f}")

Period: 2025-01-15 to 2025-07-15, Year Fraction: 0.4956
Period: 2025-07-15 to 2026-01-15, Year Fraction: 0.5038

itertools.cycle(iterable): Repeats a sequence indefinitely. This can be used to cycle through a set of risk scenarios or apply a repeating pattern of market data shocks.

import itertools
scenarios = itertools.cycle(['Base', 'Rate Up', 'Rate Down'])
for _ in range(5):
    print(f"Running scenario: {next(scenarios)}")

Running scenario: Base
Running scenario: Rate Up
Running scenario: Rate Down
Running scenario: Base
Running scenario: Rate Up

Conclusion

Understanding the difference between being “eager” (grabbing everything at once) and “lazy” (grabbing only what you need) is one of those leveling-up moments in Python. By leaning on iterators, generators, and the itertools library, you can write code that’s not just more efficient, but also much cleaner and easier to reason about.

Note

The complete Python script with all the examples from this post is available for download and experimentation. You can get it here: iteration.py.

Introduction

Lists vs. Tuples: The Eager Iterables

Key Differences

Iterators: The Lazy Approach

Generators: Simplified Iterators

Example: Flattening a List of Lists

Memory Efficiency in Action

Essential Iteration Tools: zip, sorted, enumerate

The itertools Module: A Treasure Trove for Financial Iteration

Conclusion

Essential Iteration Tools: `zip`, `sorted`, `enumerate`

The `itertools` Module: A Treasure Trove for Financial Iteration