Python Academy: Iteration

Introduction

In the world of quantitative finance, we are constantly dealing with vast amounts of data. Processing this data efficiently is not just a matter of speed, but also of memory management. Python, with its elegant and intuitive syntax, provides powerful tools for handling data streams. This post, part of the Python Academy series, delves into the art of iteration, exploring the fundamental concepts of iterators and generators, and how they can help us write more memory-efficient code.

For a mathematician, an iteration is a process of repeating a set of instructions. In programming, this concept is embodied in loops and iterables. While a for loop over a list might seem straightforward, there’s a lot more happening under the hood. Understanding these mechanics is key to mastering Python for data-intensive tasks.

Lists vs. Tuples: The Eager Iterables

Lists and tuples are the most common sequence types in Python. They are eager in the sense that all their elements are stored in memory at once. When you create a list of a million numbers, Python allocates memory for all of them.

my_list = [i for i in range(1_000_000)]
# This creates a list with 1,000,000 integers in memory.

This is fine for small to medium-sized datasets, but for very large datasets, this can lead to MemoryError.

Key Differences

The primary difference between lists and tuples is mutability.

Lists are mutable, meaning you can change their content. You can add, remove, or change elements. This flexibility comes at a cost: lists typically require more memory to store the same number of elements compared to tuples. This is because Python allocates extra memory to accommodate future additions.
Tuples are immutable. Once a tuple is created, you cannot change its content. This immutability makes them slightly more memory-efficient and faster to access than lists.

From a quant’s perspective, you can think of a tuple as a “frozen” or “read-only” list. Use tuples for data that should not change, like coordinates, configuration settings, or records from a database. If you need a collection that you will modify, a list is the way to go.

Iterators: The Lazy Approach

An iterator is an object that represents a stream of data. It produces one item at a time, only when requested. This “lazy” evaluation is incredibly memory-efficient. The iterator protocol in Python consists of two methods:

__iter__(): Returns the iterator object itself.
__next__(): Returns the next item from the stream. When there are no more items, it raises a StopIteration exception.

You can get an iterator from any iterable (like a list) using the iter() function.

my_list = [1, 2, 3]
my_iterator = iter(my_list)

print(next(my_iterator))  # Output: 1
print(next(my_iterator))  # Output: 2
print(next(my_iterator))  # Output: 3
# The next call would raise StopIteration

1
2
3

Generators: Simplified Iterators

Writing a class with __iter__ and __next__ can be cumbersome. Generators provide a much simpler way to create iterators. A generator is a function that uses the yield keyword to return an item. When a generator function is called, it returns a generator object, which is a type of iterator.

Here’s a simple generator that produces a sequence of numbers:

def number_generator(n):
    for i in range(n):
        yield i

gen = number_generator(1_000_000)
# No large list is created in memory here.
# The generator object 'gen' is created, ready to produce numbers.

The state of the generator is saved between yield calls. This allows it to resume where it left off.

Example: Flattening a List of Lists

A common task is to flatten a list of lists into a single list. A generator is a perfect tool for this, especially when dealing with large datasets, as it avoids creating a new, large list in memory.

Imagine you have a list of trades, where each trade has a list of associated cashflows. You want to process all cashflows from all trades.

trades_cashflows = [
    [10, 20, 30],  # Cashflows for Trade 1
    [15, 25],  # Cashflows for Trade 2
    [100, -10, 5],  # Cashflows for Trade 3
    110,  # Simple payment doesn't need to be in a list!
]


def flatten(list_of_lists):
    for item in list_of_lists:
        if isinstance(item, list):
            for subitem in item:
                yield subitem
        else:
            yield item


# The generator does not hold all cashflows in memory
all_cashflows_generator = flatten(trades_cashflows)

for cf in all_cashflows_generator:
    print(cf, end=" ")
# Output: 10 20 30 15 25 100 -10 5 110

10 20 30 15 25 100 -10 5 110

This flatten generator is memory-efficient. It only needs to store one cashflow at a time, regardless of the total number of cashflows.

Memory Efficiency in Action

To make our memory analysis cleaner, we can define a decorator. This is a more Pythonic way to wrap functions with common boilerplate code, like starting and stopping tracemalloc.

Let’s demonstrate the memory difference using a decorator with the standard library’s tracemalloc module. We’ll compare the memory usage of creating a list versus creating a generator.

import tracemalloc
from functools import wraps


def profile_memory(func):
    """A decorator to profile the memory usage of a function."""

    @wraps(func)
    def wrapper(*args, **kwargs):
        tracemalloc.start()
        result = func(*args, **kwargs)
        current, peak = tracemalloc.get_traced_memory()
        print(f"Function: {func.__name__}")
        print(
            f"Current memory usage is {current / 10**6:.6f}MB; Peak was {peak / 10**6:.6f}MB"
        )
        tracemalloc.stop()
        return result

    return wrapper


@profile_memory
def create_list(n):
    """This function creates a list of n numbers."""
    return [i for i in range(n)]


@profile_memory
def create_generator(n):
    """This function creates a generator of n numbers."""
    return (i for i in range(n))


n = 1_000_000
print("Profiling memory for list creation...")
my_list = create_list(n)


print("Profiling memory for generator creation...")
my_generator = create_generator(n)

# The generator itself is small. Let's profile consuming it.
@profile_memory
def consume_generator(gen):
    return list(gen)


print("Profiling memory for generator consumption...")
consumed_list = consume_generator(my_generator)

Profiling memory for list creation...
Function: create_list
Current memory usage is 40.442349MB; Peak was 40.442389MB
Profiling memory for generator creation...
Function: create_generator
Current memory usage is 0.000392MB; Peak was 0.000392MB
Profiling memory for generator consumption...
Function: consume_generator
Current memory usage is 40.443183MB; Peak was 40.443183MB

When you run this script, the decorator will handle the memory profiling for each function call. You will see that creating the list consumes a significant amount of memory, while creating the generator uses a negligible amount. The final step shows that consuming the generator into a list uses a similar amount of memory as creating the list in the first place, proving that the memory is only used when the values are actually needed.

Essential Iteration Tools: `zip`, `sorted`, `enumerate`

Python’s standard library offers several built-in functions that are indispensable for iteration tasks in finance.

enumerate(iterable, start=0): When processing a series of cashflows, you might need to know the period number for each payment. enumerate is perfect for this.

cashflows = [100, 100, 100, 1100] # Coupon payments and principal
for period, cf in enumerate(cashflows, 1):
    print(f"Period {period}: Cashflow = {cf}")

Period 1: Cashflow = 100
Period 2: Cashflow = 100
Period 3: Cashflow = 100
Period 4: Cashflow = 1100

zip(*iterables): This function is incredibly useful for combining different streams of data. For example, you can pair trade dates with their corresponding notionals.

trade_dates = ['2025-11-05', '2025-11-06', '2025-11-07']
notionals = [1_000_000, 2_500_000, 500_000]
for date, notional in zip(trade_dates, notionals):
    print(f"On {date}, we traded a notional of {notional:,}")

On 2025-11-05, we traded a notional of 1,000,000
On 2025-11-06, we traded a notional of 2,500,000
On 2025-11-07, we traded a notional of 500,000

sorted(iterable, key=None, reverse=False): Sorting is a common task. You might want to sort a list of trades by maturity date or notional.

from dataclasses import dataclass
from datetime import date

@dataclass
class Trade:
    trade_id: str
    maturity: date
    notional: float

trades = [
    Trade('T1', date(2026, 12, 31), 10_000_000),
    Trade('T2', date(2025, 12, 31), 5_000_000),
    Trade('T3', date(2027, 12, 31), 15_000_000),
]

# Sort trades by maturity date
sorted_by_maturity = sorted(trades, key=lambda t: t.maturity)
for trade in sorted_by_maturity:
    print(trade)

Trade(trade_id='T2', maturity=datetime.date(2025, 12, 31), notional=5000000)
Trade(trade_id='T1', maturity=datetime.date(2026, 12, 31), notional=10000000)
Trade(trade_id='T3', maturity=datetime.date(2027, 12, 31), notional=15000000)

The `itertools` Module: A Treasure Trove for Financial Iteration

The itertools module is a gem in the Python standard library, providing a collection of fast, memory-efficient tools for working with iterators. For financial applications, where we often deal with time series, cashflow streams, and simulations, these tools are particularly powerful.

itertools.chain(*iterables) and itertools.chain.from_iterable(iterable): Often you need to process items from several sequences in a row. chain lets you treat them as a single, continuous stream without merging them in memory. A great example is pricing a fixed-to-float swap leg, where initial coupon payments are fixed, and later ones are floating.

chain.from_iterable is a useful variant that takes a single iterable of iterables.

import itertools
import random

# First 2 years have fixed coupons
fixed_leg = [50, 50, 50, 50]

# Next 3 years have floating coupons (calculated for demonstration)
floating_leg = [50 + random.uniform(-5, 5) for _ in range(6)]

# We have a list of cashflow legs
bond_legs = [fixed_leg, floating_leg]

# We can use chain.from_iterable to chain them together
full_swap_leg = itertools.chain.from_iterable(bond_legs)

print("Full cashflow stream for the leg:")
for cf in full_swap_leg:
    print(f"{cf:.2f}", end=' ')

Full cashflow stream for the leg:
50.00 50.00 50.00 50.00 52.93 47.98 54.73 54.32 46.73 49.92

itertools.accumulate(iterable[, func]): This is perfect for calculating cumulative sums or running products. A common use case is to calculate the cumulative P&L of a trading strategy.

import itertools
daily_pnl = [150, -200, 50, 300, -100]
cumulative_pnl = itertools.accumulate(daily_pnl)
print(list(cumulative_pnl))
# Output: [150, -50, 0, 300, 200]

[150, -50, 0, 300, 200]

It can also model more complex recurrence relations, like the amortization of a loan. Let’s calculate the outstanding balance of a fixed-payment annuity until it’s paid off.

import itertools


def outstanding_balance(balance, payment, rate):
    return balance * (1 + rate) - payment


initial_notional = 1_000
interest_rate = 0.01  # Monthly rate (around 12% annually)
monthly_payment = 50

# Create an infinite stream of payments
payments = itertools.repeat(monthly_payment)

# The first argument to the lambda is the accumulated value (the balance)
# The second argument is the next item from the iterable (the payment)
balances = itertools.accumulate(
    payments,
    lambda balance, pmt: outstanding_balance(balance, pmt, interest_rate),
    initial=initial_notional,
)

# We can use takewhile to iterate until the loan is paid off (balance > 0)
# The +1 is to see the final payment that brings the balance to <= 0
amortization_schedule = itertools.takewhile(lambda balance: balance > 0, balances)

for i, balance in enumerate(amortization_schedule):
    print(f"Month          {i+1}: {balance:,.2f}")

# Note: A proper amortization schedule would have a final smaller payment.
# This example is to demonstrate the power of itertools.

Month          1: 1,000.00
Month          2: 960.00
Month          3: 919.60
Month          4: 878.80
Month          5: 837.58
Month          6: 795.96
Month          7: 753.92
Month          8: 711.46
Month          9: 668.57
Month          10: 625.26
Month          11: 581.51
Month          12: 537.33
Month          13: 492.70
Month          14: 447.63
Month          15: 402.10
Month          16: 356.12
Month          17: 309.69
Month          18: 262.78
Month          19: 215.41
Month          20: 167.56
Month          21: 119.24
Month          22: 70.43
Month          23: 21.14

itertools.pairwise(iterable): As we’ve seen, this is ideal for working with consecutive items in a sequence. Calculating year fractions for a swap leg is a prime example.

import itertools
from datetime import date

payment_dates = [date(2025, 1, 15), date(2025, 7, 15), date(2026, 1, 15)]
def year_fraction(start, end):
    return (end - start).days / 365.25

for start, end in itertools.pairwise(payment_dates):
    yf = year_fraction(start, end)
    print(f"Period: {start} to {end}, Year Fraction: {yf:.4f}")

Period: 2025-01-15 to 2025-07-15, Year Fraction: 0.4956
Period: 2025-07-15 to 2026-01-15, Year Fraction: 0.5038

itertools.cycle(iterable): Repeats a sequence indefinitely. This can be used to cycle through a set of risk scenarios or apply a repeating pattern of market data shocks.

import itertools
scenarios = itertools.cycle(['Base', 'Rate Up', 'Rate Down'])
for _ in range(5):
    print(f"Running scenario: {next(scenarios)}")

Running scenario: Base
Running scenario: Rate Up
Running scenario: Rate Down
Running scenario: Base
Running scenario: Rate Up

Conclusion

Understanding the difference between eager and lazy evaluation is crucial for writing efficient Python code, especially in quantitative finance where large datasets are the norm. By mastering iterators, generators, and the powerful tools in itertools, you can write code that is not only more memory-efficient but also more expressive and elegant. The built-in functions like enumerate, zip, and sorted further enhance your ability to work with iterable data effectively.

Note

The complete Python script with all the examples from this post is available for download and experimentation. You can get it here: iteration.py.

Introduction

Lists vs. Tuples: The Eager Iterables

Key Differences

Iterators: The Lazy Approach

Generators: Simplified Iterators

Example: Flattening a List of Lists

Memory Efficiency in Action

Essential Iteration Tools: zip, sorted, enumerate

The itertools Module: A Treasure Trove for Financial Iteration

Conclusion

Essential Iteration Tools: `zip`, `sorted`, `enumerate`

The `itertools` Module: A Treasure Trove for Financial Iteration