The Python Data Model

Introduction

If you have ever felt that Python’s built-in types possess a certain intuitive behavior that your custom classes lack, you are not alone. You expect len(obj) to get the size, obj['ticker'] to access an item, and for asset in obj to iterate through elements.

This consistency is a core principle of the Python Data Model. By implementing specific “special methods” (often called dunder methods, for double underscore), your custom classes can behave just like the built-ins. This greatly enhances expressiveness, readability, and compatibility.

In this post, we will build a Portfolio class that leverages the Python Data Model. We will focus on structuring a robust and well-behaved object.

The Goal: A Pythonic Portfolio

We want a Portfolio class that stores a list of financial positions. But we don’t want a clunky interface like portfolio.get_position_by_ticker("AAPL"). We want the market-standard syntax:

len(portfolio) to size up our exposure.
portfolio["AAPL"] to get a quote instantly.
for position in portfolio to audit our holdings.

Let’s start by defining a simple Position container. To keep our balance sheet clean, we’ll track the total USD value rather than quantity and price.

from __future__ import annotations

from dataclasses import dataclass
from collections.abc import Iterator


@dataclass
class Position:
    symbol: str
    value_usd: float

The Initial `Portfolio` Class

We start with a standard class definition. To index our strategy efficiently, we’ll store the positions in a dictionary keyed by their symbol. We name it _contents to signal it’s an internal implementation detail—a trade secret, if you will—encouraging users to access data through the public interface.

class Portfolio:
    def __init__(self, name: str, managers: tuple[str, ...], contents: list[Position] | None = None):
        self.name = name
        self.managers = managers
        # Store positions in a dictionary keyed by symbol for fast lookup
        self._contents = {pos.symbol: pos for pos in contents} if contents else {}

    def __repr__(self):
        return f"Portfolio(name={self.name!r}, managers={self.managers!r}, contents={list(self._contents.values())!r})"

# Create some data
pos1 = Position("AAPL", 15000.0)
pos2 = Position("GOOG", 140000.0)
p = Portfolio("Tech Fund", ("Alice", "Bob"), [pos1, pos2])
print(p)

Portfolio(name='Tech Fund', managers=('Alice', 'Bob'), contents=[Position(symbol='AAPL', value_usd=15000.0), Position(symbol='GOOG', value_usd=140000.0)])

We included __repr__ right away. This dunder method provides the “official” string representation, useful for debugging the object’s state.

Emulating a Collection

Right now, if we try to gauge the size of our portfolio, it raises a TypeError.

try:
    len(p)
except TypeError as e:
    print(f"Error: {e}")

Error: object of type 'Portfolio' has no len()

To enable the len() function, we implement __len__.

`len`

class Portfolio(Portfolio): # Inheriting to add methods incrementally for this post
    def __len__(self):
        return len(self._contents)

p = Portfolio("Tech Fund", ("Alice", "Bob"), [pos1, pos2])
print(f"Portfolio size: {len(p)}")

Portfolio size: 2

Now len(p) delegates to the underlying self._contents dictionary. It’s a classic delegation strategy.

`getitem`

We don’t want to scan through a list to find a stock; that’s O(N) efficiency, and in performance-critical applications, speed is crucial. The __getitem__ method allows us to use the square bracket notation [] to access items directly.

class Portfolio(Portfolio):
    def __getitem__(self, symbol: str) -> Position | None:
        return self._contents.get(symbol)

p = Portfolio("Tech Fund", ("Alice", "Bob"), [pos1, pos2])
print(f"AAPL position: {p['AAPL']}")
print(f"MSFT position: {p['MSFT']}") # Returns None

AAPL position: Position(symbol='AAPL', value_usd=15000.0)
MSFT position: None

With this, our Portfolio behaves like a mapping—an efficient way to access our assets.

Enabling Iteration: `iter`

Iterating through a collection is fundamental. Since we backed our portfolio with a dictionary, default iteration would just give us the keys (symbols). But when we loop through a portfolio, we usually want the assets themselves. We can dictate this behavior by implementing __iter__.

class Portfolio(Portfolio):
    def __iter__(self) -> Iterator[Position]:
        return iter(self._contents.values())


p = Portfolio("Tech Fund", ("Alice", "Bob"), [pos1, pos2])

print("Iterating through portfolio:")
for pos in p:
    print(f" - {pos.symbol}: ${pos.value_usd:,.2f}")

Iterating through portfolio:
 - AAPL: $15,000.00
 - GOOG: $140,000.00

Now, for pos in p yields the positions directly. We have full control over the traversal.

Membership and Truthiness: `contains` and `bool`

We can already check for membership because we implemented __iter__, but that’s like manually auditing every file to find one document—it’s O(N). Since we have a hash map (dictionary) underneath, we can do an O(1) check using __contains__.

We also want to know if our portfolio is empty. By default, Python checks len(), but implementing __bool__ allows us to be explicit about what constitutes a “truthy” portfolio.

class Portfolio(Portfolio):
    def __contains__(self, item: str | Position) -> bool:
        if isinstance(item, str):
            return item in self._contents
        if isinstance(item, Position):
            return item.symbol in self._contents
        return False

    def __bool__(self) -> bool:
        return bool(self._contents)


p = Portfolio("Tech Fund", ("Alice", "Bob"), [pos1, pos2])

print(f"Is 'AAPL' in p? {'AAPL' in p}")
print(f"Is 'MSFT' in p? {'MSFT' in p}")
print(f"Is p truthy? {bool(p)}")

empty_p = Portfolio("Empty", ("Nobody",))
print(f"Is empty_p truthy? {bool(empty_p)}")

Is 'AAPL' in p? True
Is 'MSFT' in p? False
Is p truthy? True
Is empty_p truthy? False

Operator Overloading: `add` (Combining Portfolios)

In Python, objects can be combined using operators. If we have two portfolios, it makes intuitive sense to “add” them together using +.

We implement this via __add__. This isn’t just concatenation; it’s a consolidation. We need to combine managers and sum up the value of overlapping positions.

class Portfolio(Portfolio):
    def __add__(self, other: Portfolio) -> Portfolio:
        if not isinstance(other, Portfolio):
            return NotImplemented

        new_name = f"{self.name} + {other.name}"
        # Combine managers, removing duplicates
        new_managers = tuple(sorted(set(self.managers + other.managers)))
        # Merge contents
        all_positions = {}

        # Add positions from self
        for pos in self:
            all_positions[pos.symbol] = Position(pos.symbol, pos.value_usd)

        # Add positions from other
        for pos in other:
            if pos.symbol in all_positions:
                existing = all_positions[pos.symbol]
                existing.value_usd += pos.value_usd
            else:
                all_positions[pos.symbol] = Position(pos.symbol, pos.value_usd)
        return Portfolio(new_name, new_managers, list(all_positions.values()))


# Example Usage
p1 = Portfolio("Tech", ("Alice",), [Position("AAPL", 15000.0)])
p2 = Portfolio(
    "Growth", ("Bob",), [Position("AAPL", 8000.0), Position("MSFT", 30000.0)]
)

p3 = p1 + p2
print(f"Merged Portfolio: {p3}")
print(f"Merged AAPL: {p3['AAPL']}")

Merged Portfolio: Portfolio(name='Tech + Growth', managers=('Alice', 'Bob'), contents=[Position(symbol='AAPL', value_usd=23000.0), Position(symbol='MSFT', value_usd=30000.0)])
Merged AAPL: Position(symbol='AAPL', value_usd=23000.0)

String Representation: `repr` vs `str`

Finally, we need to decide how our portfolio presents itself to the world.

__repr__ (The Private Ledger): Unambiguous, developer-focused. It should ideally allow you to recreate the object.
__str__ (The Public Face): Readable, user-focused. “What does this object represent to a human?”

If __str__ is not defined, Python falls back to __repr__. But we want a nice summary.

class Portfolio(Portfolio):
    def __str__(self):
        return f"Portfolio '{self.name}' with {len(self)} positions managed by {', '.join(self.managers)}"


p = Portfolio("Tech Fund", ("Alice", "Bob"), [pos1, pos2])
print(f"Str: {str(p)}")
print(f"Repr: {repr(p)}")

Str: Portfolio 'Tech Fund' with 2 positions managed by Alice, Bob
Repr: Portfolio(name='Tech Fund', managers=('Alice', 'Bob'), contents=[Position(symbol='AAPL', value_usd=15000.0), Position(symbol='GOOG', value_usd=140000.0)])

Why This Matters

By implementing these methods, we’ve turned a basic container into a robust and functional class.

Lookup by Symbol: Instant access with p['AAPL'].
Intuitive Iteration: Seamless traversal with for pos in p.
Arithmetic: Combining objects made easy with p1 + p2.
Expressiveness: The code reads clearly and naturally.

Summary

The Python Data Model allows your custom objects to integrate seamlessly with built-in types and Python’s core features.

__init__: Initialization.
__repr__: Detailed developer-focused representation.
__str__: User-friendly summary.
__len__: Getting the size.
__getitem__: Direct item access.
__iter__: Enabling iteration.
__contains__: Fast membership checking.
__bool__: Truthiness check.
__add__: Combining objects.

By embracing these patterns, you align with Python’s design philosophy and create clean, Pythonic code.

Note

The complete code for this example is available here: [portfolio.py](portfolio.py).

Introduction

The Goal: A Pythonic Portfolio

The Initial Portfolio Class

Emulating a Collection

__len__

__getitem__

Enabling Iteration: __iter__

Membership and Truthiness: __contains__ and __bool__

Operator Overloading: __add__ (Combining Portfolios)

String Representation: __repr__ vs __str__

Why This Matters

Summary

The Initial `Portfolio` Class

`len`

`getitem`

Enabling Iteration: `iter`

Membership and Truthiness: `contains` and `bool`

Operator Overloading: `add` (Combining Portfolios)

String Representation: `repr` vs `str`