API Reference¶

This section provides detailed documentation for all public classes and functions in Edsger.

edsger.path Module¶

Dijkstra Class¶

The main class for performing Dijkstra’s shortest path algorithm on directed graphs.

Constructor Parameters¶

edges: pandas.DataFrame containing the directed graph edges
tail: Column name for edge source nodes (default: ‘tail’)
head: Column name for edge destination nodes (default: ‘head’)
weight: Column name for edge weights (default: ‘weight’)
orientation: Either ‘out’ (single-source) or ‘in’ (single-target) (default: ‘out’)
check_edges: Whether to validate edge data (default: False)
permute: Whether to optimize node indexing (default: False)
verbose: Whether to print messages about parallel edge removal (default: False)

Methods¶

run¶

def run(self, vertex_idx, path_tracking=False, return_inf=True, 
        return_series=False, heap_length_ratio=1.0)

Runs the shortest path algorithm from/to the specified vertex.

Parameters:

vertex_idx: Source/target vertex index
path_tracking: Whether to track paths for reconstruction (default: False)
return_inf: Return infinity for unreachable vertices (default: True)
return_series: Return results as pandas Series (default: False)
heap_length_ratio: Heap size as fraction of vertices (default: 1.0)

Returns:

Array or Series of shortest path lengths

get_path¶

def get_path(self, vertex_idx)

Reconstructs the shortest path to/from a vertex (requires path_tracking=True).

Parameters:

vertex_idx: Destination/source vertex index

Returns:

Array of vertex indices forming the path

get_vertices¶

def get_vertices(self)

Returns all vertices in the graph.

Returns:

Array of vertex indices

Examples¶

Basic Usage¶

from edsger.path import Dijkstra
import pandas as pd

# Create a directed graph
edges = pd.DataFrame({
    'tail': [0, 0, 1],
    'head': [1, 2, 2],
    'weight': [1, 4, 2]
})

# Initialize Dijkstra
dijkstra = Dijkstra(edges)

# Find shortest paths from vertex 0
paths = dijkstra.run(vertex_idx=0)

With Path Tracking¶

# Enable path tracking
paths = dijkstra.run(vertex_idx=0, path_tracking=True)

# Get the actual path to vertex 2
path = dijkstra.get_path(vertex_idx=2)

Custom Parameters¶

# Create with custom settings
dijkstra = Dijkstra(
    edges,
    orientation='in',
    check_edges=True,
    permute=True
)

# Run with custom parameters
paths = dijkstra.run(
    vertex_idx=2,
    return_series=True,
    heap_length_ratio=0.5
)

BellmanFord Class¶

The class for performing Bellman-Ford shortest path algorithm on directed graphs, which handles directed graphs with negative edge weights and detects negative cycles.

Constructor Parameters¶

edges: pandas.DataFrame containing the directed graph edges
tail: Column name for edge source nodes (default: ‘tail’)
head: Column name for edge destination nodes (default: ‘head’)
weight: Column name for edge weights (default: ‘weight’)
orientation: Either ‘out’ (single-source) or ‘in’ (single-target) (default: ‘out’)
check_edges: Whether to validate edge data (default: False)
permute: Whether to optimize node indexing (default: False)
verbose: Whether to print messages about parallel edge removal (default: False)

Methods¶

run¶

def run(self, vertex_idx, path_tracking=False, return_inf=True, 
        return_series=False, detect_negative_cycles=True)

Runs the Bellman-Ford algorithm from/to the specified vertex.

Parameters:

vertex_idx: Source/target vertex index
path_tracking: Whether to track paths for reconstruction (default: False)
return_inf: Return infinity for unreachable vertices (default: True)
return_series: Return results as pandas Series (default: False)
detect_negative_cycles: Whether to detect negative cycles (default: True)

Returns:

Array or Series of shortest path lengths

Raises:

ValueError: If a negative cycle is detected and detect_negative_cycles=True

get_path¶

def get_path(self, vertex_idx)

Reconstructs the shortest path to/from a vertex (requires path_tracking=True).

Parameters:

vertex_idx: Destination/source vertex index

Returns:

Array of vertex indices forming the path

get_vertices¶

def get_vertices(self)

Returns all vertices in the graph.

Returns:

Array of vertex indices

Examples¶

Basic Usage with Negative Weights¶

from edsger.path import BellmanFord
import pandas as pd

# Create a directed graph with negative weights
edges = pd.DataFrame({
    'tail': [0, 0, 1, 1, 2, 3],
    'head': [1, 2, 2, 3, 3, 4],
    'weight': [1, 4, -2, 5, 1, 3]  # Note the negative weight
})

# Initialize Bellman-Ford
bf = BellmanFord(edges)

# Find shortest paths from vertex 0
paths = bf.run(vertex_idx=0)
print(paths)  # [ 0.  1. -1.  0.  3.]

Negative Cycle Detection¶

# Create a directed graph with a negative cycle
edges_cycle = pd.DataFrame({
    'tail': [0, 1, 2],
    'head': [1, 2, 0],
    'weight': [1, -2, -1]  # Cycle 0→1→2→0 has total weight -2
})

bf_cycle = BellmanFord(edges_cycle)
try:
    bf_cycle.run(vertex_idx=0)
except ValueError as e:
    print(f"Error: {e}")  # Error: Negative cycle detected in the graph

Path Tracking with Negative Weights¶

# Enable path tracking
paths = bf.run(vertex_idx=0, path_tracking=True)

# Get the actual path to vertex 2 (using negative weight path)
path = bf.get_path(vertex_idx=2)
print(path)  # Path using the negative weight edge

Performance: Disabling Cycle Detection¶

# For performance when you know there are no negative cycles
paths = bf.run(vertex_idx=0, detect_negative_cycles=False)

BFS Class¶

The class for performing Breadth-First Search on directed graphs, finding shortest paths based on minimum number of edges rather than weighted distances.

Constructor Parameters¶

edges: pandas.DataFrame containing the directed graph edges
tail: Column name for edge source nodes (default: ‘tail’)
head: Column name for edge destination nodes (default: ‘head’)
orientation: Either ‘out’ (single-source) or ‘in’ (single-target) (default: ‘out’)
check_edges: Whether to validate edge data (default: False)
permute: Whether to optimize node indexing (default: False)
verbose: Whether to print messages about parallel edge removal (default: False)
sentinel: Sentinel value for unreachable nodes (default: -9999). Must be a negative integer that fits in int32 range.

Note: Weight column is not used by BFS. If present in the DataFrame, it will be ignored.

Methods¶

run¶

def run(self, vertex_idx, path_tracking=False, return_series=False)

Runs BFS from/to the specified vertex.

Parameters:

vertex_idx: Source/target vertex index
path_tracking: Whether to track paths for reconstruction (default: False)
return_series: Return results as pandas Series (default: False)

Returns:

Array or Series of predecessor indices
- The sentinel value (default -9999) indicates unreachable vertex or start vertex
- Other values indicate the predecessor in the shortest path

get_path¶

def get_path(self, vertex_idx)

Reconstructs the shortest path to/from a vertex (requires path_tracking=True).

Parameters:

vertex_idx: Destination/source vertex index

Returns:

Array of vertex indices forming the path (from target to source)

get_vertices¶

def get_vertices(self)

Returns all vertices in the directed graph.

Returns:

Array of vertex indices

Examples¶

Basic BFS¶

from edsger.path import BFS
import pandas as pd

# Create a directed graph
edges = pd.DataFrame({
    'tail': [0, 0, 1, 2, 2, 3],
    'head': [1, 2, 3, 3, 4, 4]
})

# Initialize BFS
bfs = BFS(edges)

# Find shortest paths (by hop count) from vertex 0
predecessors = bfs.run(vertex_idx=0)
print(predecessors)  # [-9999     0     0     1     2]

Custom Sentinel Value¶

# Create BFS with custom sentinel for unreachable nodes
bfs = BFS(edges, sentinel=-1)
predecessors = bfs.run(vertex_idx=0)

# Check the sentinel value
print(bfs.UNREACHABLE)  # -1
# Unreachable nodes will have value -1 instead of -9999

With Path Tracking¶

# Enable path tracking
predecessors = bfs.run(vertex_idx=0, path_tracking=True)

# Get the actual path to vertex 4
path = bfs.get_path(vertex_idx=4)
print(path)  # [4 2 0] - path from target to source

With Non-Contiguous Vertex IDs¶

# Directed graph with non-contiguous vertex IDs
edges_sparse = pd.DataFrame({
    'tail': [0, 0, 10, 20],
    'head': [10, 20, 20, 30]
})

# Use permute=True for efficiency
bfs = BFS(edges_sparse, permute=True)
predecessors = bfs.run(vertex_idx=0, path_tracking=True)
path = bfs.get_path(vertex_idx=30)
print(path)  # [30 20  0]

Backward Search (orientation=’in’)¶

# Find vertices that can reach a target
bfs_in = BFS(edges, orientation='in')
predecessors = bfs_in.run(vertex_idx=4)
# Shows successors in backward traversal

Return as Pandas Series¶

# Get results as labeled Series
predecessors = bfs.run(vertex_idx=0, return_series=True)
print(predecessors)
# vertex_idx
# 0       -9999
# 1           0
# 2           0
# 3           1
# 4           2
# Name: predecessor, dtype: int32

Performance Characteristics¶

Time Complexity: O(V + E) - linear in vertices and edges
Space Complexity: O(V) - queue and predecessor arrays

HyperpathGenerating Class¶

The class for computing hyperpath-based routing using the Spiess-Florian algorithm, designed for transit network assignment and optimal strategy computation.

Reference: Spiess, H. and Florian, M. (1989). Optimal strategies: A new assignment model for transit networks. Transportation Research Part B 23(2), 83-102.

Overview¶

The Spiess-Florian algorithm computes optimal travel strategies in transit networks where passengers can choose between multiple routes with different frequencies and travel times. Unlike traditional shortest path algorithms, it models the probabilistic nature of transit where passengers may take the first arriving vehicle among a set of attractive lines.

The algorithm outputs:

Edge volumes: How passenger demand is distributed across transit lines
Travel times (u_i_vec): Expected travel time from each vertex to the destination, including waiting time

Constructor Parameters¶

edges: pandas.DataFrame containing transit network edges
tail: Column name for edge source nodes (default: ‘tail’)
head: Column name for edge destination nodes (default: ‘head’)
trav_time: Column name for travel times (default: ‘trav_time’)
freq: Column name for service frequencies (default: ‘freq’)
check_edges: Whether to validate edge data (default: False)
orientation: Currently only ‘in’ is supported (default: ‘in’)

Attributes¶

edge_count: Number of edges in the graph
vertex_count: Number of vertices in the graph
u_i_vec: Array of expected travel times from each vertex to the destination (available after run())
_edges: DataFrame containing edges with computed ‘volume’ column (available after run())

Methods¶

run¶

def run(self, origin, destination, volume, return_inf=False)

Computes the hyperpath and distributes demand volume across edges.

Parameters:

origin: Origin vertex index (int) or list of origin indices
destination: Destination vertex index (int)
volume: Demand volume (float) or list of volumes corresponding to each origin
return_inf: Whether to keep infinity for unreachable vertices (default: False)

Returns:

None (results are stored in instance attributes u_i_vec and _edges['volume'])

Examples¶

Basic Transit Network¶

from edsger.path import HyperpathGenerating
import pandas as pd

# Create a simple transit network
# Two parallel lines from stop 0 to stop 2 via different intermediate stops
edges = pd.DataFrame({
    'tail': [0, 0, 1, 3],
    'head': [1, 3, 2, 2],
    'trav_time': [5.0, 4.0, 3.0, 4.0],  # Travel times in minutes
    'freq': [0.1, 0.15, 0.2, 0.2]  # Frequencies (vehicles per minute)
})

# Initialize the algorithm
hp = HyperpathGenerating(edges)

# Compute hyperpath from origin 0 to destination 2 with 100 passengers
hp.run(origin=0, destination=2, volume=100.0)

# Access expected travel times to destination
print("Travel times to destination:", hp.u_i_vec)

# Access edge volumes (how passengers are distributed)
print(hp._edges[['tail', 'head', 'volume']])

Multiple Origins¶

# Compute hyperpath from multiple origins to a single destination
hp = HyperpathGenerating(edges)
hp.run(
    origin=[0, 1],           # Two origins
    destination=2,           # Single destination
    volume=[80.0, 20.0]      # Different demand volumes
)

# Total edge volumes reflect combined demand from all origins
print("Edge volumes:", hp._edges['volume'].values)

Custom Column Names¶

# Transit data with custom column names
transit_edges = pd.DataFrame({
    'from_stop': [0, 0, 1, 2],
    'to_stop': [1, 2, 2, 3],
    'travel_time': [2.0, 5.0, 1.5, 3.0],
    'frequency': [0.1, 0.05, 0.2, 0.15]
})

hp = HyperpathGenerating(
    transit_edges,
    tail='from_stop',
    head='to_stop',
    trav_time='travel_time',
    freq='frequency'
)
hp.run(origin=0, destination=3, volume=1.0)

With Edge Validation¶

# Enable input validation
hp = HyperpathGenerating(edges, check_edges=True)

# This will raise an error if:
# - Required columns are missing
# - Data types are incorrect
# - Travel times or frequencies are negative
# - Missing values exist

Understanding the Output¶

After calling run(), two key outputs are available:

u_i_vec: Expected travel time from each vertex to the destination
- Includes both in-vehicle travel time and expected waiting time
- Infinite values indicate unreachable vertices
_edges['volume']: Passenger volumes on each edge
- Shows how total demand is distributed across the transit network
- Higher volumes indicate more heavily used transit lines

Use Cases¶

Transit assignment: Distribute passenger demand across a transit network
Line planning: Identify heavily used routes that may need more capacity
Service evaluation: Assess expected travel times for different origin-destination pairs
Network design: Evaluate the impact of adding or removing transit lines

Parallel Edges Handling¶

The Dijkstra, BellmanFord, and BFS classes automatically handle parallel edges (multiple directed edges between the same pair of vertices) during initialization. When parallel edges are detected:

Automatic Processing: The _preprocess_edges() method is called internally during initialization
Minimum Weight Selection: For each pair of vertices with multiple edges, only the edge with the minimum weight is kept
Verbose Output: If verbose=True is set in the constructor, a message will be printed indicating how many parallel edges were removed

Example with Parallel Edges¶

import pandas as pd
from edsger.path import Dijkstra

# Directed graph with parallel edges between vertices 0->1
edges = pd.DataFrame({
    'tail': [0, 0, 1],     # Two directed edges from 0->1
    'head': [1, 1, 2],     # with weights 5.0 and 3.0
    'weight': [5.0, 3.0, 2.0]
})

# Initialize with verbose=True to see parallel edge removal
dijkstra = Dijkstra(edges, verbose=True)
# Output: "Automatically removed 1 parallel edge(s). For each pair of vertices, kept the edge with minimum weight."

# The directed graph now has only the edge 0->1 with weight 3.0 (minimum)
print(dijkstra.n_edges)  # Will show 2 instead of 3

This automatic handling ensures:

Consistent directed graph representation
Optimal shortest paths (using minimum weight edges)
No duplicate edges in the internal directed graph structure

edsger.graph_importer Module¶

GraphImporter Architecture¶

The GraphImporter system provides automatic DataFrame backend detection and optimization for all graph algorithms in Edsger.

Key Features¶

Automatic Detection: Identifies DataFrame type (pandas NumPy, pandas Arrow, or Polars)
Memory Optimization: Automatically uses uint32 for vertex indices when possible
Contiguous Memory: Ensures C-contiguous arrays for optimal Cython performance
Factory Pattern: Simple interface with automatic backend selection

standardize_graph_dataframe Function¶

from edsger.graph_importer import standardize_graph_dataframe

def standardize_graph_dataframe(edges, tail='tail', head='head', weight='weight')

Standardizes any supported DataFrame format to NumPy-backed pandas DataFrame.

Parameters:

edges: Input DataFrame (pandas, pandas with Arrow, or Polars)
tail: Column name for source vertices
head: Column name for destination vertices
weight: Column name for edge weights

Returns:

pandas DataFrame with NumPy arrays, optimized dtypes, and contiguous memory

Example Usage¶

import pandas as pd
import polars as pl
from edsger.graph_importer import standardize_graph_dataframe

# Works with any DataFrame type
edges_polars = pl.DataFrame({
    'source': [0, 1, 2],
    'target': [1, 2, 3],
    'cost': [1.0, 2.0, 3.0]
})

# Automatic conversion and optimization
edges_numpy = standardize_graph_dataframe(
    edges_polars,
    tail='source',
    head='target',
    weight='cost'
)

print(edges_numpy.dtypes)
# source    uint32  (optimized from int64)
# target    uint32  (optimized from int64)
# cost      float64

GraphImporter Classes¶

GraphImporter (Base Class)¶

Abstract base class providing the factory method for automatic DataFrame type detection.

Class Method: from_dataframe¶

@staticmethod
def from_dataframe(edges, tail='tail', head='head', weight='weight')

Factory method that automatically selects the appropriate importer based on DataFrame type.

Detection Priority:

Polars DataFrame
pandas with Arrow backend
pandas with NumPy backend (default)

PandasNumpyImporter¶

Handles standard pandas DataFrames with NumPy backend.

Minimal overhead for already NumPy-backed DataFrames
Ensures contiguous memory layout
Maintains existing dtypes

PandasArrowImporter¶

Handles pandas DataFrames with PyArrow backend.

Converts Arrow columns to NumPy arrays
Optimizes integer dtypes (uint32 when possible)
Handles nullable types gracefully

PolarsImporter¶

Handles Polars DataFrames.

Efficient conversion using Polars’ to_pandas() method
Automatic dtype optimization
Preserves performance benefits of columnar storage during conversion

Performance Considerations¶

The GraphImporter system optimizes for:

Memory Efficiency: Uses smallest possible dtype for vertex indices
Cache Locality: Ensures C-contiguous memory layout
Conversion Speed: Single conversion at initialization
Zero-Copy When Possible: Minimizes data copying during conversion

Integration with Algorithms¶

All path algorithms (Dijkstra, BellmanFord, BFS, HyperpathGenerating) automatically use the GraphImporter system:

from edsger.path import Dijkstra
import polars as pl

# GraphImporter is used internally
edges = pl.DataFrame({...})
dijkstra = Dijkstra(edges)  # Automatic detection and conversion

No manual conversion is needed - the algorithms handle everything automatically!