# API Reference This section provides detailed documentation for all public classes and functions in Edsger. ## edsger.path Module ### Dijkstra Class The main class for performing Dijkstra's shortest path algorithm on directed graphs. #### Constructor Parameters - `edges`: pandas.DataFrame containing the directed graph edges - `tail`: Column name for edge source nodes (default: 'tail') - `head`: Column name for edge destination nodes (default: 'head') - `weight`: Column name for edge weights (default: 'weight') - `orientation`: Either 'out' (single-source) or 'in' (single-target) (default: 'out') - `check_edges`: Whether to validate edge data (default: False) - `permute`: Whether to optimize node indexing (default: False) - `verbose`: Whether to print messages about parallel edge removal (default: False) #### Methods ##### run ```python def run(self, vertex_idx, path_tracking=False, return_inf=True, return_series=False, heap_length_ratio=1.0) ``` Runs the shortest path algorithm from/to the specified vertex. **Parameters:** - `vertex_idx`: Source/target vertex index - `path_tracking`: Whether to track paths for reconstruction (default: False) - `return_inf`: Return infinity for unreachable vertices (default: True) - `return_series`: Return results as pandas Series (default: False) - `heap_length_ratio`: Heap size as fraction of vertices (default: 1.0) **Returns:** - Array or Series of shortest path lengths ##### get_path ```python def get_path(self, vertex_idx) ``` Reconstructs the shortest path to/from a vertex (requires `path_tracking=True`). **Parameters:** - `vertex_idx`: Destination/source vertex index **Returns:** - Array of vertex indices forming the path ##### get_vertices ```python def get_vertices(self) ``` Returns all vertices in the graph. **Returns:** - Array of vertex indices ### Examples #### Basic Usage ```python from edsger.path import Dijkstra import pandas as pd # Create a directed graph edges = pd.DataFrame({ 'tail': [0, 0, 1], 'head': [1, 2, 2], 'weight': [1, 4, 2] }) # Initialize Dijkstra dijkstra = Dijkstra(edges) # Find shortest paths from vertex 0 paths = dijkstra.run(vertex_idx=0) ``` #### With Path Tracking ```python # Enable path tracking paths = dijkstra.run(vertex_idx=0, path_tracking=True) # Get the actual path to vertex 2 path = dijkstra.get_path(vertex_idx=2) ``` #### Custom Parameters ```python # Create with custom settings dijkstra = Dijkstra( edges, orientation='in', check_edges=True, permute=True ) # Run with custom parameters paths = dijkstra.run( vertex_idx=2, return_series=True, heap_length_ratio=0.5 ) ``` ### BellmanFord Class The class for performing Bellman-Ford shortest path algorithm on directed graphs, which handles directed graphs with negative edge weights and detects negative cycles. #### Constructor Parameters - `edges`: pandas.DataFrame containing the directed graph edges - `tail`: Column name for edge source nodes (default: 'tail') - `head`: Column name for edge destination nodes (default: 'head') - `weight`: Column name for edge weights (default: 'weight') - `orientation`: Either 'out' (single-source) or 'in' (single-target) (default: 'out') - `check_edges`: Whether to validate edge data (default: False) - `permute`: Whether to optimize node indexing (default: False) - `verbose`: Whether to print messages about parallel edge removal (default: False) #### Methods ##### run ```python def run(self, vertex_idx, path_tracking=False, return_inf=True, return_series=False, detect_negative_cycles=True) ``` Runs the Bellman-Ford algorithm from/to the specified vertex. **Parameters:** - `vertex_idx`: Source/target vertex index - `path_tracking`: Whether to track paths for reconstruction (default: False) - `return_inf`: Return infinity for unreachable vertices (default: True) - `return_series`: Return results as pandas Series (default: False) - `detect_negative_cycles`: Whether to detect negative cycles (default: True) **Returns:** - Array or Series of shortest path lengths **Raises:** - `ValueError`: If a negative cycle is detected and `detect_negative_cycles=True` ##### get_path ```python def get_path(self, vertex_idx) ``` Reconstructs the shortest path to/from a vertex (requires `path_tracking=True`). **Parameters:** - `vertex_idx`: Destination/source vertex index **Returns:** - Array of vertex indices forming the path ##### get_vertices ```python def get_vertices(self) ``` Returns all vertices in the graph. **Returns:** - Array of vertex indices #### Examples ##### Basic Usage with Negative Weights ```python from edsger.path import BellmanFord import pandas as pd # Create a directed graph with negative weights edges = pd.DataFrame({ 'tail': [0, 0, 1, 1, 2, 3], 'head': [1, 2, 2, 3, 3, 4], 'weight': [1, 4, -2, 5, 1, 3] # Note the negative weight }) # Initialize Bellman-Ford bf = BellmanFord(edges) # Find shortest paths from vertex 0 paths = bf.run(vertex_idx=0) print(paths) # [ 0. 1. -1. 0. 3.] ``` ##### Negative Cycle Detection ```python # Create a directed graph with a negative cycle edges_cycle = pd.DataFrame({ 'tail': [0, 1, 2], 'head': [1, 2, 0], 'weight': [1, -2, -1] # Cycle 0→1→2→0 has total weight -2 }) bf_cycle = BellmanFord(edges_cycle) try: bf_cycle.run(vertex_idx=0) except ValueError as e: print(f"Error: {e}") # Error: Negative cycle detected in the graph ``` ##### Path Tracking with Negative Weights ```python # Enable path tracking paths = bf.run(vertex_idx=0, path_tracking=True) # Get the actual path to vertex 2 (using negative weight path) path = bf.get_path(vertex_idx=2) print(path) # Path using the negative weight edge ``` ##### Performance: Disabling Cycle Detection ```python # For performance when you know there are no negative cycles paths = bf.run(vertex_idx=0, detect_negative_cycles=False) ``` ### BFS Class The class for performing Breadth-First Search on directed graphs, finding shortest paths based on minimum number of edges rather than weighted distances. #### Constructor Parameters - `edges`: pandas.DataFrame containing the directed graph edges - `tail`: Column name for edge source nodes (default: 'tail') - `head`: Column name for edge destination nodes (default: 'head') - `orientation`: Either 'out' (single-source) or 'in' (single-target) (default: 'out') - `check_edges`: Whether to validate edge data (default: False) - `permute`: Whether to optimize node indexing (default: False) - `verbose`: Whether to print messages about parallel edge removal (default: False) - `sentinel`: Sentinel value for unreachable nodes (default: -9999). Must be a negative integer that fits in int32 range. **Note**: Weight column is not used by BFS. If present in the DataFrame, it will be ignored. #### Methods ##### run ```python def run(self, vertex_idx, path_tracking=False, return_series=False) ``` Runs BFS from/to the specified vertex. **Parameters:** - `vertex_idx`: Source/target vertex index - `path_tracking`: Whether to track paths for reconstruction (default: False) - `return_series`: Return results as pandas Series (default: False) **Returns:** - Array or Series of predecessor indices - The sentinel value (default `-9999`) indicates unreachable vertex or start vertex - Other values indicate the predecessor in the shortest path ##### get_path ```python def get_path(self, vertex_idx) ``` Reconstructs the shortest path to/from a vertex (requires `path_tracking=True`). **Parameters:** - `vertex_idx`: Destination/source vertex index **Returns:** - Array of vertex indices forming the path (from target to source) ##### get_vertices ```python def get_vertices(self) ``` Returns all vertices in the directed graph. **Returns:** - Array of vertex indices #### Examples ##### Basic BFS ```python from edsger.path import BFS import pandas as pd # Create a directed graph edges = pd.DataFrame({ 'tail': [0, 0, 1, 2, 2, 3], 'head': [1, 2, 3, 3, 4, 4] }) # Initialize BFS bfs = BFS(edges) # Find shortest paths (by hop count) from vertex 0 predecessors = bfs.run(vertex_idx=0) print(predecessors) # [-9999 0 0 1 2] ``` ##### Custom Sentinel Value ```python # Create BFS with custom sentinel for unreachable nodes bfs = BFS(edges, sentinel=-1) predecessors = bfs.run(vertex_idx=0) # Check the sentinel value print(bfs.UNREACHABLE) # -1 # Unreachable nodes will have value -1 instead of -9999 ``` ##### With Path Tracking ```python # Enable path tracking predecessors = bfs.run(vertex_idx=0, path_tracking=True) # Get the actual path to vertex 4 path = bfs.get_path(vertex_idx=4) print(path) # [4 2 0] - path from target to source ``` ##### With Non-Contiguous Vertex IDs ```python # Directed graph with non-contiguous vertex IDs edges_sparse = pd.DataFrame({ 'tail': [0, 0, 10, 20], 'head': [10, 20, 20, 30] }) # Use permute=True for efficiency bfs = BFS(edges_sparse, permute=True) predecessors = bfs.run(vertex_idx=0, path_tracking=True) path = bfs.get_path(vertex_idx=30) print(path) # [30 20 0] ``` ##### Backward Search (orientation='in') ```python # Find vertices that can reach a target bfs_in = BFS(edges, orientation='in') predecessors = bfs_in.run(vertex_idx=4) # Shows successors in backward traversal ``` ##### Return as Pandas Series ```python # Get results as labeled Series predecessors = bfs.run(vertex_idx=0, return_series=True) print(predecessors) # vertex_idx # 0 -9999 # 1 0 # 2 0 # 3 1 # 4 2 # Name: predecessor, dtype: int32 ``` #### Performance Characteristics - **Time Complexity**: O(V + E) - linear in vertices and edges - **Space Complexity**: O(V) - queue and predecessor arrays ### HyperpathGenerating Class The class for computing hyperpath-based routing using the Spiess-Florian algorithm, designed for transit network assignment and optimal strategy computation. **Reference**: Spiess, H. and Florian, M. (1989). *Optimal strategies: A new assignment model for transit networks*. Transportation Research Part B 23(2), 83-102. #### Overview The Spiess-Florian algorithm computes optimal travel strategies in transit networks where passengers can choose between multiple routes with different frequencies and travel times. Unlike traditional shortest path algorithms, it models the probabilistic nature of transit where passengers may take the first arriving vehicle among a set of attractive lines. The algorithm outputs: - **Edge volumes**: How passenger demand is distributed across transit lines - **Travel times (u_i_vec)**: Expected travel time from each vertex to the destination, including waiting time #### Constructor Parameters - `edges`: pandas.DataFrame containing transit network edges - `tail`: Column name for edge source nodes (default: 'tail') - `head`: Column name for edge destination nodes (default: 'head') - `trav_time`: Column name for travel times (default: 'trav_time') - `freq`: Column name for service frequencies (default: 'freq') - `check_edges`: Whether to validate edge data (default: False) - `orientation`: Currently only 'in' is supported (default: 'in') #### Attributes - `edge_count`: Number of edges in the graph - `vertex_count`: Number of vertices in the graph - `u_i_vec`: Array of expected travel times from each vertex to the destination (available after `run()`) - `_edges`: DataFrame containing edges with computed 'volume' column (available after `run()`) #### Methods ##### run ```python def run(self, origin, destination, volume, return_inf=False) ``` Computes the hyperpath and distributes demand volume across edges. **Parameters:** - `origin`: Origin vertex index (int) or list of origin indices - `destination`: Destination vertex index (int) - `volume`: Demand volume (float) or list of volumes corresponding to each origin - `return_inf`: Whether to keep infinity for unreachable vertices (default: False) **Returns:** - None (results are stored in instance attributes `u_i_vec` and `_edges['volume']`) #### Examples ##### Basic Transit Network ```python from edsger.path import HyperpathGenerating import pandas as pd # Create a simple transit network # Two parallel lines from stop 0 to stop 2 via different intermediate stops edges = pd.DataFrame({ 'tail': [0, 0, 1, 3], 'head': [1, 3, 2, 2], 'trav_time': [5.0, 4.0, 3.0, 4.0], # Travel times in minutes 'freq': [0.1, 0.15, 0.2, 0.2] # Frequencies (vehicles per minute) }) # Initialize the algorithm hp = HyperpathGenerating(edges) # Compute hyperpath from origin 0 to destination 2 with 100 passengers hp.run(origin=0, destination=2, volume=100.0) # Access expected travel times to destination print("Travel times to destination:", hp.u_i_vec) # Access edge volumes (how passengers are distributed) print(hp._edges[['tail', 'head', 'volume']]) ``` ##### Multiple Origins ```python # Compute hyperpath from multiple origins to a single destination hp = HyperpathGenerating(edges) hp.run( origin=[0, 1], # Two origins destination=2, # Single destination volume=[80.0, 20.0] # Different demand volumes ) # Total edge volumes reflect combined demand from all origins print("Edge volumes:", hp._edges['volume'].values) ``` ##### Custom Column Names ```python # Transit data with custom column names transit_edges = pd.DataFrame({ 'from_stop': [0, 0, 1, 2], 'to_stop': [1, 2, 2, 3], 'travel_time': [2.0, 5.0, 1.5, 3.0], 'frequency': [0.1, 0.05, 0.2, 0.15] }) hp = HyperpathGenerating( transit_edges, tail='from_stop', head='to_stop', trav_time='travel_time', freq='frequency' ) hp.run(origin=0, destination=3, volume=1.0) ``` ##### With Edge Validation ```python # Enable input validation hp = HyperpathGenerating(edges, check_edges=True) # This will raise an error if: # - Required columns are missing # - Data types are incorrect # - Travel times or frequencies are negative # - Missing values exist ``` #### Understanding the Output After calling `run()`, two key outputs are available: 1. **`u_i_vec`**: Expected travel time from each vertex to the destination - Includes both in-vehicle travel time and expected waiting time - Infinite values indicate unreachable vertices 2. **`_edges['volume']`**: Passenger volumes on each edge - Shows how total demand is distributed across the transit network - Higher volumes indicate more heavily used transit lines #### Use Cases - **Transit assignment**: Distribute passenger demand across a transit network - **Line planning**: Identify heavily used routes that may need more capacity - **Service evaluation**: Assess expected travel times for different origin-destination pairs - **Network design**: Evaluate the impact of adding or removing transit lines ## Parallel Edges Handling The `Dijkstra`, `BellmanFord`, and `BFS` classes automatically handle parallel edges (multiple directed edges between the same pair of vertices) during initialization. When parallel edges are detected: 1. **Automatic Processing**: The `_preprocess_edges()` method is called internally during initialization 2. **Minimum Weight Selection**: For each pair of vertices with multiple edges, only the edge with the minimum weight is kept 3. **Verbose Output**: If `verbose=True` is set in the constructor, a message will be printed indicating how many parallel edges were removed ### Example with Parallel Edges ```python import pandas as pd from edsger.path import Dijkstra # Directed graph with parallel edges between vertices 0->1 edges = pd.DataFrame({ 'tail': [0, 0, 1], # Two directed edges from 0->1 'head': [1, 1, 2], # with weights 5.0 and 3.0 'weight': [5.0, 3.0, 2.0] }) # Initialize with verbose=True to see parallel edge removal dijkstra = Dijkstra(edges, verbose=True) # Output: "Automatically removed 1 parallel edge(s). For each pair of vertices, kept the edge with minimum weight." # The directed graph now has only the edge 0->1 with weight 3.0 (minimum) print(dijkstra.n_edges) # Will show 2 instead of 3 ``` This automatic handling ensures: - Consistent directed graph representation - Optimal shortest paths (using minimum weight edges) - No duplicate edges in the internal directed graph structure ## edsger.graph_importer Module ### GraphImporter Architecture The GraphImporter system provides automatic DataFrame backend detection and optimization for all graph algorithms in Edsger. #### Key Features - **Automatic Detection**: Identifies DataFrame type (pandas NumPy, pandas Arrow, or Polars) - **Memory Optimization**: Automatically uses uint32 for vertex indices when possible - **Contiguous Memory**: Ensures C-contiguous arrays for optimal Cython performance - **Factory Pattern**: Simple interface with automatic backend selection ### standardize_graph_dataframe Function ```python from edsger.graph_importer import standardize_graph_dataframe def standardize_graph_dataframe(edges, tail='tail', head='head', weight='weight') ``` Standardizes any supported DataFrame format to NumPy-backed pandas DataFrame. **Parameters:** - `edges`: Input DataFrame (pandas, pandas with Arrow, or Polars) - `tail`: Column name for source vertices - `head`: Column name for destination vertices - `weight`: Column name for edge weights **Returns:** - pandas DataFrame with NumPy arrays, optimized dtypes, and contiguous memory #### Example Usage ```python import pandas as pd import polars as pl from edsger.graph_importer import standardize_graph_dataframe # Works with any DataFrame type edges_polars = pl.DataFrame({ 'source': [0, 1, 2], 'target': [1, 2, 3], 'cost': [1.0, 2.0, 3.0] }) # Automatic conversion and optimization edges_numpy = standardize_graph_dataframe( edges_polars, tail='source', head='target', weight='cost' ) print(edges_numpy.dtypes) # source uint32 (optimized from int64) # target uint32 (optimized from int64) # cost float64 ``` ### GraphImporter Classes #### GraphImporter (Base Class) Abstract base class providing the factory method for automatic DataFrame type detection. ##### Class Method: from_dataframe ```python @staticmethod def from_dataframe(edges, tail='tail', head='head', weight='weight') ``` Factory method that automatically selects the appropriate importer based on DataFrame type. **Detection Priority:** 1. Polars DataFrame 2. pandas with Arrow backend 3. pandas with NumPy backend (default) #### PandasNumpyImporter Handles standard pandas DataFrames with NumPy backend. - Minimal overhead for already NumPy-backed DataFrames - Ensures contiguous memory layout - Maintains existing dtypes #### PandasArrowImporter Handles pandas DataFrames with PyArrow backend. - Converts Arrow columns to NumPy arrays - Optimizes integer dtypes (uint32 when possible) - Handles nullable types gracefully #### PolarsImporter Handles Polars DataFrames. - Efficient conversion using Polars' to_pandas() method - Automatic dtype optimization - Preserves performance benefits of columnar storage during conversion ### Performance Considerations The GraphImporter system optimizes for: 1. **Memory Efficiency**: Uses smallest possible dtype for vertex indices 2. **Cache Locality**: Ensures C-contiguous memory layout 3. **Conversion Speed**: Single conversion at initialization 4. **Zero-Copy When Possible**: Minimizes data copying during conversion ### Integration with Algorithms All path algorithms (Dijkstra, BellmanFord, BFS, HyperpathGenerating) automatically use the GraphImporter system: ```python from edsger.path import Dijkstra import polars as pl # GraphImporter is used internally edges = pl.DataFrame({...}) dijkstra = Dijkstra(edges) # Automatic detection and conversion ``` No manual conversion is needed - the algorithms handle everything automatically!