Quick Start Guide

Welcome to our Python library for graph algorithms. The library includes BFS, Dijkstra’s and Bellman-Ford’s algorithms, with plans to add more algorithms later. It is also open-source and easy to integrate with other Python libraries.

Graph Data Format

Edsger accepts directed graph data in multiple DataFrame formats, all with the same basic structure:

Column

Type

Description

tail

int

Source vertex ID

head

int

Destination vertex ID

weight

float

Edge weight (non-negative for Dijkstra, can be negative for Bellman-Ford)

Supported DataFrame Backends

pandas with NumPy backend (default)

import pandas as pd

edges = pd.DataFrame({
    'tail': [0, 0, 1, 2],
    'head': [1, 2, 2, 3],
    'weight': [1.0, 4.0, 2.0, 1.0]
})

pandas with Arrow backend

import pandas as pd
import pyarrow as pa

# Create with Arrow dtypes
edges_arrow = pd.DataFrame({
    'tail': pd.array([0, 0, 1, 2], dtype=pd.ArrowDtype(pa.int64())),
    'head': pd.array([1, 2, 2, 3], dtype=pd.ArrowDtype(pa.int64())),
    'weight': pd.array([1.0, 4.0, 2.0, 1.0], dtype=pd.ArrowDtype(pa.float64()))
})

# Or convert existing DataFrame
edges_arrow = edges.astype({
    'tail': pd.ArrowDtype(pa.int64()),
    'head': pd.ArrowDtype(pa.int64()),
    'weight': pd.ArrowDtype(pa.float64())
})

Polars DataFrame

import polars as pl

edges_polars = pl.DataFrame({
    'tail': [0, 0, 1, 2],
    'head': [1, 2, 2, 3],
    'weight': [1.0, 4.0, 2.0, 1.0]
})

All these formats work seamlessly with Edsger’s algorithms - the library automatically detects and handles the DataFrame type.

Note that it is also possible to use a graph with different column names for the tail, head and weight values, but we need then to specify the name mapping, as described in the following.

Dijkstra’s Algorithm

To use Dijkstra’s algorithm, you can import the Dijkstra class from the path module. The function takes a directed graph and a source node as input, and returns the shortest path from the source node to all other nodes in the directed graph. If your directed graph contains parallel edges (multiple edges between the same pair of vertices), Dijkstra automatically keeps only the edge with minimum weight for each vertex pair.

from edsger.path import Dijkstra

# Create a DataFrame with the edges of the graph
edges = pd.DataFrame({
    'tail': [0, 0, 1, 2, 2, 3],
    'head': [1, 2, 2, 3, 4, 4],
    'weight': [1, 4, 2, 1.5, 3, 1]
})
edges

tail

head

weight

0

0

1

1.0

1

0

2

4.0

2

1

2

2.0

3

2

3

1.5

4

2

4

3.0

5

3

4

1.0

# Initialize the Dijkstra object
dijkstra = Dijkstra(edges)

# Run the algorithm from a source vertex
shortest_paths = dijkstra.run(vertex_idx=0)
print("Shortest paths:", shortest_paths)
Shortest paths: [0.  1.  3.  4.5 5.5]

We get the shortest paths from the source node 0 to all other nodes in the directed graph. The output is an array with the shortest path length to each node. A path length is the sum of the weights of the edges in the path.

The column names can be specified using the tail, head and weight arguments:

other_edges = pd.DataFrame({
    'from': [0, 0, 1, 2, 2, 3],
    'to': [1, 2, 2, 3, 4, 4],
    'travel_time': [1, 4, 2, 1.5, 3, 1]
})
other_dijkstra = Dijkstra(other_edges, tail='from', head='to', weight='travel_time')

Orientation

The orientation argument (a string with a default value of 'out') specifies the orientation of the algorithm. It can be either 'out' for single source shortest paths or 'in' for single target shortest path.

dijkstra = Dijkstra(edges, orientation='in')

# Run the algorithm to a target vertex
shortest_paths = dijkstra.run(vertex_idx=0)
print("Shortest paths:", shortest_paths)
Shortest paths: [ 0. inf inf inf inf]

Run Multiple Times

Once the Dijkstra is instantiated with a given graph and orientation, the run method can be called multiple times with different source vertices.

# Run the algorithm to another target vertex
shortest_paths = dijkstra.run(vertex_idx=4)
print("Shortest paths:", shortest_paths)
Shortest paths: [5.5 4.5 2.5 1.  0. ]

Check Edges

The check_edges argument (a boolean with a default value of False) validates the given graph. When set to True, it ensures the DataFrame is well-formed by:

  • Checking for the presence of required columns for edge tail, head and weight values

  • Verifying that the data types are correct (integer for tail and head, integer or float for weight)

  • Ensuring there are no missing or invalid values (e.g. negative weights)

If any of these checks fail, an appropriate error is raised.

invalid_edges = pd.DataFrame({
    'tail': [0, 0, 1, 2, 2, 3],
    'head': [1, 2, 2, 3, 4, 4],
    'weight': [1, 4, 2, -1, 3, 1]
})
dijkstra = Dijkstra(invalid_edges, check_edges=True)
ValueError: edges['weight'] should be nonnegative

Permute

Finally, the permute argument (boolean with a default value of False) allows to permute the IDs of the nodes. If set to True, the node IDs will be reindexed to start from 0 and be contiguous for the inner computations, and the output will be reindexed to the original IDs, loading the same result as if the IDs were not permuted. The permutation may save memory and computation time for large graphs, for example if a significant ratio of the nodes are not actually used in the graph.

SHIFT = 1000

shifted_edges = pd.DataFrame({
    'tail': [0, 0, 1, 2, 2, 3],
    'head': [1, 2, 2, 3, 4, 4],
    'weight': [1, 4, 2, 1.5, 3, 1]
})
shifted_edges["tail"] += SHIFT
shifted_edges["head"] += SHIFT
shifted_edges.head(3)

tail

head

weight

0

1000

1001

1.0

1

1000

1002

4.0

2

1001

1002

2.0

dijkstra = Dijkstra(shifted_edges, permute=True)
shortest_paths = dijkstra.run(vertex_idx=0 + SHIFT)
print("Shortest paths:", shortest_paths)
Shortest paths: [inf inf inf ... 3.  4.5 5.5]
shortest_paths[-5:]
array([0. , 1. , 3. , 4.5, 5.5])

Early Termination

Early termination is a performance optimization feature that allows Dijkstra’s algorithm to stop computing once specific target nodes (termination nodes) have been reached. This can significantly reduce computation time when you only need shortest paths to a subset of vertices in the graph.

When using early termination, the algorithm will:

  1. Stop as soon as all specified termination nodes have been visited

  2. Return only the path lengths to the termination nodes (not all vertices)

  3. Return results in the same order as the termination nodes were specified

Basic Early Termination Example

import pandas as pd
from edsger.path import Dijkstra

# Create a sample graph
edges = pd.DataFrame({
    "tail": [0, 0, 0, 1, 1, 2, 2, 3, 3, 4],
    "head": [1, 2, 3, 2, 4, 3, 5, 4, 5, 5],
    "weight": [1.0, 4.0, 2.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0],
})

Without early termination (computes paths to all vertices):

dijkstra = Dijkstra(edges, orientation="out")
distances = dijkstra.run(vertex_idx=0)
print("All distances:", distances)
All distances: [0. 1. 2. 2. 3. 3.]

With early termination (computes paths only to specified nodes):

# Only compute paths to nodes 3 and 5
termination_nodes = [3, 5]
distances = dijkstra.run(vertex_idx=0, termination_nodes=termination_nodes)
print("Distances to termination nodes:", distances)
print("Shape of result:", distances.shape)
Distances to termination nodes: [2. 3.]
Shape of result: (2,)

Notice that:

  • The result array has length 2 (same as number of termination nodes)

  • distances[0] = 2.0 is the shortest path length from vertex 0 to vertex 3

  • distances[1] = 3.0 is the shortest path length from vertex 0 to vertex 5

Early Termination with Path Tracking

Early termination also works with path tracking enabled:

dijkstra = Dijkstra(edges, orientation="out")
distances = dijkstra.run(vertex_idx=0, termination_nodes=[3, 5], path_tracking=True)
print("Distances:", distances)

# Get paths to termination nodes
path_to_3 = dijkstra.get_path(vertex_idx=3)
path_to_5 = dijkstra.get_path(vertex_idx=5)
print("Path to vertex 3:", path_to_3)
print("Path to vertex 5:", path_to_5)
Distances: [2. 3.]
Path to vertex 3: [3 2 1 0]
Path to vertex 5: [5 3 2 1 0]

Important Notes

  1. Return Array Size: With early termination, the returned array size equals the number of termination nodes, not the total number of vertices in the graph.

  2. Order Preservation: Results are returned in the same order as the termination nodes are specified:

    # Termination nodes [3, 5] → results [distance_to_3, distance_to_5]
    # Termination nodes [5, 3] → results [distance_to_5, distance_to_3]
    
  3. Orientation Support: Early termination works with both orientations:

    # Single-source shortest paths (from source to termination nodes)
    dijkstra = Dijkstra(edges, orientation="out")
    distances = dijkstra.run(vertex_idx=0, termination_nodes=[3, 5])
    
    # Single-target shortest paths (from termination nodes to target)
    dijkstra = Dijkstra(edges, orientation="in") 
    distances = dijkstra.run(vertex_idx=5, termination_nodes=[0, 2])
    
  4. Unreachable Nodes: If a termination node is unreachable, its distance will be infinity:

    # If node 10 is unreachable from node 0
    distances = dijkstra.run(vertex_idx=0, termination_nodes=[3, 10])
    # Result: [2.0, inf]
    

Run Method Options

The run method can take the following arguments besides the source/target vertex index:

  • termination_nodes : list or array-like, optional (default=None)

A list or array of vertex indices where the algorithm should stop early. When specified, the algorithm will terminate as soon as all termination nodes have been reached, and will return only the path lengths to these nodes in the same order they were specified. This can provide significant performance improvements when you only need paths to a subset of vertices.

dijkstra = Dijkstra(edges)
# Get distances only to nodes 2 and 4
distances = dijkstra.run(vertex_idx=0, termination_nodes=[2, 4])
print("Distances to nodes 2 and 4:", distances)
Distances to nodes 2 and 4: [1.5 3.5]
  • path_tracking : bool, optional (default=False)

Whether to track the shortest path(s) from/to the source/target vertex to all other vertices in the graph.

dijkstra = Dijkstra(edges)
shortest_paths = dijkstra.run(vertex_idx=0, path_tracking=True)
dijkstra.get_path(vertex_idx=4)
array([4, 3, 2, 1, 0], dtype=uint32)
dijkstra.get_path(vertex_idx=0)
array([0], dtype=uint32)

The path is returned as an array of vertex indices. This is an ordered list of vertices from the source to the target vertex if orientation is 'in', and from the target to the source vertex if orientation is 'out'. Both the source and target vertices are included in the path.

Note: When using termination_nodes with path_tracking=True, you can still retrieve paths to any vertex that was reached during the computation using get_path(), even if it wasn’t in the termination nodes list.

  • return_inf : bool, optional (default=True)

Whether to return path lengths as infinity (np.inf) when no path exists.

dijkstra = Dijkstra(edges, orientation='in')
shortest_paths = dijkstra.run(vertex_idx=0, return_inf=False)
shortest_paths
array([0.00000000e+000, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
   1.79769313e+308])

The value 1.79769313e+308 actually used in the code is the largest number that can be represented in the floating point format (np.float64).

  • return_series : bool, optional (default=False)

Instead of returning a NumPy array, the run method may return a Pandas Series object with the path lengths as values and the vertex indices as the index.

shortest_paths = dijkstra.run(vertex_idx=4, return_series=True)
shortest_paths

vertex_idx

path_length

0

5.5

1

4.5

2

2.5

3

1.0

4

0.0

  • heap_length_ratio : float, optional (default=1.0)

This is an experimental parameter that controls the size of the heap used in the algorithm. The heap is a static array that is used to store the vertices that may be visited next. A value of 1.0 means that the heap is the same size as the number of vertices, so there is no risk of overflow. Be aware that there is no guarantee that the algorithm will work with a heap length ratio smaller that 1. The lowest ratio that works for a given graph depends on the graph structure and the source vertex. For a rather sparse graph, a small ratio may work, but for a dense graph, a ratio of 1.0 is required.

Bellman-Ford Algorithm

The Bellman-Ford algorithm is designed for directed graphs that may contain negative edge weights and can detect negative cycles. Unlike Dijkstra’s algorithm, Bellman-Ford can handle a broader class of problems but with a higher computational cost.

Basic Usage

from edsger.path import BellmanFord

# Create a graph with negative weights
edges_negative = pd.DataFrame({
    'tail': [0, 0, 1, 1, 2, 3],
    'head': [1, 2, 2, 3, 3, 4],
    'weight': [1, 4, -2, 5, 1, 3]  # Note the negative weight
})

# Initialize the Bellman-Ford object
bf = BellmanFord(edges_negative)

# Run the algorithm from a source vertex
shortest_paths = bf.run(vertex_idx=0)
print("Shortest paths:", shortest_paths)
Shortest paths: [ 0.  1. -1.  0.  3.]

Negative Weights and Optimal Paths

The power of Bellman-Ford becomes evident when dealing with negative weights. In the example above, the shortest path from vertex 0 to vertex 2 has length -1 (going 0→1→2 with weights 1 + (-2) = -1), which is shorter than the direct path 0→2 with weight 4.

Negative Cycle Detection

One of Bellman-Ford’s key features is detecting negative cycles, which make shortest path problems ill-defined:

# Create a graph with a negative cycle
edges_cycle = pd.DataFrame({
    'tail': [0, 1, 2],
    'head': [1, 2, 0],
    'weight': [1, -2, -1]  # Cycle 0→1→2→0 has total weight -2
})

bf_cycle = BellmanFord(edges_cycle)
try:
    shortest_paths = bf_cycle.run(vertex_idx=0)
except ValueError as e:
    print("Error:", e)
Error: Negative cycle detected in the graph

Disabling Negative Cycle Detection

For performance reasons, you can disable negative cycle detection if you’re confident your graph doesn’t contain negative cycles:

bf = BellmanFord(edges_negative)
# Skip negative cycle detection for better performance
shortest_paths = bf.run(vertex_idx=0, detect_negative_cycles=False)

Orientation

Like Dijkstra, Bellman-Ford supports both orientations:

# Single-source shortest paths (from source to all vertices)
bf_out = BellmanFord(edges_negative, orientation='out')
distances_out = bf_out.run(vertex_idx=0)

# Single-target shortest paths (from all vertices to target)
bf_in = BellmanFord(edges_negative, orientation='in')
distances_in = bf_in.run(vertex_idx=4)

Path Tracking

Bellman-Ford also supports path reconstruction:

bf = BellmanFord(edges_negative)
shortest_paths = bf.run(vertex_idx=0, path_tracking=True)

# Get the actual path to vertex 3
path = bf.get_path(vertex_idx=3)
print("Path to vertex 3:", path)
Path to vertex 3: [3 2 1 0]

When to Use Bellman-Ford vs Dijkstra

Use Bellman-Ford when:

  • Your directed graph contains negative edge weights

  • You need to detect negative cycles in directed graphs

  • Working with financial networks, arbitrage detection, or differential constraints

  • Performance is less critical than correctness

Use Dijkstra when:

  • All edge weights are non-negative (e.g., distances, travel times, costs)

  • You need the fastest possible performance (O((V+E)logV) vs O(VE))

  • Working with directed road networks, shortest distance problems in directed graphs

Performance Considerations

Bellman-Ford has O(VE) time complexity compared to Dijkstra’s O((V+E)logV). For large directed graphs with only positive weights, Dijkstra is significantly faster. However, the performance difference may be acceptable for smaller directed graphs or when negative weights are essential to your problem.

Breadth-First Search: Unweighted Directed Graphs

BFS (Breadth-First Search) finds shortest paths in directed graphs where edge weights are ignored (or treated as equal). It’s optimal for finding minimum-hop paths in directed graphs.

Basic Usage

from edsger.path import BFS

# Create a directed graph (weights will be ignored if present)
edges_bfs = pd.DataFrame({
    'tail': [0, 0, 1, 2, 2, 3],
    'head': [1, 2, 3, 3, 4, 4]
})

# Initialize BFS
bfs = BFS(edges_bfs)

# Run BFS from vertex 0
predecessors = bfs.run(vertex_idx=0)
print("Predecessors:", predecessors)
Predecessors: [-9999     0     0     1     2]

Understanding BFS Output

Unlike Dijkstra and Bellman-Ford which return distances, BFS returns predecessors:

  • Value -9999 (default sentinel): Unreachable vertex or start vertex

  • Other values: The predecessor vertex in the shortest path

You can customize the sentinel value if needed:

# Use a different sentinel value for unreachable nodes
bfs = BFS(edges_bfs, sentinel=-1)
predecessors = bfs.run(vertex_idx=0)
# Unreachable nodes will now be marked with -1 instead of -9999

Path Tracking

BFS supports path reconstruction just like Dijkstra:

# Run with path tracking enabled
predecessors = bfs.run(vertex_idx=0, path_tracking=True)

# Get the actual path to vertex 4
path = bfs.get_path(vertex_idx=4)
print("Path from 0 to 4:", path)
Path from 0 to 4: [4 2 0]

The path is returned as an array from target to source.

Orientation

BFS supports both orientations like other algorithms:

# Single-source (forward search from source)
bfs_out = BFS(edges_bfs, orientation='out')
pred_out = bfs_out.run(vertex_idx=0)

# Single-target (backward search to target)
bfs_in = BFS(edges_bfs, orientation='in')
pred_in = bfs_in.run(vertex_idx=4)

When to Use BFS

Use BFS when:

  • All edges should be treated equally (unweighted directed graphs)

  • Finding minimum number of hops/edges in directed paths

  • Maximum performance for unweighted directed graphs (O(V+E) time)

  • Social network analysis (degrees of separation in directed networks)

  • Network routing with hop counts in directed topologies

Use Dijkstra/Bellman-Ford when:

  • Edge weights matter (distances, costs, times)

  • You need weighted shortest paths in directed graphs

Spiess-Florian Hyperpath Algorithm

The Spiess-Florian algorithm (also known as HyperpathGenerating) is designed for transit network assignment. Unlike traditional shortest path algorithms, it models how passengers distribute across multiple transit lines based on service frequencies and travel times.

Transit Network Data Format

For hyperpath computation, edges require travel time and frequency columns instead of a single weight:

Column

Type

Description

tail

int

Source stop/vertex ID

head

int

Destination stop/vertex ID

trav_time

float

Travel time on this edge (e.g., minutes)

freq

float

Service frequency (e.g., vehicles per minute)

Basic Usage

from edsger.path import HyperpathGenerating
import pandas as pd

# Create a transit network
# Multiple lines connecting stops with different frequencies
edges = pd.DataFrame({
    'tail': [0, 0, 1, 1, 2, 3],
    'head': [1, 2, 2, 3, 3, 4],
    'trav_time': [2.0, 5.0, 1.5, 3.0, 2.5, 1.0],  # Travel times
    'freq': [0.1, 0.05, 0.2, 0.15, 0.1, 0.3]      # Frequencies
})

# Initialize the algorithm
hp = HyperpathGenerating(edges)

# Run from origin 0 to destination 4 with 100 passengers
hp.run(origin=0, destination=4, volume=100.0)

Understanding the Results

After running the algorithm, you can access:

# Expected travel times from each vertex to destination
# Includes waiting time based on line frequencies
print("Travel times to destination:")
print(hp.u_i_vec)

# Edge volumes showing passenger distribution
print("\nPassenger volumes on each edge:")
print(hp._edges[['tail', 'head', 'trav_time', 'freq', 'volume']])

The volume column shows how passengers are distributed across the network. Higher frequencies attract more passengers, as they reduce expected waiting time.

Multiple Origins

You can compute demand from multiple origins simultaneously:

hp = HyperpathGenerating(edges)
hp.run(
    origin=[0, 1, 2],        # Multiple origins
    destination=4,            # Single destination
    volume=[50.0, 30.0, 20.0] # Demand from each origin
)

When to Use Spiess-Florian

Use HyperpathGenerating when:

  • Modeling public transit networks with multiple competing lines

  • Computing transit assignment (distributing passenger demand)

  • Evaluating expected travel times including waiting

  • Analyzing line utilization and capacity requirements

Use Dijkstra/Bellman-Ford when:

  • Finding deterministic shortest paths

  • Working with networks where “frequency” doesn’t apply (roads, etc.)

  • You only need path distances, not demand distribution

DataFrame Backend Selection

Edsger automatically detects and optimizes for your DataFrame backend. Here’s when to use each:

pandas with NumPy backend

  • Best for: Small to medium graphs, existing pandas workflows

  • Memory: Standard NumPy arrays

  • Example:

edges = pd.DataFrame({...})  # Default pandas
dijkstra = Dijkstra(edges)

pandas with Arrow backend

  • Best for: Large graphs, memory-constrained environments

  • Memory: Columnar format, often more efficient

  • Automatic optimization: Integer columns use uint32 when possible

  • Example:

edges = pd.DataFrame({...}).astype({
    'tail': pd.ArrowDtype(pa.int64()),
    'head': pd.ArrowDtype(pa.int64()),
    'weight': pd.ArrowDtype(pa.float64())
})
dijkstra = Dijkstra(edges)

Polars DataFrame

  • Best for: High-performance workflows, large-scale data processing

  • Memory: Efficient columnar storage

  • Automatic optimization: Integer columns use uint32 when possible

  • Example:

edges = pl.DataFrame({...})
dijkstra = Dijkstra(edges)

All backends are automatically converted to optimal NumPy arrays internally for maximum Cython performance, while preserving the convenience of your preferred DataFrame library.

Next Steps