`onyx.graph.graphtools` – Basic tools for building and using graphs¶

The classes in this module support common operations using graphs.

They are used to construct graphs, serialize and unserialize graphs, iterate through the nodes or arcs of a graph, sort nodes and arcs topologically, search for paths, etc.

A graph consists of a set of nodes and a set of directed arcs. Each arc connects two nodes in the graph. Each node and each arc is identified in the graph by a non-negative integer ID. Each node and each arc has an associated label of user-supplied data. Many of the supported graph operations depend only upon the structure presented by the set of nodes and arcs, and do not depend upon the labels. For instance, whether a directed graph is acyclic or not depends only upon the structure of the graph; acyclicity does not depend on the labels associated with the nodes and arcs.

The factoring here focusses on structural properties of graphs and performing operations that depend on these properties. Subclassing of these objects is not recommended. Rather, they should be used as members of user-specific classes. That is, these classes support “has a” usage rather than “is a” usage.

To support user-semantics in a graph, each node and each arc has a user-supplied label. These labels permit the user to attach arbitrary data to each node or arc. Although it is not required, best practices suggest that each label should be an immutable object such as an integer, a string, or a tuple of integers and strings. For greatest portability of graph-based objects, e.g. in a text representation, the use of non-negative integers for labels is recommended.

The use of a single label to capture all user-specific semantics of a node or an arc is a simplifying factorization of the general graph representation problem. It means that the graph tools do not have to specify or verify semantic constraints or subtleties on nodes and arcs; the graph tools focus on the semantics associated with the graph structure. For the most part graph methods do not apply any semantics to the labels. The iteration constructs typically yield the label associated with each node or arc they visit. The few methods that make semantic use of the labels typically do so by calling a user-supplied callback to generate a numerical metric based on the label.

Furthermore, this use of a single label on each node or arc encourages the implementor of a specific graph usage to focus on the semantics of what a node is and what an arc is. That is, the implementor makes node and arc objects with whatever properties they need, and the implementor also makes an object to manage the collection of nodes and arcs. This management object would also be responsible for maintaining the graph structure using the graphtools objects, and it would use the labels in the graph to identifiy the instances of the node and arc objects in the graph. E.g. each node label could be a unique unsigned integer that is an index into the table of data that the management object maintains for the nodes.

The tools implement several types of graph object. For instance, one type of a graph is used to build the graph structure. Due to its mutability (new nodes and new arcs are getting added) the builder graph does not support many graph semantics. Once the graph is done being built, an immutable (frozen) graph is generated. Because it is immutable, this form of the graph can collect global structural information at construction time and then readily supports read-only queries and traversals without having to solve any of the difficult semantic issues that arise while a graph is mutable.

There are of course cases when a graph need to be mutable and to maintain non-local semantics. To help simplify the semantic issues in such cases, support for a streaming model of graph usage has been adopted. The underlying constraint imposed by a streaming model is that it should be possible to specify a graph building and using algorithm that can exist in a bounded amount of memory while processing an unbounded amount of data. This implies an algorithm that will remove nodes and arcs from the graph at the same average rate that they are being added to the graph. Such an algorithm thus imposes constraints on the lifetimes of nodes and arcs. Adding and removing nodes and arcs according to a topologically ordered constraint is one simplification that can help to achieve bounded memory for an unbounded stream.

onyx.graph.graphtools.A_tutorial()¶

This is a tutorial on using graphtools. It’s also a doctest-based test of the graphtools module.

This tutorial demonstrates use of the graphtools module. It shows how to build and access a graph using GraphBuilder. It shows how to create a FrozenGraph and the operations and invariants associatated with a FrozenGraph.

Play around with points on the compass and connections between them.

Let’s build this graph:

       --------------------      /                      V
West -------> North ------> South --------> East
  \              \                          ^ ^
   \              \------------------------/  /
    \                                        /
     \--------------------------------------/

First some points:

>>> builder = GraphBuilder()
>>> north = builder.new_node("North")
>>> east =  builder.new_node("East")
>>> south =  builder.new_node("South")
>>> west =  builder.new_node("West")
>>> builder.num_nodes
4

Now make some directed connections:

>>> for start_node, end_node in (
...     (west, east),
...     (north, south),
...     (west, south),
...     (south, east),
...     (north, east),
...     (west, north)):
...   _ = builder.new_arc(start_node, end_node, builder.get_node_label(start_node) + 'To' + builder.get_node_label(end_node))
>>> builder.num_arcs
6

Get an immutable version of the graph and print it out:

>>> frozen = FrozenGraph(builder)
>>> for text in frozen.text_iter():
...   print text,
num_nodes 4
0   North
1   East
2   South
3   West
num_arcs 6
0   3  1   WestToEast
1   0  2   NorthToSouth
2   3  2   WestToSouth
3   2  1   SouthToEast
4   0  1   NorthToEast
5   3  0   WestToNorth

It’s a directed acyclic graph (DAG).

>>> frozen.has_cycle
False

See which nodes are global start nodes (have no arcs that end on them) and global end nodes (have no arcs that start on them):

>>> frozen.terminals
((3,), (1,))

Note:

>>> frozen.terminals == ((west,), (east,))
True

Iteration by node label with secondary iteration of in- and out-arcs, done in topological order since that’s well defined for this graph:

>>> for (node_label, in_arc_iter, out_arc_iter) in frozen.iter_nodes():
...    print('For node with label %s:' % (node_label,))
...    print('In-arcs are:')
...    for arc_label, other_node_label in in_arc_iter:
...        print('%s in from node %s' % (arc_label, other_node_label))
...    print('Out-arcs are:')
...    for arc_label, other_node_label in out_arc_iter:
...        print('%s out to node %s' % (arc_label, other_node_label))
For node with label West:
In-arcs are:
Out-arcs are:
WestToNorth out to node North
WestToEast out to node East
WestToSouth out to node South
For node with label North:
In-arcs are:
WestToNorth in from node West
Out-arcs are:
NorthToEast out to node East
NorthToSouth out to node South
For node with label South:
In-arcs are:
NorthToSouth in from node North
WestToSouth in from node West
Out-arcs are:
SouthToEast out to node East
For node with label East:
In-arcs are:
NorthToEast in from node North
SouthToEast in from node South
WestToEast in from node West
Out-arcs are:

And the same thing, but in reverse topological order

>>> for (node_label, in_arc_iter, out_arc_iter) in frozen.iter_nodes(reverse_order=True):
...    print('For node with label %s:' % (node_label,))
...    print('In-arcs are:')
...    for arc_label, other_node_label in in_arc_iter:
...        print('%s in from node %s' % (arc_label, other_node_label))
...    print('Out-arcs are:')
...    for arc_label, other_node_label in out_arc_iter:
...        print('%s out to node %s' % (arc_label, other_node_label))
For node with label East:
In-arcs are:
NorthToEast in from node North
SouthToEast in from node South
WestToEast in from node West
Out-arcs are:
For node with label South:
In-arcs are:
NorthToSouth in from node North
WestToSouth in from node West
Out-arcs are:
SouthToEast out to node East
For node with label North:
In-arcs are:
WestToNorth in from node West
Out-arcs are:
NorthToEast out to node East
NorthToSouth out to node South
For node with label West:
In-arcs are:
Out-arcs are:
WestToNorth out to node North
WestToEast out to node East
WestToSouth out to node South

>>> for (node_id, in_arc_iter, out_arc_iter) in frozen.iter_nodes_by_id():
...    print('For node with id %s:' % (node_id,))
...    print('In-arcs are:')
...    for arc_id, other_node_id in in_arc_iter:
...        print('%s in from node %s' % (arc_id, other_node_id))
...    print('Out-arcs are:')
...    for arc_id, other_node_id in out_arc_iter:
...        print('%s out to node %s' % (arc_id, other_node_id))
For node with id 3:
In-arcs are:
Out-arcs are:
5 out to node 0
0 out to node 1
2 out to node 2
For node with id 0:
In-arcs are:
5 in from node 3
Out-arcs are:
4 out to node 1
1 out to node 2
For node with id 2:
In-arcs are:
1 in from node 0
2 in from node 3
Out-arcs are:
3 out to node 1
For node with id 1:
In-arcs are:
4 in from node 0
3 in from node 2
0 in from node 3
Out-arcs are:

A helper for looking at paths between nodes, ranked by cost.

>>> def printpaths(pathiter, verbose=False):
...   print "score",  "((arc_id, (arc_start_node, arc_end_node, arc_label)), ...)" if verbose else "(arc_label, ...)"
...   for path_score, path, graph in pathiter:
...     print path_score, tuple( ((arcid, graph.get_arc(arcid),) if verbose else graph.get_arc_label(arcid)) for arcid in path)

There are two paths from north to east. Note that by using len as the arc label costing function, the score is just the sum of the lengths of the arcs’ label strings.

>>> printpaths(frozen.iterpaths(len, north, east))
score (arc_label, ...)
11 ('NorthToEast',)
23 ('NorthToSouth', 'SouthToEast')

Given that it’s a DAG and there are paths from north to east, we should expect no path from east to north.

>>> printpaths(frozen.iterpaths(len, east, north))
score (arc_label, ...)

Use the two terminals to get all paths through the graph. Verbose setting shows details of arc and node ids on each path.

>>> printpaths(frozen.iterpaths(len, west, east), True)
score ((arc_id, (arc_start_node, arc_end_node, arc_label)), ...)
10 ((0, (3, 1, 'WestToEast')),)
22 ((2, (3, 2, 'WestToSouth')), (3, (2, 1, 'SouthToEast')))
22 ((5, (3, 0, 'WestToNorth')), (4, (0, 1, 'NorthToEast')))
34 ((5, (3, 0, 'WestToNorth')), (1, (0, 2, 'NorthToSouth')), (3, (2, 1, 'SouthToEast')))

Get a permutation of node ids and arc ids maps node and arc ids into a topological order.

>>> maps = frozen.get_topological_reorderings()
>>> maps
((3, 0, 2, 1), (0, 2, 5, 1, 4, 3))
>>> (frozen.toponodes, frozen.topoarcs)
((3, 0, 2, 1), (5, 1, 2, 0, 3, 4))

E.g. node 3 (West) would be the first node in this topological ordering and node 1 (East) would be the last node in this topological ordering. Similarly, arc 0 (WestToEast) would remain the first arc in this topological ordering and arc 3 (SouthToEast) would be the last arc in this topological ordering.

Let’s make such a graph.

>>> topo = FrozenGraph(frozen.get_mapped_tables(maps))
>>> for text in topo.text_iter():
...   print text,
num_nodes 4
0   West
1   North
2   South
3   East
num_arcs 6
0   0  3   WestToEast
1   0  2   WestToSouth
2   0  1   WestToNorth
3   1  2   NorthToSouth
4   1  3   NorthToEast
5   2  3   SouthToEast

In this graph the topological permutations are identities.

>>> topo.get_topological_reorderings()
((0, 1, 2, 3), (0, 1, 2, 3, 4, 5))
>>> topo.toponodes, topo.topoarcs
((0, 1, 2, 3), (2, 1, 3, 0, 4, 5))

So the two terminal nodes have smallest and largest node ids.

>>> (start_node,), (end_node,) = terminals = topo.terminals
>>> terminals
((0,), (3,))

Despite the reordering of ids, the scores of paths and the sequence of arc labels are the same as for the non-topologically sorted graph.

>>> printpaths(topo.iterpaths(len, start_node, end_node))
score (arc_label, ...)
10 ('WestToEast',)
22 ('WestToSouth', 'SouthToEast')
22 ('WestToNorth', 'NorthToEast')
34 ('WestToNorth', 'NorthToSouth', 'SouthToEast')

A more lexically-involved cost metric can be used to break length ties:

>>> def lexlen(label):
...   accum = 0
...   for item in label:
...     accum = accum * 0x100 + ord(item)
...   return accum

No ties:

>>> printpaths(topo.iterpaths(lexlen, start_node, end_node))
score (arc_label, ...)
412717324528352340570996 ('WestToEast',)
200478143005669390070048732 ('WestToNorth', 'NorthToEast')
206522827443974778547267548 ('WestToSouth', 'SouthToEast')
24481084856605229727991028804 ('WestToNorth', 'NorthToSouth', 'SouthToEast')

Show that randomly shuffling the arcs and nodes doesn’t change the paths’ scores or sequences of arc-labels on each path.

>>> random.seed(123)
>>> maps = tuple(map(list, topo.get_topological_reorderings()))

Fiddle the elements from the original frozen graph:

>>> _ = map(random.shuffle, maps)
>>> maps
([1, 2, 3, 0], [2, 3, 1, 5, 4, 0])
>>> rand1 = FrozenGraph(frozen.get_mapped_tables(maps))
>>> (start_node1,), (end_node1,) = rand1.terminals
>>> printpaths(rand1.iterpaths(lexlen, start_node1, end_node1))
score (arc_label, ...)
412717324528352340570996 ('WestToEast',)
200478143005669390070048732 ('WestToNorth', 'NorthToEast')
206522827443974778547267548 ('WestToSouth', 'SouthToEast')
24481084856605229727991028804 ('WestToNorth', 'NorthToSouth', 'SouthToEast')

Now differently fiddle the elements from the topo graph:

>>> _ = map(random.shuffle, maps)
>>> maps
([2, 3, 1, 0], [0, 5, 4, 2, 3, 1])
>>> rand2 = FrozenGraph(topo.get_mapped_tables(maps))
>>> (start_node2,), (end_node2,) = rand2.terminals
>>> printpaths(rand2.iterpaths(lexlen, start_node2, end_node2))
score (arc_label, ...)
412717324528352340570996 ('WestToEast',)
200478143005669390070048732 ('WestToNorth', 'NorthToEast')
206522827443974778547267548 ('WestToSouth', 'SouthToEast')
24481084856605229727991028804 ('WestToNorth', 'NorthToSouth', 'SouthToEast')

But the underlying node and arc ids are different:

>>> printpaths(rand1.iterpaths(lexlen, start_node1, end_node1), True)
score ((arc_id, (arc_start_node, arc_end_node, arc_label)), ...)
412717324528352340570996 ((5, (2, 0, 'WestToEast')),)
200478143005669390070048732 ((3, (2, 3, 'WestToNorth')), (4, (3, 0, 'NorthToEast')))
206522827443974778547267548 ((0, (2, 1, 'WestToSouth')), (1, (1, 0, 'SouthToEast')))
24481084856605229727991028804 ((3, (2, 3, 'WestToNorth')), (2, (3, 1, 'NorthToSouth')), (1, (1, 0, 'SouthToEast')))

>>> printpaths(rand2.iterpaths(lexlen, start_node2, end_node2), True)
score ((arc_id, (arc_start_node, arc_end_node, arc_label)), ...)
412717324528352340570996 ((0, (3, 1, 'WestToEast')),)
200478143005669390070048732 ((3, (3, 2, 'WestToNorth')), (2, (2, 1, 'NorthToEast')))
206522827443974778547267548 ((5, (3, 0, 'WestToSouth')), (1, (0, 1, 'SouthToEast')))
24481084856605229727991028804 ((3, (3, 2, 'WestToNorth')), (4, (2, 0, 'NorthToSouth')), (1, (0, 1, 'SouthToEast')))

class onyx.graph.graphtools.FrozenGraph(source)¶

Bases: onyx.graph.graphtools.GraphBase

An immutable graph object suitable for graph operations that depend on global structural properties of the graph.

This graph contains two disconnected subgraphs, both of which are cyclic, one with a self loop.

>>> a = FrozenGraph(GraphTables(((0, 1, 2, 3), (0, 1, 2), (1, 0, 2), (3, None, 5))))
>>> a.has_cycle, a.is_connected, a.has_self_loop, a.has_multiarcs, a.is_lattice, a.node_labels_are_node_ids, a.arc_labels_are_arc_ids
(True, False, True, False, False, True, False)

Reverse each of the conditions

>>> b = FrozenGraph(GraphTables(((-1, -2), (0, 0), (1, 1), (0, 1))))
>>> b.has_cycle, b.is_connected, b.has_self_loop, b.has_multiarcs, b.is_lattice, b.node_labels_are_node_ids, b.arc_labels_are_arc_ids
(False, True, False, True, True, False, True)

XXX [Fix this documentation] Case of a cyclic graph that is not bakis_cyclic, i.e. the cyclicity is only due to self loops

>>> c = FrozenGraph(GraphTables((('a', 'b'), (0, 1), (1, 1), ('c', 'd'))))
>>> c.has_cycle, c.has_self_loop, c.strict_is_cyclic
(False, True, True)

>>> b = SetGraphBuilder()
>>> _ = b.add_arc(0, 1)
>>> _ = b.add_arc(0, 2)
>>> d = FrozenGraph(b)
>>> d.has_join, d.is_tree, d.is_tree_strict
(False, True, True)

>>> _ = b.add_arc(2, 2)
>>> e = FrozenGraph(b)
>>> e.has_join, e.is_tree, e.is_tree_strict
(False, True, False)

>>> _ = b.add_arc(3, 2)
>>> f = FrozenGraph(b)
>>> f.has_join, f.is_tree, f.is_tree_strict
(True, False, False)

Creates a frozen graph which is a snapshot of the current state of source. source must be an instance of GraphTables or GraphBase.

arc_labels_are_arc_ids¶: Return True if each arc label is the index of the arc.

clone_graph(node_xform=<function <lambda> at 0x1d06fb18>, arc_xform=<function <lambda> at 0x1d06fb90>)¶

Returns a new FrozenGraph based on the contents of self. The new graph will be isomorphic to the original, with node labels and/or arc labels transformed by the callables node_xform and arc_xform, respectively. node_xform should be a callable taking one argument, which will be the label of the node being transformed. arc_xform should be a callable of 3 arguments, which will be the label of the arc being transformed, the label of the start node for the arc, and the label of the end node for the arc, respectively. By default, each transform function is the identity function on the label of the arc or node. Thus, if no arguments are provided, an exact copy is returned.

>>> g0 = FrozenGraph(GraphTables(((1, 2, 3), (0, 1, 0), (1, 2, 2), ('123', '234', '345'))))
>>> g1 = g0.clone_graph()
>>> g1
FrozenGraph(GraphTables(((1, 2, 3), (0, 1, 0), (1, 2, 2), ('123', '234', '345'))))

>>> g0 == g1
True

>>> g2 = g0.clone_graph(str, lambda l, s, e: int(l))
>>> g2
FrozenGraph(GraphTables((('1', '2', '3'), (0, 1, 0), (1, 2, 2), (123, 234, 345))))

>>> g3 = g0.clone_graph(str, lambda l, s, e: int(l) * s * e)
>>> g3
FrozenGraph(GraphTables((('1', '2', '3'), (0, 1, 0), (1, 2, 2), (246, 1404, 1035))))

depth_first_callback(pre_callback=None, post_callback=None, join_callback=None, cycle_callback=None)¶

Performs a depth-first traversal of the graph, making callbacks for all structural events involving the nodes in the graph. The sequence of callbacks represent a structure-revealing serialization of events from which the graph could be constructed.

See also depth_first_iter(), an advanced, lower-level method that returns a generator that yields items from the callbacks.

The optional pre_callback and post_callback arguments are functions of two arguments, (graph, node_id), where graph is the FrozenGraph instance making the callback and node_id is the identifier of the node in question. These two callbacks are each called exactly once with the node identifier of each node in the graph. The interleaving of these calls reveals the topological, tree, and spanning-tree structure in the graph.

The pre_callback function is called each time a new node_id is first encountered in the depth traversal. It is guaranteed to be called with a given value for node_id prior to being called for any node that is reachable from the given value of node_id.

The post_callback function is called after all nodes reachable from node_id have been encountered and processed by pre_callback and post_callback. For DAGs, the reverse of the sequence of node_id in the calls to post_callback is a topological ordering of the nodes. For example, toponodes is constructed this way. Also, for any subtree in the graph, the node_id values appearing in the pre_callback and post_callback calls are nested.

The cycle_callback and join_callback arguments are functions of three arguments, (graph, start_node_id, end_node_id), where graph is the FrozenGraph instance making the callback and start_node_id and end_node_id are two nodes that are connected by an arc. These two callbacks reveal the joins and cycles in the graph.

The join_callback is called whenever a join is encountered that does not complete a cycle. That is, end_node_id will already have been used in calls to both pre_callback and post_callback. In each call to join_callback, it is guaranteed that start_node_id and end_node_id are distinct.

The cycle_callback is called whenever a join is encountered that does complete a cycle. In this case, end_node_id will already have been used in a pre_callback, but it will not have been used in a post_callback. If start_node_id and end_node_id are the same, the cycle is a self loop. Otherwise, start_node_id and end_node_id are distinct and the cycle includes multiple nodes. The set of arcs appearing in the calls to cycle_callback is a set of arcs which, if removed or reversed, would convert the graph a DAG. Put another way, if cycle_callback is not ever called, the graph is a DAG.

depth_first_iter(pre_callback=None, post_callback=None, join_callback=None, cycle_callback=None)¶

Performs a depth-first traversal of the graph, making callbacks to iterator-returning functions for all structural events involving the nodes in the graph. The sequence of callbacks represent a structure-revealing serialization of events from which the graph could be constructed. The iterator yields the items from each of the iterables/generators that each iterator-returning callback returns.

See also depth_first_callback(), a simpler, high-level method that just calls the callbacks.

The optional pre_callback and post_callback arguments are iterator-returning functions of two arguments, (graph, node_id), where graph is the FrozenGraph instance making the callback and node_id is the identifier of the node in question. These two callbacks are each called exactly once with the node identifier of each node in the graph. The interleaving of these calls reveals the topological, tree, and spanning-tree structure in the graph.

The pre_callback iterator-returning function is called each time a new node_id is first encountered in the depth traversal. It is guaranteed to be called with a given value for node_id prior to being called for any node that is reachable from the given value of node_id.

The post_callback iterator-returning function is called after all nodes reachable from node_id have been encountered and processed by pre_callback and post_callback. For DAGs, the reverse of the sequence of node_id in the calls to post_callback is a topological ordering of the nodes. For example, toponodes is constructed this way. Also, for any subtree in the graph, the node_id values appearing in the pre_callback and post_callback calls are nested.

The cycle_callback and join_callback arguments are iterator-returning functions of three arguments, (graph, start_node_id, end_node_id), where graph is the FrozenGraph instance making the callback and start_node_id and end_node_id are two nodes that are connected by an arc. These two callbacks reveal the joins and cycles in the graph.

The join_callback is called whenever a join is encountered that does not complete a cycle. That is, end_node_id will already have been used in calls to both pre_callback and post_callback. In each call to join_callback, it is guaranteed that start_node_id and end_node_id are distinct.

The cycle_callback is called whenever a join is encountered that does complete a cycle. In this case, end_node_id will already have been used in a pre_callback, but it will not have been used in a post_callback. If start_node_id and end_node_id are the same, the cycle is a self loop. Otherwise, start_node_id and end_node_id are distinct and the cycle includes multiple nodes. The set of arcs appearing in the calls to cycle_callback is a set of arcs which, if removed or reversed, would convert the graph a DAG. Put another way, if cycle_callback is not ever called, the graph is a DAG.

dot_display(temp_file_prefix='graphtools_', display_tool_format='open -a /Applications/Graphviz.app %s', **kwargs)¶

Display a dot-generated representation of the graph. Returns the name of the temporary file that the display tool is working from. The caller is responsible for removing this file. See also dot_iter().

Optional temp_file_prefix is the prefix used for the temporary filename.

Optional display_tool_format is formatting string, with %s where the filename goes, used to generate the command that will display the file. By default it assumes you’re on a Mac and have Graphviz.app installed in the /Applications directory.

Remaining keyword arguments are handed to the dot_iter function.

dot_iter(graph_label='', graph_type='digraph', globals=(), node_label_callback=<function <lambda> at 0x1d06a5f0>, node_attributes_callback=None, arc_label_callback=<function <lambda> at 0x1d06a5f0>, arc_attributes_callback=None, arc_ports_callback=None)¶

Returns a generator that yields lines of text, including newlines. The text represents the graph in the DOT language for which there are many displayers. See also dot_display().

Optional argument graph_label is a string label for the graph. Optional argument graph_type defaults to ‘digraph’, can also be ‘graph’.

Optional globals is an iterable that yields strings to put in the globals section of the DOT file.

Optional node_label_callback is a function that will be called with three arguments for each node: (node_label, is_start, is_end), where node_label is the node’s label, is_start is True if the node is a start node (has no incoming arcs), and is_end is True if the node is an end node (has no outgoing arcs). The callback function should return a string which will be used as the text label for the node in the figure, or None or the empty string for no label. If this argument is not given, each node with a non-None label will be labelled with the str value of its label. If this argument is None, nodes will not be labelled.

Optional arc_label_callback is a function that will be called with the arc label for each arc. The callback function should return a string which will be used as the text label for the arc in the figure, or None or the empty string for no label. If this argument is not given, each arc with a non-None label will be labelled with the str value of its label. If this argument is None, arcs will not be labelled.

Optional node_attributes_callback is a function that returns an iterable that will yield a string for each attribute assignment for the node. It will be called with three arguments for each node: (node_label, is_start, is_end), where node_label is the node’s label, is_start is True if the node is a start node (has no incoming arcs), and is_end is True if the node is an end node (has no outgoing arcs).

Optional arc_attributes_callback is a function that returns an iterable that will yield a string for each attribute assignment for the arc. It will be called with the arc label for each arc in the graph.

Optional arc_ports_callback is a function of two arguments that is called for each arc in the graph. The arguments are the labels of the start and end nodes of the arc. The function returns a pair of strings that name the respective ports of the nodes to which the arc should attach. The strings should be empty if no port is implied.

>>> graph = FrozenGraph(GraphTables(((1, 2, 'a'), (0, 1, 2), (1, 0, 2), (3, None, 5))))
>>> print ''.join(graph.dot_iter())
digraph  { 
  n0  [label="1"];
  n1  [label="2"];
  n2  [label="a"];
  n0 -> n1  [label="3"];
  n1 -> n0;
  n2 -> n2  [label="5"];
}
<BLANKLINE>

>>> print ''.join(graph.dot_iter(graph_label='Foo',
...                              globals=['size="5,6.125";', 'ordering=out;'],
...                              node_label_callback=lambda label, is_start, is_end: ('<' if is_start else '') + str(label) + ('>' if is_end else ''),
...                              arc_label_callback=str,
...                              node_attributes_callback=lambda label, is_start, is_end: ['shape=%s' % ('octagon' if is_start or is_end else 'ellipse',), 'style=%s' % ('bold' if isinstance(label, int) else 'normal',)],
...                              arc_attributes_callback=lambda label: ['style=%s' % ('bold' if label is None else 'normal',)]))
digraph Foo { 
  label="Foo";
  size="5,6.125";
  ordering=out;
  n0  [label="1", shape=ellipse, style=bold];
  n1  [label="2", shape=ellipse, style=bold];
  n2  [label="<a>", shape=octagon, style=normal];
  n0 -> n1  [label="3", style=normal];
  n1 -> n0  [label="None", style=bold];
  n2 -> n2  [label="5", style=normal];
}
<BLANKLINE>

endnodes¶: A tuple of the indices of the ending-node ids. These are nodes with no outgoing arcs.

get_arc(arc_id)¶: Return a tuple of the (start_node, end_node, arc_label) for the arc associated with the arc_id.

get_arc_label(arc_id)¶: Return the arc label associated with the arc_id.

get_backward_transitive_closures(seed_node_id_iter, force_reflexive=False, max_depth=None)¶

Find the part of the graph that is reachable going backward from the set of nodes in seed_node_id_iter. I.e., find the part of the graph that can reach the nodes in seed_node_id_iter when going forward.

Return a tuple of two sets: the ids of the nodes and the ids of the arcs in the backward-reachable part of the graph. If optional force_reflexive is True then include all the nodes from seed_node_id_iter in the set of returned node ids regardless of whether the graph has cycles that include them. If optional max_depth is not None, it should be a non-negative integer and the subgraph returned will be limited to those nodes that are at most max_depth from any node in seed_node_id_iter.

>>> g0 = FrozenGraph(GraphTables(((-10, -11, -12), (2, 1, 1), (0, 2, 0,), (-20, -21, -22))))
>>> g0.get_backward_transitive_closures((2,))
(set([1]), set([1]))
>>> g0.get_backward_transitive_closures((2,), force_reflexive=True)
(set([1, 2]), set([1]))
>>> g0.get_backward_transitive_closures((2,), max_depth=0)
(set([]), set([]))
>>> g0.get_backward_transitive_closures((2,), force_reflexive=True, max_depth=0)
(set([2]), set([]))
>>> g0.get_backward_transitive_closures((2,), max_depth=1)
(set([1]), set([1]))
>>> g0.get_backward_transitive_closures((0,), max_depth=1)
(set([1, 2]), set([0, 2]))
>>> g0.get_backward_transitive_closures((0,), max_depth=2)
(set([1, 2]), set([0, 1, 2]))
>>> FrozenGraph(GraphTables(((10, 20), (0, 0, 1, 1), (0, 1, 0, 1), ('00', '01', '10', '11')))).get_backward_transitive_closures((0,))
(set([0, 1]), set([0, 1, 2, 3]))

get_canonical_DAG()¶

Returns a new FrozenGraph based on the contents of self. In the new graph the nodes and arcs are rearranged to be indexed in a topological order.

Raises a ValueError if has_cycle() is True.

>>> FrozenGraph(GraphTables(((-10, -11, -12), (2, 1, 1), (0, 2, 0,), (-20, -21, -22)))).get_canonical_DAG()
FrozenGraph(GraphTables(((-11, -12, -10), (0, 1, 0), (1, 2, 2), (-21, -20, -22))))

get_forward_transitive_closures(seed_node_id_iter, force_reflexive=False, max_depth=None)¶

Find the part of the graph that is reachable going forward from the set of nodes in seed_node_id_iter.

Return a tuple of two sets: the ids of the nodes and the ids of the arcs in the forward-reachable part of the graph. If optional force_reflexive is True then include all the nodes from seed_node_id_iter in the set of returned node ids regardless of whether the graph has cycles that include them. If optional max_depth is not None, it should be a non-negative integer and the subgraph returned will be limited to those nodes that are at most max_depth from any node in seed_node_id_iter.

>>> g0 = FrozenGraph(GraphTables(((-10, -11, -12), (2, 1, 1), (0, 2, 0,), (-20, -21, -22))))
>>> g0.get_forward_transitive_closures((1,))
(set([0, 2]), set([0, 1, 2]))
>>> g0.get_forward_transitive_closures((1,), force_reflexive=True)
(set([0, 1, 2]), set([0, 1, 2]))
>>> g0.get_forward_transitive_closures((1,), max_depth=0)
(set([]), set([]))
>>> g0.get_forward_transitive_closures((1,), force_reflexive=True, max_depth=0)
(set([1]), set([]))
>>> g0.get_forward_transitive_closures((1,), max_depth=1)
(set([0, 2]), set([1, 2]))
>>> g0.get_forward_transitive_closures((1,), max_depth=2)
(set([0, 2]), set([0, 1, 2]))

>>> FrozenGraph(GraphTables(((10, 20), (0, 0, 1, 1), (0, 1, 0, 1), ('00', '01', '10', '11')))).get_forward_transitive_closures((0,))
(set([0, 1]), set([0, 1, 2, 3]))

get_induced_subgraph(start_node_iter, end_node_iter)¶

Returns a FrozenGraph of the induced subgraph between nodes in start_node_iter and nodes in end_node_iter, where each of the iterables yields node ids. If either iterator is empty, an empty graph is returned.

A DAG example

>>> g = FrozenGraph(GraphTables((xrange(12),
...                             (0, 0, 2, 2, 3, 3, 5, 6, 8, 8, 9, 10, 11),
...                             (2, 3, 4, 5, 6, 7, 8, 9, 1, 1, 10, 1, 2),
...                             xrange(13))))
>>> #g.dot_display()
>>> i = g.get_induced_subgraph([0],[1])
>>> i
FrozenGraph(GraphTables(((0, 1, 2, 3, 5, 6, 8, 9, 10), (0, 0, 2, 3, 4, 5, 6, 6, 7, 8), (2, 3, 4, 5, 6, 7, 1, 1, 8, 1), (0, 1, 3, 4, 6, 7, 8, 9, 10, 11))))
>>> #i.dot_display()

Examples involving cycles

>>> j = FrozenGraph(GraphTables((xrange(4),
...                              (0, 0, 1, 0, 3),
...                              (0, 1, 0, 2, 1),
...                              xrange(5))))
>>> #j.dot_display()
>>> j.get_induced_subgraph([3],[2]) == j
True
>>> j.get_induced_subgraph([2],[3])
FrozenGraph(GraphTables(((), (), (), ())))
>>> j.get_induced_subgraph([0],[1])
FrozenGraph(GraphTables(((0, 1), (0, 0, 1), (0, 1, 0), (0, 1, 2))))

If either iterator is empty, an empty graph is returned.

>>> j.get_induced_subgraph([],[1])
FrozenGraph(GraphTables(((), (), (), ())))

>>> j.get_induced_subgraph([0],[])
FrozenGraph(GraphTables(((), (), (), ())))

get_line_graph(node_labeler=None, arc_labeler=None)¶

Returns a new FrozenGraph based on the contents of self. The new graph will be the line-graph of the original, also called the interchange or edge-dual graph.

The line-graph (V*, E*) of a directed graph (V,E) has nodes V* which correspond to the arcs E of the original graph. Two nodes v1 and v2 in V* have an arc between them in E* if and only if the arcs e1 and e2 in E that they correspond to form a path of length two in V, that is, if and only if there is a node v in the original graph such that e1 is an in-arc of v and e2 is an out-arc of v.

By default, nodes in the resulting line-graph will have labels which are triples (pred_node_label, arc_label, succ_node_label) where these labels come from the original graph; arcs in the resulting graph will have labels which are triples (in_arc_label, node_label, out_arc_label), again with labels from the original graph. Callers may provide their own labeling functions for either nodes or arcs, each will be called with three arguments which are exactly the respective triples shown above.

>>> g0 = FrozenGraph(GraphTables(((0, 1, 2, 3), (0, 1, 2), (1, 2, 3), ('A', 'B', 'C'))))

>>> lg0 = g0.get_line_graph()
>>> lg0
FrozenGraph(GraphTables((((0, 'A', 1), (1, 'B', 2), (2, 'C', 3)), (0, 1), (1, 2), (('A', 1, 'B'), ('B', 2, 'C')))))

Here’s an example with a simpler labeler, showing that a diamond-shaped lattice is converted to a disconnected graph. Repeated applications of the line-graph operator result in the empty graph.

>>> diamond = FrozenGraph(GraphTables((('w', 'x', 'y', 'z'), (0, 0, 1, 2), (1, 2, 3, 3), ('A', 'B', 'C', 'D'))))
>>> labeler = lambda left, middle, right: middle
>>> lg1 = diamond.get_line_graph(labeler, labeler)
>>> lg1
FrozenGraph(GraphTables((('A', 'B', 'C', 'D'), (0, 1), (2, 3), ('x', 'y'))))

>>> lg2 = lg1.get_line_graph(labeler, labeler)
>>> lg2
FrozenGraph(GraphTables((('x', 'y'), (), (), ())))

>>> lg3 = lg2.get_line_graph(labeler, labeler)
>>> lg3
FrozenGraph(GraphTables(((), (), (), ())))

Cycle graphs are idempotent under two applications of the line-graph operator.

>>> cycle = FrozenGraph(GraphTables((('a', 'b', 'c', 'd'), (0, 1, 2, 3), (1, 2, 3, 0), ('A', 'B', 'C', 'D'))))
>>> cycle_lg1 = cycle.get_line_graph(labeler, labeler)

>>> cycle_lg1
FrozenGraph(GraphTables((('A', 'B', 'C', 'D'), (3, 0, 1, 2), (0, 1, 2, 3), ('a', 'b', 'c', 'd'))))

>>> cycle_lg2 = cycle_lg1.get_line_graph(labeler, labeler)
>>> cycle_lg2 == cycle
True

Here’s an example with two unconnected cycles of circumferences 3 and 4

>>> two_cycles = FrozenGraph(GraphTables((('a', 'b', 'c', 'd', 'x', 'y', 'z'),
...                                       (0, 1, 2, 3, 4, 5, 6),
...                                       (1, 2, 3, 0, 5, 6, 4),
...                                       ('A', 'B', 'C', 'D', 'X', 'Y', 'Z'))))
>>> two_cycles_lg1 = two_cycles.get_line_graph(labeler, labeler)
>>> two_cycles_lg2 = two_cycles_lg1.get_line_graph(labeler, labeler)
>>> two_cycles_lg2 == two_cycles
True

>>> debruijn_labeler = lambda left, middle, right: str(left) + str(right)[-1]
>>> empty_labeler = lambda a,b,c: None
>>> debruijn0 = FrozenGraph(GraphTables((('0', '1'), (0, 1, 0, 1), (0, 0, 1, 1), (None, None, None, None))))
>>> debruijn1 = debruijn0.get_line_graph(debruijn_labeler, empty_labeler)
>>> debruijn2 = debruijn1.get_line_graph(debruijn_labeler, empty_labeler)
>>> debruijn3 = debruijn2.get_line_graph(debruijn_labeler, empty_labeler)
>>> debruijn3.num_nodes
16

get_mapped_tables(maps)¶

Using the supplied id maps, return a GraphTables object with mapped graph tables.

XXX the funcgtionality of this method and get_topological_reorderings() are largely obsoleted by get_canonical_DAG()

get_nocycles()¶

Returns a new FrozenGraph based on the contents of self. The set of nodes is unchanged. Cycles appearing in the original graph are broken in the new graph by removing cycle-completing arcs. The set of cycle-completing arcs is not canonical; that is, a different ordering of nodes and arcs in the original graph can result in different set of arcs being removed.

XXX note: an alternative algorithm would just reverse each of the cycle-completing arcs

>>> g = FrozenGraph(GraphTables(((-10, -11, -12), (2, 1, 0, 1), (0, 2, 2, 0), (-20, -21, -22, None))))
>>> g.get_nocycles()
FrozenGraph(GraphTables(((-10, -11, -12), (2, 1, 1), (0, 2, 0), (-20, -21, None))))

get_node_in_arcs(node_id)¶

Returns a list of the arc_ids of the incident arcs on node_id.

Query a graph with a single node with two self loops

>>> FrozenGraph(GraphTables(((-10,), (0, 0), (0, 0,), (-20, -21)))).get_node_in_arcs(0)
[0, 1]

get_node_label(node_id)¶: Return the node label associated with the node_id.

get_node_out_arcs(node_id)¶

Returns a list of the arc_ids of the incident arcs from node_id.

Query a graph with a single node with two self loops

>>> FrozenGraph(GraphTables(((-10,), (0, 0), (0, 0,), (-20, -21)))).get_node_out_arcs(0)
[0, 1]

get_one_best(arc_cost_function, random_tie_breaks=False)¶

Using arc_cost_function to generate the cost for each arc, return a pair, (cost, arc_ids), giving the total cost of the cheapest path through the lattice and the sequence of arc_ids making up that path.

If optional random_tie_breaks is not False, use the value to seed randomness that’s used to break scoring ties. Use a value other than None for reproducible randomness. Clients using the random_tie_breaks option are strongly encouraged to use an arc_cost_function that returns integer values.

See also iterpaths() for a more general path-finding interface.

Both properties is_lattice and is_strict_dag must be True.

Here’s a latice, arc labels are unique non-negative ints

>>> lat = make_random_lattice(50, 3, 10, 400, node_labeler=2, seed=1)
>>> lat.num_nodes, lat.num_arcs
(21, 23)
>>> 

Make the cost just be a modulus plus one, get the one-best

>>> def cost_func(label): return label % 4 + 1
>>> cost, arc_ids = lat.get_one_best(cost_func)
>>> print cost, arc_ids
13 [3, 6, 14, 18, 21, 22]

To visualize, uncomment the dot_display

>>> arc_labels = tuple(lat.get_arc_label(arc_id) for arc_id in arc_ids)
>>> assert len(set(arc_labels)) == len(arc_ids)
>>> def alc(label, *_): return (label, label % 4 + 1)
>>> def aac(label):
...   if label in arc_labels: yield 'style=bold'
>>> #_ = lat.dot_display(globals=['rankdir=LR', 'label="Calvary"', 'labelloc=top', 'fontsize=24'], arc_label_callback=alc, arc_attributes_callback=aac)

A big one

>>> lat2 = make_random_lattice(500, 3, 10, 700, node_labeler=4, seed=1)
>>> lat2.num_nodes, lat2.num_arcs
(1401, 3258)
>>> cost_one_lat2, arc_ids = lat2.get_one_best(cost_func)        
>>> cost_one_lat2, len(arc_ids)
(62, 40)

Verify same score for top choice from iterpaths

>>> start_node, = lat2.startnodes
>>> end_node, = lat2.endnodes
>>> iterpaths = lat2.iterpaths(cost_func, start_node, end_node)
>>> cost_iter_lat2, path, graph = iterpaths.next()
>>> cost_iter_lat2, len(path)
(62, 40)

Even bigger

>>> lat3 = make_random_lattice(500, 4, 7, 700, node_labeler=8, seed=1)
>>> lat3.num_nodes, lat3.num_arcs
(3314, 10249)
>>> cost_one_lat3, arc_ids3 = lat3.get_one_best(cost_func)        
>>> cost_one_lat3, len(arc_ids3)
(68, 55)

Verify same score for top choice from iterpaths, even though the best paths chosen by the two algorithms are different!

>>> start_node, = lat3.startnodes
>>> end_node, = lat3.endnodes
>>> iterpaths = lat3.iterpaths(cost_func, start_node, end_node)
>>> cost_iter_lat3, path, graph = iterpaths.next()
>>> cost_iter_lat3, len(path)
(68, 54)

>>> cost_iter_lat3 == cost_one_lat3
True

>>> tuple(iterpaths.next()[0] for _ in xrange(20))
(68, 68, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69)

Verify random selection

>>> cost_one_lat3_rand, arc_ids_rand3 = lat3.get_one_best(cost_func, random_tie_breaks=0)
>>> arc_ids_rand3 == arc_ids3
False

Lattice with more identical top choices; show that random one_best finds them all

>>> def cost_func2(label): return label % 2 + 1
>>> lat4 = make_random_lattice(500, 8, 8, 800, node_labeler=6, seed=1)
>>> lat4.num_nodes, lat4.num_arcs
(2887, 19093)

The non-random one-best

>>> cost_one_lat4, arc_ids4 = lat4.get_one_best(cost_func2)
>>> cost_one_lat4
37

Count number of tied top choices

>>> start4, = lat4.startnodes
>>> end4, = lat4.endnodes
>>> iterpaths4 = lat4.iterpaths(cost_func2, start4, end4)
>>> for num_same in itertools.count():
...   cost, path, graph = iterpaths4.next()
...   assert cost >= cost_one_lat4
...   if cost > cost_one_lat4: break
>>> num_same
8

See how many times it takes get_one_best with random tie breaks to enumerate all the tied top choices

>>> paths = set()
>>> paths.add(tuple(arc_ids4))
>>> for num_tries in itertools.count():
...   cost_one_lat4_rand, arc_ids_rand4 = lat4.get_one_best(cost_func2, random_tie_breaks=num_tries)
...   assert cost_one_lat4_rand == cost_one_lat4
...   paths.add(tuple(arc_ids_rand4))
...   assert len(paths) <= num_same
...   if len(paths) == num_same: break
>>> num_tries
13

See how iterpaths_rea does

>>> iterpaths_rea4 = lat4.iterpaths_rea(cost_func2)
>>> for num_same2 in itertools.count():
...   cost2, path, graph = iterpaths_rea4.next()
...   assert cost2 >= cost_one_lat4
...   if cost2 > cost_one_lat4: break
>>> num_same2, cost2
(8, 38)

>>> for num_same3 in itertools.count():
...   cost3, path, graph = iterpaths_rea4.next()
...   assert cost3 >= cost2
...   if cost3 > cost2: break
>>> num_same3, cost3
(13261, 39)

Test the errors

>>> make_random_DAG(1, 15, 2, 4, 2, 600, seed=0).get_one_best(lambda x: 1)
Traceback (most recent call last):
  ...
ValueError: expected both is_lattice and is_strict_dag to be True, got False True

>>> FrozenGraph(GraphTables(([None], [0], [0], [None]))).get_one_best(lambda x: 1)
Traceback (most recent call last):
  ...
ValueError: expected both is_lattice and is_strict_dag to be True, got True False

get_reversed()¶

Returns a new FrozenGraph based on the contents of self. In the new graph the directions of all the arcs are reversed.

>>> g = FrozenGraph(GraphTables(((-10, -11, -12), (2, 1, 1), (0, 2, 0,), (-20, -21, -22))))
>>> g.get_reversed()
FrozenGraph(GraphTables(((-10, -11, -12), (0, 2, 0), (2, 1, 1), (-20, -21, -22))))
>>> g.get_reversed().get_reversed() == g
True

get_topological_reorderings()¶

Find a topological ordering for the nodes and arcs in an acyclic graph.

Return two sequences, each of which maps old ids to new ids. The first is the map of node ids, the second is the map of arc ids.

XXX the funcgtionality of this method and get_mapped_tables() are largely obsoleted by get_canonical_DAG()

has_cycle¶: Return True if there are is any multi-node cycle in the graph, False otherwise.

has_join¶: True if there are any joins in the graph, False otherwise. A join occurs when two or more arcs end on the same node.

has_multiarcs¶: Return True if there are any two distinct nodes with parallel arcs between them, False if there is either zero or one arc between each pair of distinct nodes. This measure is of more interest for undirected graphs than directed graphs.

has_self_loop¶: Return True if there is any node with a self loop, False otherwise.

is_confusion_network¶

Returns True if the graph has the structure of a confusion network (also called a consensus network, or a sausage, or a linear grammar).

A confusion network is a strict DAG (no cycles or self loops) with a single start node and a single end node. The essential structural property is that for each node in the graph, u, all the arcs that start from u end on a single node, v, that is distinct from u.

>>> FrozenGraph(GraphTables((tuple(xrange(3)), (0,0,1,1,1,1), (1,1,2,2,2,2), tuple(xrange(6))))).is_confusion_network
True
>>> print FrozenGraph(GraphTables((tuple(xrange(3)), (0,0,1,1,1,1), (1,2,2,2,2,2), tuple(xrange(6))))).is_confusion_network
False

is_connected¶: Return True if the nodes of the graph are weakly connected, False if there are disconnected subsets of nodes.

is_lattice¶: Return True if the graph has no cycles and has a single start node and a single end node.

is_strict_dag¶: Return True if the graph has no self loops and no cycles.

is_tree¶: True if the graph is a tree, False otherwise. There are many ways to decide if a graph is a tree. For our purposes, a graph is a tree if it has a single startnode and no joins.

is_tree_strict¶: True if the graph is strictly a tree, False otherwise. There are many ways to define a strict a tree. For our purposes, a graph is a strict tree if it has a single startnode, no joins, and no self-loops.

iter_arc_ngrams(ngram_order)¶

Returns a generator that yields information about the arc-ngram-sequences of length ngram_order in the graph.

Each iteration yields a pair of tuples for an ngram in the graph. The first tuple in the pair is the sequence of ngram_order + 1 node labels for the nodes that define the ngram. The second tuple in the pair is the sequence of ngram_order arc labels for the arcs in the ngram.

Raises ValueError if self.is_strict_dag() is not True.

An example showing all ngrams of order 2 or more For readability purposes, we convert the node labels into node indicies.

>>> g = make_random_DAG(1, 15, 2, 4, 2, 600, seed=0)
>>> nodelabels, arcstartnodes, arcendnodes, arclabels = g.graphseqs
>>> g = FrozenGraph(GraphTables((tuple(xrange(len(nodelabels))), arcstartnodes, arcendnodes, arclabels)))

>>> for ngram in g.iter_arc_ngrams(2): print ngram
((0, 1, 6), (0, 6))
((0, 1, 6), (1, 6))
((0, 3, 4), (3, 4))
((1, 6, 9), (6, 10))
((3, 4, 5), (4, 5))
((3, 4, 7), (4, 7))
((3, 4, 8), (4, 8))
((4, 5, 9), (5, 9))
>>> for ngram in g.iter_arc_ngrams(3): print ngram
((0, 1, 6, 9), (0, 6, 10))
((0, 1, 6, 9), (1, 6, 10))
((0, 3, 4, 5), (3, 4, 5))
((0, 3, 4, 7), (3, 4, 7))
((0, 3, 4, 8), (3, 4, 8))
((3, 4, 5, 9), (4, 5, 9))
>>> for ngram in g.iter_arc_ngrams(4): print ngram
((0, 3, 4, 5, 9), (3, 4, 5, 9))
>>> for ngram in g.iter_arc_ngrams(5): print ngram

iter_nodes(reverse_order=False)¶: Returns a generator over the nodes in the graph. The generator yields a three-item tuple: the first item is the label of the node, the second item is a generator over the in-arcs of the node, and the third item is a generator over the out-arcs of the node. Each of these arc generators yields a pair (arc_label, node_label) where the node label is the label of the node at the other end of the arc relative to the node currently being yielded. When a topological ordering of nodes is possible, the nodes will be iterated in forward topological order by default, and in reverse topological order if reverse_order is True. If there is no topological ordering because the graph has cycles, then the order of iteration is undefined.

iter_nodes_by_id(reverse_order=False)¶

Returns a generator over the nodes in the graph. The generator yields a three-item tuple: the first item is the id of the node, the second item is a generator over the in-arcs of the node, and the third item is a generator over the out-arcs of the node. Each of these arc generators yields a pair (arc_id, node_id) where the node id is the id of the node at the other end of the arc relative to the node currently being yielded. When a topological ordering of nodes is possible, the nodes will be iterated in forward topological order by default, and in reverse topological order if reverse_order is True. If there is no topological ordering because the graph has cycles, then the order of iteration is undefined.

>>> fg0 = FrozenGraph(GraphTables(((10, 11, 12, 13, 14), (0, 0, 1, 2, 3), (1, 2, 3, 3, 4), ('A', 'B', 'C', 'D', 'E'))))
>>> for node_id, in_arc_iter, out_arc_iter in fg0.iter_nodes_by_id():
...     print('Node with id %d and label %s' % (node_id, fg0.get_node_label(node_id)))
...     for arc_id, pred_node_id in in_arc_iter:
...         print('In-arc from node %d with id %d and label %s' % (pred_node_id, arc_id, fg0.get_arc_label(arc_id)))
...     for arc_id, succ_node_id in out_arc_iter:
...         print('Out-arc to node %d with id %d and label %s' % (succ_node_id, arc_id, fg0.get_arc_label(arc_id)))
Node with id 0 and label 10
Out-arc to node 1 with id 0 and label A
Out-arc to node 2 with id 1 and label B
Node with id 1 and label 11
In-arc from node 0 with id 0 and label A
Out-arc to node 3 with id 2 and label C
Node with id 2 and label 12
In-arc from node 0 with id 1 and label B
Out-arc to node 3 with id 3 and label D
Node with id 3 and label 13
In-arc from node 1 with id 2 and label C
In-arc from node 2 with id 3 and label D
Out-arc to node 4 with id 4 and label E
Node with id 4 and label 14
In-arc from node 3 with id 4 and label E

iterpaths(cost_func, start_node, end_node, _optimize=True)¶

Returns a generator that yields paths between start_node and end_node based on the summed cost of the arcs on the path. The cost_func will be called with a single argument, an arc label, and it must return the cost for the arc.

See also get_one_best() for use when is_lattice is True and you just need a single best-path through the lattice.

The generator yields a three-item tuple, (cost, arc_ids, self). The first item is the cost for the path, the second item is a tuple of arc-ids for the path, the third item is the graph object itself. The costs are non-decreasing for successive yields. See get_arc() for getting the information associated with an arc_id.

It is an error to call this if the graph contains cycles. See strict_is_cyclic().

Because this DAG is not a lattice, this graph exercises one internal optimization

>>> dag = make_random_DAG(1, 15, 2, 4, 2, 600, seed=0)
>>> for cost, arcs, graph in dag.iterpaths(lambda x: 1, 0, 6): print cost, arcs
2 (0, 6)
2 (1, 6)
>>> dag._last_iterpaths_count
4
>>> for cost, arcs, graph in dag.iterpaths(lambda x: 1, 0, 6, _optimize=False): print cost, arcs
2 (0, 6)
2 (1, 6)
>>> dag._last_iterpaths_count
11

iterpaths_rea(arc_cost_func)¶

Returns a generator that yields paths between the start nodes and end nodes of a DAG. The arc_cost_func will be called with a single argument, an arc label, and it must return the cost of the arc.

The generator yields a three-item tuple, (cost, arc_ids, self). The first item is the cost for the path, the second item is a tuple of arc-ids for the path, the third item is the graph object itself. The costs are non-decreasing for successive yields. See get_arc() for getting the information associated with an arc_id.

It is an error to call this if the graph contains non-positve cycles.

>>> def cost_func(label): return label
>>> dag = make_random_DAG(15, 15, 3, 3, 5, 300, seed=0)
>>> dag.startnodes, dag.endnodes
((0, 1, 2, 3, 4), (4, 5, 8, 12, 13, 14, 19, 20, 22, 29, 30, 31, 32, 33, 34))
>>> [x for x in dag.iterpaths_rea(cost_func)]
Traceback (most recent call last):
  ...
ValueError: FrozenGraph must not have nodes that have no input or output arcs. Encountered empty nodes set([4])

>>> dag = make_random_DAG(5, 15, 5, 3, 5, 400, seed=0)
>>> dag.startnodes
(0, 1)
>>> dag.endnodes
(7, 11, 16, 18, 19, 22, 25, 42, 50, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74)
>>> for num_choice, (cost, arc_seq, _) in enumerate(dag.iterpaths_rea(cost_func)):
...     print 'Choice %d  Cost %d  Arcs %s' %(num_choice, cost, arc_seq)
...     if num_choice > 10: break
Choice 0  Cost 5  Arcs (5,)
Choice 1  Cost 15  Arcs (4, 11)
Choice 2  Cost 20  Arcs (20,)
Choice 3  Cost 25  Arcs (6, 19)
Choice 4  Cost 25  Arcs (4, 21)
Choice 5  Cost 26  Arcs (7, 19)
Choice 6  Cost 28  Arcs (12, 16)
Choice 7  Cost 30  Arcs (4, 26)
Choice 8  Cost 31  Arcs (4, 8, 19)
Choice 9  Cost 33  Arcs (3, 30)
Choice 10  Cost 46  Arcs (15, 31)
Choice 11  Cost 55  Arcs (1, 22, 32)

>>> #dag.dot_display(globals=['rankdir=LR'])
>>> ssdag = make_random_dag(50, 3, 10, 400, node_labeler=2, seed=1)
>>> ssdag.num_nodes, ssdag.num_arcs
(47, 54)
>>> ssdag.endnodes
(7, 10, 11, 12, 19, 21, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)

>>> #ssdag.dot_display(globals=['rankdir=LR'])

Make the cost just be a modulus plus one, get the one-best

>>> for num_choice, (cost, arc_seq, _) in enumerate(ssdag.iterpaths_rea(cost_func)):
...     print 'Choice %d  Cost %d  Arcs %s' %(num_choice, cost, arc_seq)
...     label_seq = tuple(ssdag.get_arc_label(arc_id) for arc_id in arc_seq)
...     # uncomment to get lattice and path displayed.
...     #def alc(label, *_): return (label, label)
...     #def aac(label):
...     #    yield 'style=bold' if label in label_seq else 'style=dotted'
...     #_ = ssdag.dot_display(globals=['rankdir=LR', 'label="Choice %d"' %(num_choice), 'labelloc=top', 'fontsize=24'], arc_label_callback=alc, arc_attributes_callback=aac)
Choice 0  Cost 7  Arcs (0, 1, 6)
Choice 1  Cost 9  Arcs (0, 2, 7)
Choice 2  Cost 11  Arcs (0, 11)
Choice 3  Cost 17  Arcs (0, 5, 12)
Choice 4  Cost 23  Arcs (0, 2, 3, 8, 10)
Choice 5  Cost 32  Arcs (0, 2, 3, 8, 19)
Choice 6  Cost 37  Arcs (4, 13, 20)
Choice 7  Cost 67  Arcs (0, 2, 3, 8, 14, 18, 22)
Choice 8  Cost 81  Arcs (4, 13, 23, 41)
Choice 9  Cost 89  Arcs (0, 2, 3, 17, 25, 42)
Choice 10  Cost 90  Arcs (0, 2, 3, 17, 25, 43)
Choice 11  Cost 99  Arcs (0, 5, 9, 15, 31, 39)
Choice 12  Cost 101  Arcs (0, 2, 3, 17, 30, 49)
Choice 13  Cost 106  Arcs (0, 5, 9, 15, 31, 46)
Choice 14  Cost 110  Arcs (0, 5, 9, 15, 16, 27, 38)
Choice 15  Cost 111  Arcs (0, 5, 9, 15, 31, 51)
Choice 16  Cost 114  Arcs (4, 13, 23, 34, 40)
Choice 17  Cost 117  Arcs (0, 2, 3, 17, 25, 26, 44)
Choice 18  Cost 121  Arcs (4, 13, 23, 34, 47)
Choice 19  Cost 122  Arcs (4, 13, 23, 34, 48)
Choice 20  Cost 130  Arcs (4, 13, 28, 35, 50)
Choice 21  Cost 132  Arcs (4, 13, 28, 35, 52)
Choice 22  Cost 133  Arcs (4, 13, 28, 35, 53)
Choice 23  Cost 138  Arcs (0, 2, 3, 17, 30, 36, 50)
Choice 24  Cost 140  Arcs (0, 2, 3, 17, 30, 36, 52)
Choice 25  Cost 141  Arcs (0, 2, 3, 17, 30, 36, 53)
Choice 26  Cost 143  Arcs (0, 5, 9, 15, 16, 27, 32, 39)
Choice 27  Cost 150  Arcs (0, 5, 9, 15, 16, 27, 32, 46)
Choice 28  Cost 151  Arcs (0, 2, 3, 17, 25, 26, 33, 45)
Choice 29  Cost 155  Arcs (0, 5, 9, 15, 16, 27, 32, 51)
Choice 30  Cost 193  Arcs (0, 2, 3, 17, 25, 26, 33, 37, 50)
Choice 31  Cost 195  Arcs (0, 2, 3, 17, 25, 26, 33, 37, 52)
Choice 32  Cost 196  Arcs (0, 2, 3, 17, 25, 26, 33, 37, 53)
Choice 33  Cost 204  Arcs (0, 5, 9, 15, 16, 21, 24, 29, 35, 50)
Choice 34  Cost 206  Arcs (0, 5, 9, 15, 16, 21, 24, 29, 35, 52)
Choice 35  Cost 207  Arcs (0, 5, 9, 15, 16, 21, 24, 29, 35, 53)

>>> def cost_func1(label): return label % 4 + 1
>>> def countem(lat):
...   #return lat.num_nodes, lat.num_arcs
...   min_cost, max_cost = sys.maxint, 0
...   min_len, max_len = sys.maxint, 0
...   prev_cost = 0
...   for count, (cost, arcs, _) in enumerate(lat.iterpaths_rea(cost_func1)):
...     assert cost >= prev_cost
...     prev_cost = cost
...     if cost < min_cost: min_cost = cost
...     if cost > max_cost: max_cost = cost
...     if len(arcs) < min_len: min_len = len(arcs)
...     if len(arcs) > max_len: max_len = len(arcs)
...   return min_cost, max_cost, min_len, max_len, count

>>> lat = make_random_lattice(55, 2, 3, 750, node_labeler=1, seed=13)
>>> #lat.dot_display(globals=['rankdir=LR'])
>>> countem(lat)
(32, 57, 14, 22, 1535)
>>> countem(make_random_lattice(150, 2, 10, 900, node_labeler=2, seed=0))
(31, 72, 13, 24, 3926)
>>> countem(make_random_lattice(100, 2, 4, 600, node_labeler=2, seed=0))
(43, 85, 18, 32, 8779)
>>> countem(make_random_lattice(100, 3, 5, 500, node_labeler=2, seed=0))
(33, 77, 15, 28, 47151)
>>> countem(make_random_lattice(40, 7, 10, 800, node_labeler=4, seed=0))
(10, 45, 5, 17, 2994)
>>> countem(make_random_lattice(50, 2, 15, 1000, node_labeler=2, seed=1))
(14, 21, 6, 9, 3)
>>> countem(make_random_lattice(50, 2, 15, 1000, seed=1))
(13, 23, 7, 10, 7)
>>> countem(make_random_lattice(50, 3, 10, 400, node_labeler=2, seed=1))
(13, 24, 6, 11, 3)
>>> countem(make_random_lattice(60, 2, 3, 700, node_labeler=2, seed=0))
(33, 65, 16, 24, 4057)
>>> countem(make_random_lattice(60, 2, 3, 700, node_labeler=2, seed=0))
(33, 65, 16, 24, 4057)

>>> countem(make_random_lattice(500, 3, 10, 700, node_labeler=4, seed=1)) 
>>> countem(make_random_lattice(250, 6, 25, 600, node_labeler=50, seed=0)) 
>>> countem(make_random_lattice(500, 4, 7, 700, node_labeler=8, seed=1)) 
>>> countem(make_random_lattice(500, 8, 8, 800, node_labeler=6, seed=1)) 

>>> lat = FrozenGraph(GraphTables(((16, 17, 10, 11, 12, 14, 22, 4, 5, 6, 0, 23, 3),
...                                (0, 2, 4, 0, 7, 3, 9, 10, 3, 8, 1, 3, 10, 6, 10, 12, 5),
...                                (1, 3, 5, 6, 8, 4, 2, 8, 6, 9, 6, 5, 7, 11, 12, 7, 0),
...                                (100, 100, 25, 100, 100, 100, 10, 10, 10, 100, 100, 10, 14, 30, 14, 100, 100))))
>>> #lat.dot_display(globals=['rankdir=LR'])
>>> for num_choice, (cost, arc_seq, _) in enumerate(lat.iterpaths_rea(cost_func1)):
...     print 'Choice %d  Cost %d  Arcs %s' %(num_choice, cost, arc_seq)
...     label_seq = tuple(lat.get_arc_label(arc_id) for arc_id in arc_seq)
Choice 0  Cost 14  Arcs (7, 9, 6, 1, 8, 13)
Choice 1  Cost 15  Arcs (12, 4, 9, 6, 1, 8, 13)
Choice 2  Cost 16  Arcs (7, 9, 6, 1, 11, 16, 3, 13)
Choice 3  Cost 16  Arcs (7, 9, 6, 1, 5, 2, 16, 3, 13)
Choice 4  Cost 16  Arcs (14, 15, 4, 9, 6, 1, 8, 13)
Choice 5  Cost 17  Arcs (12, 4, 9, 6, 1, 5, 2, 16, 3, 13)
Choice 6  Cost 17  Arcs (12, 4, 9, 6, 1, 11, 16, 3, 13)
Choice 7  Cost 17  Arcs (7, 9, 6, 1, 11, 16, 0, 10, 13)
Choice 8  Cost 17  Arcs (7, 9, 6, 1, 5, 2, 16, 0, 10, 13)
Choice 9  Cost 18  Arcs (14, 15, 4, 9, 6, 1, 5, 2, 16, 3, 13)
Choice 10  Cost 18  Arcs (14, 15, 4, 9, 6, 1, 11, 16, 3, 13)
Choice 11  Cost 18  Arcs (12, 4, 9, 6, 1, 5, 2, 16, 0, 10, 13)
Choice 12  Cost 18  Arcs (12, 4, 9, 6, 1, 11, 16, 0, 10, 13)
Choice 13  Cost 19  Arcs (14, 15, 4, 9, 6, 1, 5, 2, 16, 0, 10, 13)
Choice 14  Cost 19  Arcs (14, 15, 4, 9, 6, 1, 11, 16, 0, 10, 13)

node_labels_are_node_ids¶: Return True if each node label is the index of the node.

num_arcs¶: The number of arcs in the graph.

num_endnodes¶: Number of end nodes in the graph.

num_nodes¶: The number of nodes in the graph.

num_startnodes¶: Number of start nodes in the graph.

startnodes¶: A tuple of the indices of the starting-node ids. These are nodes with no incoming arcs.

strict_is_cyclic¶: Return True if the graph contains one or more cycle of any kind, False if the graph is strictly acyclic (strictly a DAG). For the purposes of this function, a multinode cycle or a self loop constitutes cyclicity. See also has_self_loop() and has_cycle().

terminals¶: A pair of tuples, the startnodes and the endnodes.

text_iter()¶: Returns a generator that yields lines of text, including newlines. The text represents the graph in a human-readable, serializable form.

verify()¶

class onyx.graph.graphtools.FrozenGraphArcNodeStructure(num_nodes=None, arc_data=None)¶

Bases: onyx.graph.graphtools.GraphArcNodeStructure

static create(num_nodes, arc_data)¶

end_nodes¶: A sorted tuple of the of end nodes in the graph. A end node is a node that has no arc leading from it to another node.

has_self_loop¶: True if there is any node in the graph with an arc to itself, False if there is no node in the graph with an arc to itself.

iter_arc_ids¶: An iterator that enumerates valid arc identifiers in the graph. It’s like xrange(graph.num_arcs), except that for mutable subclasses, the iterator correctly iterates over identifiers for arcs that are added while the iteration is active.

iter_arcs¶: An iterator that enumerates the arcs in the graph in their arc identifier order. It yields a pair, (start_node_id, end_node_id), for each arc in the graph. For mutable subclasses, the iterator correctly iterates over arcs that are added while the iteration is active.

iter_node_ids¶: An iterator that enumerates valid node identifiers in the graph. It’s like xrange(graph.num_nodes), except that for mutable subclasses, the iterator correctly iterates over identifiers for nodes that are added while the iteration is active.

num_arcs¶: The number of arcs in the graph.

num_end_nodes¶: Number of end nodes in the graph. A end node is a node that has no arc leading from it to another node.

num_nodes¶: The number of nodes in the graph.

num_start_nodes¶: Number of start nodes in the graph. A start node is a node that has no arc from another node ending on it.

self_loops¶: A sorted tuple of the nodes in the graph, each of which has one or more arcs to itself.

start_nodes¶: A sorted tuple of the start nodes in the graph. A start node is a node that has no arc from another node ending on it.

verify()¶

class onyx.graph.graphtools.FrozenSubgraphs(subgraphs)¶

Bases: onyx.graph.graphtools._SubgraphsGetter

An immutable collection of subgraph specifiers.

>>> sb = SubgraphsBuilder()
>>> sb.add_subgraph(xrange(10), xrange(2), xrange(3, 5))
0
>>> sb.add_subgraph((47, 49, 51, 55), (47,), ())
1
>>> FrozenSubgraphs(sb).get_subgraph(0)
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1), (3, 4))

get_subgraph(index)¶

Returns a triple of tuples: (nodes, starts, ends) representing the subgraph at index.

Raises ValueError if index is not a valid indentifier for a subgraph.

class onyx.graph.graphtools.GraphArcNodeStructure¶

Bases: onyx.graph.graphtools.GraphArcStructure

A baseclass for graphical objects. In addition to being a GraphArcStructure it provides property access to derived information about distinguished nodes in the graph.

Subclasses must initialize self._node_data with three sets: a set of the identifiers for nodes that have self loops, a set of the identifiers for the starting nodes, and a set of the identifiers for the ending nodes of the graph. If the subclass is mutable, it must maintain these three sets.

static create(num_nodes, arc_data)¶

end_nodes¶: A sorted tuple of the of end nodes in the graph. A end node is a node that has no arc leading from it to another node.

has_self_loop¶: True if there is any node in the graph with an arc to itself, False if there is no node in the graph with an arc to itself.

iter_arc_ids¶: An iterator that enumerates valid arc identifiers in the graph. It’s like xrange(graph.num_arcs), except that for mutable subclasses, the iterator correctly iterates over identifiers for arcs that are added while the iteration is active.

iter_arcs¶: An iterator that enumerates the arcs in the graph in their arc identifier order. It yields a pair, (start_node_id, end_node_id), for each arc in the graph. For mutable subclasses, the iterator correctly iterates over arcs that are added while the iteration is active.

iter_node_ids¶: An iterator that enumerates valid node identifiers in the graph. It’s like xrange(graph.num_nodes), except that for mutable subclasses, the iterator correctly iterates over identifiers for nodes that are added while the iteration is active.

num_arcs¶: The number of arcs in the graph.

num_end_nodes¶: Number of end nodes in the graph. A end node is a node that has no arc leading from it to another node.

num_nodes¶: The number of nodes in the graph.

num_start_nodes¶: Number of start nodes in the graph. A start node is a node that has no arc from another node ending on it.

self_loops¶: A sorted tuple of the nodes in the graph, each of which has one or more arcs to itself.

start_nodes¶: A sorted tuple of the start nodes in the graph. A start node is a node that has no arc from another node ending on it.

verify()¶: Verify the object’s invariants.

class onyx.graph.graphtools.GraphArcStructure¶

Bases: object

A baseclass for graphical objects. It provides property access to the lowest-level structural information about the graph.

Subclasses must initialize self._num_nodes with the number of nodes, and self._arc_data with a pair of sequences representing the starting and ending nodes of the arcs in the graph. This baseclass is agnostic as to the types of the sequences. However, if the subclass is mutable, e.g. the sequences are lists, it must maintain the invariants involving self._num_nodes and self._arc_data that the private _verify method successfully checks the invariants.

static create(num_nodes=None, arc_data=None)¶

iter_arc_ids¶: An iterator that enumerates valid arc identifiers in the graph. It’s like xrange(graph.num_arcs), except that for mutable subclasses, the iterator correctly iterates over identifiers for arcs that are added while the iteration is active.

iter_arcs¶: An iterator that enumerates the arcs in the graph in their arc identifier order. It yields a pair, (start_node_id, end_node_id), for each arc in the graph. For mutable subclasses, the iterator correctly iterates over arcs that are added while the iteration is active.

iter_node_ids¶: An iterator that enumerates valid node identifiers in the graph. It’s like xrange(graph.num_nodes), except that for mutable subclasses, the iterator correctly iterates over identifiers for nodes that are added while the iteration is active.

num_arcs¶: The number of arcs in the graph.

num_nodes¶: The number of nodes in the graph.

verify()¶: Verify the object’s invariants.

class onyx.graph.graphtools.GraphBase(seqs)¶

Bases: object

Base class for graph work, contains utility and verification mixins.