Onyx logo

Previous topic

onyx.graph.decodergraph – A graph builder that can decode while it’s building.

Next topic

onyx.graph.graphsandbox – Sandbox where new graph stuff is developed.

This Page

onyx.graph.dynamicgraph – Tools for working with dynamic, or online, graphs.

These graphs have well-defined semantics that they maintain as edges are added and removed from the graph.

At present, UndirectedOnlineInvariantsGraph is the only such dynamic graph.

class onyx.graph.dynamicgraph.PointProcessWindow(length, add, remove)

Bases: object

Obsolete: see from onyx.util import pointprocess, pointprocess.PointProcessSamplingWindow

FIFO window based on timestamps.

>>> secs_per_minute = 60
>>> secs_per_hour = secs_per_minute * 60
>>> secs_per_day = secs_per_hour * 24
>>> secs_per_week = secs_per_day * 7
>>> secs_per_year = secs_per_day * 365.2425
>>> g = UndirectedOnlineInvariantsGraph()
>>> w = PointProcessWindow(secs_per_week, g.add_edge, g.remove_edge)
>>> w.length
604800

Get the invariants at which each of the features is first maximal

>>> maxen = collections.defaultdict(lambda:collections.defaultdict(int))
>>> fields = 'employeeIndexDict[From]', 'employeeIndexDict[To]', 'Epoch', 'Topic'
>>> with open(os.path.join(_module_dir, 'enron_subset.csv'), 'rb') as infile:
...   for msg_index, (msg_from, msg_to, msg_epoch, msg_topic) in enumerate(iterutils.csv_itemgetter(infile, fields)):
...     w.add((msg_from, msg_to), msg_epoch)
...     features = g.invariants
...     for key, value in features.iteritems():
...       if value > maxen[key][key]:
...          assert features['scan1'] == g._scan1_brute
...          features2 = dict(features)
...          features2['msg_index'] = msg_index
...          features2['msg_day'] = int((msg_epoch - w.stats.min_timestamp) / secs_per_day)
...          maxen[key] = features2
>>> g.invariants
{'mad_lower_k': 5958, 'max_degree': 24, 'num_triangles': 157, 'order': 114, 'scan1': 50, 'size': 247}
>>> stats = w.stats
>>> duration_sec = int(stats.max_timestamp - stats.min_timestamp)
>>> duration_day = duration_sec // secs_per_day
>>> duration_year = int(duration_sec / secs_per_year)
>>> print 'duration_year', duration_year, ' duration_day', duration_day, ' duration_sec', duration_sec
duration_year 0  duration_day 9  duration_sec 850974
>>> print 'num_added', stats.num_added, ' num_removed', stats.num_removed
num_added 2500  num_removed 748
>>> print 'length', stats.length, ' max_queued', stats.max_queued
length 604800  max_queued 2330
>>> for key in sorted(maxen): print key, maxen[key][key], ' ', maxen[key]
mad_lower_k 6690   {'num_triangles': 177, 'msg_day': 6, 'scan1': 91, 'mad_lower_k': 6690, 'size': 283, 'order': 123, 'msg_index': 1941, 'max_degree': 52}
max_degree 52   {'num_triangles': 103, 'msg_day': 4, 'scan1': 81, 'mad_lower_k': 5765, 'size': 213, 'order': 116, 'msg_index': 1329, 'max_degree': 52}
num_triangles 177   {'num_triangles': 177, 'msg_day': 6, 'scan1': 91, 'mad_lower_k': 6690, 'size': 283, 'order': 123, 'msg_index': 1941, 'max_degree': 52}
order 123   {'num_triangles': 162, 'msg_day': 6, 'scan1': 91, 'mad_lower_k': 6160, 'size': 272, 'order': 123, 'msg_index': 1930, 'max_degree': 52}
scan1 91   {'num_triangles': 155, 'msg_day': 6, 'scan1': 91, 'mad_lower_k': 6000, 'size': 260, 'order': 122, 'msg_index': 1907, 'max_degree': 52}
size 283   {'num_triangles': 177, 'msg_day': 6, 'scan1': 91, 'mad_lower_k': 6690, 'size': 283, 'order': 123, 'msg_index': 1941, 'max_degree': 52}
add(obj, timestamp)
length
stats
class onyx.graph.dynamicgraph.UndirectedOnlineInvariantsGraph(source=None)

Bases: object

An undirected graph that maintains a set of invariant features as edges are added and removed.

Create a graph and look at its initial invariant feature set

>>> g = UndirectedOnlineInvariantsGraph()
>>> g.invariants
{'mad_lower_k': 0, 'max_degree': 0, 'num_triangles': 0, 'order': 0, 'scan1': 0, 'size': 0}

Add some edges

>>> for edge in ((0, 1), (0, 2), (1, 3), (2, 3), (2, 4), (3, 4), (1, 'x')):
...   _ = g.add_edge(edge)

Look at the features

>>> g.invariants
{'mad_lower_k': 2400, 'max_degree': 3, 'num_triangles': 1, 'order': 6, 'scan1': 4, 'size': 7}

Add more edges, looking at what gets returned: the normalized edge, and the multigraph count

>>> g.add_edge((0, 3))
((0, 3), 1)

The returned count is one, meaning the edge was a new edge, so the set of features has changed. We see that we’ve changed most of the features; we haven’t changed the order because we didn’t add a new node.

>>> g.invariants
{'mad_lower_k': 2800, 'max_degree': 4, 'num_triangles': 3, 'order': 6, 'scan1': 7, 'size': 8}

Add the edge again. Since the return count is greater than one, the edge was not new to the graph, and so none of the features has changed.

>>> g.add_edge((3, 0))
((0, 3), 2)
>>> g.invariants
{'mad_lower_k': 2800, 'max_degree': 4, 'num_triangles': 3, 'order': 6, 'scan1': 7, 'size': 8}

The incidence matrix

>>> incidence, nodes = g.incidence_matrix
>>> incidence
array([[0, 1, 1, 1, 0, 0],
       [1, 0, 0, 1, 0, 1],
       [1, 0, 0, 1, 1, 0],
       [1, 1, 1, 0, 1, 0],
       [0, 0, 1, 1, 0, 0],
       [0, 1, 0, 0, 0, 0]], dtype=int8)
>>> nodes
[0, 1, 2, 3, 4, 'x']
>>> import numpy as np
>>> eigenvalues = np.linalg.eigvalsh(incidence)
>>> eigenvalues[eigenvalues.argmax()] 
2.98093658810182...
>>> eigenvalues.sort()
>>> ref = np.array([ -1.77960786e+00,  -1.53708297e+00,  -7.06200511e-01, -4.93288006e-16,   1.04195475e+00,   2.98093659e+00])
>>> ref.sort()
>>> np.allclose(eigenvalues, ref)
True

Edge containment

>>> (1, 3) in g
True
>>> (3, 1) in g
True
>>> (3, 8) in g
False

Self-loop edge is OK for checking containment, even though it isn’t a valid edge

>>> trial_edge = 3, 3
>>> trial_edge in g
False
>>> g.add_edge(trial_edge)
Traceback (most recent call last):
  ...
ValueError: invalid attempt to use a self loop on node 3

Remove copies of an edge, and see that the features only change when the edge count drops to zero

>>> e = 0, 3
>>> g.add_edge(e); g.invariants
((0, 3), 3)
{'mad_lower_k': 2800, 'max_degree': 4, 'num_triangles': 3, 'order': 6, 'scan1': 7, 'size': 8}
>>> while e in g: g.remove_edge(e); g.invariants
((0, 3), 2)
{'mad_lower_k': 2800, 'max_degree': 4, 'num_triangles': 3, 'order': 6, 'scan1': 7, 'size': 8}
((0, 3), 1)
{'mad_lower_k': 2800, 'max_degree': 4, 'num_triangles': 3, 'order': 6, 'scan1': 7, 'size': 8}
((0, 3), 0)
{'mad_lower_k': 2400, 'max_degree': 3, 'num_triangles': 1, 'order': 6, 'scan1': 4, 'size': 7}
>>> g.remove_edge(e)
Traceback (most recent call last):
  ...
ValueError: invalid attempt to remove a nonexistent edge (0, 3)

Create a copy of the graph and then structurally change the copy and see that its invariant features have changed

>>> g2 = UndirectedOnlineInvariantsGraph(g)
>>> g2.invariants == g.invariants
True
>>> g2.add_edge(('a', 'b'))
(('a', 'b'), 1)
>>> g2.invariants == g.invariants
False
>>> g3 = UndirectedOnlineInvariantsGraph('bogus initializer')
Traceback (most recent call last):
  ...
TypeError: expected an instance of UndirectedOnlineInvariantsGraph, got a str

An edge can be any pair of immutables that support a full ordering. Note that the full-ordering requirement means you can’t use a frozenset as a node.

>>> g.add_edge(((20, 30), 'alphabet'))
(('alphabet', (20, 30)), 1)

Examples of things an edge cannot be

>>> g.add_edge(1)
Traceback (most recent call last):
  ...
TypeError: expected edge to be a container with exactly two nodes, got an instance of int
>>> g.add_edge([1,2,3])
Traceback (most recent call last):
  ...
ValueError: expected edge to have exactly 2 nodes, got 3
>>> g.add_edge(([1,2], set()))
Traceback (most recent call last):
  ...
TypeError: expected both nodes in the edge to be immutable types, got a list and a set
>>> g.add_edge(((1,2), frozenset('abc')))
Traceback (most recent call last):
  ...
TypeError: expected both nodes in the edge to be immutable, ordered types, got a tuple and a frozenset

Look at some internals

>>> g.invariants['scan1'] == g._scan1_online == g._scan1_brute == 4
True
>>> g.invariants['mad_lower_k'] == g._mad_lower_k == g._mad_lower_k_brute == 2400
True
>>> edges, adjacencies, k1_scans = g._containers
>>> for key in sorted(edges.keys()): print key, edges[key]
(0, 1) 1
(0, 2) 1
(1, 3) 1
(1, 'x') 1
(2, 3) 1
(2, 4) 1
(3, 4) 1
('alphabet', (20, 30)) 1
>>> for key in adjacencies.keys(): print repr(key), sorted(adjacencies[key])
0 [1, 2]
1 [0, 3, 'x']
2 [0, 3, 4]
3 [1, 2, 4]
4 [2, 3]
'alphabet' [(20, 30)]
(20, 30) ['alphabet']
'x' [1]

Remove all the edges

>>> edges = (0, 1), (0, 2), (1, 3), (1, 'x'), (2, 3), (2, 4), (3, 4), ((20, 30), 'alphabet')
>>> for edge in edges:
...   _ = g.remove_edge(edge)
>>> g.invariants == UndirectedOnlineInvariantsGraph().invariants == {'mad_lower_k': 0, 'max_degree': 0, 'num_triangles': 0, 'order': 0, 'scan1': 0, 'size': 0}
True

Make a fully connected graph

>>> len(edges)
8
>>> for e in ((e1, e2) for e1 in edges for e2 in edges if e1 != e2):
...   _ = g.add_edge(e), g.invariants
>>> g.invariants
{'mad_lower_k': 7000, 'max_degree': 7, 'num_triangles': 56, 'order': 8, 'scan1': 28, 'size': 28}
add_edge(edge)

Add the edge to the graph, where edge is a pair of immutables that satisfy a full-ordering relation.

Returns a pair, the normlized edge and the updated multigraph count for the edge. The count will be one if the edge is new to the graph. In this case, the stats for the invariant features will have been updated. If the count is greater than one, then the edge was previously in the graph and the add_edge() call has not altered the invariant features of the graph.

get_invariants()

Method to get the invariants. See invariants property.

incidence_matrix

Property returns a two-item tuple. The first item is a two-dimensional Numpy array of zeros and ones, an incidence matrix for the graph. The second item is a sorted sequence of the nodes of the graph, where the index of each node corresponds to the row and column for that node in the incidence matrix.

invariants

This property returns a dict containing the current values for the observable invariant features of the graph.

remove_edge(edge)

Decrement by one the multigraph count for the edge in the graph. Raises ValueError if the edge is not in the graph.

Returns a pair, the normlized edge and the updated multigraph count for the edge. The count will be zero if the edge itself has been removed from the graph. In this case, the features will have been updated. Otherwise, the count will be positive and the remove_edge() call will not have altered the invariants of the graph.

onyx.graph.dynamicgraph.make_UndirectedOnlineInvariantsGraph(iterable)

Create an UndirectedOnlineInvariantsGraph and initialize it with items from iterable. Each item is an edge, represented as a pair of distinct, ordered invariants specifying the nodes upon which the edge is incident.

Here’s a simple chain of nodes

>>> g1 = make_UndirectedOnlineInvariantsGraph(itertools.izip(xrange(0, 5), xrange(1, 6)))
>>> g1.invariants
{'mad_lower_k': 1667, 'max_degree': 2, 'num_triangles': 0, 'order': 6, 'scan1': 2, 'size': 5}
>>> incidence, map = g1.incidence_matrix
>>> incidence
array([[0, 1, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 0],
       [0, 1, 0, 1, 0, 0],
       [0, 0, 1, 0, 1, 0],
       [0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 1, 0]], dtype=int8)

Here’s an Erdos-Renyi random graph

>>> g2 = make_UndirectedOnlineInvariantsGraph(randomgraph.make_ErdosRenyi(7, 0.25, seed=1).T)
>>> g2.invariants
{'mad_lower_k': 2400, 'max_degree': 3, 'num_triangles': 1, 'order': 7, 'scan1': 4, 'size': 8}
>>> incidence2, _ = g2.incidence_matrix
>>> incidence2
array([[0, 0, 0, 1, 0, 1, 1],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 1],
       [1, 0, 0, 0, 1, 0, 0],
       [1, 0, 1, 0, 1, 0, 0]], dtype=int8)