Onyx logo

Previous topic

onyx.util.filesystem – Filesystem utilities

Next topic

onyx.util.nonparametric – The idea here is that you have an unbounded stream of discrete-space

This Page

onyx.util.iterutils – The objects in this module provide operations on iterable objects.

class onyx.util.iterutils.SaveLast(iterable)

Bases: object

Given iterable, construct a new iterable that yields the same items as iterable. The new iterable has a single attribute, last, which is the most recent item yielded by the new iterable.

>>> saver = SaveLast(xrange(10))
>>> tuple(saver)
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>> saver.last
9
>>> saver = SaveLast(xrange(20))
>>> all(saver.last == x for x in saver)
True
last
next()
onyx.util.iterutils.csv_header_check(csv_iterable, items, search=False)

Given csv_iterable, an iterable over lines of CSV-formatted data, ensure that the fields in the header line are a superset of the fields in items.

Return a generator that yields the header line and then the remaining lines from csv_iterable. This generator is suitable for use as the csvfile iterable to the CSV-file-reading callables in Python’s csv module.

Optional search defaults to False which means that the first line in csv_iterable must be the header line that names all the fields of the CSV file. The set of all fields from the header line must be a superset of items. ValueError will be raised if the set of header fields is not superset of items.

If optional search is True, then lines from csv_iterable are searched until a line with a set of header fields that are a superset of items is found found. ValueError will be raised if no such line is found.

>>> csv_file_str = '''
... # non-header line in csv file
... another such line, just for fun, followed by blank
...
... here comes the desired header
... name,date,amount
... foo,yesterday,+20
... bar,today,30
... baz,tomorrow,-50
... bat,tomorrow etc.,-50.0
... 0,0,0
... '''
>>> items = 'amount', 'name'
>>> for line in csv_header_check(cStringIO.StringIO(csv_file_str), items, search=True): print line,
name,date,amount
foo,yesterday,+20
bar,today,30
baz,tomorrow,-50
bat,tomorrow etc.,-50.0
0,0,0

Error on missing field

>>> items_not = 'amount', 'name', 'address'
>>> csv_header_check(cStringIO.StringIO(csv_file_str), items_not)
Traceback (most recent call last):
  ...
ValueError: no CSV-header-like line found, expecting something like: "amount","name","address"

Error on empty csv_iterable

>>> csv_header_check(EMPTY_ITER, items)
Traceback (most recent call last):
  ...
ValueError: no CSV-header-like line found, expecting something like: "amount","name"
onyx.util.iterutils.csv_itemgetter(csv_iterable, items, search=False)

Return a generator that will yield a list of the selected items from each record of csv_iterable which must yield strings that are in CSV format.

Optional search defaults to False which means that the first line in csv_iterable must be a header line that names all the fields of the CSV file, and the set of fields must be a superset of items. If search is True, then lines from csv_iterable are searched until such a line is found.

ValueError will be raised if the header fields are not a superset of items.

Typical CSV file with first line having header fields

>>> csv_file1 = '''name,date,amount
... foo,yesterday,+20
... bar,today,30
... baz,tomorrow,-50
... bat,tomorrow etc.,-50.0
... 0,0,0
... '''

The same CSV file, but with non-table lines preceding the actual header line

>>> csv_file2 = '''
... # blank lines, comment lines, and other non-header lines, preceding the header and data
...
... a near miss header:
... amount,address,date
... here comes the actual header:
... name,date,amount
... foo,yesterday,+20
... bar,today,30
... baz,tomorrow,-50
... bat,tomorrow etc.,-50.0
... 0,0,0
... '''

The subset of headers that matter to us

>>> items = 'amount', 'name'

Usage on a typical file

>>> tuple(csv_itemgetter(cStringIO.StringIO(csv_file1), items))
((20, 'foo'), (30, 'bar'), (-50, 'baz'), (-50.0, 'bat'), (0, 0))

Use the file with non-table lines.

ValueError because the first line isn’t a header

>>> tuple(csv_itemgetter(cStringIO.StringIO(csv_file2), items))
Traceback (most recent call last):
  ...
ValueError: no CSV-header-like line found, expecting something like: "amount","name"

Allow a search for the header line

>>> tuple(csv_itemgetter(cStringIO.StringIO(csv_file2), items, search=True))
((20, 'foo'), (30, 'bar'), (-50, 'baz'), (-50.0, 'bat'), (0, 0))
class onyx.util.iterutils.iappend(iterable)

Bases: object

Returns an iterable object that yields items from iterable. The iterator will have methods append() and extend() which can be used to add items one at a time or as a group to the end of the iteration. Note that it is easy to use this to create unbounded iterations.

>>> x = iappend(xrange(1, 4))
>>> for i in x:
...   print i
...   if 1 <= i < 4:
...     x.append(10*i)
...     x.extend([None]*i)
...   else:
...      for i in x: print i
1
2
3
10
None
20
None
None
30
None
None
None
append(item)
extend(iterable)
onyx.util.iterutils.idelay(iterable, taps=2)

Given iterable, return a new iterable that yields tuples, each containing taps successive items from iterable. The tuples overlap, such that the first such tuple starts with the first item from iterable, the second tuple starts with the second item from iterable, etc. Non-negative integer taps defaults to 2. For an iterable that yields N items, idelay() will yield N - taps + 1 tuples of taps items each.

This function is useful when you need to do (typically pairwise) work on successive members of a sequence.

Raises TypeError if iterable is not an iterable object, or ValueError if taps is not a non-negative integer.

>>> for x in idelay(xrange(5)): print x
(0, 1)
(1, 2)
(2, 3)
(3, 4)
>>> for x, y, z in idelay(xrange(6), 3): print x, y, z
0 1 2
1 2 3
2 3 4
3 4 5

Edge cases

>>> for x in idelay(xrange(3), 1.): print x
(0,)
(1,)
(2,)

Using taps=0 gives N + 1 empty tuples

>>> for x in idelay(xrange(3), 0): print x
()
()
()
()

Errors

>>> idelay(None)
Traceback (most recent call last):
  ...
TypeError: 'NoneType' object is not iterable
>>> idelay(xrange(3), -1)
Traceback (most recent call last):
  ...
ValueError: expected taps to be a non-negative number, got -1
>>> idelay(xrange(3), 2.125)
Traceback (most recent call last):
  ...
ValueError: expected taps to be an integral value, got 2.125
>>> idelay(xrange(3), 'foo')
Traceback (most recent call last):
  ...
ValueError: expected taps to be an integral type, got 'str'
onyx.util.iterutils.imerge(*iters)

Given iters, zero or more iterable arguments, cycle through the iterables, yielding one item from each, until one of the iterables is exhausted.

>>> tuple(imerge('abc', 'wxyz'))
('a', 'w', 'b', 'x', 'c', 'y')
>>> tuple(imerge('wxyz', 'abc'))
('w', 'a', 'x', 'b', 'y', 'c', 'z')

Edge case

>>> tuple(imerge())
()
onyx.util.iterutils.iter_itemgetter(iterable, items)

Return a generator that constructs and yields a tuple of indexed items from each element of the iterable.

The iterable must yield an item that is indexable by each of the elements of items (a finite, iterable sequence). The returned generator will yield a tuple of the indexed elements.

This function is typically used to select a subset or a permutation of the items in each item from iterable.

Example, showing that the indexing is general enough to use on an iterable stream of dicts, and that an indexing item can be repeated

>>> stream = (dict((('a', x), ('b', 3 * x), ('c', 4 * x + 3))) for x in xrange(5))
>>> tuple(iter_itemgetter(stream, ('c', 'a', 'a')))
((3, 0, 0), (7, 1, 1), (11, 2, 2), (15, 3, 3), (19, 4, 4))
onyx.util.iterutils.iter_numerify(iterable)

Returns a generator that, when possible, converts sequences of strings from iterable into sequences of ints or floats.

Each item from iterable must be a sequence. Each string item in the sequence is converted to an int if possible, or if that fails, a float, or if that fails, is not converted. A tuple of the results is yielded.

>>> seq = ['20', 'foo'], ['+30.', ()], ['-50x', None, True, False, 'foobar']
>>> tuple(iter_numerify(seq))
((20, 'foo'), (30.0, ()), ('-50x', None, True, False, 'foobar'))
onyx.util.iterutils.iunique(iterable)

Returns a generator that yields each unique item from iterable. The items from iterable must be immutable (hashable).

>>> ''.join(iunique('welcome to the machine'))
'welcom thain'
onyx.util.iterutils.lookahead_iter(iterable, lookahead=1)

Returns a new iterable that yields the same items as iterable.

Before returning the new iterable, this function calls next() on iterable immediately, lookahead times, the intention being to run the iterable to trigger gross errors early. Optional lookahead defaults to 1.

>>> tuple(lookahead_iter(xrange(5)))
(0, 1, 2, 3, 4)
>>> tuple(lookahead_iter(xrange(5), 3))
(0, 1, 2, 3, 4)

Motivation

>>> def line_iter(filename):
...   with open(filename, 'rb') as infile:
...     for line in infile: yield line

The problem of the late error

>>> bogus_filename = '__no_such_file__'
>>> stream = line_iter(bogus_filename)

Intervening logic... then the late error

>>> for line in stream: do_something(line)
Traceback (most recent call last):
  ...
IOError: [Errno 2] No such file or directory: '__no_such_file__'

Use lookahead_iter() and get the error right when stream is created

>>> stream = lookahead_iter(line_iter(bogus_filename))
Traceback (most recent call last):
  ...
IOError: [Errno 2] No such file or directory: '__no_such_file__'

Show what’s going on

>>> def printing_xrange(x):
...   for i in xrange(x):
...     print i
...     yield i
>>> i0 = printing_xrange(5)
>>> i1 = lookahead_iter(printing_xrange(5))
0
>>> i2 = lookahead_iter(printing_xrange(5), 2)
0
1
>>> t0 = tuple(i0)
0
1
2
3
4
>>> t0
(0, 1, 2, 3, 4)
>>> tuple(i2) == tuple(i1) == t0
2
3
4
1
2
3
4
True

Test edge cases

>>> tuple(lookahead_iter(()))
()
>>> tuple(lookahead_iter(EMPTY_ITER))
()
>>> tuple(lookahead_iter(xrange(5), 0))
(0, 1, 2, 3, 4)
>>> tuple(lookahead_iter(xrange(5), 10))
(0, 1, 2, 3, 4)
onyx.util.iterutils.numerify(item)

If item is a string, try to return the int or the float interpretation item, otherwise return the item. If item is not a string, return the item

>>> numerify('+30')
30
>>> numerify('-30.')
-30.0
>>> numerify(30.5)
30.5
>>> numerify(True)
True
>>> numerify(())
()