Onyx logo

Previous topic

onyx.textdata.tdutil – Some low-level utilities for working with textual data files.

Next topic

onyx.textdata.onyxtext – A text format for line-oriented serialization of memory structures

This Page

onyx.textdata.yamldata – Tools for decoding and encoding Onyx text-based data into YAML documents.

A text-based Onyx object is a Yaml sequence of a header and a sequence of lines. The header is a map that describes the type and version of the data. The sequence of lines is the content of the data. Each line is a separate record of white-space separated tokens. The reader, YamldataReader, parses the stream to get the header and the data. The YamldataReader is intended to be used in an iteration context in order to yield a list of the tokens on each line.

An example

>>> doc = '''
... ---
... - __onyx_yaml__meta_version : '1'
...   __onyx_yaml__stream_type : IndexedObjectSet
...   __onyx_yaml__stream_version : '0'
...   __onyx_yaml__stream_options : implicit_index=True
...   
... -
...   # format for IndexedObjectSet, where presence of index field depends on value of implicit_index
...   # [index] module factory args-string
...   - onyx.signalprocessing.spectrum PreEmphasis  3dB=1500*hz
...   - onyx.signalprocessing.window Sliding  length=25000*usec  shift=10000*usec
... ...
...   '''

Examples loading a single document from a string or a stream

Create a reader for this data

>>> reader = YamldataReader(doc, stream_type='IndexedObjectSet', stream_version='0')
>>> x = list(reader)
>>> x
[['onyx.signalprocessing.spectrum', 'PreEmphasis', '3dB=1500*hz'], ['onyx.signalprocessing.window', 'Sliding', 'length=25000*usec', 'shift=10000*usec']]
>>> x == list(YamldataReader(cStringIO.StringIO(doc), stream_type='IndexedObjectSet', stream_version='0'))
True
>>> sorted(reader.keys())
['current_line_contents', 'current_line_number', 'meta_version', 'stream_options', 'stream_type', 'stream_version']
>>> reader.current_line_number
7
>>> reader.current_line_contents
'onyx.signalprocessing.window Sliding  length=25000*usec  shift=10000*usec'
>>> reader
attrdict({'current_line_contents': 'onyx.signalprocessing.window Sliding  length=25000*usec  shift=10000*usec', 'stream_options': 'implicit_index=True', 'stream_version': '0', 'stream_type': 'IndexedObjectSet', 'current_line_number': 7, 'meta_version': '1'})

Example of a string or stream with two documents

>>> doc2 = doc + '''
...
... ---
... - __onyx_yaml__meta_version : '1'
...   __onyx_yaml__stream_type : IndexedObjectSet
...   __onyx_yaml__stream_version : '0'
...   __onyx_yaml__stream_options : implicit_index=True
...   
... -
...   # format for IndexedObjectSet, where presence of index field depends on value of implicit_index
...   # [index] module factory args-string
...   - onyx.signalprocessing.spectrum PreEmphasis  3dB=2500*hz
...   - onyx.signalprocessing.window Sliding  length=25000*usec  shift=10000*usec
... ...
...   '''

A plain old YamldataReader only returns one document, the first in the string or stream

>>> list(YamldataReader(doc2, stream_type='IndexedObjectSet', stream_version='0'))
[['onyx.signalprocessing.spectrum', 'PreEmphasis', '3dB=1500*hz'], ['onyx.signalprocessing.window', 'Sliding', 'length=25000*usec', 'shift=10000*usec']]

To read multiple documents from the string or stream, make a YamldataGenerator

>>> docgen = YamldataGenerator(doc2)

Then, each YamldataReader instance gets the next document. So, get the first document

>>> x = YamldataReader(docgen, stream_type='IndexedObjectSet', stream_version='0')
>>> list(x)
[['onyx.signalprocessing.spectrum', 'PreEmphasis', '3dB=1500*hz'], ['onyx.signalprocessing.window', 'Sliding', 'length=25000*usec', 'shift=10000*usec']]

Get the next document

>>> y = YamldataReader(docgen, stream_type='IndexedObjectSet', stream_version='0')
>>> list(y)
[['onyx.signalprocessing.spectrum', 'PreEmphasis', '3dB=2500*hz'], ['onyx.signalprocessing.window', 'Sliding', 'length=25000*usec', 'shift=10000*usec']]

For documentation on YamldataReader errors and what causes them, see help(yamldata_reader_errors).

class onyx.textdata.yamldata.YamldataBase

Bases: onyx.builtin.attrdict

static build_header_dict(stream_type, stream_version, stream_options=None)
clear

D.clear() -> None. Remove all items from D.

copy()
static fromkeys()

dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v. v defaults to None.

get

D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

has_key

D.has_key(k) -> True if D has a key k, else False

items

D.items() -> list of D’s (key, value) pairs, as 2-tuples

iteritems

D.iteritems() -> an iterator over the (key, value) items of D

iterkeys

D.iterkeys() -> an iterator over the keys of D

itervalues

D.itervalues() -> an iterator over the values of D

keys

D.keys() -> list of D’s keys

pop

D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised

popitem

D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty.

static prefix_header_name(header)
static prefix_header_names(headers)
setdefault

D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

update(E, **kwargs)

D.update(E, **F) -> None. Update D from dict/iterable E and F. If E has a .keys() method, does: for k in E: D[k] = E[k] If E lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values

D.values() -> list of D’s values

viewitems

D.viewitems() -> a set-like object providing a view on D’s items

viewkeys

D.viewkeys() -> a set-like object providing a view on D’s keys

viewvalues

D.viewvalues() -> an object providing a view on D’s values

class onyx.textdata.yamldata.YamldataGenerator(instream)

Bases: object

next()
class onyx.textdata.yamldata.YamldataReader(instream, stream_type=None, stream_version=None, no_stream_options=False, header_only=False)

Bases: onyx.textdata.yamldata.YamldataBase

This object is an attrdict for access to the header fields of the Yamldata. It is also a one-shot iterator over the contents of the Yamldata, if any, where each yield is a list of the white-space separated tokens on the next non-empty item in the Yamldata.

static build_header_dict(stream_type, stream_version, stream_options=None)
clear

D.clear() -> None. Remove all items from D.

copy()
static fromkeys()

dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v. v defaults to None.

get

D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

has_key

D.has_key(k) -> True if D has a key k, else False

items

D.items() -> list of D’s (key, value) pairs, as 2-tuples

iteritems

D.iteritems() -> an iterator over the (key, value) items of D

iterkeys

D.iterkeys() -> an iterator over the keys of D

itervalues

D.itervalues() -> an iterator over the values of D

keys

D.keys() -> list of D’s keys

next()
pop

D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised

popitem

D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty.

static prefix_header_name(header)
static prefix_header_names(headers)
setdefault

D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

update(E, **kwargs)

D.update(E, **F) -> None. Update D from dict/iterable E and F. If E has a .keys() method, does: for k in E: D[k] = E[k] If E lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values

D.values() -> list of D’s values

viewitems

D.viewitems() -> a set-like object providing a view on D’s items

viewkeys

D.viewkeys() -> a set-like object providing a view on D’s keys

viewvalues

D.viewvalues() -> an object providing a view on D’s values

class onyx.textdata.yamldata.YamldataWriter(outstream, stream_type, stream_version, stream_options=None)

Bases: onyx.textdata.yamldata.YamldataBase

>>> stream0 = cStringIO.StringIO()
>>> yw0 = YamldataWriter(stream0, 'MyType', '0')
>>> src = (('var0', 0), ('var1', 1))
>>> yw0.write_document(src)
>>> print stream0.getvalue()
---
- __onyx_yaml__stream_version: "0"
  __onyx_yaml__meta_version: "1"
  __onyx_yaml__stream_type: "MyType"
- - var0 0
  - var1 1
<BLANKLINE>
>>> stream0.seek(0)
>>> yr = YamldataReader(stream0)
>>> body = list(yr)
>>> stream1 = cStringIO.StringIO()
>>> yw1 = YamldataWriter(stream1, 'MyType', '0')
>>> yw1.write_document(body)
>>> stream1.getvalue() == stream0.getvalue()
True

Add a few more documents to this stream

>>> yw1.write_document(body)
>>> yw1.write_document(body)
>>> print stream1.getvalue()
---
- __onyx_yaml__stream_version: "0"
  __onyx_yaml__meta_version: "1"
  __onyx_yaml__stream_type: "MyType"
- - var0 0
  - var1 1
---
- __onyx_yaml__stream_version: "0"
  __onyx_yaml__meta_version: "1"
  __onyx_yaml__stream_type: "MyType"
- - var0 0
  - var1 1
---
- __onyx_yaml__stream_version: "0"
  __onyx_yaml__meta_version: "1"
  __onyx_yaml__stream_type: "MyType"
- - var0 0
  - var1 1
<BLANKLINE>
static build_header_dict(stream_type, stream_version, stream_options=None)
clear

D.clear() -> None. Remove all items from D.

copy()
static fromkeys()

dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v. v defaults to None.

get

D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

has_key

D.has_key(k) -> True if D has a key k, else False

items

D.items() -> list of D’s (key, value) pairs, as 2-tuples

iteritems

D.iteritems() -> an iterator over the (key, value) items of D

iterkeys

D.iterkeys() -> an iterator over the keys of D

itervalues

D.itervalues() -> an iterator over the values of D

keys

D.keys() -> list of D’s keys

output_as_yaml_doc(body)
pop

D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised

popitem

D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty.

static prefix_header_name(header)
static prefix_header_names(headers)
setdefault

D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

update(E, **kwargs)

D.update(E, **F) -> None. Update D from dict/iterable E and F. If E has a .keys() method, does: for k in E: D[k] = E[k] If E lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values

D.values() -> list of D’s values

viewitems

D.viewitems() -> a set-like object providing a view on D’s items

viewkeys

D.viewkeys() -> a set-like object providing a view on D’s keys

viewvalues

D.viewvalues() -> an object providing a view on D’s values

write_document(iterable)

Write a single Yamldata document to the outstream. iterable should produce sequences which will each be converted to one line of space-separated tokens. This function may be called multiple times to produce streams with multiple documents.

onyx.textdata.yamldata.yamldata_reader_errors()
>>> YamldataReader("")
Traceback (most recent call last):
...
StopIteration
>>> YamldataReader("1")
Traceback (most recent call last):
...
ValueError: no header found in Yamldata stream; are you sure this is a Yamldata source?
>>> YamldataReader("a")
Traceback (most recent call last):
...
ValueError: bad header structure in Yamldata stream: [a] - are you sure this is a Yamldata source?
>>> YamldataReader("- __onyx_yaml__meta_version : '1' \n- - foo\n- - bar ")
Traceback (most recent call last):
...
ValueError: bad document structure in Yamldata stream, expected 1 or 2 sub-parts, got 3
>>> YamldataReader("- __onyx_yaml__meta_version : '1' \n- - foo", header_only=True)
Traceback (most recent call last):
...
ValueError: bad document structure in Yamldata stream, expected only a header
>>> YamldataReader("- __onyx_yaml__meta_version : '1'", header_only=False)
Traceback (most recent call last):
...
ValueError: bad document structure in Yamldata stream, expected both a header and a body

Some error cases where the well-formatted data doesn’t match the code’s expectations:

>>> doc = '''
... ---
... - __onyx_yaml__meta_version : '1'
...   __onyx_yaml__stream_type : IndexedObjectSet
...   __onyx_yaml__stream_version : '0'
...   __onyx_yaml__stream_options : implicit_index=True
...   
... -
...   # format for IndexedObjectSet, where presence of index field depends on value of implicit_index
...   # [index] module factory args-string
...   - onyx.signalprocessing.spectrum PreEmphasis  3dB=1500*hz
...   - onyx.signalprocessing.window Sliding  length=25000*usec  shift=10000*usec
... ...
...   '''
>>> reader = YamldataReader(doc, stream_type='BigObject', stream_version='0')
Traceback (most recent call last):
   ...
ValueError: unexpected stream_type in Yamldata stream: expected 'BigObject', got 'IndexedObjectSet'
>>> reader = YamldataReader(doc, stream_type='IndexedObjectSet', stream_version='0.1')
Traceback (most recent call last):
   ...
ValueError: unexpected stream_version in Yamldata stream: expected '0.1', got '0'
>>> reader = YamldataReader(doc, stream_type='IndexedObjectSet', stream_version='0', no_stream_options=True)
Traceback (most recent call last):
   ...
ValueError: unexpected presence of stream_options in header: 'implicit_index=True'
>>> YamldataReader("- __onyx_yaml__meta_version : '0' \n  __onyx_yaml__stream_type : MyType  \n  __onyx_yaml__stream_version : '0'  \n- - foo \n  - bar")
Traceback (most recent call last):
   ...
ValueError: unexpected meta_version in Yamldata stream: expected 1, got 0

Some structural error cases:

>>> YamldataReader("- __onyx_yaml__meta_version : '1' \n- - foo ")
Traceback (most recent call last):
   ...
ValueError: missing the following required headers in Yamldata stream: '__onyx_yaml__stream_type' '__onyx_yaml__stream_version'
>>> YamldataReader("- __onyx_yaml__meta_version : '1' \n  __onyx_yaml__stream_type : MyType  \n  __onyx_yaml__stream_version : '0'  \n  __onyx_yaml__bogus_header : bogus header value  \n- - foo \n  - bar")
Traceback (most recent call last):
   ...
ValueError: unexpected headers in Yamldata stream: '__onyx_yaml__bogus_header'

This concludes testing for this module