Onyx logo

Previous topic

onyx.textdata.yamldata – Tools for decoding and encoding Onyx text-based data into YAML documents.

Next topic

onyx.textdata.textdata – Support for writing and reading data using the Textdata format

This Page

onyx.textdata.onyxtext – A text format for line-oriented serialization of memory structures

class onyx.textdata.onyxtext.OnyxTextBase

Bases: object

class onyx.textdata.onyxtext.OnyxTextReader(iterable, data_type=None, data_version=None)

Bases: onyx.textdata.onyxtext.OnyxTextBase, onyx.textdata.tdutil.TextParserErrorMixin

A reader for the OnyxText meta-format

This class supports parsing from a stream which produces sequences of string tokens. The intent is to support a particular style of line-oriented text file or stream which is tokenized by some lower level. Five types of representations are supported: simple scalars, lists, arrays, and indexed collections. Each representation type uses a different syntax, as described in the various read_XXX functions. See help on read_scalar, read_singleton, read_list, read_array, and read_indexed_collection for the details on each representation syntax. Also see help on OnyxTextWriter for the details on how to generate these representations.

The iterable argument to the constructor should be a source of sequences to be parsed, the first of which must be a valid OnyxText header sequence. If data_type and/or data_version are not None, they will be checked against the corresponding values in the header sequence.

Generically, the read_XXX functions on the stream object take information which is meant to be verified and/or used in parsing. E.g., specifying name=’foo’ means the name of the parsed entity will be verified to be ‘foo’ and not any other name. The return values are the object name and the object.

>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('name0', '4'), ('name1', '3.14159'))
>>> src[0]
('stream_type', 'OnyxText', 'stream_version', '0', 'data_type', 'test', 'data_version', '0')
>>> ctr0 = OnyxTextReader(src, data_type='test', data_version='0')
>>> ctr0.data_type
'test'
>>> ctr0.data_version
'0'
>>> ctr0.read_scalar(name='name0', rtype=int)
('name0', 4)
>>> ctr0.read_scalar(name='name0', rtype=int)
Traceback (most recent call last):
...
TextdataParseFailure: Expected to read name name0, but read name1
convert_or_raise(value, new_type)
data_type
data_version
next_line(eof_ok=False)

Get the next line from the stream. Errors on EOF unless eof_ok is True, in which case returns None.

raise_parsing_error(err_string)

Clients and subclasses may call this function to raise a consistent error. If the stream has any of the attributes: current_line_number, current_filename, or current_line_contents, additional information will be added to the error string.

read_array(name=None, rtype=<type 'str'>, dim=None, shape=None)

An array is represented as a specification on one line and the array contents on subsequent lines. The specification consists of the name of the array, the token ‘Array’, the dimension of the array, and the sizes of the dimensions. Each subsequent line must contain as many elements as the size of the last dimension, and there must be enough lines to complete the array. Note: this layout is consistent with how Numpy prints arrays. vtype is used to construct and array of the correct type. If name is not none, it will be checked against the name of the array. If dim is not None, it will be checked against the specified dimension of the array. If shape is not None, it must be a tuple of ints and will be checked against the specified shape of the array.

>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('name0', 'Array', 2, 3, 4),(0,1,4,9),(1,2,3,4),(10,20,30,40))
>>> ctr0 = OnyxTextReader(src)
>>> ctr0.read_array('name0', int, 2, (3,4))
('name0', array([[ 0,  1,  4,  9],
       [ 1,  2,  3,  4],
       [10, 20, 30, 40]]))
read_indexed_collection(read_fn, user_data, name=None, header_token_count=None)

An indexed collection is represented as a specification on one line and the indexed items on subsequent lines. The specification consists of the name of the collection, the token ‘IndexedCollection’, the name of the items, and the number of items. read_fn is a callable which takes three arguments, the stream, the user_data and the sequence of header tokens, and returns the object read. Each item is represented as one line consisting of the object name, the index, and optional additional fields, followed by as many lines as necessary to represent the object (which may be 0). read_fn will be called once for each object with the stream and the tokens on the first line as arguments. If name is not none, it will be checked against the name of the collection. If header_token_count is not None, it will be checked against the number of tokens in each header.

>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('collection0', 'IndexedCollection', 'item', '2'),
...        ('item', '0', 'hi'), ('X', '4'), ('Y', '4'), ('MyList', 'List', '4'), ('a', 'b', 'c', 'd'),
...        ('item', '1', 'there'), ('X', '14'), ('Y', '44'), ('MyList', 'List', '4'), ('foo', 'bar', 'baz', 'foobar'))
>>> def reader(s, not_used, tokens):
...    v,x = s.read_scalar('X', int)
...    v,y = s.read_scalar('Y', int)
...    v,l = s.read_list('MyList')
...    return (int(tokens[1]), tokens[2], x, y, l)
>>> ctr0 = OnyxTextReader(src)
>>> ctr0.read_indexed_collection(reader, None, name='collection0', header_token_count=3)
('collection0', ((0, 'hi', 4, 4, ['a', 'b', 'c', 'd']), (1, 'there', 14, 44, ['foo', 'bar', 'baz', 'foobar'])))
read_list(name=None, rtype=<type 'str'>, count=None)

A list is represented as a specification on one line and the list items on the next line. The specification consists of the name of the list, the token ‘List’, and the number of items. If name is not none, it will be checked against the name of the list. The tokens on the second line will be converted to rtype before return.

>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('name0', 'List', '4'),(0,1,4,9))
>>> ctr0 = OnyxTextReader(src)
>>> ctr0.read_list('name0', int)
('name0', [0, 1, 4, 9])
>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('name1', 'List', '1'),("149",))
>>> ctr0 = OnyxTextReader(src)
>>> ctr0.read_list('name1', int, count=1)
('name1', [149])
read_scalar(name=None, rtype=<type 'str'>)

A scalar is represented on one line with two tokens, the name of the scalar and the value. If name is not none, it will be checked against the name of the scalar. The value will be converted to rtype before return.

>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('name0', '4'), ('name1', '3.14159'))
>>> ctr0 = OnyxTextReader(src)
>>> ctr0.read_scalar(rtype=int)
('name0', 4)
>>> n,v = ctr0.read_scalar(rtype=float)
>>> n, float_to_readable_string(v)
('name1', '+(+0001)0x921f9f01b866e')
>>> ctr0 = OnyxTextReader(src)
>>> ctr0.read_scalar('name0', int)
('name0', 4)
>>> ctr0.read_scalar('name1')
('name1', '3.14159')
read_singleton(name=None)

A singleton is represented on one line with three tokens: the fixed token ‘Singleton’, the name of the variable, and the name of the singleton. If name is not none, it will be checked against the name of the variable. The return value will be an object of onyx.util.singleton.Singleton.

>>> src = (OnyxTextReader._make_header_tokens('test', '0'), ('test', 'Singleton', 'onyx.textdata.onyxtext.test'))
>>> ctr0 = OnyxTextReader(iter(src))
>>> n, s = ctr0.read_singleton('test')
>>> s
onyx.util.singleton.Singleton('onyx.textdata.onyxtext.test')
verify_thing(expected, found, what)
verify_token_count(expected, tokens, what)
verify_token_count_min(expected, tokens, what)
class onyx.textdata.onyxtext.OnyxTextWriter

Bases: onyx.textdata.onyxtext.OnyxTextBase

A writer for the OnyxText meta-format

This class provides generation of sequences of string tokens. The intent is to support a particular style of line-oriented text file or stream which is written out by some lower level. Five types of representations are supported: simple scalars, singletons, lists, arrays, and indexed collections. Each representation type uses a different syntax, as described in the various gen_XXX functions. See help on gen_scalar, gen_singleton, gen_list, gen_array, and gen_indexed_collection for the details on each representation syntax. Also see help on OnyxTextReader for the details on how to parse these representations.

Generically, the gen_XXX functions on the writer object take data to be serialized. The return values are generators which will produce the appropriate sequence or sequences for that data.

>>> writer = OnyxTextWriter()
>>> header_gen = writer.gen_header('test', '0')
>>> v1_gen = writer.gen_scalar('var1', 3.14159)
>>> all_out_gen = chain(header_gen, v1_gen)
>>> output = tuple(all_out_gen)
>>> output
(('stream_type', 'OnyxText', 'stream_version', '0', 'data_type', 'test', 'data_version', '0'), ('var1', '3.14159'))
gen_array(name, array)
>>> arr = np.array(range(8), dtype = float).reshape(2,4)
>>> arr
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.]])
>>> ctw0 = OnyxTextWriter()
>>> g = ctw0.gen_array("array0", arr)
>>> tuple(g)
(('array0', 'Array', '2', '2', '4'), ('+(-1023)0x0000000000000', '+(+0000)0x0000000000000', '+(+0001)0x0000000000000', '+(+0001)0x8000000000000'), ('+(+0002)0x0000000000000', '+(+0002)0x4000000000000', '+(+0002)0x8000000000000', '+(+0002)0xc000000000000'))
>>> arr = np.array(range(8), dtype = float).reshape(2,2,2)
>>> arr
array([[[ 0.,  1.],
        [ 2.,  3.]],
<BLANKLINE>
       [[ 4.,  5.],
        [ 6.,  7.]]])
>>> ctw0 = OnyxTextWriter()
>>> g = ctw0.gen_array("array1", arr)
>>> tuple(g)
(('array1', 'Array', '3', '2', '2', '2'), ('+(-1023)0x0000000000000', '+(+0000)0x0000000000000'), ('+(+0001)0x0000000000000', '+(+0001)0x8000000000000'), ('+(+0002)0x0000000000000', '+(+0002)0x4000000000000'), ('+(+0002)0x8000000000000', '+(+0002)0xc000000000000'))
gen_header(data_type, data_version)

Generate a OnyxText header tuple with the given data_type and data_version.

>>> ctw0 = OnyxTextWriter()
>>> g = ctw0.gen_header('test', '0')
>>> tuple(g)
(('stream_type', 'OnyxText', 'stream_version', '0', 'data_type', 'test', 'data_version', '0'),)
gen_indexed_collection(name, obj_name, obj_seq, obj_gen)
>>> def obj_gen(stream, obj):
...    return chain((('info', str(len(obj[2]))),),
...                 stream.gen_scalar("X", obj[0]),
...                 stream.gen_scalar("Y", obj[1]),
...                 stream.gen_list("MyList", obj[2]))
>>> objs = ((3,4,[1,2,3]), (1.2, 2.3, ['a', 'b']))
>>> ctw0 = OnyxTextWriter()
>>> g = ctw0.gen_indexed_collection("collection0", "object", objs, obj_gen)
>>> tuple(g)
(('collection0', 'IndexedCollection', 'object', 2), ('object', '0', 'info', '3'), ('X', '3'), ('Y', '4'), ('MyList', 'List', '3'), ('1', '2', '3'), ('object', '1', 'info', '2'), ('X', '1.2'), ('Y', '2.3'), ('MyList', 'List', '2'), ('a', 'b'))
gen_list(name, the_list)
>>> ctw0 = OnyxTextWriter()
>>> g = ctw0.gen_list("list0", [1.2, 2.3, 3.4])
>>> tuple(g)
(('list0', 'List', '3'), ('1.2', '2.3', '3.4'))
gen_scalar(name, value)
>>> ctw0 = OnyxTextWriter()
>>> g = ctw0.gen_scalar("scalar0", 3.14159)
>>> tuple(g)
(('scalar0', '3.14159'),)
gen_singleton(name, value)
>>> ctw0 = OnyxTextWriter()
>>> s = Singleton('onyx.textdata.onyxtext.test0')
>>> g = ctw0.gen_singleton("test0", s)
>>> tuple(g)
(('test0', 'Singleton', 'onyx.textdata.onyxtext.test0'),)