Onyx logo

Previous topic

Dataflow

Next topic

onyx.dataflow.combine_processor – Creating network of processors and useful processor wrappers.

This Page

onyx.dataflow.cfg – Work with CFG grammars.

class onyx.dataflow.cfg.ExplicitCfgBuilder

Bases: object

A builder for CFG grammars. This supports an explicit model of specifying a CFG by naming each terminal and non-terminal before it is used in any productions, and by, optionally, specifying the starting non-terminal(s).

>>> builder = ExplicitCfgBuilder()
>>> builder.has_terminal(None)
False
>>> builder.has_non_terminal(None)
False

Put in some terminal items with checking that they’re not already there

>>> builder.add_new_terminal(-1)
>>> builder.add_new_terminal(-2)

Can’t use add_new_terminal for an existing item

>>> builder.add_new_terminal(-2)
Traceback (most recent call last):
  ...
ValueError: item is already a terminal: -2

But you can use the less careful add_terminal

>>> builder.add_terminal(-2)

Similarly, put in some non_terminal items with checking that they’re not already there

>>> builder.add_new_non_terminal(1)
>>> builder.add_new_non_terminal(2)

Can’t use add_new_non_terminal for an existing item

>>> builder.add_new_non_terminal(1)
Traceback (most recent call last):
  ...
ValueError: item is already a non-terminal: 1

But you can use the less careful add_non_terminal

>>> builder.add_non_terminal(2)

Careful or not, the methods keep the terminal and non-terminal sets disjoint

>>> builder.add_terminal(2)
Traceback (most recent call last):
  ...
ValueError: item is already a non-terminal: 2
>>> builder.add_non_terminal(-2)
Traceback (most recent call last):
  ...
ValueError: item is already a terminal: -2
add_new_non_terminal(non_terminal)

Introduce a non_terminal to the CFG. A ValueError is raised if the non_terminal token is already in the CFG as either a terminal or a non-terminal.

add_new_terminal(terminal)

Introduce a terminal to the CFG. A ValueError is raised if the terminal token is already in the CFG as either a terminal or a non-terminal.

add_non_terminal(non_terminal)

Ensure that the non_terminal is in the CFG. A ValueError is raised if the non_terminal is already in the CFG as a terminal.

add_terminal(terminal)

Ensure that the terminal is in the CFG. A ValueError is raised if the terminal is already in the CFG as a non-terminal.

has_non_terminal(non_terminal)
has_terminal(terminal)
class onyx.dataflow.cfg.Grammar(grammar_sets, debug=False)

Bases: object

A CFG grammar object, initialized from a GrammarTables instance. Can be used to generate random sentences from the language.

A simple grammar

>>> g0 = Grammar(parse_grammar(cStringIO.StringIO(_tiny_cfg)), debug=False)
>>> for line in g0.str_iter: print line
__terminals__
A
B
__productions__
abab A B A B
__starts__
abab
>>> result = ' '.join(item for item in g0.gen(-1, debug=False))
>>> print result
A B A B

A binary number grammar

>>> g1 = Grammar(parse_grammar(cStringIO.StringIO(_binary_cfg)))

Canonical text-based representation

>>> for line in g1.str_iter: print line
__terminals__
0
1
__productions__
binary_digit 1
binary_digit 0
binary_number binary_digit
binary_number binary_number binary_digit
__starts__
binary_number

A BNF-style representation

>>> for line in g1.bnf_iter: print line
binary_digit ::= "1"
               | "0"
binary_number ::= binary_digit
                | binary_number binary_digit

Run the simple guy until five sentential forms have been seen

>>> result = ' '.join(item for item in g1.gen(5, seed=1))
>>> print result
1 0 0 1 1

Run a more complicate grammar until 50 sentential forms have been seen

>>> g2 = Grammar(parse_grammar(cStringIO.StringIO(_hello_world_cfg)))
>>> result = ' '.join(item for item in g2.gen(25, seed=81))
>>> print result
H J F M 4 U J 7 D 1 P N P V Y G P E + ( 7 5 5 * R 4 R J E * 8 + N N 2 D 8 X Y J O Q 4 0 S W T T 8 L ) + G V 3 M R E

Ambigous grammar

>>> g3 = Grammar(parse_grammar(cStringIO.StringIO(_ambiguous_cfg)))
>>> #g3 = Grammar(parse_grammar(cStringIO.StringIO(_ambiguous_cfg)), debug=True)
>>> # for line in g3.bnf_iter: print line
>>> result = ' '.join(item for item in g3.gen(2, seed=4))
>>> #result = ' '.join(item for item in g3.gen(2, seed=4, debug=True))
>>> print result
IF FOO THEN IF FOO THEN EXPR ELSE EXPR

Copy construction of the simple guy

>>> Grammar(g1).grammar_sets == g1.grammar_sets
True

Augment the productions so as to have a shared lhs and a shared rhs, also use an epsilon, and have multiple starts, and turn on debugging spew

>>> terminals, non_terminals, productions, starts = g1.grammar_sets
>>> g3 = Grammar(GrammarTables(terminals, non_terminals + ('zero', 'zero1', 'list_of_zero'), productions + (('zero', ('0',)), ('zero1', ('0',)), ('list_of_zero', ()), ('list_of_zero', ('list_of_zero', 'zero'))), starts + ('list_of_zero',)), debug=True)    
productions: ((0, ('binary_digit', ('0',))), (1, ('binary_digit', ('1',))), (2, ('binary_number', ('binary_digit',))), (3, ('binary_number', ('binary_number', 'binary_digit'))), (4, ('list_of_zero', ())), (5, ('list_of_zero', ('list_of_zero', 'zero'))), (6, ('zero', ('0',))), (7, ('zero1', ('0',))))
starts: ('binary_number', 'list_of_zero')
terminal_by_id: ((0, '0'), (1, '1'))
terminal by integer: ((-2, '0'), (-3, '1'))
non_terminal_by_id: ((0, 'binary_digit'), (1, 'binary_number'), (2, 'list_of_zero'), (3, 'zero'), (4, 'zero1'))
prod1: ((0, (-3, -1)), (0, (-2, -1)), (1, (0, -1)), (1, (1, 0, -1)), (2, (-1,)), (2, (2, 3, -1)), (3, (-2, -1)), (4, (-2, -1)))
rhs_by_id: ((0, (-3, -1)), (1, (-2, -1)), (2, (-1,)), (3, (0, -1)), (4, (1, 0, -1)), (5, (2, 3, -1)))
rhs uniqueness: 6 of 8 : 0.75
prod2: ((0, 0), (0, 1), (1, 3), (1, 4), (2, 2), (2, 5), (3, 1), (4, 1))
ruleset_by_id: ((0, (0, 1)), (1, (1,)), (2, (2, 5)), (3, (3, 4)))
ruleset uniqueness: 4 of 5 : 0.8
self.ruleset_id_by_lhs_id: ((0, 0), (1, 3), (2, 2), (3, 1), (4, 1))
starts: (1, 2)

Watch the internals of this new guy

>>> result = ' '.join(item for item in g3.gen(4, seed=1, debug=True))
start_states: ('binary_number', 'list_of_zero')
start_ids: (1, 2)
start_rhs_ids: (6, 7)
rhs_by_id: ((0, (-3, -1)), (1, (-2, -1)), (2, (-1,)), (3, (0, -1)), (4, (1, 0, -1)), (5, (2, 3, -1)), (6, (1, -1)), (7, (2, -1)))
start_ruleset_id: 4
ruleset_by_id: ((0, (0, 1)), (1, (1,)), (2, (2, 5)), (3, (3, 4)), (4, (6, 7)))
start_lhs_id: 5
ruleset_id_by_lhs_id: ((0, 0), (1, 3), (2, 2), (3, 1), (4, 1), (5, 4))
non_terminal_by_id: ((0, 'binary_digit'), (1, 'binary_number'), (2, 'list_of_zero'), (3, 'zero'), (4, 'zero1'), (5, '<start_state>'))
<BLANKLINE>
predicting:   <start_state>
prediction:     (5, 6, 0, 0)  <start_state> ( . binary_number ) 0
prediction:     (5, 7, 0, 0)  <start_state> ( . list_of_zero ) 0
<BLANKLINE>
k: 0
predicting:   list_of_zero
prediction:     (2, 2, 0, 0)  list_of_zero ( . ) 0    from: (5, 7, 0, 0)  <start_state> ( . list_of_zero ) 0
prediction:     (2, 5, 0, 0)  list_of_zero ( . list_of_zero zero ) 0    from: (5, 7, 0, 0)  <start_state> ( . list_of_zero ) 0
predicting:   list_of_zero
prediction: +   (2, 2, 0, 0)  list_of_zero ( . ) 0    from: (2, 5, 0, 0)  list_of_zero ( . list_of_zero zero ) 0
prediction: +   (2, 5, 0, 0)  list_of_zero ( . list_of_zero zero ) 0    from: (2, 5, 0, 0)  list_of_zero ( . list_of_zero zero ) 0
predicting:   binary_number
prediction:     (1, 3, 0, 0)  binary_number ( . binary_digit ) 0    from: (5, 6, 0, 0)  <start_state> ( . binary_number ) 0
prediction:     (1, 4, 0, 0)  binary_number ( . binary_number binary_digit ) 0    from: (5, 6, 0, 0)  <start_state> ( . binary_number ) 0
predicting:   binary_number
prediction: +   (1, 3, 0, 0)  binary_number ( . binary_digit ) 0    from: (1, 4, 0, 0)  binary_number ( . binary_number binary_digit ) 0
prediction: +   (1, 4, 0, 0)  binary_number ( . binary_number binary_digit ) 0    from: (1, 4, 0, 0)  binary_number ( . binary_number binary_digit ) 0
predicting:   binary_digit
prediction:     (0, 0, 0, 0)  binary_digit ( . "1" ) 0    from: (1, 3, 0, 0)  binary_number ( . binary_digit ) 0
prediction:     (0, 1, 0, 0)  binary_digit ( . "0" ) 0    from: (1, 3, 0, 0)  binary_number ( . binary_digit ) 0
completing:   list_of_zero  (2, 2, 0, 0)  list_of_zero ( . ) 0
completion:     (2, 5, 1, 0)  list_of_zero ( list_of_zero . zero ) 0    from: (2, 2, 0, 0)  list_of_zero ( . ) 0
completion:     (5, 7, 1, 0)  <start_state> ( list_of_zero . ) 0    from: (2, 2, 0, 0)  list_of_zero ( . ) 0
completing: * <start_state>  (5, 7, 1, 0)  <start_state> ( list_of_zero . ) 0
predicting:   zero
prediction:     (3, 1, 0, 0)  zero ( . "0" ) 0    from: (2, 5, 1, 0)  list_of_zero ( list_of_zero . zero ) 0
scanset: 0 1
yielding: 1
<BLANKLINE>
k: 1
scanning:    "1"
scanned:        (0, 0, 1, 0)  binary_digit ( "1" . ) 0
completing:   binary_digit  (0, 0, 1, 0)  binary_digit ( "1" . ) 0
completion:     (1, 3, 1, 0)  binary_number ( binary_digit . ) 0    from: (0, 0, 1, 0)  binary_digit ( "1" . ) 0
completing:   binary_number  (1, 3, 1, 0)  binary_number ( binary_digit . ) 0
completion:     (5, 6, 1, 0)  <start_state> ( binary_number . ) 0    from: (1, 3, 1, 0)  binary_number ( binary_digit . ) 0
completion:     (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0    from: (1, 3, 1, 0)  binary_number ( binary_digit . ) 0
predicting:   binary_digit
prediction:     (0, 0, 0, 1)  binary_digit ( . "1" ) 1    from: (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0
prediction:     (0, 1, 0, 1)  binary_digit ( . "0" ) 1    from: (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0
completing: * <start_state>  (5, 6, 1, 0)  <start_state> ( binary_number . ) 0
scanset: 0 1
yielding: 0
<BLANKLINE>
k: 2
scanning:    "0"
scanned:        (0, 1, 1, 1)  binary_digit ( "0" . ) 1
completing:   binary_digit  (0, 1, 1, 1)  binary_digit ( "0" . ) 1
completion:     (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0    from: (0, 1, 1, 1)  binary_digit ( "0" . ) 1
completing:   binary_number  (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0
completion:     (5, 6, 1, 0)  <start_state> ( binary_number . ) 0    from: (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0
completion:     (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0    from: (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0
completing: * <start_state>  (5, 6, 1, 0)  <start_state> ( binary_number . ) 0
predicting:   binary_digit
prediction:     (0, 0, 0, 2)  binary_digit ( . "1" ) 2    from: (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0
prediction:     (0, 1, 0, 2)  binary_digit ( . "0" ) 2    from: (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0
scanset: 0 1
yielding: 0
<BLANKLINE>
k: 3
scanning:    "0"
scanned:        (0, 1, 1, 2)  binary_digit ( "0" . ) 2
completing:   binary_digit  (0, 1, 1, 2)  binary_digit ( "0" . ) 2
completion:     (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0    from: (0, 1, 1, 2)  binary_digit ( "0" . ) 2
completing:   binary_number  (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0
completion:     (5, 6, 1, 0)  <start_state> ( binary_number . ) 0    from: (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0
completion:     (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0    from: (1, 4, 2, 0)  binary_number ( binary_number binary_digit . ) 0
completing: * <start_state>  (5, 6, 1, 0)  <start_state> ( binary_number . ) 0
predicting:   binary_digit
prediction:     (0, 0, 0, 3)  binary_digit ( . "1" ) 3    from: (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0
prediction:     (0, 1, 0, 3)  binary_digit ( . "0" ) 3    from: (1, 4, 1, 0)  binary_number ( binary_number . binary_digit ) 0
<BLANKLINE>
completed: countdown
>>> print result
1 0 0
bnf_iter
gen(countdown, seed=None, debug=False)

Returns a generator yielding terminal tokens giving a random sentence in the language.

If countdown is positive the iteration will stop after the generated sentence contains countdown sub-sentences.

countdown completions have occured (where a completion is an opaque internal occurence in the parser that indicates a sentential prefix has been observed). If countdown is negative the iterator will stop when the end of the grammar is reached, which can easily be never for non-trivial grammars.

If seed is given and is not None it is used to seed the randomness, giving reproducible results.

If debug is not False, the verbose activity of the parser will be printed.

grammar_sets
integer_to_symbol(id, terminal_quote='', completion_symbol='')
str_iter
symbol_to_integer(symbol)
static symetrical_coder(id)
class onyx.dataflow.cfg.GrammarTables(terminals, non_terminals, productions, starts=())

Bases: object

Minimal transport and checking for a grammar. Four components of a grammar, each as a sorted tuple of a set of immutable items.

Constructor takes three or four iterable arguments: the set of terminals, the set of non_terminals, the set of productions, and the set of start states. The terminal and non_terminal sets must be disjoint. Each production is a two-item sequence (lhs, rhs) where lhs is a non_terminal and rhs is a sequence of terminals and non_terminals. The start_states, if any, must be non_terminals. All items must be immutable (hashable) objects.

>>> g = GrammarTables(('a', 'b'), ('AB',), (('AB', ('a', 'b')), ('AB', ('a', 'AB', 'b'))))
>>> tuple(len(x) for x in g)
(2, 1, 2, 0)
>>> g = GrammarTables(('a', 'b'), ('AB',), (('AB', ('a', 'b')), ('AB', ('a', 'AB', 'b'))), ('AB',))
>>> tuple(len(x) for x in g)
(2, 1, 2, 1)
data
class onyx.dataflow.cfg.ImplicitCfgBuilder(arg=())

Bases: object

A builder for CFG grammars. This supports the simple model of specifying a CFG by building it up one production at a time.

In this simple model terminals and epsilons are implicit, and no starting non-terminals are specified. A terminal is a right-hand-side (rhs) items which does not appear on the left-hand-side (lhs) of any production. An epsilon is a production with an empty rhs.

XXX consider implementing that the first production to be added be the start state... this would imply some sort of ordering semantics....

>>> builder = ImplicitCfgBuilder()

Names for terminals

>>> builder.add_production('zero', [0])
>>> builder.add_production('one', [1])

Give a name to an epsilon production

>>> builder.add_production('epsilon')

Components of binary numbers

>>> builder.add_production('binary_digit', ['zero'])
>>> builder.add_production('binary_digit', ['one'])

Binary number

>>> builder.add_production('binary_number', ['binary_digit'])
>>> builder.add_production('binary_number', ['binary_number','binary_digit'])

Possibly empty sequence of binary numbers

>>> builder.add_production('binary_numbers_list', ['epsilon'])
>>> builder.add_production('binary_numbers_list', ['binary_numbers_list', 'binary_number'])

Look inside

>>> tuple(builder)
(('binary_digit', ('one',)), ('binary_digit', ('zero',)), ('binary_number', ('binary_digit',)), ('binary_number', ('binary_number', 'binary_digit')), ('binary_numbers_list', ('binary_numbers_list', 'binary_number')), ('binary_numbers_list', ('epsilon',)), ('epsilon', ()), ('one', (1,)), ('zero', (0,)))

Constructor accepts the iteration to create a copy of the builder

>>> builder2 = ImplicitCfgBuilder(builder)

This duplicate rhs won’t change the object)

>>> builder2.add_production('binary_numbers_list', ['binary_numbers_list', 'binary_number'])

This new rhs will

>>> builder2.add_production('binary_number_pair', ['binary_number', 'binary_number'])
>>> tuple(builder2)
(('binary_digit', ('one',)), ('binary_digit', ('zero',)), ('binary_number', ('binary_digit',)), ('binary_number', ('binary_number', 'binary_digit')), ('binary_number_pair', ('binary_number', 'binary_number')), ('binary_numbers_list', ('binary_numbers_list', 'binary_number')), ('binary_numbers_list', ('epsilon',)), ('epsilon', ()), ('one', (1,)), ('zero', (0,)))

The grammar_sets() method returns the transportable, frozen view of the grammar

>>> tuple(builder2.grammar_sets())
((0, 1), ('binary_digit', 'binary_number', 'binary_number_pair', 'binary_numbers_list', 'epsilon', 'one', 'zero'), (('binary_digit', ('one',)), ('binary_digit', ('zero',)), ('binary_number', ('binary_digit',)), ('binary_number', ('binary_number', 'binary_digit')), ('binary_number_pair', ('binary_number', 'binary_number')), ('binary_numbers_list', ('binary_numbers_list', 'binary_number')), ('binary_numbers_list', ('epsilon',)), ('epsilon', ()), ('one', (1,)), ('zero', (0,))), ())

Here’s the TypeError if the rhs sequence has a mutable item in it

>>> builder2.add_production('binary_number_triple', ['binary_number', 'binary_number', range(3)]) 
Traceback (most recent call last):
  ...
TypeError: ...unhashable...
add_production(lhs, rhs=())

Add a production to the CFG, where lhs is the non-terminal for the production and rhs, if given, is an iterable of non-terminals and terminals making up the production. If rhs is not given or is an empty iterable then the production is an epsilon production.

All non-terminals and terminals must be immutable objects.

grammar_sets()

Return a tuple of the three sets comprising the grammar: (terminals, non-terminals, productions) where each production is a two-item tuple consisting of the non-terminal lhs and the sequence of non-terminals and terminals that is the rhs.

The return values are suitable for use as the constructor arguments for the GrammarTables object.

onyx.dataflow.cfg.parse_grammar(stream)

Simple line-based parser for grammars, returns a GrammarTables instance.

>>> g = parse_grammar(cStringIO.StringIO(_binary_cfg))

Counts of terminals, non-terminals, productions,starts

>>> tuple(len(x) for x in g)
(2, 2, 4, 1)
>>> g = parse_grammar(cStringIO.StringIO(_hello_world_cfg))
>>> tuple(len(x) for x in g)
(41, 19, 64, 1)
onyx.dataflow.cfg.until(stream, value, flatten=False)