`onyx.dataflow.simplecfg` – Tools for working with CFG grammars.¶

class onyx.dataflow.simplecfg.Cfg¶

Bases: object

The is obsolete but still contains some useful ideas looking towards CFG-based lattice transduction. And, it can handle nullable non-terminals!

add_production(lhs, rhs)¶

fsa(starts)¶

gen()¶

class onyx.dataflow.simplecfg.CfgBuilder(prod_iter=None)¶

Bases: object

A simple builder for CFG grammars.

The CfgBuilder provides two set-style methods for adding productions: add_production() adds a single production for a non-terminal to the grammar; update_production() adds a sequence of productions for a non-terminal to the grammar.

CfgBuilder supports containment queries via Python’s in syntax, returning True if the given symbol is either a non-terminal or a terminal that appears in the grammar. This allows a client to ensure that non-terminals or terminals that are planned to be added are either new to the grammar or already exist in the grammar.

If prod_iter is not None, it must be an iterable of pairs (lhs, rhs) which will be used to initialize the grammar. See add_production().

>>> pairs0 = (('A', ('x', 'y', 'z')), ('B', ()), ('B', ('b',)), ('B', ('B', 'b')), ('C', ('A',)), ('C', ('B',)))
>>> builder = CfgBuilder(pairs0)
>>> builder.size
12

Because FrozenCfg can act as an iterable source of pairs, it can be used to initialize a CfgBuilder.

>>> cfg0 = FrozenCfg(builder)
>>> builder1 = CfgBuilder(cfg0)
>>> cfg1 = FrozenCfg(builder1)
>>> sorted(list(cfg0))
[('A', ('x', 'y', 'z')), ('B', ()), ('B', ('B', 'b')), ('B', ('b',)), ('C', ('A',)), ('C', ('B',))]

>>> sorted(list(cfg0)) == sorted(list(cfg1))
True

Build a grammar which uses the terminal symbols of the previous grammar as its nonterminals

>>> pairs1 = (('x', (1, 2, 3)), ('y', ()), ('b', (5,)))
>>> builder2 = CfgBuilder(pairs1)
>>> cfg2 = FrozenCfg(builder2)

Two grammars can be ‘composed’ by adding the productions of each to a single builder. This relies on one grammar using the terminals of of the other as its nonterminals.

>>> builder3 = CfgBuilder(chain(cfg1, cfg2))
>>> builder3.size
20

>>> cfg3 = FrozenCfg(builder3)
>>> sorted(list(cfg3))
[('A', ('x', 'y', 'z')), ('B', ()), ('B', ('B', 'b')), ('B', ('b',)), ('C', ('A',)), ('C', ('B',)), ('b', (5,)), ('x', (1, 2, 3)), ('y', ())]

>>>

add_production(lhs, rhs)¶

Add a production to the CFG. The lhs argument is an immutable object that is the left-hand-side (non-terminal) for the production. The rhs argument is a possibly-empty iterable of immutable objects (symbols) that are the sequence of non-terminals and terminals that make up the right-hand-side of the production. An empty rhs is used to add an epsilon production. The productions for a given non-terminal are treated as a set; this means that duplicate right-hand sides are ignored. See also update_production().

>>> builder = CfgBuilder()
>>> builder.size
0
>>> builder.add_production('A', ('x', 'y', 'zoo'))
>>> builder.size
4
>>> builder.add_production('B', ())
>>> builder.size
6
>>> builder.add_production('B', ('b',))
>>> builder.size
7
>>> builder.add_production('B', ('B', 'b'))
>>> builder.add_production('Cows', ('A',))
>>> builder.add_production('Cows', ('B',))
>>> builder.size
12
>>> builder.add_production('Cows', ('B',))
>>> builder.size
12
>>> 'A' in builder
True
>>> 'zoo' in builder
True
>>> 'Moo' in builder
False

size¶: The size of the grammar. We follow Robert Moore in calculating the size of the grammar: the size is the number of non-terminal symbols plus the sum of the lengths of the right-hand-side sequences over all the productions in the grammar. By counting each non-terminal just once, instead of once for each of its productions, this size statistic more closely tracks storage requirements of actual implementations of grammar structures. An empty right-hand-side sequence (epsilon) is counted has having length one.

update_production(lhs, rhs_set)¶

Add a set of productions to the CFG. The lhs argument is an immutable object that is the left-hand-side (non-terminal) for each of the productions. The rhs_set argument is a possibly-empty iterable of rhs sequences. Each rhs sequence is an iterable of immutable objects (symbols) that are the sequence of non-terminals and terminals that make up the right-hand-side of the given production. An empty rhs is used to add an epsilon production. The productions for a given non-terminal are treated as a set; this means that duplicate right-hand sides are ignored. See also add_production().

>>> builder = CfgBuilder()
>>> builder.add_production('A', ('x', 'y', 'zoo'))
>>> builder.update_production('B', ((), ('b',), ('B', 'b')))
>>> builder.size
9
>>> builder.update_production('Cows', (('A',), ('B',)))
>>> builder.size
12
>>> builder.add_production('Cows', ('A',))
>>> builder.size
12
>>> builder.update_production('Cows', (('B',), ('A',), ))
>>> builder.size
12
>>> 'A' in builder, 'zoo' in builder, 'Moo' in builder
(True, True, False)

class onyx.dataflow.simplecfg.CfgDecoderBase(arg)¶

Bases: object

A baseclass for objects that use a Cfg. This baseclass analyzes the productions and creates the static structures necessary to decode against the Cfg represented by productions. It is intended to be subclassed for particular use cases.

>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('B', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('B',))

>>> x = CfgDecoderBase(b)
>>> x.dump()
productions:
  A :  x  y  z
  B :  
  B :  B  b
  B :  b
  C :  A
  C :  B
terminals:   b x y z
has_epsilon: True
nullable:   B C
fsa_set:
     A ( . x y z )
     A ( x . y z )
     A ( x y . z )
  -  B ( . )
     B ( . B b )
     B ( B . b )
     B ( . b )
     C ( . A )
     C ( . B )
  +  b ( . b )
  +  x ( . x )
  +  y ( . y )
  +  z ( . z )
initial_items_by_symbol:
  A
    A ( . x y z )
  B
    B ( . )
    B ( . B b )
    B ( . b )
  C
    C ( . A )
    C ( . B )
  b
    b ( . b )
  x
    x ( . x )
  y
    y ( . y )
  z
    z ( . z )
prediction_all_items_by_symbol:
  A
    A ( . x y z )
    x ( . x )
  B
    B ( . B b )
    B ( . b )
    b ( . b )
  b
    b ( . b )
  x
    x ( . x )
  y
    y ( . y )
  z
    z ( . z )
prediction_terminals_by_symbol:
  A
    x
  B
    b
  b
    b
  x
    x
  y
    y
  z
    z
CfgDecoderBase.completers:
    A
      - C ( . A )
    B
        B ( . B b )
      - C ( . B )
  + b
        B ( . B b )
      - B ( B . b )
      - B ( . b )
      - C ( . B )
  + x
        A ( . x y z )
  + y
        A ( x . y z )
  + z
      - A ( x y . z )
      - C ( . A )
CfgDecoderBase.completers2:
    A
      - C ( . A )
    B
        B ( . B b )
      - C ( . B )
  + b
      - B ( B . b )
      - B ( . b )
        B ( . B b )
      - C ( . B )
  + x
        A ( . x y z )
  + y
        A ( x . y z )
  + z
      - A ( x y . z )
      - C ( . A )

dump()¶

make_completer(completer, completer2, ids_by_item, label='')¶

print_completeable(label, completeables, parents=None, sorter=<built-in function sorted>)¶

print_extenders(label, extenders, sorter=<built-in function sorted>)¶

static print_item_dict(label, item_dict)¶

class onyx.dataflow.simplecfg.CfgParser(arg)¶

Bases: onyx.dataflow.simplecfg.CfgDecoderBase

A subclass of CfgDecoderBase that can be used to parse sequences of terminals against languages defined by sets of non-terminals of a Cfg grammar.

This has replaced CfgRecognizer.

dump()¶

make_completer(completer, completer2, ids_by_item, label='')¶

parser(start_non_terminals, force_tree=False)¶

Given a set of starting non-terminals, start_non_terminals, in the CFG productions from which CfgParser was constructed, returns a (generator) function that can be used to recognize whether a sequence of terminals forms a sentence in the language defined by the starting non_terminals.

Optional force_tree defaults to False, in which case the parser optimizes its internal structures to merge decodes that have identical futures (potentially resulting in space-saving DAGs or even cyclic graphs in these data structures). If force_tree is True, this optimization is disabled and the parser will generate a strict tree of internal states (this can use exponentially more space than the False case).

The returned callable, func, is a implemented as a generator. From each call to func, a four-tuple is returned: (state_id, is_sentential, legal_terminals, exception). In this tuple: state_id is an opaque identifier for the state of the decoder; is_sentential is True if the sequence of terminal items up to this state is a sentential form; legal_terminals is the frozenset of legal terminals with which the decoder can be advanced from state_id; and exception is an Exception instance indicating that the call to func was somehow invalid. If exception is not None, the values appearing in the rest of the tuple are unspecified.

Note

The callable from the obsolete recognizer() method returns the four-tuple: (is_sentential, state_id, legal_terminals, exception), so you must switch the order of the first two items when migrating code to use parser().

As with all generators, the first call to func must be func(None). From this initial call, is_sentential is True if the empty string is a sentence in the language, and legal_terminals is the set of legal initial terminals in non-empty sentences of the language.

Subsequently, you can call func with a pair (state_id, terminal), where state_id is any previously returned state_id, and terminal is any item from the corresponding set of legal_terminals that were returned with that state_id. In this way, you can examine the set of legal terminals with which to advance the decode(s) and choose how to advance the decode.

If the next putative terminal in the token sequence the client is decoding is not in the set of legal_terminals then the recognition of the client’s sequence has failed. It is up to the client to notice this.

It is an error to call func with a (state_id, terminal) pair for which either state_id is not a state_id returned from a prior call to func, or terminal is not an item in the set of items in the legal_terminals associated with state_id. In either of these cases, the exception item in the returned tuple will be an appropriately initialized instance of ValueError.

If the legal_terminals item in the returned tuple is empty, it means that the sequence of terminal tokens given to func is a sentence in the language (so is_sentential will be True), and it means that this sentence is not a prefix of any other sentence in the language.

See the implementation for cfg_iter() for a simple use of this interface that enumerates all sentences in a language.

Implementation note:

This parser is implemented with an FSA-item-centric version of Earley’s algorithm. As such it can handle any CFG. It makes extensive use of mappings and sets. It is prelimary work for a lattice-centric CFG that will transduce lattices into lattices such that any path through the resulting lattice is a sentence in the language.

print_completeable(label, completeables, parents=None, sorter=<built-in function sorted>)¶

print_extenders(label, extenders, sorter=<built-in function sorted>)¶

static print_item_dict(label, item_dict)¶

recognizer(*args, **kwargs)¶

Obsolete, replaced by CfgParser.parser().

You should switch to CfgParser.parser() which has the same functionality, but note, the callable returned by CfgParser.parser() has a different order of its return arguments.

class onyx.dataflow.simplecfg.CfgRecognizer(arg)¶

Bases: onyx.dataflow.simplecfg.CfgParser

Obsolete, replaced by CfgParser

You should switch to using CfgParser, specifically its CfgParser.parser() which has the same functionality as CfgRecognizer.recognizer(), but note, has a different order of return arguments.

dump()¶

make_completer(completer, completer2, ids_by_item, label='')¶

parser(start_non_terminals, force_tree=False)¶

Given a set of starting non-terminals, start_non_terminals, in the CFG productions from which CfgParser was constructed, returns a (generator) function that can be used to recognize whether a sequence of terminals forms a sentence in the language defined by the starting non_terminals.

Optional force_tree defaults to False, in which case the parser optimizes its internal structures to merge decodes that have identical futures (potentially resulting in space-saving DAGs or even cyclic graphs in these data structures). If force_tree is True, this optimization is disabled and the parser will generate a strict tree of internal states (this can use exponentially more space than the False case).

The returned callable, func, is a implemented as a generator. From each call to func, a four-tuple is returned: (state_id, is_sentential, legal_terminals, exception). In this tuple: state_id is an opaque identifier for the state of the decoder; is_sentential is True if the sequence of terminal items up to this state is a sentential form; legal_terminals is the frozenset of legal terminals with which the decoder can be advanced from state_id; and exception is an Exception instance indicating that the call to func was somehow invalid. If exception is not None, the values appearing in the rest of the tuple are unspecified.

Note

The callable from the obsolete recognizer() method returns the four-tuple: (is_sentential, state_id, legal_terminals, exception), so you must switch the order of the first two items when migrating code to use parser().

As with all generators, the first call to func must be func(None). From this initial call, is_sentential is True if the empty string is a sentence in the language, and legal_terminals is the set of legal initial terminals in non-empty sentences of the language.

Subsequently, you can call func with a pair (state_id, terminal), where state_id is any previously returned state_id, and terminal is any item from the corresponding set of legal_terminals that were returned with that state_id. In this way, you can examine the set of legal terminals with which to advance the decode(s) and choose how to advance the decode.

If the next putative terminal in the token sequence the client is decoding is not in the set of legal_terminals then the recognition of the client’s sequence has failed. It is up to the client to notice this.

It is an error to call func with a (state_id, terminal) pair for which either state_id is not a state_id returned from a prior call to func, or terminal is not an item in the set of items in the legal_terminals associated with state_id. In either of these cases, the exception item in the returned tuple will be an appropriately initialized instance of ValueError.

If the legal_terminals item in the returned tuple is empty, it means that the sequence of terminal tokens given to func is a sentence in the language (so is_sentential will be True), and it means that this sentence is not a prefix of any other sentence in the language.

See the implementation for cfg_iter() for a simple use of this interface that enumerates all sentences in a language.

Implementation note:

This parser is implemented with an FSA-item-centric version of Earley’s algorithm. As such it can handle any CFG. It makes extensive use of mappings and sets. It is prelimary work for a lattice-centric CFG that will transduce lattices into lattices such that any path through the resulting lattice is a sentence in the language.

print_completeable(label, completeables, parents=None, sorter=<built-in function sorted>)¶

print_extenders(label, extenders, sorter=<built-in function sorted>)¶

static print_item_dict(label, item_dict)¶

recognizer(*args, **kwargs)¶

Obsolete, replaced by CfgParser.parser().

You should switch to CfgParser.parser() which has the same functionality, but note, CfgParser.parser() has a different order of return arguments.

class onyx.dataflow.simplecfg.FrozenCfg(builder, start=None)¶

Bases: object

>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('B', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('B',))
>>> cfg = FrozenCfg(b)
>>> cfg = FrozenCfg(b, 'x')
Traceback (most recent call last):
    ...
ValueError: expected (optional) start to be in the set of non_terminals, but got 'x'
>>> cfg = FrozenCfg(b, 'B')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B' : 
'B' : 'B' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'B'

>>> cfg = FrozenCfg(CfgBuilder())
>>> cfg.size
0

make_left_factored_cfg()¶

Return a FrozenCfg that is equivalent to self but for which the productions for a non_terminal are all left factored. This means that there is no prefix sharing across the productions for a given non-terminal. Another way to state this is that there will be only one production for each direct left corner of each non-terminal.

>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('A', ('x', 'y', 'q'))
>>> b.add_production('A', ('l', 'n', 'z'))
>>> b.add_production('A', ('l', 'n', 'q'))
>>> b.add_production('A', ('l',))
>>> b.add_production('A', ('x', 'z', 'z'))
>>> b.add_production('A', ('x', 'q', 'q'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('C', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('D',))
>>> b.add_production('C', ('E', 'e'))
>>> b.add_production('D', ('B',))
>>> b.add_production('E', ('B', 'f'))
>>> b.add_production('S', ('A', 'C', 'x'))
>>> b.add_production('S', ('B', 'C', 'x'))
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l'
'A' : 'l' 'n' 'q'
'A' : 'l' 'n' 'z'
'A' : 'x' 'q' 'q'
'A' : 'x' 'y' 'q'
'A' : 'x' 'y' 'z'
'A' : 'x' 'z' 'z'
'B' : 
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
>>> cfg1 = cfg.make_left_factored_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l' 'A_lf3'
'A' : 'x' 'A_lf2'
'A_lf0' : 'q'
'A_lf0' : 'z'
'A_lf1' : 'q'
'A_lf1' : 'z'
'A_lf2' : 'q' 'q'
'A_lf2' : 'y' 'A_lf0'
'A_lf2' : 'z' 'z'
'A_lf3' : 
'A_lf3' : 'n' 'A_lf1'
'B' : 
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
>>> cfg2 = cfg1.make_no_epsilon_cfg()
>>> for lhs, rhs in cfg2: print str(lhs), ': ', '  '.join(str(rhs_token) for rhs_token in rhs)
A :  l
A :  l  A_lf3_er2
A :  x  A_lf2
A_lf0 :  q
A_lf0 :  z
A_lf1 :  q
A_lf1 :  z
A_lf2 :  q  q
A_lf2 :  y  A_lf0
A_lf2 :  z  z
A_lf3_er2 :  n  A_lf1
B_er1 :  C_er0  b
B_er1 :  b
C_er0 :  A
C_er0 :  D_er3
C_er0 :  E  e
D_er3 :  B_er1
E :  B_er1  f
E :  f
S :  A  C_er0  x
S :  A  x
S :  B_er1  C_er0  x
S :  B_er1  x
S :  C_er0  x
S :  x
>>> cfg3 = cfg2.make_left_factored_cfg()
>>> for lhs, rhs in cfg3: print str(lhs), ': ', '  '.join(str(rhs_token) for rhs_token in rhs)
A :  l  A_lf0_lf0
A :  x  A_lf2
A_lf0 :  q
A_lf0 :  z
A_lf0_lf0 :  
A_lf0_lf0 :  A_lf3_er2
A_lf1 :  q
A_lf1 :  z
A_lf2 :  q  q
A_lf2 :  y  A_lf0
A_lf2 :  z  z
A_lf3_er2 :  n  A_lf1
B_er1 :  C_er0  b
B_er1 :  b
C_er0 :  A
C_er0 :  D_er3
C_er0 :  E  e
D_er3 :  B_er1
E :  B_er1  f
E :  f
S :  A  S_lf0
S :  B_er1  S_lf1
S :  C_er0  x
S :  x
S_lf0 :  C_er0  x
S_lf0 :  x
S_lf1 :  C_er0  x
S_lf1 :  x
>>> cfg4 = cfg3.make_no_epsilon_cfg()
>>> for lhs, rhs in cfg4: print str(lhs), ': ', '  '.join(str(rhs_token) for rhs_token in rhs)
A :  l
A :  l  A_lf0_lf0_er0
A :  x  A_lf2
A_lf0 :  q
A_lf0 :  z
A_lf0_lf0_er0 :  A_lf3_er2
A_lf1 :  q
A_lf1 :  z
A_lf2 :  q  q
A_lf2 :  y  A_lf0
A_lf2 :  z  z
A_lf3_er2 :  n  A_lf1
B_er1 :  C_er0  b
B_er1 :  b
C_er0 :  A
C_er0 :  D_er3
C_er0 :  E  e
D_er3 :  B_er1
E :  B_er1  f
E :  f
S :  A  S_lf0
S :  B_er1  S_lf1
S :  C_er0  x
S :  x
S_lf0 :  C_er0  x
S_lf0 :  x
S_lf1 :  C_er0  x
S_lf1 :  x
>>> cfg5 = cfg4.make_left_factored_cfg()
>>> for lhs, rhs in cfg5: print str(lhs), ': ', '  '.join(str(rhs_token) for rhs_token in rhs)
A :  l  A_lf0_lf0
A :  x  A_lf2
A_lf0 :  q
A_lf0 :  z
A_lf0_lf0 :  
A_lf0_lf0 :  A_lf0_lf0_er0
A_lf0_lf0_er0 :  A_lf3_er2
A_lf1 :  q
A_lf1 :  z
A_lf2 :  q  q
A_lf2 :  y  A_lf0
A_lf2 :  z  z
A_lf3_er2 :  n  A_lf1
B_er1 :  C_er0  b
B_er1 :  b
C_er0 :  A
C_er0 :  D_er3
C_er0 :  E  e
D_er3 :  B_er1
E :  B_er1  f
E :  f
S :  A  S_lf0
S :  B_er1  S_lf1
S :  C_er0  x
S :  x
S_lf0 :  C_er0  x
S_lf0 :  x
S_lf1 :  C_er0  x
S_lf1 :  x
>>> cfg.size, cfg1.size, cfg2.size, cfg3.size, cfg4.size, cfg5.size
(42, 44, 51, 55, 55, 57)
>>> cfg.num_productions, cfg1.num_productions, cfg2.num_productions, cfg3.num_productions, cfg4.num_productions, cfg5.num_productions
(17, 21, 25, 28, 28, 29)

make_nlrg_cfg()¶

Return a FrozenCfg that is equivalent to self but for which the non-left-recursive productions for each non-terminal are grouped into a new non-terminal; this follows Robert Moore’s non-left-recursion-grouping (NLRG) algorithm

>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('A', ('x', 'y', 'q'))
>>> b.add_production('A', ('l', 'n', 'z'))
>>> b.add_production('A', ('l', 'n', 'q'))
>>> b.add_production('A', ('l',))
>>> b.add_production('A', ('x', 'z', 'z'))
>>> b.add_production('A', ('x', 'q', 'q'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('C', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('D',))
>>> b.add_production('C', ('E', 'e'))
>>> b.add_production('D', ('B',))
>>> b.add_production('E', ('B', 'f'))
>>> b.add_production('S', ('A', 'C', 'x'))
>>> b.add_production('S', ('B', 'C', 'x'))
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l'
'A' : 'l' 'n' 'q'
'A' : 'l' 'n' 'z'
'A' : 'x' 'q' 'q'
'A' : 'x' 'y' 'q'
'A' : 'x' 'y' 'z'
'A' : 'x' 'z' 'z'
'B' : 
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
>>> cfg1 = cfg.make_nlrg_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l'
'A' : 'l' 'n' 'q'
'A' : 'l' 'n' 'z'
'A' : 'x' 'q' 'q'
'A' : 'x' 'y' 'q'
'A' : 'x' 'y' 'z'
'A' : 'x' 'z' 'z'
'B' : 
'B' : 'C' 'b'
'B' : 'b'
'C' : 'C_nlg'
'C' : 'D'
'C_nlg' : 'A'
'C_nlg' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'

make_no_epsilon_cfg()¶

Return a FrozenCfg that is equivalent to self but which contains no epsilon productions.

>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('B', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('B',))
>>> b.add_production('D', ('(', 'C', ')'))
>>> b.add_production('E', ('B', '(', 'C', ')'))
>>> b.add_production('F', ('B', '(', 'C', ')', 'C'))
>>> b.add_production('F', ('B', '(', 'C', ')', 'C', 'C'))
>>> b.add_production('G', ('B',))
>>> b.add_production('G', ('B', 'B', 'B'))
>>> b.add_production('S', ('A',))
>>> b.add_production('S', ('A', 'F'))
>>> b.add_production('S', ('A', 'C'))
>>> cfg = FrozenCfg(b)
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B' : 
'B' : 'B' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'B'
'D' : '(' 'C' ')'
'E' : 'B' '(' 'C' ')'
'F' : 'B' '(' 'C' ')' 'C'
'F' : 'B' '(' 'C' ')' 'C' 'C'
'G' : 'B'
'G' : 'B' 'B' 'B'
'S' : 'A'
'S' : 'A' 'C'
'S' : 'A' 'F'
>>> cfg1 = cfg.make_no_epsilon_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B_er1' : 'B_er1' 'b'
'B_er1' : 'b'
'C_er0' : 'A'
'C_er0' : 'B_er1'
'D' : '(' ')'
'D' : '(' 'C_er0' ')'
'E' : '(' ')'
'E' : '(' 'C_er0' ')'
'E' : 'B_er1' '(' ')'
'E' : 'B_er1' '(' 'C_er0' ')'
'F' : '(' ')'
'F' : '(' ')' 'C_er0'
'F' : '(' ')' 'C_er0' 'C_er0'
'F' : '(' 'C_er0' ')' 'C_er0'
'F' : '(' 'C_er0' ')' 'C_er0' 'C_er0'
'F' : 'B_er1' '(' ')'
'F' : 'B_er1' '(' ')' 'C_er0'
'F' : 'B_er1' '(' ')' 'C_er0' 'C_er0'
'F' : 'B_er1' '(' 'C_er0' ')'
'F' : 'B_er1' '(' 'C_er0' ')' 'C_er0'
'F' : 'B_er1' '(' 'C_er0' ')' 'C_er0' 'C_er0'
'G_er2' : 'B_er1'
'G_er2' : 'B_er1' 'B_er1'
'G_er2' : 'B_er1' 'B_er1' 'B_er1'
'S' : 'A'
'S' : 'A' 'C_er0'
'S' : 'A' 'F'

make_no_left_recursion_cfg()¶

Return a FrozenCfg that is equivalent to self but which contains no left-recursive production chains.

>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('C', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('D',))
>>> b.add_production('C', ('E', 'e'))
>>> b.add_production('D', ('B',))
>>> b.add_production('E', ('B', 'f'))
>>> b.add_production('S', ('A', 'C'))
>>> b.add_production('S', ('B', 'C'))
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C'
'S' : 'B' 'C'
>>> cfg1 = cfg.make_no_left_recursion_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'S' : 

non_terminals¶: A frozenset of the non-terminals in the grammar.

num_productions¶: The number of productions in the grammar.

size¶

The size of the grammar.

We follow Robert Moore in calculating the size of the grammar: the size is the number of non-terminal symbols plus the sum of the lengths of the right-hand-side sequences over all the productions in the grammar. By counting each non-terminal just once, instead of once for each of its productions, this size statistic more closely tracks storage requirements of actual implementations of grammar structures. An empty right-hand-side sequence (epsilon) is counted has having length one.

terminals¶: A frozenset of the terminals in the grammar.

class onyx.dataflow.simplecfg.FsaItem(*_)¶

Bases: tuple

Item in an Fsa decomposition of a grammar. An FsaItem is a production and a cursor position within the production. Specifically it’s a triple: the non-terminal for the production, the right-hand side sequence that the production expands to, and the zero-based cursor position within the rhs sequence.

An FsaItem is explicitly constructed via FsaItem.production_item() From this, other FsaItems can be obtained via the shifted, penultimate and initial properties.

A production

>>> lhs = 'A'
>>> rhs = 'a1', 'a2', 'a3'
>>> a = FsaItem.production_item(lhs, rhs)
>>> assert a.is_production ^ a.is_terminal
>>> a.is_production, a.is_terminal
(True, False)
>>> a.is_epsilon
False
>>> a.is_initial, a.is_penultimate, a.is_final
(True, False, False)

>>> a.lhs
'A'
>>> a.rhs_symbol
'a1'
>>> a.as_production
'A :  a1 a2 a3'

>>> a.cursor
0
>>> a.rhs_seq
('a1', 'a2', 'a3')
>>> while True:
...   a.lhs; a.rhs_symbol; print a
...   if a.is_penultimate: break
...   a = a.shifted
'A'
'a1'
A ( . a1 a2 a3 )
'A'
'a2'
A ( a1 . a2 a3 )
'A'
'a3'
A ( a1 a2 . a3 )

>>> a.is_penultimate
True
>>> a.is_final
False

>>> a.initial.is_initial
True
>>> a.penultimate.is_penultimate
True

>>> a.is_penultimate
True
>>> a.shifted
Traceback (most recent call last):
  ...
ValueError: cannot get shift of an FsaItem that is_epsilon or is_penultimate or is_terminal
>>> a.as_terminal
Traceback (most recent call last):
  ...
ValueError: expected is_terminal to be True, but FsaItem is_production is True: 'A ( a1 a2 . a3 )'

An epsilon

>>> e = FsaItem.production_item('E',())
>>> str(e)
'E ( . )'
>>> e.is_production, e.is_terminal
(True, False)
>>> e.is_initial, e.is_penultimate, e.is_final
(True, False, True)
>>> e.is_epsilon
True
>>> e.lhs
'E'
>>> e.as_production
'E :  '
>>> e.shifted
Traceback (most recent call last):
  ...
ValueError: cannot get shift of an FsaItem that is_epsilon or is_penultimate or is_terminal

A terminal

>>> b = FsaItem.terminal_item('x')
>>> b.is_production, b.is_terminal
(False, True)
>>> b.is_initial, b.is_penultimate, b.is_final
(True, True, False)
>>> b.is_epsilon
False
>>> b.as_terminal
'x'
>>> b.as_production
'x :  x'
>>> b.lhs
'x'
>>> b.rhs_symbol
'x'

>>> print b
x ( . x )
>>> print repr(b)
FsaItem(('x', ('x',), 0))
>>> b = b.shifted
Traceback (most recent call last):
  ...
ValueError: cannot get shift of an FsaItem that is_epsilon or is_penultimate or is_terminal

as_production¶

as_terminal¶

count¶: T.count(value) -> integer – return number of occurrences of value

cursor¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

index¶: T.index(value, [start, [stop]]) -> integer – return first index of value. Raises ValueError if the value is not present.

initial¶

is_epsilon¶

is_final¶

is_initial¶

is_penultimate¶

is_production¶

is_terminal¶

lhs¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

penultimate¶

static production_item(non_terminal, rhs_seq)¶

Factory staticmethod to create an initial FSA item for a production, where non_terminal is the left-hand-side token of the production, and rhs_seq is the sequence of tokens making up the right-hand-side of the production. Other FSA items for the production can be created from the shifted and penultimate properties.

Returns an FsaItem for the production.

Raises TypeError if non_terminal or any of the tokens in rhs_seq is a mutable object.

Raises ValueError if rhs_seq consists of one item and that item is equal to non_terminal.

rhs_seq¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

rhs_symbol¶

shifted¶

static singletons()¶: send(arg) -> send ‘arg’ into generator, return next yielded value or raise StopIteration.

static terminal_item(terminal)¶

Factory staticmethod to create an FSA item for terminal. This is a non-standard usage of FSA items for internal simplicity of the CFG decoders.

Returns an FsaItem for the terminal.

Raises TypeError if terminal is a mutable object.

class onyx.dataflow.simplecfg.ParserState(*_)¶

Bases: tuple

Helper for Cfg recognition. It’s a container for the immutablesignature of a recognizer state in the decoder, (is_sentential, earley_seeders), and a mutable member, extenders. When copy constructed, the signature is preserved, but the extenders is copied.

earley_seeders is sets of state_ids indexed by token

Make an empty one

>>> ParserState()
(False, frozendict_of_set({}), frozenset([]), dict_of(set, {}), dict_of(tuple, {}), dict_of(set, {}), {}, dict_of(set, {}))

Make a non-empty one

>>> r1 = ParserState(True, dict_of(set, (('a', set((0, 1))), ('b', set((10, 20, 30))))), 'abc')

Make a copy of it

>>> r2 = ParserState(r1)

Their completers, extenders and stems are distinct

>>> r1.completers['e'].add(101)
>>> r2.completers['f'].add(202)
>>> r1.extenders['c'].add(100)
>>> r2.extenders['d'].add(200)
>>> r1.stems['a'] = 'b'
>>> r2.stems['a'] = 'c'
>>> r1
(True, frozendict_of_set({'a': frozenset([0, 1]), 'b': frozenset([10, 20, 30])}), frozenset(['a', 'c', 'b']), dict_of(set, {'e': set([101])}), dict_of(tuple, {}), dict_of(set, {'c': set([100])}), {'a': 'b'}, dict_of(set, {}))
>>> r2
(True, frozendict_of_set({'a': frozenset([0, 1]), 'b': frozenset([10, 20, 30])}), frozenset(['a', 'c', 'b']), dict_of(set, {'f': set([202])}), dict_of(tuple, {}), dict_of(set, {'d': set([200])}), {'a': 'c'}, dict_of(set, {}))

completers¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

completers2¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

count¶: T.count(value) -> integer – return number of occurrences of value

extenders¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

extenders2¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

index¶: T.index(value, [start, [stop]]) -> integer – return first index of value. Raises ValueError if the value is not present.

is_sentential¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

legal_terminals¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

seeders¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

stems¶

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

onyx.dataflow.simplecfg.cfg_iter(cfg, starts)¶

Returns a generator that yields an iterable over the terminals for each sentence in the lanauge. Works in a breadth-first fashion over legal terminals, so lengths of sentential forms are monotonically increasing.

Finite special cases

Simple

>>> finite = '''
...   S  a b c D
...   D  x
...   D  y
... '''
>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(finite), ['S']))
('abcx', 'abcy')

Nullable and finite

>>> nullable_finite = '''
...   S
...   S  a b c
... '''
>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(nullable_finite), ['S'])) 

Empty (no terminals)

>>> empty = '''
...   S  E
...   E  S
... '''
>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(empty), ['S']))
()

Ambiguous

>>> ambiguous = '''
...   S a X c
...   X AB CC
...   X A BC
...   AB A B
...   BC B CC
...   CC CCC
...   CCC CCCC
...   CCCC C
... '''
>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(ambiguous), ['S']))
('aABCc',)

>>> ambiguous2 = '''
...   S S0 z z
...   S0  a b A c d
...   A B
...   A C
...   B x
...   C x
... '''
>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(ambiguous2), ['S', 'S0']))
('abxcd', 'abxcdzz')

Late productions

>>> late = '''
...   S   a X b Y c Z d
...   X   X0
...   X0  x0 X1
...   X1  x1 X2
...   X2  x20
...   X2  x21
...   Y   Y0
...   Y0  y0 Y1
...   Y1  y1 Y2
...   Y2  y2
...   Z   Z0
...   Z0  z0 Z1
...   Z1  z1 Z2
...   Z2  z20
...   Z2  z21
... '''
>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(late), ['S']))
('ax0x1x20by0y1y2cz0z1z20d', 'ax0x1x20by0y1y2cz0z1z21d', 'ax0x1x21by0y1y2cz0z1z20d', 'ax0x1x21by0y1y2cz0z1z21d')
>>> tuple(' '.join(sent) for sent in cfg_iter(_cfg_from_string2(late), ['S']))
('a x0 x1 x20 b y0 y1 y2 c z0 z1 z20 d', 'a x0 x1 x20 b y0 y1 y2 c z0 z1 z21 d', 'a x0 x1 x21 b y0 y1 y2 c z0 z1 z20 d', 'a x0 x1 x21 b y0 y1 y2 c z0 z1 z21 d')

Infinite language

>>> exprs = '''
...   # toy expression grammar; various productions are commented out in order to
...   # make language enumeration managable in tests
...
...   number   digit
...   number   number digit 
...   digit    0
...   #digit    1
...
...   symbol   letter
...   symbol   letter symbol 
...   letter   a
...   letter   b
...
...   expr     tert
...   tert     addn
...   tert     addn ? addn : addn
...   addn     term
...   addn     addn + term
...   term     factor
...   term     term * factor
...   factor   primary
...   #factor   primary ^ factor
...
...   primary  ( expr )
...   primary  number
...   #primary  symbol
...
...   # making cond be a primary is not typical for a language, but it allows us
...   # to get ambiguity into expr rather than having statements
...   primary  cond
...
...   #cond     if( expr ) expr
...   #cond     if( expr ) expr el expr
...   cond     < expr > expr
...   cond     < expr > expr | expr
... '''

>>> max_len = 6
>>> unique = set()
>>> for count, sentence_iter in enumerate(cfg_iter(_cfg_from_string2(exprs), ['expr'])):
...   sentence = tuple(sentence_iter)
...   if len(sentence) > max_len: break
...   unique.add(sentence)
...   print ''.join(sentence),
0 00 (0) 0*0 0+0 000 (00) 0*00 0+00 00*0 00+0 0000 <0>0 ((0)) (0)*0 (0)+0 (0*0) (0+0) (000) 0*(0) 0*0*0 0*0+0 0*000 0+(0) 0+0*0 0+0+0 0+000 00*00 00+00 000*0 000+0 00000 0?0:0 <00>0 <0>00 ((00)) (0)*00 (0)+00 (0*00) (0+00) (00)*0 (00)+0 (00*0) (00+0) (0000) (<0>0) 0*(00) 0*0*00 0*0+00 0*00*0 0*00+0 0*0000 0*<0>0 0+(00) 0+0*00 0+0+00 0+00*0 0+00+0 0+0000 0+<0>0 00*(0) 00*0*0 00*0+0 00*000 00+(0) 00+0*0 00+0+0 00+000 000*00 000+00 0000*0 0000+0 000000 00?0:0 0?00:0 0?0:00 <(0)>0 <0*0>0 <0+0>0 <000>0 <00>00 <0>(0) <0>0*0 <0>0+0 <0>000 <0>0|0
>>> count == len(unique)
True
>>> len(unique)
86

Test an error message

>>> tuple(''.join(sent) for sent in cfg_iter(_cfg_from_string2(ambiguous), ['X', 'x']))
Traceback (most recent call last):
  ...
ValueError: expected set of start_non_terminals to be drawn from the set of non_terminals, but (also) got: x

Regression test for a deep bug:

>>> dial_prods = '''
...   D2      D D
...   D3      D2 D
...   EXD4    Pex D3
...   Pex     exch
...   D       0
...   D       1
... '''
>>> tuple(' '.join(sent) for sent in cfg_iter(_cfg_from_string2(dial_prods), ['EXD4']))
('exch 0 0 0', 'exch 0 0 1', 'exch 0 1 0', 'exch 0 1 1', 'exch 1 0 0', 'exch 1 0 1', 'exch 1 1 0', 'exch 1 1 1')

Bug used to give: (‘exch 0’, ‘exch 1’, ‘exch 0 0’, ‘exch 0 1’, ‘exch 1 0’, ‘exch 1 1’, ‘exch 0 0 0’, ‘exch 0 0 1’, ‘exch 0 1 0’, ‘exch 0 1 1’, ‘exch 1 0 0’, ‘exch 1 0 1’, ‘exch 1 1 0’, ‘exch 1 1 1’)

Regression test against some overly constrictive asserts

>>> infinite_bug = '''
...  S  a
...  S  b
...  S  S S
... '''
>>> for x in itertools.islice((' '.join(sent) for sent in cfg_iter(_cfg_from_string2(infinite_bug), ['S'])), 0, 80, 4): print x
a
b a
a b a
b b a
a a b a
a b b a
b a b a
b b b a
a a a b a
a a b b a
a b a b a
a b b b a
b a a b a
b a b b a
b b a b a
b b b b a
a a a a b a
a a a b b a
a a b a b a
a a b b b a

onyx.dataflow.simplecfg.fsa_dot_lines(successor_links, prediction_links, completion_links, graph_label='', graph_type='digraph', globals=())¶

onyx.dataflow.simplecfg.iter_nullable(productions)¶

Yield the nullable non-terminals in the productions. Productions is a mapping with non-terminal key and an iterable of sequences as value, where each (possibly empty) sequence consists of non-terminals and terminals.

This function returns the list of non-terminals which, through transitive closure, can derive an empty sentence.

>>> tuple(iter_nullable({}))
()
>>> tuple(iter_nullable({'x':(('x', 'y'),)}))
()
>>> tuple(iter_nullable({'x':((), ('x', 'y'))}))
('x',)
>>> tuple(iter_nullable({'x':((), ('x', 'y')), 'z':(('a', 'b'), ('x',)), 'a':(('a',),)}))
('x', 'z')

onyx.dataflow.simplecfg.plot_fsa(successor_links, prediction_links, completion_links)¶

class onyx.dataflow.simplecfg.production_dict(arg=())¶

Bases: onyx.builtin.dict_of_set

A specialized dictionary for working with CFG productions. The keys are non-terminal symbols. Each value is a set of the right-hand-side sequences for the productions corresponding to the left-hand-side key. An empty set is automatically added when a new key is used to access the dictionary.

clear¶: D.clear() -> None. Remove all items from D.

copy¶: D.copy() -> a shallow copy of D

static fromkeys()¶: dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v. v defaults to None.

get¶: D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.

has_key¶: D.has_key(k) -> True if D has a key k, else False

items¶: D.items() -> list of D’s (key, value) pairs, as 2-tuples

iter_flat¶: Returns a generator that yields a pair corresponding to each production in the set of productions. The first item in each pair is the non-terminal left-hand-side for the production, the second item is sequence of non-terminals and terminals making up the right-hand-side of the production.

iter_nullable¶

iteritems¶: D.iteritems() -> an iterator over the (key, value) items of D

iterkeys¶: D.iterkeys() -> an iterator over the keys of D

itervalues¶: D.itervalues() -> an iterator over the values of D

keys¶: D.keys() -> list of D’s keys

non_terminals¶: Return a set of the non_terminal symbols in the collection of productions.

pop¶: D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised

popitem¶: D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty.

setdefault(key, value)¶

size¶: The size of the grammar. We follow Robert Moore in calculating the size of the grammar: the size is the number of non-terminal symbols plus the sum of the lengths of the right-hand-side sequences over all the productions in the grammar. By counting each non-terminal just once, instead of once for each of its productions, this size statistic more closely tracks storage requirements of actual implementations of grammar structures. An empty right-hand-side sequence (epsilon) is counted as having length one.

symbols¶: Return a set of all the symbols (non_terminals and terminals) in the collection of productions.

terminals¶: Return a set of the terminals symbols in the productions.

update(E, **kwargs)¶: D.update(E, **F) -> None. Update D from dict/iterable E and F. If E has a .keys() method, does: for k in E: D[k] = E[k] If E lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values¶: D.values() -> list of D’s values

viewitems¶: D.viewitems() -> a set-like object providing a view on D’s items

viewkeys¶: D.viewkeys() -> a set-like object providing a view on D’s keys

viewvalues¶: D.viewvalues() -> an object providing a view on D’s values

onyx.dataflow.simplecfg.recognizer_from_cfg(cfg, starts)¶

Simple recognizer for a Cfg. Source code shows how to use Cfg to recognize.

Here’s a the Hello World of recursive grammars

>>> binary_math = '''
...   expr     term
...   expr     expr + term
...   term     factor
...   term     term * factor
...   factor   primary
...   factor   primary ^ factor
...   primary  ( expr )
...   primary  number
...   primary  symbol
...   number   digit
...   number   number digit
...   digit    0
...   digit    1
...   # symbol starts with a letter1 followed by lettern, where lettern is zero or more letter1 and/or _ 
...   # so, in this language, a symbol cannot begin with _
...   symbol   letter1 lettern
...   letter1 a
...   letter1 b
...   # lettern shows that epsilons work, e.g. for empty lists
...   lettern
...   lettern  lettern letter1
...   lettern  lettern _
...   '''

Make a recognize function

>>> cfg = _cfg_from_string(binary_math)
>>> recognize = recognizer_from_cfg(cfg, ['expr'])

Helper for showing failures

>>> def fail(string, prefix):
...   print 'Failed: ', string; print '    --> ', ' ' * len(prefix) + '^'

Test out the recursion

>>> string1 = '001+100*(10+11^(1+1))'
>>> sentential, prefix = recognize(string1)
>>> sentential
True
>>> ''.join(prefix) == string1
True

Now try a sentence not in the language; the returned prefix shows how much successfully matched.

>>> string2 = '001+)100'
>>> sentential, prefix = recognize(string2)
>>> sentential
False
>>> ''.join(prefix)
'001+'
>>> fail(string2, prefix)
Failed:  001+)100
    -->      ^

Include some symbols

>>> string3 = 'ab*a_b+(01*bb)'
>>> sentential, prefix = recognize(string3)
>>> sentential
True
>>> ''.join(prefix) == string3
True

Now try a bogus symbol, can’t start with _

>>> string4 = 'ab*a_b+(01*_bb)'
>>> sentential, prefix = recognize(string4)
>>> sentential
False
>>> ''.join(prefix)
'ab*a_b+(01*'
>>> fail(string4, prefix)
Failed:  ab*a_b+(01*_bb)
    -->             ^

Make a new recognizer that has ‘expr’ as a start symbol, but that also has ‘lettern’ as a start symbol; this is not something you’d do for an expression grammar, but it lets you demonstrate a nullable language.

>>> recognize2 = recognizer_from_cfg(cfg, ['expr', 'lettern'])

Recap, inside an expr, you can have symbols. Symbols start with either ‘a’ or ‘b’, and then any number of ‘a’, ‘b’, or ‘_’; symbols do not start with ‘_’:

>>> recognize2('1+ab+a_')
(True, ['1', '+', 'a', 'b', '+', 'a', '_'])
>>> recognize2('1+_a')
(False, ['1', '+'])

Since lettern is a start state, you can have a non-empty lettern that begins with ‘_’

>>> recognize2('_ab')
(True, ['_', 'a', 'b'])

For a nullable start state such as lettern, the empty string is sentential

>>> recognize2('')
(True, [])

A sentence that is just a symbol (and thus an expr) is also a lettern, i.e. the grammar is ambiguous. The recognizer_from_cfg interface doesn’t reveal the ambiguity; it just recognizes that the token stream is sentential:

>>> recognize2('ab__ab')
(True, ['a', 'b', '_', '_', 'a', 'b'])

onyx.dataflow.simplecfg.render(dot_lines_iterable, temp_file_prefix='simplecfg_', display_tool_format='open -a /Applications/Graphviz.app %s')¶

class onyx.dataflow.simplecfg.tagged_tuple(tag, args)¶

Bases: tuple

count¶: T.count(value) -> integer – return number of occurrences of value

index¶: T.index(value, [start, [stop]]) -> integer – return first index of value. Raises ValueError if the value is not present.

tag¶

class onyx.dataflow.simplecfg.tagged_tuple2(tag, *args)¶

Bases: list

append¶: L.append(object) – append object to end

count¶: L.count(value) -> integer – return number of occurrences of value

extend¶: L.extend(iterable) – extend list by appending elements from the iterable

index¶: L.index(value, [start, [stop]]) -> integer – return first index of value. Raises ValueError if the value is not present.

insert¶: L.insert(index, object) – insert object before index

pop¶: L.pop([index]) -> item – remove and return item at index (default last). Raises IndexError if list is empty or index is out of range.

remove¶: L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse¶: L.reverse() – reverse IN PLACE

sort¶: L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

tag¶

class onyx.dataflow.simplecfg.tagged_tuple3¶

Bases: list

append¶: L.append(object) – append object to end

count¶: L.count(value) -> integer – return number of occurrences of value

extend¶: L.extend(iterable) – extend list by appending elements from the iterable

index¶: L.index(value, [start, [stop]]) -> integer – return first index of value. Raises ValueError if the value is not present.

insert¶: L.insert(index, object) – insert object before index

pop¶: L.pop([index]) -> item – remove and return item at index (default last). Raises IndexError if list is empty or index is out of range.

remove¶: L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse¶: L.reverse() – reverse IN PLACE

sort¶: L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

Previous topic

Next topic

This Page

`onyx.dataflow.simplecfg` – Tools for working with CFG grammars.¶

Navigation

Previous topic

Next topic

This Page

Quick search

onyx.dataflow.simplecfg – Tools for working with CFG grammars.¶

Navigation

`onyx.dataflow.simplecfg` – Tools for working with CFG grammars.¶