A LexiconBuilder can be constructed in empty form (the default), from a string, or from a FrozenLexicon object. See help(LexiconBuilder) for more details.
>>> lb0 = LexiconBuilder(_dict0)
A FrozenLexicon is a form of FrozenCfg and can be used anywhere a FrozenCfg can be used. In addition, FrozenLexicon has a few other properties.
>>> lex0 = FrozenLexicon(lb0)
>>> lex0.num_orthos
19
>>> lex0.num_prons
21
>>> print(lex0)
Lexicon with 19 orthographies and 21 prons
>>> lb1 = LexiconBuilder(lex0)
>>> lb1.add_from_strings(("this_word th i S w e r d", "that_word th ae t w e r d"))
>>> lex1 = FrozenLexicon(lb1)
>>> print(lex1)
Lexicon with 21 orthographies and 23 prons
Bases: onyx.dataflow.simplecfg.FrozenCfg
A module for using word and pronunciation collections.
A FrozenLexicon is a form of FrozenCfg and can be used anywhere a FrozenCfg can be used. In addition, FrozenLexicon has a few other properties.
Make FrozenLexicons from LexiconBuilders:
>>> lb0 = LexiconBuilder(_dict0)
>>> lex0 = FrozenLexicon(lb0)
>>> lex0.num_orthos
19
>>> lex0.num_prons
21
>>> lex0.num_phones
22
>>> print(lex0)
Lexicon with 19 orthographies and 21 prons
>>> lex0.size
89
>>> lex0.num_productions
21
>>> len(lex0.terminals)
22
>>> len(lex0.non_terminals)
19
>>> sorted(lex0.terminals)[0]
(onyx.util.singleton.Singleton('onyx.lexicon.PHONE'), '@')
>>> sorted(lex0.non_terminals)[0]
(onyx.util.singleton.Singleton('onyx.lexicon.WORD'), '</s>')
>>> # for lhs, rhs in lex0: print('%s ====> %s)' % (lhs, rhs))
Return a FrozenCfg that is equivalent to self but for which the productions for a non_terminal are all left factored. This means that there is no prefix sharing across the productions for a given non-terminal. Another way to state this is that there will be only one production for each direct left corner of each non-terminal.
>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('A', ('x', 'y', 'q'))
>>> b.add_production('A', ('l', 'n', 'z'))
>>> b.add_production('A', ('l', 'n', 'q'))
>>> b.add_production('A', ('l',))
>>> b.add_production('A', ('x', 'z', 'z'))
>>> b.add_production('A', ('x', 'q', 'q'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('C', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('D',))
>>> b.add_production('C', ('E', 'e'))
>>> b.add_production('D', ('B',))
>>> b.add_production('E', ('B', 'f'))
>>> b.add_production('S', ('A', 'C', 'x'))
>>> b.add_production('S', ('B', 'C', 'x'))
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l'
'A' : 'l' 'n' 'q'
'A' : 'l' 'n' 'z'
'A' : 'x' 'q' 'q'
'A' : 'x' 'y' 'q'
'A' : 'x' 'y' 'z'
'A' : 'x' 'z' 'z'
'B' :
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
>>> cfg1 = cfg.make_left_factored_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l' 'A_lf3'
'A' : 'x' 'A_lf2'
'A_lf0' : 'q'
'A_lf0' : 'z'
'A_lf1' : 'q'
'A_lf1' : 'z'
'A_lf2' : 'q' 'q'
'A_lf2' : 'y' 'A_lf0'
'A_lf2' : 'z' 'z'
'A_lf3' :
'A_lf3' : 'n' 'A_lf1'
'B' :
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
>>> cfg2 = cfg1.make_no_epsilon_cfg()
>>> for lhs, rhs in cfg2: print str(lhs), ': ', ' '.join(str(rhs_token) for rhs_token in rhs)
A : l
A : l A_lf3_er2
A : x A_lf2
A_lf0 : q
A_lf0 : z
A_lf1 : q
A_lf1 : z
A_lf2 : q q
A_lf2 : y A_lf0
A_lf2 : z z
A_lf3_er2 : n A_lf1
B_er1 : C_er0 b
B_er1 : b
C_er0 : A
C_er0 : D_er3
C_er0 : E e
D_er3 : B_er1
E : B_er1 f
E : f
S : A C_er0 x
S : A x
S : B_er1 C_er0 x
S : B_er1 x
S : C_er0 x
S : x
>>> cfg3 = cfg2.make_left_factored_cfg()
>>> for lhs, rhs in cfg3: print str(lhs), ': ', ' '.join(str(rhs_token) for rhs_token in rhs)
A : l A_lf0_lf0
A : x A_lf2
A_lf0 : q
A_lf0 : z
A_lf0_lf0 :
A_lf0_lf0 : A_lf3_er2
A_lf1 : q
A_lf1 : z
A_lf2 : q q
A_lf2 : y A_lf0
A_lf2 : z z
A_lf3_er2 : n A_lf1
B_er1 : C_er0 b
B_er1 : b
C_er0 : A
C_er0 : D_er3
C_er0 : E e
D_er3 : B_er1
E : B_er1 f
E : f
S : A S_lf0
S : B_er1 S_lf1
S : C_er0 x
S : x
S_lf0 : C_er0 x
S_lf0 : x
S_lf1 : C_er0 x
S_lf1 : x
>>> cfg4 = cfg3.make_no_epsilon_cfg()
>>> for lhs, rhs in cfg4: print str(lhs), ': ', ' '.join(str(rhs_token) for rhs_token in rhs)
A : l
A : l A_lf0_lf0_er0
A : x A_lf2
A_lf0 : q
A_lf0 : z
A_lf0_lf0_er0 : A_lf3_er2
A_lf1 : q
A_lf1 : z
A_lf2 : q q
A_lf2 : y A_lf0
A_lf2 : z z
A_lf3_er2 : n A_lf1
B_er1 : C_er0 b
B_er1 : b
C_er0 : A
C_er0 : D_er3
C_er0 : E e
D_er3 : B_er1
E : B_er1 f
E : f
S : A S_lf0
S : B_er1 S_lf1
S : C_er0 x
S : x
S_lf0 : C_er0 x
S_lf0 : x
S_lf1 : C_er0 x
S_lf1 : x
>>> cfg5 = cfg4.make_left_factored_cfg()
>>> for lhs, rhs in cfg5: print str(lhs), ': ', ' '.join(str(rhs_token) for rhs_token in rhs)
A : l A_lf0_lf0
A : x A_lf2
A_lf0 : q
A_lf0 : z
A_lf0_lf0 :
A_lf0_lf0 : A_lf0_lf0_er0
A_lf0_lf0_er0 : A_lf3_er2
A_lf1 : q
A_lf1 : z
A_lf2 : q q
A_lf2 : y A_lf0
A_lf2 : z z
A_lf3_er2 : n A_lf1
B_er1 : C_er0 b
B_er1 : b
C_er0 : A
C_er0 : D_er3
C_er0 : E e
D_er3 : B_er1
E : B_er1 f
E : f
S : A S_lf0
S : B_er1 S_lf1
S : C_er0 x
S : x
S_lf0 : C_er0 x
S_lf0 : x
S_lf1 : C_er0 x
S_lf1 : x
>>> cfg.size, cfg1.size, cfg2.size, cfg3.size, cfg4.size, cfg5.size
(42, 44, 51, 55, 55, 57)
>>> cfg.num_productions, cfg1.num_productions, cfg2.num_productions, cfg3.num_productions, cfg4.num_productions, cfg5.num_productions
(17, 21, 25, 28, 28, 29)
Return a FrozenCfg that is equivalent to self but for which the non-left-recursive productions for each non-terminal are grouped into a new non-terminal; this follows Robert Moore’s non-left-recursion-grouping (NLRG) algorithm
>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('A', ('x', 'y', 'q'))
>>> b.add_production('A', ('l', 'n', 'z'))
>>> b.add_production('A', ('l', 'n', 'q'))
>>> b.add_production('A', ('l',))
>>> b.add_production('A', ('x', 'z', 'z'))
>>> b.add_production('A', ('x', 'q', 'q'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('C', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('D',))
>>> b.add_production('C', ('E', 'e'))
>>> b.add_production('D', ('B',))
>>> b.add_production('E', ('B', 'f'))
>>> b.add_production('S', ('A', 'C', 'x'))
>>> b.add_production('S', ('B', 'C', 'x'))
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l'
'A' : 'l' 'n' 'q'
'A' : 'l' 'n' 'z'
'A' : 'x' 'q' 'q'
'A' : 'x' 'y' 'q'
'A' : 'x' 'y' 'z'
'A' : 'x' 'z' 'z'
'B' :
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
>>> cfg1 = cfg.make_nlrg_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'l'
'A' : 'l' 'n' 'q'
'A' : 'l' 'n' 'z'
'A' : 'x' 'q' 'q'
'A' : 'x' 'y' 'q'
'A' : 'x' 'y' 'z'
'A' : 'x' 'z' 'z'
'B' :
'B' : 'C' 'b'
'B' : 'b'
'C' : 'C_nlg'
'C' : 'D'
'C_nlg' : 'A'
'C_nlg' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C' 'x'
'S' : 'B' 'C' 'x'
Return a FrozenCfg that is equivalent to self but which contains no epsilon productions.
>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('B', ())
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('B', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('B',))
>>> b.add_production('D', ('(', 'C', ')'))
>>> b.add_production('E', ('B', '(', 'C', ')'))
>>> b.add_production('F', ('B', '(', 'C', ')', 'C'))
>>> b.add_production('F', ('B', '(', 'C', ')', 'C', 'C'))
>>> b.add_production('G', ('B',))
>>> b.add_production('G', ('B', 'B', 'B'))
>>> b.add_production('S', ('A',))
>>> b.add_production('S', ('A', 'F'))
>>> b.add_production('S', ('A', 'C'))
>>> cfg = FrozenCfg(b)
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B' :
'B' : 'B' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'B'
'D' : '(' 'C' ')'
'E' : 'B' '(' 'C' ')'
'F' : 'B' '(' 'C' ')' 'C'
'F' : 'B' '(' 'C' ')' 'C' 'C'
'G' : 'B'
'G' : 'B' 'B' 'B'
'S' : 'A'
'S' : 'A' 'C'
'S' : 'A' 'F'
>>> cfg1 = cfg.make_no_epsilon_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B_er1' : 'B_er1' 'b'
'B_er1' : 'b'
'C_er0' : 'A'
'C_er0' : 'B_er1'
'D' : '(' ')'
'D' : '(' 'C_er0' ')'
'E' : '(' ')'
'E' : '(' 'C_er0' ')'
'E' : 'B_er1' '(' ')'
'E' : 'B_er1' '(' 'C_er0' ')'
'F' : '(' ')'
'F' : '(' ')' 'C_er0'
'F' : '(' ')' 'C_er0' 'C_er0'
'F' : '(' 'C_er0' ')' 'C_er0'
'F' : '(' 'C_er0' ')' 'C_er0' 'C_er0'
'F' : 'B_er1' '(' ')'
'F' : 'B_er1' '(' ')' 'C_er0'
'F' : 'B_er1' '(' ')' 'C_er0' 'C_er0'
'F' : 'B_er1' '(' 'C_er0' ')'
'F' : 'B_er1' '(' 'C_er0' ')' 'C_er0'
'F' : 'B_er1' '(' 'C_er0' ')' 'C_er0' 'C_er0'
'G_er2' : 'B_er1'
'G_er2' : 'B_er1' 'B_er1'
'G_er2' : 'B_er1' 'B_er1' 'B_er1'
'S' : 'A'
'S' : 'A' 'C_er0'
'S' : 'A' 'F'
Return a FrozenCfg that is equivalent to self but which contains no left-recursive production chains.
>>> b = CfgBuilder()
>>> b.add_production('A', ('x', 'y', 'z'))
>>> b.add_production('B', ('b',))
>>> b.add_production('B', ('C', 'b'))
>>> b.add_production('C', ('A',))
>>> b.add_production('C', ('D',))
>>> b.add_production('C', ('E', 'e'))
>>> b.add_production('D', ('B',))
>>> b.add_production('E', ('B', 'f'))
>>> b.add_production('S', ('A', 'C'))
>>> b.add_production('S', ('B', 'C'))
>>> cfg = FrozenCfg(b, 'S')
>>> for lhs, rhs in cfg: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'A' : 'x' 'y' 'z'
'B' : 'C' 'b'
'B' : 'b'
'C' : 'A'
'C' : 'D'
'C' : 'E' 'e'
'D' : 'B'
'E' : 'B' 'f'
'S' : 'A' 'C'
'S' : 'B' 'C'
>>> cfg1 = cfg.make_no_left_recursion_cfg()
>>> for lhs, rhs in cfg1: print repr(lhs), ':', ' '.join(repr(rhs_token) for rhs_token in rhs)
'S' :
A frozenset of the non-terminals in the grammar.
The number of productions in the grammar.
The size of the grammar.
We follow Robert Moore in calculating the size of the grammar: the size is the number of non-terminal symbols plus the sum of the lengths of the right-hand-side sequences over all the productions in the grammar. By counting each non-terminal just once, instead of once for each of its productions, this size statistic more closely tracks storage requirements of actual implementations of grammar structures. An empty right-hand-side sequence (epsilon) is counted has having length one.
A frozenset of the terminals in the grammar.
Bases: onyx.dataflow.simplecfg.CfgBuilder
A class for building word and pronunciation collections.
A LexiconBuilder can be constructed in empty form (the default), from a string, or from a FrozenLexicon object. In the string case, the string should be a collection of lines, with each line consisting of space-separated tokens. Each line represents one word/pron combination; the first token of the line is the word, the remaining tokens collectively are the pron.
>>> lb0 = LexiconBuilder(_dict0)
Add word/prons to a lexicon from a string source.
iterable should give strings of tokens tokens separated by spaces. Each string represents one word/pron combination; the first token of the string is the word, the remaining tokens collectively are the pron.
Add word/prons to a lexicon.
word should be a string and phones an iterable of strings.
The size of the grammar. We follow Robert Moore in calculating the size of the grammar: the size is the number of non-terminal symbols plus the sum of the lengths of the right-hand-side sequences over all the productions in the grammar. By counting each non-terminal just once, instead of once for each of its productions, this size statistic more closely tracks storage requirements of actual implementations of grammar structures. An empty right-hand-side sequence (epsilon) is counted has having length one.
Add a set of productions to the CFG. The lhs argument is an immutable object that is the left-hand-side (non-terminal) for each of the productions. The rhs_set argument is a possibly-empty iterable of rhs sequences. Each rhs sequence is an iterable of immutable objects (symbols) that are the sequence of non-terminals and terminals that make up the right-hand-side of the given production. An empty rhs is used to add an epsilon production. The productions for a given non-terminal are treated as a set; this means that duplicate right-hand sides are ignored. See also add_production().
>>> builder = CfgBuilder()
>>> builder.add_production('A', ('x', 'y', 'zoo'))
>>> builder.update_production('B', ((), ('b',), ('B', 'b')))
>>> builder.size
9
>>> builder.update_production('Cows', (('A',), ('B',)))
>>> builder.size
12
>>> builder.add_production('Cows', ('A',))
>>> builder.size
12
>>> builder.update_production('Cows', (('B',), ('A',), ))
>>> builder.size
12
>>> 'A' in builder, 'zoo' in builder, 'Moo' in builder
(True, True, False)