Onyx logo

Previous topic

onyx.htkfiles.htkaudio – Read HTK audio files into arrays

Next topic

onyx.htkfiles.htkscoring – Quick test of scoring some acoustic models with feature data.

This Page

onyx.htkfiles.mlfprocess – Event streams for HTK master label files

Event stream processors are provided at two levels. MlfProcessor provides a low-level stream, while MlfBlockProcessor provides a higher-level stream based on getting the low-level events.

class onyx.htkfiles.mlfprocess.MlfBlockEvent(record_filename, labels)

Bases: object

class onyx.htkfiles.mlfprocess.MlfBlockProcessor(label_event_handler=None, sendee=None, sending=True, bypass_types=())

Bases: onyx.dataflow.streamprocess.ProcessorBase

Generator for blocks of mlf data from a stream of Mlf processor events.

Event types sent: MlfBlockEvent

>>> files = tuple(os.path.join(module_dir, mlf_file) for mlf_file in ('f1.mlf', 'f2.mlf', 'mono.mlf'))
>>> result = []
>>> def handler0(label_evt): return label_evt.word_label
>>> mbp0 = MlfBlockProcessor(handler0, sendee=result.append)
>>> mp0 = MlfProcessor(sendee=mbp0.process)
>>> tfs = TextFileProcessor(files, sendee=mp0.process)
>>> while not tfs.finished():
...    tfs.process()
>>> len(result)
45
>>> for evt in result[:4]:
...    print evt
MlfBlock for */en_0638-A-001.rec, labels = [None, None, None]
MlfBlock for */en_0638-A-002.rec, labels = [None, None, None, None, None, None]
MlfBlock for */en_0638-A-001.rec, labels = ['right', None, None, '<sil>']
MlfBlock for */en_0638-A-002.rec, labels = ['oh', 'you', None, 'did', None, None, '<sil>']
>>> print result[4].record_filename
*/adg0_4_sr009.lab
>>> print result[4].labels[:20]
['sil', 'sh', 'ow', 'sp', 'dh', 'ax', 'sp', 'g', 'r', 'ih', 'dd', 'l', 'iy', 'z', 'sp', 'td', 'ch', 'r', 'ae', 'k']

label_event_handler is a callable which takes MlfLabelEvents; the values returned will be appended to the list for each record. If None, the label events themselves will be appended. sendee is a function to call with our output events.

dc

A debug context for this processor. This attribute is an object returned by dcheck() in the onyx.util.debugprint module, and may be used for debug output. The tag for turning on such output is available as debug_tag

debug_tag

Activate this tag to get debugging information, see onyx.util.debugprint.DebugPrint

graph

Return a graph for this processor. By default this is just a single node whose label is the label of processor; derived classes may wish to override this property.

label

Return a label for this processor. By default this is just the name of the class; derived classes may wish to override this property by providing a different label to __init__().

process(event)

Process an event from an MlfProcesser and perhaps generate an MlfBlock Event and call the sendee with it

send(result)

Internal function that pushes result into the sendee. Implementations of process() must call this to push results. To set up the sendee, (the target of the push), clients of the processor must either initialize the object with a sendee, or call set_sendee(). Processors created with a sendee of False will never send, but will not error if send is called.

sendee

The callable this processor will use to send events; see set_sendee()

sending

Whether this processor will currently send events at all; see set_sending()

set_sendee(sendee)

Clients call this to set up the callable where the processor will send its results.

set_sending(sending)

Clients call this to turn sending from a processor on or off.

static std_process_prologue(process_function)

Subclasses may use this decorater on their process function to implement the usual bypass and process semantics and to set up the debug context returned by dc().

class onyx.htkfiles.mlfprocess.MlfEventBase

Bases: object

class onyx.htkfiles.mlfprocess.MlfLabelEvent

Bases: onyx.htkfiles.mlfprocess.MlfEventBase

class onyx.htkfiles.mlfprocess.MlfProcessor(sendee=None, sending=True, bypass_types=())

Bases: onyx.dataflow.streamprocess.ProcessorBase

Generator for mlf events from a stream of character lines (as produced, e.g., from reading an mlf file) or TextFileEvents. Event types include:

Event types sent: MlfRecordStartEvent
MlfLabelEvent MlfRecordEndEvent
>>> lines = ['#!MLF!#',                 '"*/en_0638-A-001.rec"',                 '0 300000 sil 47.421978',                 '300000 1400000 lr7 301.527985',                 '1400000 2100000 sil 130.250153',                 '.',                 '"*/en_0638-A-002.rec"',                 '0 400000 sil -8.702469',                 '400000 800000 n7 14.018845',                 '800000 1100000 I 19.814428',                 '1100000 1700000 ng 103.180771',                 '1700000 2500000 d 24.743431',                 '2500000 3800000 U -52.839127',                 '.',]
>>> result = []
>>> mes = MlfProcessor(sendee=result.append)
>>> result
[]
>>> for line in lines:
...     mes.process(line)
>>> len(result)
13
>>> for evt in result:
...    print evt
Record start: */en_0638-A-001.rec
0 300000 sil 47.421978
300000 1400000 lr7 301.527985
1400000 2100000 sil 130.250153
Record end: */en_0638-A-001.rec
Record start: */en_0638-A-002.rec
0 400000 sil -8.702469
400000 800000 n7 14.018845
800000 1100000 I 19.814428
1100000 1700000 ng 103.180771
1700000 2500000 d 24.743431
2500000 3800000 U -52.839127
Record end: */en_0638-A-002.rec
>>> lines2 = ['#!MLF!#',
...           '"*/en_0638-A-001.rec"',
...           '0 300000 r 33.539131 right',
...           '300000 1500000 aI 289.424377',
...           '1500000 1800000 t 48.881046',
...           '1800000 2100000 sil 56.252335 <sil>',
...           '.',
...           '"*/en_0638-A-002.rec"',
...           '0 800000 oU -22.766861 oh',
...           '800000 1100000 y -0.764874 you',
...           '1100000 1400000 u 20.385105',
...           '1400000 1700000 d 47.047123 did',
...           '1700000 2000000 I 13.943106',
...           '2000000 2500000 d -4.789762',
...           '2500000 3800000 sil -126.102829 <sil>',
...           '.']
>>> for line in lines2:
...     mes.process(line)
>>> len(result)
28
>>> for evt in result:
...    if isinstance(evt, MlfLabelEvent) and  evt.word_label:
...        print evt
0 300000 r 33.539131 right
1800000 2100000 sil 56.252335 <sil>
0 800000 oU -22.766861 oh
800000 1100000 y -0.764874 you
1400000 1700000 d 47.047123 did
2500000 3800000 sil -126.102829 <sil>

Test an error condition by starting beyond the record_filename marker in a record. Note that just starting beyond the header string isn’t an error(!) since we’re just feeding more records under the “original” header.

>>> try: 
...     for line in lines[2:]:
...         mes.process(line)
... except MlfProcessorError, e:
...     print e
Unexpected line in stream (file UNKNOWN, line 30): 0 300000 sil 47.421978

Test use of TextFileProcessor as input

>>> files = tuple(os.path.join(module_dir, mlf_file) for mlf_file in ('f1.mlf', 'f2.mlf'))
>>> result = []
>>> mes2 = MlfProcessor(sendee=result.append)
>>> tfs = TextFileProcessor(files, sendee=mes2.process)
>>> while not tfs.finished():
...    tfs.process()
>>> for evt in result:
...    print evt
Record start: */en_0638-A-001.rec
0 300000 sil 47.421978
300000 1400000 lr7 301.527985
1400000 2100000 sil 130.250153
Record end: */en_0638-A-001.rec
Record start: */en_0638-A-002.rec
0 400000 sil -8.702469
400000 800000 n7 14.018845
800000 1100000 I 19.814428
1100000 1700000 ng 103.180771
1700000 2500000 d 24.743431
2500000 3800000 U -52.839127
Record end: */en_0638-A-002.rec
Record start: */en_0638-A-001.rec
0 300000 r 33.539131 right
300000 1500000 aI 289.424377
1500000 1800000 t 48.881046
1800000 2100000 sil 56.252335 <sil>
Record end: */en_0638-A-001.rec
Record start: */en_0638-A-002.rec
0 800000 oU -22.766861 oh
800000 1100000 y -0.764874 you
1100000 1400000 u 20.385105
1400000 1700000 d 47.047123 did
1700000 2000000 I 13.943106
2000000 2500000 d -4.789762
2500000 3800000 sil -126.102829 <sil>
Record end: */en_0638-A-002.rec
dc

A debug context for this processor. This attribute is an object returned by dcheck() in the onyx.util.debugprint module, and may be used for debug output. The tag for turning on such output is available as debug_tag

debug_tag

Activate this tag to get debugging information, see onyx.util.debugprint.DebugPrint

graph

Return a graph for this processor. By default this is just a single node whose label is the label of processor; derived classes may wish to override this property.

label

Return a label for this processor. By default this is just the name of the class; derived classes may wish to override this property by providing a different label to __init__().

make_label_event(line)

Make an event corresponding to the label represented in ‘line’.

Currently, we can deal with lines containing 1, 4, or 5 tokens, always white-space separated (which is the HTK spec). The full spec would allow other numbers of tokens, and the interpretation of the tokens used here matches the files we have at present, but the HTK spec would allow other interpretations of lines with 4 or 5 tokens. Unfortunately, there doesn’t seem to be any in-band way to disambiguate these cases.

parse_line(line)
process(event)

Process either a line of text or a TextLine event, and perhaps generate an MlfEvent and call the sendee with it

send(result)

Internal function that pushes result into the sendee. Implementations of process() must call this to push results. To set up the sendee, (the target of the push), clients of the processor must either initialize the object with a sendee, or call set_sendee(). Processors created with a sendee of False will never send, but will not error if send is called.

sendee

The callable this processor will use to send events; see set_sendee()

sending

Whether this processor will currently send events at all; see set_sending()

set_sendee(sendee)

Clients call this to set up the callable where the processor will send its results.

set_sending(sending)

Clients call this to turn sending from a processor on or off.

static std_process_prologue(process_function)

Subclasses may use this decorater on their process function to implement the usual bypass and process semantics and to set up the debug context returned by dc().

verify_state(acceptable, line, filename=None)
exception onyx.htkfiles.mlfprocess.MlfProcessorError

Bases: exceptions.StandardError

args
message
class onyx.htkfiles.mlfprocess.MlfRecordEndEvent

Bases: onyx.htkfiles.mlfprocess.MlfEventBase

class onyx.htkfiles.mlfprocess.MlfRecordStartEvent

Bases: onyx.htkfiles.mlfprocess.MlfEventBase