`onyx.signalprocessing.htkmfcc` – Implement a simple version of HTK’s Mel Cepstral front end.¶

class onyx.signalprocessing.htkmfcc.CachingDude(factory)¶

Bases: object

This object caches values it generates via the factory argument passed to the constructor. An instance is used via indexing syntax, instance[key], where key must be immutable. If the instance has seen key before, it returns the prior value that it cached for key. If it hasn’t seen key before it calls factory with key, and caches and returns the value that factory returns.

class onyx.signalprocessing.htkmfcc.Dither(scale)¶

Bases: object

Adds randomness to data samples. Constructor takes a numeric scale argument, typically positive.

The returned instance is callable, with a single Numpy array, data. On each call it generates samples from the uniform interval [-abs(scale), abs(scale)) with the same shape as data. For positive scale the randomness is reproducible; for negative scale, the randomness is not reproducible. Note that the standard deviation, or power, of these generated samples is abs(scale) / sqrt(3), approximately 0.6 * abs(scale).

The generated samples are added in-place to data. Returns the modified data array.

>>> data = np.arange(24, dtype=np.float).reshape((2,3,4))
>>> dither0 = Dither(1)
>>> x = dither0(data)
>>> x is data
True
>>> data
array([[[  0.09762701,   1.43037873,   2.20552675,   3.08976637],
        [  3.8473096 ,   5.29178823,   5.87517442,   7.783546  ],
        [  8.92732552,   8.76688304,  10.58345008,  11.05778984]],
<BLANKLINE>
       [[ 12.13608912,  13.85119328,  13.14207212,  14.1742586 ],
        [ 15.04043679,  17.66523969,  18.5563135 ,  19.7400243 ],
        [ 20.95723668,  21.59831713,  21.92295872,  23.56105835]]])

>>> y = dither0(np.zeros(5))
>>> y
array([-0.76345115,  0.27984204, -0.71329343,  0.88933783,  0.04369664])

Explicitly show reproducibilty (it’s implicit in these doctests succeeding)

>>> dither1 = Dither(1)
>>> (dither1(np.arange(24)) == data.flat).all()
True
>>> (dither1(np.zeros(5)) == y).all()
True

Non-reproducibility via a negative scale

>>> dither_1 = Dither(-1)
>>> (dither_1(np.arange(24)) == data.flat).any()
False
>>> (dither_1(np.zeros(5)) == y).any()
False

Edge case, scale of zero, no signal gets added

>>> dither2 = Dither(0)
>>> z = np.copy(data)
>>> z is data
False
>>> (dither2(z) == data).all()
True

class onyx.signalprocessing.htkmfcc.Hamming¶

Bases: object

An instance of this class can be used to do Hamming window weighting. The instance is callable with a Numpy array which it will (usually) modify in place. It performs the Hamming weighting along the final dimension and returns the resulting ndarray.

The instance caches the necessary Hamming data it uses, so there is no penalty for using the same instance to perform the weighting on differently shaped data. The module attribute, hamming, is an instance of Hamming and can be used for all Hamming-windowing work.

While Hamming window is popular, consider using the module’s kaiser58. The Hamming window does have a narrow main lobe, but it has a largest sidelobe attenuation of only 44 dB. It doesn’t achieve the 58 dB attenuation of the kaiser58 window until 18 to 20 times its main lobe width. The kaiser58 window has a main lobe that is 1.3 times the width of Hamming window’s main lobe, but its attentuation is 58 dB or better everywhere outside the main lobe.

>>> x = hamming(np.ones(10))
>>> x
array([ 0.08      ,  0.18761956,  0.46012184,  0.77      ,  0.97225861,
        0.97225861,  0.77      ,  0.46012184,  0.18761956,  0.08      ])

>>> hamming(np.array((np.ones(10), np.ones(10) * -2)))
array([[ 0.08      ,  0.18761956,  0.46012184,  0.77      ,  0.97225861,
         0.97225861,  0.77      ,  0.46012184,  0.18761956,  0.08      ],
       [-0.16      , -0.37523911, -0.92024368, -1.54      , -1.94451721,
        -1.94451721, -1.54      , -0.92024368, -0.37523911, -0.16      ]])

>>> (hamming(np.ones(10)) == x).all()
True

class onyx.signalprocessing.htkmfcc.HtkDelta(window)¶

Bases: object

HTK-style delta processing.

>>> triang = np.array(range(6) + range(6, -1, -1) + range(13)).reshape((2, -1)).T
>>> triang.T
array([[ 0,  1,  2,  3,  4,  5,  6,  5,  4,  3,  2,  1,  0],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

>>> from itertools import chain
>>> gap = None,

>>> d0 = HtkDelta(0)
>>> np.vstack(y for x in chain(triang, gap) for y in d0(x)).T
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

>>> d1 = HtkDelta(1)
>>> np.vstack(y for x in chain(triang, gap) for y in d1(x)).T
array([[ 0.5,  1. ,  1. ,  1. ,  1. ,  1. ,  0. , -1. , -1. , -1. , -1. ,
        -1. , -0.5],
       [ 0.5,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,
         1. ,  0.5]])

>>> d2 = HtkDelta(2)
>>> np.vstack(y for x in chain(triang, gap) for y in d2(x)).T
array([[ 0.5,  0.8,  1. ,  1. ,  1. ,  0.6,  0. , -0.6, -1. , -1. , -1. ,
        -0.8, -0.5],
       [ 0.5,  0.8,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,  1. ,
         0.8,  0.5]])

>>> d3 = HtkDelta(3)
>>> np.vstack(y for x in chain(triang, gap) for y in d3(x)).T
array([[ 0.5       ,  0.71428571,  0.89285714,  1.        ,  0.78571429,
         0.42857143,  0.        , -0.42857143, -0.78571429, -1.        ,
        -0.89285714, -0.71428571, -0.5       ],
       [ 0.5       ,  0.71428571,  0.89285714,  1.        ,  1.        ,
         1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.89285714,  0.71428571,  0.5       ]])
>>> d3(None)
()

>>> data = np.array((np.arange(80), np.arange(0, 160, 2)))
>>> data.shape
(2, 80)
>>> data.shape = 2, 16, 5
>>> data = np.transpose(data, (1,0,2))
>>> for buf in data: print 'b', buf; print 'd', d3(buf)
b [[0 1 2 3 4]
 [0 2 4 6 8]]
d ()
b [[ 5  6  7  8  9]
 [10 12 14 16 18]]
d ()
b [[10 11 12 13 14]
 [20 22 24 26 28]]
d ()
b [[15 16 17 18 19]
 [30 32 34 36 38]]
d (array([[ 2.5,  2.5,  2.5,  2.5,  2.5],
       [ 5. ,  5. ,  5. ,  5. ,  5. ]]),)
b [[20 21 22 23 24]
 [40 42 44 46 48]]
d (array([[ 3.57142857,  3.57142857,  3.57142857,  3.57142857,  3.57142857],
       [ 7.14285714,  7.14285714,  7.14285714,  7.14285714,  7.14285714]]),)
b [[25 26 27 28 29]
 [50 52 54 56 58]]
d (array([[ 4.46428571,  4.46428571,  4.46428571,  4.46428571,  4.46428571],
       [ 8.92857143,  8.92857143,  8.92857143,  8.92857143,  8.92857143]]),)
b [[30 31 32 33 34]
 [60 62 64 66 68]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[35 36 37 38 39]
 [70 72 74 76 78]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[40 41 42 43 44]
 [80 82 84 86 88]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[45 46 47 48 49]
 [90 92 94 96 98]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[ 50  51  52  53  54]
 [100 102 104 106 108]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[ 55  56  57  58  59]
 [110 112 114 116 118]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[ 60  61  62  63  64]
 [120 122 124 126 128]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[ 65  66  67  68  69]
 [130 132 134 136 138]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[ 70  71  72  73  74]
 [140 142 144 146 148]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)
b [[ 75  76  77  78  79]
 [150 152 154 156 158]]
d (array([[  5.,   5.,   5.,   5.,   5.],
       [ 10.,  10.,  10.,  10.,  10.]]),)

For now, we use None as the gap or EOF signifier. First None flushes the delta queue. Any others do nothing.

>>> for x in d3(None): print repr(x)
array([[ 4.46428571,  4.46428571,  4.46428571,  4.46428571,  4.46428571],
       [ 8.92857143,  8.92857143,  8.92857143,  8.92857143,  8.92857143]])
array([[ 3.57142857,  3.57142857,  3.57142857,  3.57142857,  3.57142857],
       [ 7.14285714,  7.14285714,  7.14285714,  7.14285714,  7.14285714]])
array([[ 2.5,  2.5,  2.5,  2.5,  2.5],
       [ 5. ,  5. ,  5. ,  5. ,  5. ]])
>>> d3(None)
()

Window is the one-sided length of the window for HTK-style delta processing.

onyx.signalprocessing.htkmfcc.HzToMel(hz)¶

Given the Hertz frequency, hz, return the corresponding Mel frequency. See the inverse function MelToHz().

>>> int(HzToMel(100) + 0.5)
150
>>> HzToMel(6300)
2595.0

class onyx.signalprocessing.htkmfcc.Kaiser(beta)¶

Bases: object

An instance of this class can be used to do Kaiser window weighting. The instance is callable with a Numpy array which it will (usually) modify in place. It performs the Kaiser weighting along the final dimension and returns the resulting ndarray. The constructor is called with one argument, beta, which is the value of beta to use for the discrete Kaiser window.

The instance caches the necessary Kaiser data it uses, so there is no penalty for using the same instance to perform the weighting on differently shaped data.

The module attribute, kaiser58, is an instance of Kaiser with beta=8. This beta gives first sidelobes that are -58 dB below the main lobe. This is comparable attenuation to that of the Blackman window, but with a main lobe that is approximately 10% narrower than that of the Blackman window.

>>> x = kaiser58(np.ones(10))
>>> x
array([ 0.00233883,  0.0653422 ,  0.28589032,  0.65247867,  0.95473877,
        0.95473877,  0.65247867,  0.28589032,  0.0653422 ,  0.00233883])

>>> kaiser58(np.array((np.ones(10), np.ones(10) * -2)))
array([[ 0.00233883,  0.0653422 ,  0.28589032,  0.65247867,  0.95473877,
         0.95473877,  0.65247867,  0.28589032,  0.0653422 ,  0.00233883],
       [-0.00467766, -0.1306844 , -0.57178064, -1.30495735, -1.90947755,
        -1.90947755, -1.30495735, -0.57178064, -0.1306844 , -0.00467766]])

>>> (kaiser58(np.ones(10)) == x).all()
True

class onyx.signalprocessing.htkmfcc.MelFilter(low, mid, high, normalize=False)¶

Bases: object

A mel filter. The low, mid, and high breakpoints are given in terms of integer indicies, but these can be floats with fractional values. The Mel filter is piecewise linear and (conceptually) goes through the points (low, 0), (mid, 1), (high, 0).

The Mel filter provides the positive values along those two linear segments at integer indices between low and high. These are the indicies into the vector of amplitude or energy values to which the filter will be applied. If optional constructor argument normalize is True (default False), the weights will be scaled so that their sum is one.

When an instance is called with a vector of values, it applies its weights to the part of the vector corresponding to the integer indices of the filter with positive weights, and then returns the summed value.

Here’s a simple example using integer breakpoints and an asymetrical filter

>>> m = MelFilter(10, 14, 22)
>>> m(np.ones(24))
6.0

A look inside shows the weights

>>> m._filter
array([ 0.25 ,  0.5  ,  0.75 ,  1.   ,  0.875,  0.75 ,  0.625,  0.5  ,
        0.375,  0.25 ,  0.125])

And the range of indices these weights are applied to

>>> m._select
(Ellipsis, slice(11, 22, None))

Note that the resulting filter can be used on multichannel (or multi-dimensional) data. Filtering is done on the right-most dimension. The returned data has one fewer dimensions than the input data by dropping the rightmost dimension; i.e. output.shape = input.shape[:-1]. This has implications when multiple filters are being used and you want the result of combining the outputs of these multiple filters to appear as the rightmost dimension, e.g. see the implementation of MelFilterBank.__call__() below.

>>> input = np.array([np.ones(24), 2 * np.ones(24)], dtype=np.float)
>>> input.shape
(2, 24)
>>> output = m(input)
>>> output.shape
(2,)
>>> output
array([  6.,  12.])

>>> input2 = input[np.newaxis, : ,np.newaxis, :]
>>> input2.shape
(1, 2, 1, 24)
>>> output2 = m(input2)
>>> output2.shape
(1, 2, 1)

In general, the supplied indices will have fractional parts due to both the temporal sampling rate and spectral sampling interval. For example, consider a 44,100 Hz sampled signal. Frequency components are calculated for 1,024 point buffer of real wave data (23 msec) using the FFT. The FFT returns 1,024 (complex) values in an integer-indexed array. The frequency corresponding to an index is given by (index * 44100 / 1024). If you want a Mel filter with breakpoint frequencies of 500, 560, and 625, you would use indices of 11.6, 13, and 14.5 to specify the Mel filter, (calculated simply by multiplying the frequency by the buffer-length to sample-frequency ratio, 1024/44100).

>>> m = MelFilter(11.6, 13, 14.5)

Again, looking inside we have only three weights

>>> m._filter
array([ 0.28571429,  1.        ,  0.33333333])

And the integer indices(into the 1,024-point set of FFT outputs) corresponding to those weights, 12, 13, and 14.

>>> m._select
(Ellipsis, slice(12, 15, None))

>>> m(np.ones(15)) 
1.6190476...

A similar example, but using normalization

>>> MelFilter(11.6, 13, 14.5, normalize=True)(np.ones(15))
1.0

class onyx.signalprocessing.htkmfcc.MelFilterBank(num_filters, low_hz, high_hz, sample_rate_hz, num_unit_circle_samples, inv_warp_fn=None, normalize=False)¶

Bases: object

A simple Mel filterbanking object.

Returns a filterbanking object with num_filters. The lowest-frequency filter cuts off at low_hz and the highest-frequency filter cuts off at high_hz. Filters are triangle filters, spaced linearly on the Mel axis, with 50% overlap, as per the HTK written spec.

The sample_rate_hz and num_unit_circle_samples determine the support indices of the resulting filters – for most uses num_unit_circle_samples should be the same as the length of the Discrete Fourier Transform (FFT) that’s being used.

Optional inv_warp_fn is a function that can be used to do additional warping of the Mel filters. Optional normalize (default False) can be set to True to cause the weights for each filter to sum to one.

Example shows that the higher frequency filters in this ‘telephone’ set of filters cover 4.5 times as much spectrum as the lower frequency filters

>>> fb = MelFilterBank(30, 200, 3500, 44100, 1024)
>>> res1 = fb(np.ones(513))
>>> ref = np.array([ 1.07500422,  1.12112248,  1.33344889,  1.22044253,  1.24887204,
...                 1.50392032,  1.38542223,  1.61813593,  1.54067957,  1.76965499,
...                 1.8112514 ,  1.86211777,  1.97671068,  2.07826614,  2.16557503,
...                 2.37578273,  2.34643364,  2.5892086 ,  2.62946463,  2.79979753,
...                 2.96687913,  3.11538792,  3.2500031 ,  3.40087175,  3.62335014,
...                 3.78696251,  3.95911884,  4.16730022,  4.42668438,  4.58136463], dtype=np.float)
>>> np.allclose(ref, res1)
True

Smaller filterbank, show that normalizing works

>>> fb = MelFilterBank(10, 100, 2000, 44100, 1024, normalize=True)
>>> res2 = fb(np.ones(513))
>>> ref = np.array([ 0.99999994,  1.        ,  1.        ,  0.99999988,  1.        ,
...                 1.        ,  1.        ,  1.        ,  1.        ,  0.99999988], dtype=np.float)
>>> np.allclose(ref, res2)
True

Use on multichannel data

>>> res3 = fb(np.array([np.ones(513), 2 * np.ones(513)]))
>>> ref = np.array([[ 1.        ,  1.        ,  0.99999994,  0.99999994,  1.        ,
...                  1.        ,  1.        ,  1.00000012,  1.        ,  0.99999988],
...                [ 2.        ,  2.        ,  1.99999988,  1.99999988,  2.        ,
...                  2.        ,  2.        ,  2.00000024,  2.        ,  1.99999976]], dtype=np.float)
>>> np.allclose(ref, res3)
True

Show how error messages can lead you to construct a well-formed filter bank

>>> fb = MelFilterBank(30, 200, 7999, 16000, 1024)
>>> fb = MelFilterBank(30, 200, 8000, 16000, 1024)
Traceback (most recent call last):
  ...
ValueError: expected high_hz to be less than half the sample rate, got high_hz 8000, sample_rate_hz 16000

>>> fb = MelFilterBank(-30, None, None, None, None)
Traceback (most recent call last):
  ...
ValueError: expected num_filters to be a positive number, got -30

>>> fb = MelFilterBank(30, -30, None, None, None)
Traceback (most recent call last):
  ...
ValueError: expected low_hz to be a non-negative number, got -30
>>> fb = MelFilterBank(30, 300, -30, None, None)
Traceback (most recent call last):
  ...
ValueError: expected high_hz to be a positive number, got -30
>>> fb = MelFilterBank(30, 300, 200, None, None)
Traceback (most recent call last):
  ...
ValueError: expected high_hz to be to be greater than 300, got 200
>>> fb = MelFilterBank(30, 300, 500, -30, None)
Traceback (most recent call last):
  ...
ValueError: expected sample_rate_hz to be a positive number, got -30
>>> fb = MelFilterBank(30, 300, 500, 400, None)
Traceback (most recent call last):
  ...
ValueError: expected high_hz to be less than half the sample rate, got high_hz 500, sample_rate_hz 400
>>> fb = MelFilterBank(30, 300, 500, 8000, None)
Traceback (most recent call last):
  ...
ValueError: expected num_unit_circle_samples to be a positive number, got None
>>> fb = MelFilterBank(30, 300, 500, 8000, 128) 
Traceback (most recent call last):
  ...
ValueError: too narrow a triangular band for integer samples (4..., 4.89..., 4.99...)
>>> fb = MelFilterBank(5, 300, 500, 8000, 128)

onyx.signalprocessing.htkmfcc.MelToHz(mel)¶

Given the Mel frequency, mel, return the corresponding Hertz frequency. See the inverse function HzToMel().

>>> int(MelToHz(150) + 0.5)
100
>>> MelToHz(2595)
6300.0

class onyx.signalprocessing.htkmfcc.Preemphasis(alpha)¶

Bases: object

XXX should do the preemphasis along the last dimension!

A stateful first differences preemphasis object using the alpha value provided to the constructor. The instance is callable and expects a Numpy ndarray as its argument. Implements the following along the first dimension of data, where data can be any shape, with the restriction that the first dimension needs to be two or larger on the first call following a reset():

output[n] = data[n] - alpha * data[n-1]

The data argument may be overwritten, so use data.copy() as the argument if you need to hang on to the original data. The output of each call is the same shape as each input and always dtype=numpy.float. (To implement this shape feature, the very first item in the output following a call to reset() is faked to be the same as the second item in that output.) Note that you can provide an array with a changed size of the first dimension of the data on each call to the instance; you can only change other dimensions or the dimensionality on the first call following a call to reset().

Examples:

Construct a preemphasis filter with an alpha with only 5 bits of precision that’s close to 0.97, a typical preemphasis alpha. As such we get easily reproduced floating point results on these tests.

>>> pre = Preemphasis(0x1f / (1 << 5))
>>> pre.alpha
0.96875

First example calls the instance with a shape (18,) ndarray:

>>> x = tuple(pre(np.arange(18)))
>>> x
(1.0, 1.0, 1.03125, 1.0625, 1.09375, 1.125, 1.15625, 1.1875, 1.21875, 1.25, 1.28125, 1.3125, 1.34375, 1.375, 1.40625, 1.4375, 1.46875, 1.5)

Second example resets the instance then calls it with three shape (6,) ndarrays, which, if catenated would be the same as the array in the first example. The outputs are the same, showing the stateful behavior:

>>> pre.reset()
>>> y = tuple(x for i in xrange(3) for x in pre(np.arange(6*i, 6*(i+1))))
>>> y == x
True

A multidimensional example:

>>> z1 = np.arange(6)
>>> z2 = np.column_stack((z1, z1 * -1))
>>> z2.shape
(6, 2)
>>> pre.reset()
>>> pre(z2)
array([[ 1.     , -1.     ],
       [ 1.     , -1.     ],
       [ 1.03125, -1.03125],
       [ 1.0625 , -1.0625 ],
       [ 1.09375, -1.09375],
       [ 1.125  , -1.125  ]])

XXX example using 1-d alpha array

A brief aside, hinting at floating point’s subtleties and pitfalls:

Our alpha has 5 bits (the highest order bit being implicit in the binary representation).

>>> floatutils.float_to_readable_string(pre.alpha)
'+(-0001)0xf000000000000'

Here’s, 8 bits very close to 0.98, another common alpha value for preemphasis:

>>> x = 0xfb / (1 << 8)
>>> x
0.98046875
>>> floatutils.float_to_readable_string(x)
'+(-0001)0xf600000000000'

Note that 0.97 and 0.98 are poor choices for reproducility because their floating-point approximations have full precision:

>>> floatutils.float_to_readable_string(0.97)
'+(-0001)0xf0a3d70a3d70a'
>>> floatutils.float_to_readable_string(0.98)
'+(-0001)0xf5c28f5c28f5c'

alpha¶

reset()¶

class onyx.signalprocessing.htkmfcc.Selector¶

Bases: object

Single instance, selector, is used to return the object used as an index. Helpful for capturing fancy indexing constants.

The module attribute, selector, is an instance of Selector, intended to be used for all such selecting work.

>>> selector[3:4]
slice(3, 4, None)
>>> selector[2, ..., 3:4, np.newaxis, :, :-1, ::]
(2, Ellipsis, slice(3, 4, None), None, slice(None, None, None), slice(None, -1, None), slice(None, None, None))

onyx.signalprocessing.htkmfcc.do_fake_processing(processor)¶

onyx.signalprocessing.htkmfcc.fake_normal_data(shape, seed=None)¶

Using the numpy shape, return a callable, gen, that generates random data with the given shape. If seed is given and is not None, the results from successive calls to gen are reproducible whenever the same seed is used.

>>> shape = 2, 3
>>> seed = 0
>>> g = fake_normal_data(shape, seed)
>>> g0 = g()
>>> g0
array([[ 1.76405235,  0.40015721,  0.97873798],
       [ 2.2408932 ,  1.86755799, -0.97727788]])
>>> g()
array([[ 0.95008842, -0.15135721, -0.10321885],
       [ 0.4105985 ,  0.14404357,  1.45427351]])
>>> (fake_normal_data(shape, seed)() == g0).all()
True

onyx.signalprocessing.htkmfcc.get_power_of_two_roundup(x)¶

Round up to smallest power of two that is equal to or greater than x, where 1 is considered the smallest possible power of two.

The indexable module attribute, power_of_two_roundup, caches the values.

>>> get_power_of_two_roundup(1023)
1024

>>> power_of_two_roundup[1024]
1024
>>> power_of_two_roundup[0]
1

>>> tuple(power_of_two_roundup[x] for x in xrange(34))
(1, 1, 2, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 64)
>>> tuple(power_of_two_roundup[x] for x in (31, 32, 33))
(32, 32, 64)
>>> tuple(power_of_two_roundup[x/4] for x in xrange(18))
(1, 1, 1, 1, 1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 8)

onyx.signalprocessing.htkmfcc.htkmfcc()¶

onyx.signalprocessing.htkmfcc.htkparams2ndarray(stream)¶

Parse stream, a file-like object with a read() method, for an “HTK Format Parameter File” object.

Return a tuple of (parameters, sample_period_100ns, target_kind, target_modifiers), where parameters is a two-dimensional ndarray of float values, sample_period_100ns is the sample period of the parameters in units of 100 nanoseconds, target_kind is a string giving the HTK basic target, and target_modifiers is a possibly-empty sequence of one-character strings giving the HTK target modifiers. The shape of parameters is (num_samples, sample_dim).

See also: ndarray2htkparams().

>>> array0 = np.arange(2000, dtype=np.float32).reshape((-1, 50))
>>> array0.shape
(40, 50)
>>> str(array0.dtype)
'float32'
>>> array0[0][0], array0[-1][-1]
(0.0, 1999.0)

>>> binary = ndarray2htkparams(array0, 500, 'FBANK', '0')
>>> array1, sample_rate_100ns, target, mods = htkparams2ndarray(StringIO(binary))
>>> sample_rate_100ns
500
>>> target
'FBANK'
>>> mods
['0']
>>> array1.shape
(40, 50)
>>> str(array1.dtype)
'float32'
>>> array1[0][0], array1[-1][-1]
(0.0, 1999.0)

>>> with debugprint.DebugPrint('htk_format'):
...   array1, sample_rate_100ns, target, mods = htkparams2ndarray(StringIO(binary))
htk_format: htkparams2ndarray(): header_format >iihh
htk_format: htkparams2ndarray(): header_size 12
htk_format: htkparams2ndarray(): num_samples 40
htk_format: htkparams2ndarray(): sample_period_100ns 500
htk_format: htkparams2ndarray(): sample_size 200
htk_format: htkparams2ndarray(): param_kind 020007
htk_format: htkparams2ndarray(): data_size 8000
htk_format: htkparams2ndarray(): file_size 8012
htk_format: htkparams2ndarray(): basic FBANK 020007 07
htk_format: htkparams2ndarray(): target FBANK_0

onyx.signalprocessing.htkmfcc.make_check_nan_hook(label)¶

onyx.signalprocessing.htkmfcc.make_dct(length)¶

Return a 2-D ndarray for calculating Discrete Cosine Transform coefficients on vectors with length elements.

The module attribute, dct, caches the ndarrays it’s seen, and is suitable for all DCT work.

>>> make_dct(1)
array([[ 1.41421356]])
>>> res4 = make_dct(4)
>>> ref = np.array([[ 0.70710677,  0.70710677,  0.70710677,  0.70710677],
...                [ 0.65328145,  0.27059805, -0.27059811, -0.65328151],
...                [ 0.49999997, -0.49999997, -0.49999991,  0.50000018],
...                [ 0.27059805, -0.65328145,  0.65328151, -0.27059838]], dtype=np.float)
>>> np.allclose(ref, res4)
True

Using the module’s dct attribute

>>> dct[1]
array([[ 1.41421356]])
>>> dct[1] is dct[1]
True
>>> dct[12] is dct[12]
True

onyx.signalprocessing.htkmfcc.make_fft_abs_processor(sample_nsec, frame_nsec, window_nsec, max_channels=1, channel_index=0, dither=0, preemcoef=0, zmeansource=False, usehamming=False, doublefft=False)¶

Builds a processor for the windowing and FFT part of a typical speech front end. Returns a tuple of the sample_per_sec, fft_size, and the processor.

Numpy ndarrays of real wave data should be fed in: shape = (numchannels, num_samples), where num_samples >= 2, but is otherwise unconstrained. The channel_index arguement selects which channel is used. It will push ndarrays (of length fft_size//2+1) of the magnitudes of the spectrum (from 0 Hz to sample_frequency/2 inclusive). The bin_to_hertz factor is simply sample_per_sec / fft_size.

>>> sample_nsec =    22676  # 44100 Hz
>>> frame_nsec =  10000000  # 10 msec
>>> window_nsec = 25600000  # 25.6 msec
>>> with debugprint.DebugPrint('frontend'):
...   samples_per_sec, fft_size, processor = make_fft_abs_processor(sample_nsec, frame_nsec, window_nsec,
...                                                                 zmeansource=True, usehamming=True)
frontend: sample_nsec 22676  frame_nsec 10000000  window_nsec 25600000
frontend: samples_per_sec 44100  slide_samples 441  window_samples 1129
frontend: fft_length 2048

>>> #processor.graph.dot_display(globals=('rankdir=LR',))

>>> samples_per_sec, fft_size
(44100, 2048)

onyx.signalprocessing.htkmfcc.make_htk_chain_processor(htk_specs, sendee=None, sending=True, bypass_types=())¶

Make an HTK signal processing chain.

The htk_specs argument is a dictionary of HTK signal processing configuration. sendee, sending, and bypass_types are the standard initializer arguments for a Processor.

Returns a ChainProcessor that implements the signal processing for htk_specs.

The ChainProcessor has an attrdict attribute, details, that contains details about the configuration of the chain.

>>> bypass_types = tuple, singleton.Singleton
>>> specs = dict((('ENORMALISE', False), ('SOURCEKIND', 'WAVEFORM'), ('STEREOMODE', 'LEFT'), ('SOURCERATE', 226.757), ('SAVEWITHCRC', False), ('TARGETRATE', 100000.0), ('LOFREQ', 250), ('HIFREQ', 3500), ('ADDDITHER', 0.0001)))

>>> specs['TARGETKIND'] = 'MFCC'
>>> chain00 = make_htk_chain_processor(specs, bypass_types=bypass_types)
>>> do_fake_processing(chain00)  
details:
  basic_target   MFCC
  ceplifter   22
  dither   0.0001
  do_0   False
  do_cep   True
  do_log   True
  fft_length   2048
  hifreq   3500
  lofreq   250
  max_chans   2
  nsec_per_source_sample   22676
  nsec_per_target_sample   10000000
  nsec_per_window   25600000
  numceps   12
  numchans   20
  preemcoef   0.97
  samples_per_sec   44100
  slide_samples   441
  stereoindex   0
  stereomode   LEFT
  target_mods   frozenset([])
  targetkind   MFCC
  usehamming   True
  usepower   False
  window_samples   1129
  zmeansource   False
nsec_per_source_sample 22676  nsec_per_target_sample 10000000  nsec_per_window 25600000
samples_per_sec 44100  slide_samples 441  window_samples 1129
fft_length 2048
numchans 20  lofreq 250  hifreq 3500
len(chain) 11
sample_count 100000  sample_count / slide_samples 226
len(res) 225
len(res[0]) 12  len(res[-1]) 12
res[0].min() -12.275...  res[-1].min() -13.133...
res[0].max() 5.800...  res[-1].max() 4.976...
res.T[0].min() -14.429...  res.T[-1].min() -6.989...
res.T[0].max() -9.578...  res.T[-1].max() 9.754...
elapsed time ... seconds

>>> specs['TARGETKIND'] = 'MFCC_0'
>>> chain01 = make_htk_chain_processor(specs, bypass_types=bypass_types)  
>>> do_fake_processing(chain01)  
details:
  basic_target   MFCC
  ceplifter   22
  dither   0.0001
  do_0   True
  do_cep   True
  do_log   True
  fft_length   2048
  hifreq   3500
  lofreq   250
  max_chans   2
  nsec_per_source_sample   22676
  nsec_per_target_sample   10000000
  nsec_per_window   25600000
  numceps   12
  numchans   20
  preemcoef   0.97
  samples_per_sec   44100
  slide_samples   441
  stereoindex   0
  stereomode   LEFT
  target_mods   frozenset(['0'])
  targetkind   MFCC_0
  usehamming   True
  usepower   False
  window_samples   1129
  zmeansource   False
nsec_per_source_sample 22676  nsec_per_target_sample 10000000  nsec_per_window 25600000
samples_per_sec 44100  slide_samples 441  window_samples 1129
fft_length 2048
numchans 20  lofreq 250  hifreq 3500
len(chain) 11
sample_count 100000  sample_count / slide_samples 226
len(res) 225
len(res[0]) 12  len(res[-1]) 12
res[0].min() -12.275...  res[-1].min() -13.133...
res[0].max() 215.856...  res[-1].max() 230.777...
res.T[0].min() -14.429...  res.T[-1].min() 207.305...
res.T[0].max() -9.578...  res.T[-1].max() 241.468...
elapsed time ... seconds

>>> specs['TARGETKIND'] = 'MELSPEC'
>>> chain02 = make_htk_chain_processor(specs, bypass_types=bypass_types)  
>>> do_fake_processing(chain02)  
details:
  basic_target   MELSPEC
  dither   0.0001
  do_cep   False
  do_log   False
  fft_length   2048
  hifreq   3500
  lofreq   250
  max_chans   2
  nsec_per_source_sample   22676
  nsec_per_target_sample   10000000
  nsec_per_window   25600000
  numchans   20
  preemcoef   0.97
  samples_per_sec   44100
  slide_samples   441
  stereoindex   0
  stereomode   LEFT
  target_mods   frozenset([])
  targetkind   MELSPEC
  usehamming   True
  usepower   False
  window_samples   1129
  zmeansource   False
nsec_per_source_sample 22676  nsec_per_target_sample 10000000  nsec_per_window 25600000
samples_per_sec 44100  slide_samples 441  window_samples 1129
fft_length 2048
numchans 20  lofreq 250  hifreq 3500
len(chain) 7
sample_count 100000  sample_count / slide_samples 226
len(res) 225
len(res[0]) 20  len(res[-1]) 20
res[0].min() 2.735...  res[-1].min() 2.186...
res[0].max() 98.621...  res[-1].max() 137.300...
res.T[0].min() 1.101...  res.T[-1].min() 54.749...
res.T[0].max() 8.554...  res.T[-1].max() 195.682...
elapsed time ... seconds

>>> specs['TARGETKIND'] = 'FBANK'
>>> chain03 = make_htk_chain_processor(specs, bypass_types=bypass_types)  
>>> do_fake_processing(chain03)  
details:
  basic_target   FBANK
  dither   0.0001
  do_cep   False
  do_log   True
  fft_length   2048
  hifreq   3500
  lofreq   250
  max_chans   2
  nsec_per_source_sample   22676
  nsec_per_target_sample   10000000
  nsec_per_window   25600000
  numchans   20
  preemcoef   0.97
  samples_per_sec   44100
  slide_samples   441
  stereoindex   0
  stereomode   LEFT
  target_mods   frozenset([])
  targetkind   FBANK
  usehamming   True
  usepower   False
  window_samples   1129
  zmeansource   False
nsec_per_source_sample 22676  nsec_per_target_sample 10000000  nsec_per_window 25600000
samples_per_sec 44100  slide_samples 441  window_samples 1129
fft_length 2048
numchans 20  lofreq 250  hifreq 3500
len(chain) 9
sample_count 100000  sample_count / slide_samples 226
len(res) 225
len(res[0]) 20  len(res[-1]) 20
res[0].min() 1.006...  res[-1].min() 0.782...
res[0].max() 4.591...  res[-1].max() 4.922...
res.T[0].min() 0.0965...  res.T[-1].min() 4.002...
res.T[0].max() 2.146...  res.T[-1].max() 5.276...
elapsed time ... seconds

>>> specs0 = parse_htk_specs(StringIO(_specs0_str))
>>> specs0['SOURCERATE'] = 226.757  # 44100 Hz
>>> chain0 = make_htk_chain_processor(specs0, bypass_types=bypass_types)  
>>> do_fake_processing(chain0)  
details:
  basic_target   MFCC
  ceplifter   22
  dither   0.0
  do_0   True
  do_cep   True
  do_log   True
  fft_length   2048
  hifreq   3800
  lofreq   125
  max_chans   1
  nsec_per_source_sample   22676
  nsec_per_target_sample   10000000
  nsec_per_window   25000000
  numceps   12
  numchans   24
  preemcoef   0.97
  samples_per_sec   44100
  slide_samples   441
  stereoindex   0
  stereomode   None
  target_mods   frozenset(['0'])
  targetkind   MFCC_0
  usehamming   True
  usepower   True
  window_samples   1102
  zmeansource   True
nsec_per_source_sample 22676  nsec_per_target_sample 10000000  nsec_per_window 25000000
samples_per_sec 44100  slide_samples 441  window_samples 1102
fft_length 2048
numchans 24  lofreq 125  hifreq 3800
len(chain) 12
sample_count 100000  sample_count / slide_samples 226
len(res) 225
len(res[0]) 12  len(res[-1]) 12
res[0].min() -25.695...  res[-1].min() -26.635...
res[0].max() 316.909...  res[-1].max() 345.476...
res.T[0].min() -28.808...  res.T[-1].min() 299.286...
res.T[0].max() -18.624...  res.T[-1].max() 367.724...
elapsed time ... seconds

>>> #chain0.processor.graph.dot_display(globals=('rankdir=LR',))

>>> sp1 = make_htk_chain_processor(_specs1)
Traceback (most recent call last):
   ...
ValueError: unimplemented TARGETKIND value: PLP

>>> make_htk_chain_processor(dict((('ZMEANSOURCE', True), ('Foobar', None))))
Traceback (most recent call last):
   ...
ValueError: unexpected configuration parameter: Foobar

>>> make_htk_chain_processor(dict((('ZMEANSOURCE', True), ('Foobar', None), ('Baz', True))))
Traceback (most recent call last):
   ...
ValueError: unexpected configuration parameters: Baz Foobar

>>> make_htk_chain_processor(dict((('ENORMALISE', False), ('SOURCEKIND', 'WAVEFORM'), ('SAVEWITHCRC', False),)))
Traceback (most recent call last):
   ...
ValueError: unexpected empty value for TARGETKIND parameter: None

>>> make_htk_chain_processor(dict((('ENORMALISE', False), ('SOURCEKIND', 'WAVEFORM'), ('SAVEWITHCRC', False), ('TARGETKIND', 'MFCC_D_S'))))
Traceback (most recent call last):
   ...
ValueError: unexpected TARGETKIND value: S

>>> make_htk_chain_processor(dict((('ENORMALISE', False), ('SOURCEKIND', 'WAVEFORM'), ('SAVEWITHCRC', False), ('TARGETKIND', 'MFCC_D_S_YY'))))
Traceback (most recent call last):
   ...
ValueError: unexpected TARGETKIND values: S YY

onyx.signalprocessing.htkmfcc.make_lifter(numchans, lifter)¶

Return an array of liftering factors for vectors with numchans items using lifter as the coefficient. The value of lifter can be 1.0, in which case the vector of factors is essentially all ones. Otherwise lifter is typically between 1.5 and 2.0 times numchans. It should be an integer.

The indexable module attribute, lifter, caches the vectors.

>>> np.allclose(make_lifter(10, 1), np.ones(10))
True

>>> ref = np.array([  2.5654633 ,   4.09905815,   5.5695653 ,   6.94704914,
...                  8.20346832,   9.31324577,  10.25378895,  11.00595188,
...                 11.55442238,  11.88803577,  12.        ,  11.88803577], dtype=np.float)

>>> np.allclose(make_lifter(12, 22), ref)
True

>>> np.allclose(lifter[10, 1], np.ones(10))
True
>>> np.allclose(lifter[12, 22], ref)
True

onyx.signalprocessing.htkmfcc.make_mel_specs(num_filters, low_hz, high_hz, inv_warp_fn=None, round_to_int=True)¶

Return a sequence of the break-point frequencies (in Hertz) for a Mel filterbank. Each filter is specified by three parameters, two of which are shared with the parameters for the neighboring filters, so num_filters + 2 values are returned. The filters overlap 50%, such that the low-cutoff of one filter is the center of the preceding filter, and similarly for the high-cutoff. The filters are spaced evenly on the Mel scale, so they are non-linearly spaced on the Hz scale. Note that responses are intended to be zero at both low_hz and high_hz. If optional round_to_int (default True) is True, then the returned break-point frequency values are rounded to integers; otherwise floats are returned.

Specs for a typical set of filters.

>>> num_mel = 31
>>> specs = make_mel_specs(num_mel, 200, 3500)
>>> len(specs) == num_mel + 2
True
>>> specs
(200, 244, 291, 340, 391, 445, 501, 561, 623, 688, 756, 828, 904, 983, 1066, 1153, 1244, 1340, 1441, 1546, 1657, 1773, 1895, 2023, 2158, 2299, 2446, 2602, 2764, 2935, 3114, 3303, 3500)

Show that the bandwidth is uniform in the Mel space.

>>> spec_hzs = make_mel_specs(3, 200, 3500)
>>> mels = tuple(int(HzToMel(spec_hz)+0.5) for spec_hz in spec_hzs)
>>> mels
(283, 717, 1151, 1585, 2019)
>>> mels[2] - mels[0]
868
>>> mels[2]-mels[0] == mels[3]-mels[1] == mels[4]-mels[2]
True

onyx.signalprocessing.htkmfcc.make_melcepstral_processor(samples_per_sec, fft_length, numchans, low_cutoff, high_cutoff, numceps, c0=False, ceplifter=0, inv_warp_fn=None)¶

Make and return a Mel-Cepstral processor. The resulting processor expects audio input in the form of a Numpy ndarray with dimension 2 and shape (nchannels, nsamples). It will pass any tuple or Singleton through directly.

>>> melcep = make_melcepstral_processor(samples_per_sec=44100, fft_length=2048, numchans=24, low_cutoff=300, high_cutoff=3500, numceps=12, c0=True, ceplifter=22)
>>> #melcep.graph.dot_display(globals=('rankdir=LR',))

>>> collector = list()
>>> melcep.set_sendee(collector.append)
>>> melcep.send(())
>>> melcep.send((1,2))
>>> melcep.send(('a', 'b'))
>>> collector
[(), (1, 2), ('a', 'b')]

onyx.signalprocessing.htkmfcc.make_triangle_function(left, mid, right)¶

Returns a function that calculates values of a rising-then-falling trianglular function. The rising line of the triangle goes through points (left, 0) and (mid, 1), and the falling line of the triangle goes through points (mid, 1) and (right, 0). Within the support domain, (left, right), the function is non-negative, and outside the support domain the function is negative. This formulation is useful for building some simple discrete weighting functions, e.g. overlapping triangular Mel filters.

>>> f = make_triangle_function(-2, 0, 2)
>>> tuple(f(x) for x in xrange(-3, 4)) 
(-0.5, 0.0, 0.5, 1.0, 0.5, ...0.0, -0.5)

>>> f = make_triangle_function(100.25, 141.8, 200.5)
>>> tuple((x, int(100*f(x))) for x in xrange(90, 220, 10))
((90, -24), (100, 0), (110, 23), (120, 47), (130, 71), (140, 95), (150, 86), (160, 68), (170, 51), (180, 34), (190, 17), (200, 0), (210, -16))

onyx.signalprocessing.htkmfcc.nd_io(f)¶: A function wrapper to apply our semantics about ndarray types and overwriting. These semantics are that all DSP data is float, that a called function takes ownership of the arguments, that a called function has no ownership of the value it returns. Only does the work on the first argument; the rest are just handed in to f.

onyx.signalprocessing.htkmfcc.nd_io2(f)¶: A method wrapper (has a self argument) to apply our semantics about ndarray types and overwriting. These semantics are that all DSP data is float, that a called function takes ownership of the arguments, that a called function has no ownership of the value it returns.

onyx.signalprocessing.htkmfcc.ndarray2htkparams(array, sample_period_100ns, target='USER', modifiers=(), stream=None)¶

Format the two-dimensional numpy array into “HTK Format Parameter File” format. sample_period_100ns is the sample period in units of 100 nano-seconds (the HTK standard). Optional target, default ‘USER’, is the HTK basic-parameter-kind and optional modifiers is an iterable of HTK parameter-kind-modifiers. The array must be two dimensional, shape (num_samples, sample_dim) and consist float data.

See the HTK Book for more details about the format of the HTK Format Parameter File.

If optional stream is given and is not None it should be a stream-like object with a write() method to which the bytes of the formatted HTK Format Parameter File will be written and None will be returned; otherwise, a string of the bytes of the formatted HTK file is returned.

Previous topic

Next topic

This Page

`onyx.signalprocessing.htkmfcc` – Implement a simple version of HTK’s Mel Cepstral front end.¶

Navigation

Previous topic

Next topic

This Page

Quick search

onyx.signalprocessing.htkmfcc – Implement a simple version of HTK’s Mel Cepstral front end.¶

Navigation

`onyx.signalprocessing.htkmfcc` – Implement a simple version of HTK’s Mel Cepstral front end.¶