Onyx logo

Previous topic

Acoustic Modeling

Next topic

onyx.am.hmm – Hidden Markov Models

This Page

onyx.am.gaussian – Simple Gaussian models and mixtures

This module provides the classes GaussianModelBase, DummyModel, and GaussianMixtureModel.

class onyx.am.gaussian.DummyModel(dimension, value=0.0)

Bases: onyx.am.gaussian.GaussianModelBase

DummyModel - return a constant score

>>> dm = DummyModel(3)
>>> dm2 = DummyModel(3)
>>> dm == dm2
True
>>> x = np.array([0, 0, 0])
>>> dm.score(x)
0.0
>>> dm.set_value(3.141)
>>> dm == dm2
False
>>> dm.score(x)
3.141
>>> dm.score_components(x)
array([ 3.141])
add_adaptation_data(data)
begin_adapting()
copy()
covariance_type
dimension
end_adapting()
num_components
score(x)
score_components(x)

Return a numpy vector of weight * likelihood products

static seed(seed=None)

Use a seed other than None for reproducible randomness in the sample() function.

set_means(m)
set_value(v)
set_vars(v)
set_weights(w)
class onyx.am.gaussian.GaussianMixtureModel(dimension, covariance_type, num_components, covar_range=None)

Bases: onyx.am.gaussian.GaussianModelBase

Gaussian mixture models

>>> m0 = GaussianMixtureModel(3, GaussianModelBase.DIAGONAL_COVARIANCE, 2)
>>> m1 = GaussianMixtureModel(3, GaussianModelBase.DIAGONAL_COVARIANCE, 2)
>>> m0.set_weights(np.array((0.75, 0.25)))
>>> m1.set_weights(np.array((0.75, 0.25)))
>>> mu = np.array(((1, 1, 1), (3, 3, 3)))
>>> m0.set_means(mu)
>>> m1.set_means(mu)
>>> v = np.array(((1, 1, 1), (1, 1, 1)))
>>> m0.set_vars(v)
>>> m1.set_vars(v)
>>> print m0
Gmm: (Type = diagonal, Dim = 3, NumComps = 2)
Weights   Models
 0.7500       Means: 1.0000 1.0000 1.0000    Vars: 1.0000 1.0000 1.0000
 0.2500       Means: 3.0000 3.0000 3.0000    Vars: 1.0000 1.0000 1.0000
>>> print m1
Gmm: (Type = diagonal, Dim = 3, NumComps = 2)
Weights   Models
 0.7500       Means: 1.0000 1.0000 1.0000    Vars: 1.0000 1.0000 1.0000
 0.2500       Means: 3.0000 3.0000 3.0000    Vars: 1.0000 1.0000 1.0000
>>> m0 == m1
True
>>> m2 = m0.copy()
>>> m0 == m2
True
>>> s0 = m0.score([0,0,0])
>>> float_to_readable_string(s0)    
'+(-0007)0x5c2d69462ba21'
>>> s1 = m0.score([1,1,1])
>>> float_to_readable_string(s1)    
'+(-0005)0x866d5e87388e3'
>>> s2 = m0.score([2,2,2])
>>> float_to_readable_string(s2)    
'+(-0007)0xd03c4e0dff270'
>>> comp_scores = m0.get_likelihoods([2,2,2])
>>> [float_to_readable_string(x) for x in comp_scores]
['+(-0007)0xd03c4e0dff270', '+(-0007)0xd03c4e0dff270']
>>> weighted_scores = comp_scores * m0.weights
>>> [float_to_readable_string(x) for x in weighted_scores]
['+(-0007)0x5c2d3a8a7f5d4', '+(-0009)0xd03c4e0dff270']
>>> comps = m0.score_components([2,2,2])
>>> [float_to_readable_string(x) for x in comps]
['+(-0007)0x5c2d3a8a7f5d4', '+(-0009)0xd03c4e0dff270']
>>> comps.sum() == s2
True
>>> seq = np.array(([0,0,0],[1,1,1],[2,2,2],[3,3,3]))
>>> seq.shape
(4, 3)
>>> seq = np.rollaxis(seq, 1)
>>> seq.shape
(3, 4)
>>> comp_scores2 = m0.get_likelihoods_for_sequence(seq)
>>> comp_scores2.shape
(2, 4)
>>> (comp_scores2[:,2] == comp_scores).all()
True
>>> comp_scores3 = m0.score_components_for_sequence(seq)
>>> comp_scores3.shape
(2, 4)
>>> (comp_scores3[:,2] == comps).all()
True
>>> s3 = m0.get_log_score_for_sequence(np.array(([0,0,0],[1,1,1])).transpose())
>>> float_to_readable_string(s3)
'-(+0002)0xe5a488d2645a4'
>>> s3 == np.log(s0) + np.log(s1)
True
>>> s3 = m0.get_log_score_for_sequence(np.array(([0,0,0],[100,100,100])).transpose())
>>> float_to_readable_string(s3)
'-(+0009)0x5c92619f2f2f1'

Here’s an example with priming of the means and variances. A 1-component GaussianMixtureModel should be provided; this is used for all components.

>>> mu_primer = np.array((10, 20, 1000))
>>> var_primer = np.array((1, 22, 10000))
>>> GaussianModelBase.seed(0)
>>> primer = GaussianMixtureModel(3, GaussianModelBase.DIAGONAL_COVARIANCE, 1)
>>> primer.set_model(mu_primer, var_primer)
>>> m3 = GaussianMixtureModel(3, GaussianModelBase.DIAGONAL_COVARIANCE, 2)
>>> m3.init_models(primer)
>>> print m3
Gmm: (Type = diagonal, Dim = 3, NumComps = 2)
Weights   Models
 0.5000       Means: 9.9081 20.0762 1034.9414    Vars: 1.0000 22.0000 10000.0000
 0.5000       Means: 9.9518 23.3150 923.3682    Vars: 1.0000 22.0000 10000.0000
>>> m4 = m3.copy()
>>> print m4
Gmm: (Type = diagonal, Dim = 3, NumComps = 2)
Weights   Models
 0.5000       Means: 9.9081 20.0762 1034.9414    Vars: 1.0000 22.0000 10000.0000
 0.5000       Means: 9.9518 23.3150 923.3682    Vars: 1.0000 22.0000 10000.0000

Test whether invalid mixture weights are prohibited.

>>> bad_weights1 = np.array([0.2, -0.8])
>>> m4.set_weights(bad_weights1)
Traceback (most recent call last):
...
ValueError: Bad argument to set_weights: expected all non-negative values, but got [ 0.2 -0.8]
>>> bad_weights2 = np.array([0.0, 0.0])
>>> m4.set_weights(bad_weights2)
Traceback (most recent call last):
...
ValueError: Bad argument to set_weights: expected positive weight sum, but got 0.0

Test whether weights behave the same after linear scaling w.r.t. interpolation.

>>> m5 = GaussianMixtureModel(3, GaussianModelBase.FULL_COVARIANCE, 2)
>>> m6 = GaussianMixtureModel(3, GaussianModelBase.FULL_COVARIANCE, 2)
>>> weights1 = np.array([0.25, 0.75])
>>> weights2 = np.array([0.75, 0.25])
>>> scaled1 = weights1 * 10
>>> scaled2 = weights2 * 10
>>> m5.set_weights(weights1)
>>> m5.set_weights(weights2, 0.5)
>>> m5.weights
array([ 0.5,  0.5])
>>> m6.set_weights(scaled1)
>>> m6.set_weights(scaled2, 0.5)
>>> (m5.weights == m6.weights).all()
True
>>> (np.array(m6.covar_range) == np.array((2.0E-20, 2.0E+20))).all()
True
add_adaptation_data(data)
begin_adapting()
copy()

Return a deep copy of this model

covar_range
covariance_type
static dcScoring()
dimension
end_adapting()
get_likelihoods(x)

Return an array of likelihoods, one for each component, for datapoint x

get_likelihoods_for_sequence(seq)

seq should be a Numpy array of datapoints with shape (dim, N) where N is the length of the sequence. Return a 2-d array of likelihoods, one for each component in the model and each datapoint in seq

get_log_score_for_sequence(seq)

Get the log likelihood for the points in a data iterable. The model assumption is that the points can be treated independently, so it is sufficient to sum their log-likelihoods.

init_models(primer=None, mode='random')
num_components
relevances
sample()

Randomly sample from a GMM. A component is chosen with p(i) = w_i, then sampled.

>>> gmm = GaussianMixtureModel(2, GaussianModelBase.DIAGONAL_COVARIANCE, 2)
>>> gmm.set_means(np.array([[1.0, -1.0], [-1.0, 1.0]]))
>>> gmm.set_vars(np.array([[1.0, 1.0], [1.0, 1.0]]))
>>> gmm.set_weights(np.array([0.6, 0.4]))
>>> gmm.seed(0)
>>> gmm.sample()
array([-0.23626814,  0.15374746])
>>> gmm.sample()
array([ 1.69882769, -1.09636785])
>>> gmm.sample()
array([-0.98880416,  2.14990452])
>>> gmm = GaussianMixtureModel(2, GaussianModelBase.FULL_COVARIANCE, 2)
>>> gmm.set_means(np.array([[1.0, -1.0], [-1.0, 1.0]]))
>>> gmm.set_vars(np.array([[[1.0, 0.3], [0.3, 1.0]], [[1.0, -0.1], [-0.1, 1.0]]]))
>>> gmm.set_weights(np.array([0.6, 0.4]))
>>> gmm.seed(0)
>>> gmm.sample()
array([-0.23626814,  0.08161617])
>>> gmm.sample()
array([ 1.69882769, -0.88228076])
>>> gmm.sample()
array([-0.98880416,  2.14302097])
score(x)

Score the GMM for datapoint x. See also score_components() and get_likelihoods()

score_components(x)

Return a numpy vector of weight * likelihood products for datapoint x.

score_components_for_sequence(seq)

seq should be a Numpy array of datapoints with shape (dim, N) where N is the length of the sequence. Return a 2-d array of weight * likelihood products, one for each component in the model and each datapoint in seq

score_sequence(seq)

Score the GMM for sequence of datapoints seq. seq should be a Numpy array of datapoints with shape (dim, N) where N is the length of the sequence. Return a 1-d array of total scores for each datapoint in seq. See also score(), score_components_for_sequence() and get_likelihoods_for_sequence()

static seed(seed=None)

Use a seed other than None for reproducible randomness in the sample() function.

set_means(m, rel_factors=None)

m is an np array of mean vectors with shape (num_components, dimension). Where this object only has one component, m can be any reasonable point.

set_model(m=None, v=None)
set_relevances(values)

Set relevances for adaptation - see adapt()

values must be a tuple of three non-negative numbers, the first is used for weights, the second for means, and the third for variances.

set_vars(v, rel_factors=None)

v is an np array of (co)variance/matrices with shape (num_components, dimension) or (num_components, dimension, dimension). Where this object only has one component and diagonal covariance, v can be any reasonable point. Where this object only has one component and full covariance, v can be a matrix with shape (dimension, dimension).

set_weights(w, rel_factors=None)
setup_for_scoring()
weights
class onyx.am.gaussian.GaussianModelBase(dimension, covariance_type)

Bases: object

>>> m = GaussianModelBase(4, GaussianModelBase.DUMMY_COVARIANCE)
>>> try:
...    m.dimension = 5
... except AttributeError:
...    print "OK, dimension not settable"
... else:
...    print "Problem! dimension was settable"
OK, dimension not settable
>>> try:
...    m.covariance_type = GaussianModelBase.FULL_COVARIANCE
... except AttributeError:
...    print "OK, covariance_type not settable"
... else:
...    print "Problem! covariance_type was settable"
OK, covariance_type not settable
add_adaptation_data(data)
begin_adapting()
covariance_type
dimension
end_adapting()
static seed(seed=None)

Use a seed other than None for reproducible randomness in the sample() function.

class onyx.am.gaussian.GmmAccumSet(num_comps, dim, covariance_type)

Bases: object

Internal class used for accumulation for GMM training - you shouldn’t be making these yourself.

>>> g0 = GmmAccumSet(10, 33, GaussianModelBase.DIAGONAL_COVARIANCE)
accum_one_frame(frame, comp_scores, gamma)
accum_sequence(seq, comp_scores, gammas)
apply_accum()

Compute weights, means, and vars from accum set.

clear()
dimension
merge_accum_set(other)

Merge another accum set into this one

num_components
num_frames_accumulated
onyx.am.gaussian.make_gmmA(dimension, num_components, mean_base=1.5, covar=onyx.util.singleton.Singleton('onyx.am.gaussian.GaussianModelBase.DIAGONAL_COVARIANCE'))

Make a GMM with the given dimension and number of components. The means of the components will be set, in all dimensions, at (mean_base * i) for the (1-based) i’th component.