See in particular ModelManager, a base class for managers, and GmmMgr, a manager for Gaussian mixture models.
Bases: onyx.am.modelmgr.ModelManager
A container and manager for GuassianModels.
models is an iterable returning instances. Models must all have the same covariance type.
Bases: onyx.am.modelmgr.ModelManager
A container and manager for underlying Gmms.
E.g., Hmm models use this class to hold the models in each of their states. Models must all have the same covariance type. Accumulation and adaptation is handled here.
>>> dim = 12
>>> covar_type = gaussian.GaussianModelBase.DIAGONAL_COVARIANCE
>>> num_comps = 3
>>> gmm0 = gaussian.GaussianMixtureModel(dim, covar_type, num_comps)
>>> gmm0.init_models()
>>> gmm1 = gmm0.copy()
GmmMgrs can be created empty:
>>> gmm_mgr0 = GmmMgr(dim)
>>> print gmm_mgr0.to_string()
GmmMgr with no models (dimension 12)
Or with an iterable of models:
>>> gmm_mgr1 = GmmMgr((gmm0, gmm1))
>>> print gmm_mgr1.to_string()
GmmMgr with 2 models of type <class 'onyx.am.gaussian.GaussianMixtureModel'>
(dimension 12, covariance_type onyx.am.gaussian.GaussianModelBase.DIAGONAL_COVARIANCE)
Or from another GmmMgr:
>>> gmm_mgr2 = GmmMgr(gmm_mgr1)
>>> print gmm_mgr2.to_string()
GmmMgr with 2 models of type <class 'onyx.am.gaussian.GaussianMixtureModel'>
(dimension 12, covariance_type onyx.am.gaussian.GaussianModelBase.DIAGONAL_COVARIANCE)
Or from a dict with an iterable of component numbers and other specifications:
>>> num_comp_tuple = (2, 3, 2, 4)
>>> init_dict = dict(dimension=dim, covar_type=covar_type, num_comps=num_comp_tuple)
>>> gmm_mgr3 = GmmMgr(init_dict)
>>> print gmm_mgr3.to_string()
GmmMgr with 4 models of type <class 'onyx.am.gaussian.GaussianMixtureModel'>
(dimension 12, covariance_type onyx.am.gaussian.GaussianModelBase.DIAGONAL_COVARIANCE)
The other specifications may include primers and/or covariance bounds:
>>> m0 = gaussian.GaussianMixtureModel(dim, covar_type, 1)
>>> m0.set_means(xrange(dim))
>>> m0.set_vars([1.0] * dim)
>>> m1 = gaussian.GaussianMixtureModel(dim, covar_type, 1)
>>> m1.set_means(xrange(dim, 0, -1))
>>> m1.set_vars([1.0] * dim)
>>> primers = (m0, m1, m1, m0)
>>> covar_bounds = (1.0E-10, 100.0)
>>> init_dict['primers'] = primers
>>> init_dict['covar_bounds'] = covar_bounds
>>> gmm_mgr4 = GmmMgr(init_dict)
>>> print gmm_mgr4.to_string(full=True)
GmmMgr with 4 models of type <class 'onyx.am.gaussian.GaussianMixtureModel'>
(dimension 12, covariance_type onyx.am.gaussian.GaussianModelBase.DIAGONAL_COVARIANCE)
Initialization can take four forms, each taking a single argument. First, one int argument constructs an empty GmmMgr for models with the given dimension (number of features). Second, another GmmMgr can be passed in, in which case the new GmmMgr is a deep copy of the argument. Third, an iterable of models can be passed in, in which case the new GmmMgr will have those models, in the order iterated. In this case, the iterable should return instances of either GaussianMixtureModel or DummyModel instances, and models must all have the same covariance type. Finally, a dictionary may be provided with keys num_comps, dimension, covar_type, and optionally, priming and covar_bounds, where num_comps is an iterable of the number of components for each model, dimension is a positive integer, and covar_type is either GaussianMixtureModel.DIAGONAL_COVARIANCE or GaussianMixtureModel.FULL_COVARIANCE. New GaussianMixtureModel instances will be created for each element returned by num_comps. priming, if it is provided, is an iterable of GaussianMixtureModel which will be used to initialize all the components of each model, so priming must be as long as num_comps and the priming models should have the same covariance type as covar_type. covar_bounds, if is it provided, is a pair of numbers (min, max) which will be used to bound the covariance of each model. See GaussianMixtureModel.
Accumulate for a sequence of datapoints. mi is a model index; there must be an accumulator set up for it (see ensure_accumulators()). seq is a Numpy array of observations with shape (dimension, N) where N is the number of frames. comp_scores is a Numpy array of shape (num_components, N) containing the component scores for the given model. If gamma is not None, it must be a Numpy array of shape (N+1,) containing weights; this argument is intended for use by Baum-Welch algorithm implementations. This call can only be made when the adaptation state is ACCUMULATING.
Add some models to a manager.
models should be an iterable of either GaussianMixtureModels or DummyModels, which must all have the same dimension and covariance type.
Apply all active accumulators. This call can only be made when the adaptation state is APPLYING.
Clear all existing accumulators. This call can only be made when the adaptation state is INITIALIZING.
Dump the accumulator dictionary from this GmmMgr into fh, which should be a file-like object open for writing.
Make sure there are accumulators for the given models. If they need to be created, they will also be cleared, but this call will have no effect on accumulators that already exist.
Get the value of the normalizing accumulator for a model. This call can only be made when the adaptation state is APPLYING.
Load and return the accumulator dictionary stored in fh, which should be a file-like object open for reading.
Merge another accumulation dict into the one in this instance. other_dict should be a mapping from model indices to GmmAccumSet instances. For each model, the covariance type, number of components, and dimension must match. This call can only be made when the adaptation state is ACCUMULATING.
Bases: object
Abstract base class for model managers of various levels.
This class supports storing a collection of models as a tuple with indexed access, as well as transition through a series of states as part of the adaptation process. A state machine enforces the cycle of initialization, accumulation, and adaptation. The intent is that a single high-level client signals progress in this cycle via explicit calls to set_adaptation_state, while multiple lower-level clients each request initialization, accumulation and adaptation of the models they are using. It is expected that multiple low-level clients will share the same models, so more than one client may be making such requests for a given model.
Read a GmmMgr from a file. file should be a file-like object opened for reading on a pickle file. Returns a new GmmMgr.
For tests and examples, see write_gmm_mgr().
Score data_array selectively according to selector. data_seq should be a Numpy array of features. selector should be a dictionary mapping frame indices to iterables of model indices. On a given frame, all the models indicated will be scored. If a frame index is missing as a key or maps to an empty iterable, no models will be scored for that frame. Note that xrange(model_mgr.num_models) will provide an iterable that will score all models. Returns a dictionary which maps frame indicies to an inner dictionary which maps model indicies to scores. If include_intruders is True, score every model on every frame and include in the result any models and their scores which score better than the worst score in the selector. Note that this might mean a lot more scoring is done.
This function is meant for diagnostic purposes; no attempt is made to make it particularly efficient.
>>> dimension = 2
>>> num_components = 3
>>> generator = gaussian.make_gmmA(dimension, num_components, mean_base=3.14)
>>> gmm0 = gaussian.make_gmmA(dimension, num_components, mean_base=2.6)
>>> gmm1 = gaussian.make_gmmA(dimension, num_components, mean_base=2.7)
>>> gmm2 = gaussian.make_gmmA(dimension, num_components, mean_base=2.8)
>>> gmm3 = gaussian.make_gmmA(dimension, num_components, mean_base=2.9)
>>> gmm4 = gaussian.make_gmmA(dimension, num_components, mean_base=3.0)
>>> gmm5 = gaussian.make_gmmA(dimension, num_components, mean_base=3.1)
>>> mm1 = GmmMgr((gmm0, gmm1, gmm2, gmm3, gmm4, gmm5))
>>> gaussian.GaussianMixtureModel.seed(0)
>>> num_obs = 80
>>> data = [generator.sample() for i in xrange(num_obs)]
>>> arr_data = np.rollaxis(np.array(data), 1)
>>> selector = {0: (1, 2), 13: (0, 2), 42: xrange(mm1.num_models), 43: (0, 1, 2), 79: ()}
>>> result = score_selected_models(mm1, arr_data, selector)
>>> len(result)
5
>>> result.keys()
[0, 42, 43, 13, 79]
>>> def dump_frame_dict(d):
... for (m, s) in d.items(): print(m, floatutils.float_to_readable_string(s))
>>> dump_frame_dict(result[42])
(0, '+(-0005)0x41f6d7f93264e')
(1, '+(-0005)0x48f5b86f74169')
(2, '+(-0005)0x49d28f711f922')
(3, '+(-0005)0x44471119375fa')
(4, '+(-0005)0x388f5e572b910')
(5, '+(-0005)0x2750b42d3ddee')
>>> dump_frame_dict(result[79])
>>> dump_frame_dict(result[0])
(1, '+(-0008)0x627d4e8903f43')
(2, '+(-0007)0x5ce5961608061')
Illustrating the use of include_intruders - even though we only ask for one score on frame 42, we get the others because they are better than the rank-0 (top) score
>>> selector = {42: (2, 3, 4),}
>>> with DebugPrint("modelmgr:score_selected_models"):
... result = score_selected_models(mm1, arr_data, selector, include_intruders_rank=0)
modelmgr:score_selected_models: For frame 42
modelmgr:score_selected_models: selected score threshold is 0.0402615357971 (from model 2)
modelmgr:score_selected_models: selected count is 3, total count is 3, intruder count is 0
>>> dump_frame_dict(result[42])
(2, '+(-0005)0x49d28f711f922')
(3, '+(-0005)0x44471119375fa')
(4, '+(-0005)0x388f5e572b910')
Write gmm_mgr to file. file should be open for writing. The format is a Python pickle.
>>> f = cStringIO.StringIO()
>>> dim = 2
>>> num_components = 3
>>> weights = np.array((0.25, 0.5, 0.25), dtype=float)
>>> mu = np.array(((1, 1), (2, 2), (3, 3)), dtype=float)
>>> v = np.array(((1, 1), (1, 1), (1, 1)), dtype=float)
>>> gmm0 = gaussian.GaussianMixtureModel(dim, gaussian.GaussianMixtureModel.DIAGONAL_COVARIANCE, num_components)
>>> gmm0.set_weights(weights)
>>> gmm0.set_means(mu)
>>> gmm0.set_vars(v)
>>> gmm0.set_relevances((10.0, 10.0, 10.0))
>>> gmm1 = gaussian.GaussianMixtureModel(dim, gaussian.GaussianMixtureModel.DIAGONAL_COVARIANCE, num_components)
>>> gmm1.set_weights(weights)
>>> gmm1.set_means(mu)
>>> gmm1.set_vars(v)
>>> gmm_mgr0 = GmmMgr((gmm0, gmm1))
Round-trip test
>>> with onyx.util.opentemp('wb', suffix='_gmm.pickle', prefix='onyx_modelmgr_test_') as (filename, outfile):
... write_gmm_mgr(gmm_mgr0, outfile)
>>> with open(filename, 'rb') as infile:
... gmm_mgr2 = read_gmm_mgr(infile)
>>> os.remove(filename)
>>> gmm_mgr0 == gmm_mgr2
True