Onyx logo

Table Of Contents

Previous topic

Welcome to the Onyx documentation!

Next topic

User Installation

This Page

Introduction to Onyx

Onyx is for doing research and development on machine learning algorithms. Onyx was orginally developed at the Human Language Technology Center of Excellence. Onyx simplifies the process of taking a computational research idea and implementing a working experiment. It uses a dataflow model of how data gets manipulated during an experiment. Using Onyx, it is straightforward to take an machine-learning idea as outlined on a whiteboard, e.g. with data flowing through arrows and algorithmic processing happening in blocks, and turn that idea into a working experimental configuration. Furthermore, it is easy to take a working configuration and either extend it or embed it into a larger experiment.

A key design feature of Onyx is very strong support for what are refered to as streaming or online machine-learning algorithms. Online alogrithms are increasingly necessary to deal with the rapidly growing volume of data that is available for use by machine-learning technology. These algorithms are distinct from more-traditional batch algorithms. One defining feature of online algorithms is that they only get to examine a given piece of data for a limited time – once they are done with the data they must let it go and cannot store it for later. Online algorithms are a rational approach (perhaps the only rational approach) to the fact that the availability of data is far outstripping the resources necessary to store the data.

Onyx is written in Python, a high-level interpreted language that is very-well suited for use in both exploratory research and for advanced technology prototyping and development. Python, and Onyx, make it very easy to build online algorithms and models. The language itself is easy to learn, and Onyx makes it easy to implement each each step in an algorithm as a simple function or a simple object. Onyx is then used to connect these algorithmic blocks into a dataflow graph. The experiment can be started, and the models and algorithmic state can be examined and changed in situ, that is, while the experiment is running.

Key Features

Features of Onyx include:

  • deep support for experiments with online machine-learning algorithms
  • a dataflow architecture supports factoring problems into small algorithmic components
  • access to the interpreter to examine live objects, models, algorithm state, etc.
  • easy to produce reliable experimental frameworks for colleagues to use
  • transparent use of multithreading
  • transparent access to grid computing (SGE environment)
  • high-performance C libraries for numerical work (Numpy and SciPy)
  • straightforward mechanisms for integrating external executables into a dataflow
  • detailed documentation with verified example-code
  • OpenSource access to all Python source code, illustrating numerous best-practices
  • tutorial examples