Advanced Analytics

As Eureka is fundamentally a random access (as in near direct access among a good many dimensions) storage system with a full statistical complement then it can serve as a good basis for machine learning and can co-opt hypothesis evaluation directly from machine or deep machine learning systems.

The first, and most trivial example the simple extraction of features values.  As covered before feature in question can be directly interrogated skipping everything all other parts of the corpus except information in the neighborhood of the feature.  Additionally, the obtained features would generally be sampled in order to provide a training set a secondary training set and a testing set, or they might be sampled just in order to carefully reduce the training and/or testing to sizes to a size small enough for training or testing specifically.  Traditionally, this would have been preformed in multiple steps likely with multiple passes through the full dataset (sequential access).  With Eureka a pseudorandom selection could be achieved facilitating a direct (and deterministic) access to produce a fair sampling of an arbitrary portion of the original dataset.

Addressing hypothesis evaluation as all relevant statistics are available in the context of the query evaluation, a hypothesis such as A (+) B (+) C (+) X can be evaluated with a stack ranking of X limited by a specific confidence interval.  Optimization is in two dimensions, first we can skip the generally vast majority of non relevant information in the corpus, and then secondly by random internally sampling  of A (+) B (+) C (+) {X0, X1, X2, …} when the X set settles statistically we have completed.  Note that we have directly all frequencies precomputed for (i.e. A, B , C, and {Xy}) aiding our statistics and that the two dimensions of optimizations are independent making their efficiency multiplicative.