The state of the art of big data processing create numerous duplications of the original or source of truth data in the ascending form of meta data and transformations on that original data and or itself. This process in creating multiple levels of intermediates with dependencies one or more source of truth repositories, with perhaps some meta data being common to multiple applications at the lower tiers and more specific to singular application meta data upper tiers.
The meta data and derivatives are used in order to break the linear cost of the source of truth repositories, yet they themselves have at least linear cost to construct. In addition to the multiplicity, that source of truth must generally be retained leading to an additional management cost in keeping updates and amendments not only in the source but through all the derivative stack.
The Eureka data store is the idea that the original data need not be retained but rather recovered in part or whole from a conservative statistics set. Proliferation of meta data is generally unnecessary with the Eureka data store as it allows for general direct access. Finally, comprehension learning and advanced analysis would be advantaged with directly available precomputed statistics, random access of data and integrated hypothesis testing. In short algorithms would be able to dive deeper more quickly into large data than traditional digests that the status quo provide. Hence the Eureka data store would be inherently quicker, more resource efficient, flexible and generally applicable than the myriad of other technical choices currently employed.
The gains of the Eureka Data Store is as much due to the changes of perception of addressing the problem as specific techniques employed in Eureka. The next sections will address illusory perceptions in the standard big data architectures.