The Illusion Resource Control

The very notion of big data defines data, whatever that is by its being big whatever that is.  The abstraction of fields in records, or by the combined size or perhaps utilizing statistical distribution of the values in those fields only exacerbates the ambiguity.  Qualifying order of magnitude numbers is difficult enough as they go above the countable range, but with format or structural changes or codec the same data can vary enormously in size.

In practice big data practitioners typically resort to raw amount of total space utilized.  If this sounds at all reasonable please consider the following.  Given that I could take three hundred gallons to paint my house, one could derive drastically different conclusions.  On the one hand, this figure could indicate that I am the owner of the possibly the largest and most expensive property in my city.  On the other hand, it could indicate a terrible painter.

I have claimed and we can reasonably deduce that the meta data in any big data system is a good multiple of the resources of original data.  Of the big data technology stack one feature regularly presents itself is opportunity to tune performance or coverage against resource allocation.  This sounds reasonable on the surface as in practice decisions result in significant quantities of resources, but what if the base line cost is not very reasonable.

I examine some actual numbers in the performance section but insofar the a la cart approach used in most big data installations are inefficient in contrast to a wholistic approach of Eureka then the opportunity to pare down expensive options represents the illusion of economy.