{"id":290,"date":"2017-07-11T04:37:36","date_gmt":"2017-07-11T04:37:36","guid":{"rendered":"http:\/\/eurekadata.net\/?page_id=290"},"modified":"2017-07-16T19:37:32","modified_gmt":"2017-07-16T19:37:32","slug":"order","status":"publish","type":"page","link":"https:\/\/eurekadata.net\/index.php\/introduction\/order\/","title":{"rendered":"The Illusion of Organization"},"content":{"rendered":"<p>The standard model of large data applications is about providing order or organization around discrete pieces of information. \u00a0There are many consequences to this perspective and many opportunities as the perspective is shed.<\/p>\n<p>The consequences of the discrete pieces orientation is that frequently somewhere deep down in some data store resides your datum, that datum is replicated all over the place (possibly with transformation, but here again each instance in one place), and those datum are referenced through a mechanism that reflects the storage organization more than the data stored. \u00a0As evidence that this might not be the most efficient way, consider that with all natural brains contrary mechanism mechanisms prevail. \u00a0In brains the signal is distributed, and connections form or strengthen around the signals instead of signals themselves being replicated all over.<\/p>\n<p class=\"p1\">In the standard model an emphasis is what is seen and not what is important.<span class=\"Apple-converted-space\">\u00a0 <\/span>Its as though a comprehensibly small piece of data can be scaled up to an incomprehensibly large piece of data, and that proves a false tautology!<span class=\"Apple-converted-space\">\u00a0 <\/span>An incomprehensibly large data is incomprehensible at least to our native human perspective, and that should begs the question does the current model provide some kind of order?<\/p>\n<p class=\"p1\">Let us perform a brief mental experiment. \u00a0Imagine what row 81985529216486895\u00a0looks like in a 1152921504606846975 sized table of http access log.<\/p>\n<pre>81985529216486895:1152921504606846975 xx.xxx.xxx.xxx - - [xx\/xxx\/xxxx:xx:xx:xx -xxxx] \"xxxx \/xx-xxxxx\/xxxxx-xxxx.xxx xxxx\/x.x\" xxx xxx \"xxxx:\/\/xxxxxxxxxx.xxx\/xx-xxxxx\/xxxx.xxx?xxxx=xxx&amp;xxxxx<\/pre>\n<p class=\"p1\">It is very likely that you came up with a concrete example. You are likely to do this because as a fixed or concrete example seems in your mind a better representation, but how reasonable were you in coming up with that placeholder for actual data? \u00a0Even when guided with a good amount of expertise your chance of a match is pretty much impossible. \u00a0It is really not much better off than a Monte Carlo approach of constructing a random length string with random values. \u00a0Considered also that you have answered the wrong question. \u00a0Instead of what line 81985529216486895 <em>looks like<\/em> you&#8217;ve likely answered the much more challenging question of what line\u00a081985529216486895\u00a0<em>is<\/em>. \u00a0The former question is really much more useful and comes up more often than the latter.<\/p>\n<p>As an alternative to a fixed or concrete instatnce is coming up with statistics and clearly addresses the challenge more naturally. \u00a0For example, you could have reasonably deduced that the log line in question follows a specific distribution in byte length, or field count, or average field length. \u00a0You may deduce as well that that each kind of field has its own distribution in length. \u00a0The line is split up into first order fields by a single space and subfields are themselves split up from those first order fields by spaces or other delimiters. \u00a0 All the values as they fall in the entire log, the line or any given field or sub-field in the line has particular frequencies. \u00a0Even a mild effort and minimal expertise can reveal much utility as you try this approach.<\/p>\n<p>Coming up with a concrete example is likened to living in the world of small data and imposing that perspective to the large. \u00a0The alternate approach is to apply statistics to the world of large data and the combination of big and statistics \u00a0makes a lot of intuitive sense. \u00a0So just where do statistics fall in the organization and order of large data in the standard large data methods? \u00a0Statistics to the degree that they are available are appended to the outside the\u00a0<em>source of truth<\/em>\u00a0instead of begin viewed as an integral part of the data itself.<\/p>\n<p>Where the massive size in the magnitude of data should intuitively drive us to the value of statistics, the standard methods for organizing information is more attune to the world of small data, a world where statistics cannot by virtue of small sample size offer little for order and organization. \u00a0Lets again revisit this fictitious log line\u00a081985529216486895 from the perspective of it being recalled for an application or service. \u00a0Well first consider, that baring some monitor or specialty tool its not likely that any application would utilize the whole piece of data. \u00a0Second, there is certainly a lot of data here (the other 1152921504606846974 rows) \u00a0and this one row would very unlikely (even hundreds or thousands of events roll up together) into a results that any one row would have a direct connection with a result or output even in a service even a service that is invoked billions of times each day. \u00a0Do consider that row from a statistical sense however, and this one row could effect unfathomable number of features derived from the data set. \u00a0In a sense it is likely to come up (perhaps even multiple times) as a participant of the results of each service call.<\/p>\n<p>As statistics is the natural form of the large data system then it should be very intuitive to optimize data around a statistical basis. \u00a0Support for direct access to an instance of original data is the secondary or tertiary priority of such a system. \u00a0Eureka data was designed around such a principle, consequentially it is less sequential and less beholden to the original form of the data and consequentially presents fewer of the artifacts we add with our standard repositories.<\/p>\n<p>Paradoxically, as compared with other big data systems and their of challenges of linking together of internal references to get to that a copy of the pristine representation of the original data the Eureka Data system is likely to recall a single instance of data more quickly even if it means recalling that data not eventually from one pristine spot but as a result of number of statistics and relationships.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/eurekadata.net\/index.php\/introduction\/change\/\">next<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The standard model of large data applications is about providing order or organization around discrete pieces of information. \u00a0There are many consequences to this perspective and many opportunities as the perspective is shed. The consequences of the discrete pieces orientation is that frequently somewhere deep down in some data store resides your datum, that datum &hellip; <a href=\"https:\/\/eurekadata.net\/index.php\/introduction\/order\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">The Illusion of Organization<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":238,"menu_order":1,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/pages\/290"}],"collection":[{"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/comments?post=290"}],"version-history":[{"count":9,"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/pages\/290\/revisions"}],"predecessor-version":[{"id":379,"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/pages\/290\/revisions\/379"}],"up":[{"embeddable":true,"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/pages\/238"}],"wp:attachment":[{"href":"https:\/\/eurekadata.net\/index.php\/wp-json\/wp\/v2\/media?parent=290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}