Structural Overview

The highest level overview of Eureka has the concepts or views of your corpus at the token, the join, and the corpus level. Complementary to this is the concept of space which is elaborated on in section on tokens.

  • The Token view, contains all the information about all the tokens in your corpus
  • The Join view connects the Token View with the Corpus View and provides frequency ranking insight.
  • The Corpus view is a representation of the stream of the original text itself, the order or relative positions of the tokens.

These views as designed are irreducible, and the concepts of them are perhaps helpful if not required when constructing selections, operations and statistics calculations.  More information on what each view yields see Access Via Three Views.

For instance, it is not strictly critical but perhaps conceptually helpful to understand

  • that all intra-numeric regular expression Count(RegEx(num:’\+?3.1415[0-9]*’)) determines the count of unique tokens matching Pi to five digits occurs entirely within the Token view, or
  • That counting for occurrences of those tokens prefaced by the word ‘answer’ is a join of the Token view and the join view (eg. Count(word:’answer’ (+) space:<any> (+) RegEx(num:’\+?3.1415[0-9]*’)))
  • And finally, discovering that the frequent answer could be Pi purely by ranking a selected sequence is a full stack operation involving the Token view as a selection criteria, the join view as a sub selection filter and coefficients for confidence sampling criteria, and the Corpus view as random sampling source.