The Voynich text generator

Description of the table:

The table is a set of 168 individual elements. For convenience the table is set out in 14
rows of 12 elements. Each row contains certain items. The first item is the count of the
occurrences of the pair item throughout the sample. Below this number is the unique pair
as found in the EVA sample. There are 168 of these in the sample. For the purposes of this
table I have omitted lines with 'wierdo' characters to avoid complicating the table unduly.

Below the pair are set of 1 to 3 flags. These indicate whether the pair can start a VMS word,
be in mid word position or end a VMS word. These are S, M and X respectively. Below this are
a list of letters that are found to succeed these start pairs to form either initial, mid
or terminating triplets. These also have a set of 3 flags. These are I if the triplet can
initiate a word, C if it can continue a word or X if it can terminate a word.

Use of the table:

If we take the example of the pair 'in' we find that the pair never starts a word in the
EVA sample and so can not be used to start a word. If we now look at the pair 'ch' it has
all flags and so we can use this as the start of an example word. 

Using 'ch' we scan the list of letters in the column beneath it to find a letter with the I
flag that can form an initial triplet. There are quite a few for 'ch'. In this case let us
choose 'o'.

Now we have the sequence 'cho'. To find the next letter we now take the 'ho' from the end
and use this as a lookup to that element. We find that the flags for 'ho' show that it
cannot start a word but can be a mid word pair or a terminator. Also all the letter flags
indicate this being either C for continue word or X for terminating a word.

Lets choose 'd' for our next selection, giving us 'chold'. Now we continue with the pair
'ld'. We find that 'ld' has 3 continuing letters. In this case I will chose 'a' as it
forms the beginning of the popular word terminator 'aiin'.

Now we have 'cholda'. Looking up 'da' we find only 3 continuation letters. These are 'l',
'r' and 'i'. This would restrict the opportunities when reaching this path through the
table and would account for a high percentage of word endings through this route.
Particularly in the case of 'l' and 'r' as these are flagged as word terminating by 
the X flag. So with these we would have words ending 'dal' and 'dar'.

In this case let us choose the 'i' as indicated above. This gives use 'choldai'. Looking
up 'ai' we now find that we have 4 terminators. Two of these could provide the words 
'choldain' or 'choldair'. They could however also continue
a word if necessary.

Let us select 'i' however, as this can only continue a word here as its other flag signals
it as an initiator and not a terminator. This gives us 'choldaii'. Moving to the entry for
'ii' we find only one continuator which would append another 'i'. All other letters will
terminate only. We now have four word ending choices which would produce the words 
'choldaiim', 'choldaiin', 'choldaiir' and 'choldaiis'.

NOTE: We need not generate a triplet at all. If the current pair has the X flag it can
terminate a word without the need for selecting a letter from its list. This does not
explain single letters as words in the VMS.

The question to ask here is, what property will link one continuator to another more often
than through other routes. It is these links between continuators that I believe is the
way forward.

Now for the table itself:

 1075  818   454   450   441   415   405   403   366   353   251   240
 ch    ho    ai    ol    in    hy    ii    or    da    sh    ok    he
 SMX   MX    SM    SMX   MX    MX    SM    SMX   SMX   SMX   SMX   MX
 e ICX d CX  m X   d ICX d CX  d CX  m X   l X   l ICX e ICX o ICX o CX
 l IX  k CX  n CX  o ICX y X   s CX  n X   y X   m IX  o ICX y ICX s CX
 m X   l CX  r CX  s ICX c C   a C   r X   c IC  n IX  y IX  a IC  t CX
 o ICX m X   s X   y ICX       k C   s X   o IC  r ICX a IC  c IC  y CX
 s ICX p CX  i IC  a IC        p C   i IC  s IC  s IX  c I   e IC  a C
 y ICX r CX  c C   c IC        t C         a C   d I   d I   s IC  c C
 a IC  s CX        t IC                    d C   i IC  f I   h C   e C
 c IC  t CX        e C                           k I   k I         k C
 d IC  y X                                       y I   t I
 k IC  a C
 p I   c C
 r IC  e C
 t IC  f C
       i C
       o C


 233   229   212   208   196   194   163   162   151   149   149   136
 ot    qo    ct    th    dy    od    tc    kc    ar    ey    ha    ee
 SMX   SMX   SM    MX    SMX   SMX   SM    SM    SMX   MX    MX    SMX
 o ICX a I   h CX  d X   d ICX g X   h ICX h ICX y X   r X   d CX  e ICX
 y ICX c I   y X   l X   c IC  l IX              e I   k C   l CX  g X
 a IC  d I   c I   o CX  k I   o ICX             a C         m X   n X
 c IC  e I   h I   y CX  o I   y IX              c C         n X   o ICX
 e I   f I         a C   p I   a IC              o C         r CX  s ICX
 l I   k I         e C   t I   s IC                          i C   y ICX
 s IC  l I         r C         c C                           p C   a IC
 e C   o I                     e C                                 k C
       p I
       s I
       t IC
       y I


 121   94    87    87    63    79    78    77    76    67    62    59
 eo    ko    al    to    do    yt    ka    yk    ky    ty    ck    kh
 MX    SMX   MX    SMX   SMX   SM    SMX   SM    SMX   SMX   SM    MX
 d CX  d ICX a X   d ICX o X   y IX  i ICX y IX  d IX  d IC  h ICX o CX
 l CX  l ICX d CX  l ICX a IC  a IC  l CX  a IC  t IC  c C   y IX  y X
 m X   m X   g X   m X   d I   c IC  m X   c IC  c C   o C   c I   a C
 r CX  r ICX s X   r ICX i I   d I   n X   e IC  k C   s C   o I   e C
 s X   s IX  y X   y IX  k I   e I   r IX  o I         t C         h C
 a C   y X   c C   e IC  l I   o IC  y X   s IC
 c C   a IC  o C   i IC  r IC  s I   d C
 e C   c I         s I   t I
 k C   i IC        a C
 t C   k I         c C
       t I         k C
       o C


 59    57    53    52    46    45    42    42    41    41    41    39
 oc    ke    ta    pc    am    yd    dc    om    hc    ir    yc    ld
 SM    SM    SM    SM    MX    SMX   SM    X     M     MX    SM    MX
 h IC  g X   l ICX h ICX o X   s X   h IC        f C   d CX  h IC  g X
 k IC  o IC  m X   o X         y IC  t C         h C   o X   p C   s X
 t IC  y X   n X   y X         a IC              k C   c C         y X
       c IC  r IX              l IC              p C   i C         a C
       e IC  i IC              o I               t C               c C
       s C   o I               e C                                 o C


 39    37    36    35    35    35    34    33    33    31    30    28
 oi    ks    so    ea    oa    os    oy    cp    ph    lo    op    an
 SM    SM    SMX   M     SM    SMX   SMX   SM    M     SMX   SMX   X
 n X   h ICX c IC  l CX  l ICX h ICX k IC  h IC  a C   d ICX c IC
 i IC        d IC  m X   n X   s X   t I         e C   l ICX o IC
 r I         e I   n X   r ICX a IC  d C         h C   m X   y IC
             i I   r X   i IC  c IC              o C   r IX  a C
             k I   s X   o I   o I                     c IC
             l I   y X                                 k I
             o I   i C                                 i C
             r I
             s I
             t I
             y I


 27    26    26    23    22    21    21    20    19    18    17    17
 es    oe    sa    te    ro    hk    ry    ly    ds    ts    ls    sc
 MX    SM    SM    SMX   SMX   M     SMX   MX    SMX   SM    MX    SM
 e CX  p X   i IC  y IX  l IX  y X   c I   s C   f X   h IC  y X   h IC
 y X   e IC  r I   e IC  m X   a C         t C   h IC        a C   k I
 o C   k IC  t I   o IC  d IC  c C                           h C
       o C               k I   e C                           s C
                         t I   o C


 16    14    14    14    14    13    13    12    12    11    10    9
 ra    ek    ht    lc    po    rc    sy    cf    fh    yp    de    oo
 SM    SM    M     SM    SM    SM    SX    SM    S     SM    SM    SM
 l X   y X   y X   h IC  d X   h IC  a I   h IC  y X   c IC  y IX  r X
 m X   s I   a C   f C   l IX        k I         a C   o I   e IC  s X
 r X   a C   c C   p C   c I                     h C         o I   c IC
 i IC  c C   o C         e I                     o C               a C
       e C   s C         i I                                       i C
       h C               r IC                                      k C
       o C               t I


 8     8     8     7     7     7     7     7     6     6     5     5
 fc    py    ys    dl    en    hd    hs    rd    fo    hl    ad    et
 SM    SMX   SX    SMX   X     M     MX    M     SM    MX    MX    SM
 h IC  d IC  h IC  o IX        s X   y X   y X   l IX  o C   y X   y IX
       c C         s I         y X   e C         a I         a C   a I
                               a C   h C         c I               o I
                                                 d IC


 5     5     5     5     5     4     4     4     4     4     3     3
 im    la    of    se    yo    ay    dk    hp    lt    ps    as    dd
 MX    MX    SM    SMX   SM    MX    S     M     M     S     MX    SM
 m X   l X   c IC  d X   a I   t C   o I   a C   y X   h I   a C   o I
       r X   o C   y X   d I         s I   c C   a C               y I
       e C   y C   e I   r I         y I         c C
       i C               k C                     o C


 3     3     3     3     3     3     3     3     2     2     2     2
 dg    ec    hh    hr    is    pa    ri    ya    ak    ao    at    dt
 X     M     M     MX    X     SMX   M     SM    M     SM    SMX   SM
       h C   y X   a C         l X   n X   i IC  y X   r X   a I   c I
       t C         e C         i I                                 o I
                               d C


 2     2     2     2     2     2     2     2     2     2     2     2
 ed    eg    fy    ic    il    le    mo    nd    ny    pd    re    rs
 X     X     M     M     M     M     X     MX    X     S     MX    M
             d C   h C   s X   r X         y X         r I   s X   h C
                   d C                                 y I
                   i C
                   l C
                   n C
                   o C
                   r C


 2     2     2     2     2     2     2     1     1     1     1     1
 sk    ss    td    ye    ac    ae    ap    co    cs    di    dp    dr
 S     M     SM    S     M     M     M     S     S     SM    S     M
 a I   h C   g X   e I   h C   n X   c C   y I   h I   i I   c I   a C
 y I         a I


 1     1     1     1     1     1     1     1     1     1     1     1
 el    ep    er    fa    fs    hf    hm    id    ik    io    lg    lk
 X     X     X     S     S     M     X     M     M     M     X     M
                   c I   h I   y C         y X   a C   d C         o I


 1     1     1     1     1     1     1     1     1     1     1     1
 mm    nc    no    oh    oq    qk    rg    rl    sf    tl    yf    yr
 X     M     M     S     S     S     X     X     X     M     S     X
       h C   l X   e I   o I   o I                     y X   o I