The George Boole Fanclub.

Verbose Patterns in the Voynich Manuscript

George Boole Software articles Hardware articles Voynich manuscript               

The structure of the Voynich Manuscript has been analysed by Jorge Stolfi. His analysis has identified an internal structure which he has termed crust-mantle-core. Full details of this analysis can be found at his web site.

This document takes as its basis the work that Jorge Stolfi has produced and applies it to a technique of encipherment that aims to mimic the method found in the Voynich Manuscript. It uses what I term verbose groups to accomplish this.

The verbose groups were determined using an analysis of the text in the EVA transcription that first counted the frequency of digraphs. This produced an initial curve that had a distinct peak at the high end. The resulting pair counts are shown in the table below.

ch 1075ho 815ai 454ol 448in 438
hy 415ii 405or 400da 365sh 353
ok 245he 240th 208ot 231qo 229
ct 212ha 148dy 194od 190kc 161
tc 160ar 150ey 149ee 136eo 121
ko 93al 87to 87do 82ka 78
ky 75yt 73yk 70ty 66ck 62
kh 59ke 57oc 55ta 53pc 52
am 46om 42dc 41hc 41ir 41
yc 39yd 39oi 39ks 36ea 35
oa 35os 35ld 34so 34ph 33
cp 33oy 33op 30lo 29an 27
oe 26es 26te 23sa 22ro 22
hk 21ry 21ly 20ts 18ds 16
ls 15ra 14ek 14po 14ht 14
sc 12cf 12fh 12sy 12de 10
yp 10rc 10oo 9lc 8py 8
hd 7rd 7dl 7en 7hs 7
ys 7fc 6hl 6fo 6la 5
ad 5se 5of 5im 5et 5
hp 4ps 4ay 4pa 3ya 3
ec 3dd 3hh 3ri 3dk 3
yo 3as 3is 3ic 2ed 2
nd 2pd 2td 2le 2re 2
ye 2dg 2eg 2ak 2sk 2
il 2ao 2mo 2hr 2rs 2
ss 2at 2ny 2fa 1ac 1
id 1ae 1yf 1lg 1rg 1
oh 1di 1lk 1qk 1el 1
tl 1hm 1mm 1io 1no 1
ap 1dp 1ep 1oq 1dr 1
er 1yr 1cs 1fs 1dt 1
lt 1fy 1   

Two things become immediately apparent when studying this table. Firstly the deviation of the pairs ch and ho. As ch is thought to be a single glyph this is no surprise. The word chol is very common in the manuscript so it was assumed at this time that the group cho was not in fact a verbose group. Secondly that groups with lower occurances appear more popular than those of higher occurance. All occurances of ch were then replaced with an uppercase A and the paircounts were recalculated to determine what effect this would have. The results are shown in the table below.

Ao 524ai 454ol 450in 441da 406
ii 405or 403sh 353ho 296ok 251
ka 240Ay 235ot 233qo 229ta 216
ct 212th 208dy 196od 194hy 184
ar 153Ae 149ey 149ee 136eo 121
ko 94al 92he 92aA 88to 87
do 83yt 79yk 77ky 76ty 67
ha 63ck 62kh 59oa 58ke 57
pA 55am 47ya 43om 42sa 42
ir 41yd 45ld 39oi 39ea 37
ks 37oc 36so 36os 35oy 34
cp 33ph 33Ac 31lo 31op 30
ra 29an 28es 27oe 26te 23
ro 22ry 21ly 20Ak 19ds 19
ts 18lA 17ls 17ek 14po 14
at 13sy 13cf 12fh 12yp 11
Ad 10as 10de 10fa 9hc 9
oo 9py 8ys 8dl 7en 7
rd 7fo 6Ap 5et 5im 5
of 5se 5yo 5dk 4hk 4
lt 4ps 4dd 3dg 3hh 3
ht 3is 3ri 3dc 2dt 2
ed 2eg 2fy 2hd 2il 2
lc 2le 2mo 2nd 2ny 2
pd 2re 2rs 2sk 2ss 2
td 2ye 2co 1cs 1di 1
dp 1dr 1ec 1el 1ep 1
er 1fs 1hf 1hl 1hr 1
iA 1ic 1id 1ik 1io 1
lg 1lk 1mm 1nA 1no 1
oh 1oq 1qk 1rg 1rl 1
sc 1sf 1tl 1yc 1yf 1
yr 1    

We now have a smoother curve but still too many low occurance groups. We can immediately replace sh with a single letter and also groups such as cfh, ckh, etc. If an analysis of trigraphs is then carried out combinations such as iin and iir can also be amalgamated.

Once this has been completed we arrive at a set of verbose groups shown below. A group preceded by a dot means it can start a word, followed by a dot means it can end a word and surrounded by dots indicates it is a stand alone word. Those groups with no dots appear within words but do not start or end them.

.a 1         a 16         a. 7
             ae 1
             ai 3         ai. 1
             aii 2
             aim 1        aim. 2
             ain 3        ain. 58
             air 6        air. 21
                          ais. 2
             aiil 2
                          aiim. 2
.aiin. 7     aiin 2       aiin. 328

             aiir 1       aiir. 8
                          aiiin. 4
             aiiir 1
             ak 2

.al. 3       al 23        al. 61
.am. 4       am 2         am. 40
.an. 1                    an. 27
             ap 1
.ar. 5
.ar 2        ar 19        ar. 125
.c 2
.ch 539      ch 532       ch. 4
.cfh 10      cfh 2
.ck 3        ck 2
.ckh 30      ckh 26       ckh. 1
.cph 28      cph 5
.cth. 2
.cth 160     cth 45       cth. 1
.ct 2        ct 2
.d. 10
.d 405       d 59         d. 20
.dy. 52
.dy 14       dy 4         dy. 126

.e 5         e 149        e. 9
.ee 5        ee 102       ee. 2
.ek 1        ek 6
             eo 4         eo. 16
                          ey. 96
.f 11        f 4          f. 1
.g. 3                     g. 7
             h 6
                          in. 2
                          iin. 1
                          iiin. 1
.iiin 1
.k 129       k 199
.l 7         l 5          l. 12
             ls 1
.m. 1                     m. 2
.n. 1                     n. 7
.o. 4
.o 40        o 141        o. 89
.od. 1
.od 16       od 62        od. 22
.oe 7        oe 3
.of 1        of 2
                          oin. 2
.oir. 1
.oiin. 3                  oiin. 27
             oiir 1       oiir. 2
                          oiis. 1
                          oiiin. 2
.ok 117      ok 47        ok. 2
.ol. 14
.ol 29       ol 67        ol. 338
.om. 1                    om. 41
.op 14       op 6         op. 1
.or. 25
.or 5        or 37        or. 336
.os. 4
.os 4        os 2         os. 15
.ot. 1
.ot 110      ot 36        ot. 1
             oy 1         oy. 22
.p 37        p 25         p. 1
.q 1
.qo 228      qo 1
.r. 8
.r 13        r 3          r. 3
.s. 37
.s 71        s 16         s. 45
.sh. 2
.sh 257      sh 90        sh. 4
.so. 1
.so 7
.t 77        t 187        t. 4
.y. 25
.y 183       y 71         y. 621

This final analysis leaves us with a discreet set of verbose groups. We can now use these along with an analysis of the Voynich text to produce a lookalike encipherment. An analysis of page f1r produces a set of verbose groups. These are shown below.

d    s    ch   sh   ok   cfh  cph  p    k    os   od   a    ai   y
o    ot   ckh  cth  t    ek   eo   ch   sh   ey   dy   ee   or   r
al   ain  om   oiin oy   ol   o    an   ar   e    y    aiir aiil in
s    aiin air  oin

These groups can now be used in the attempted encipherment. The suggestion has been put forward that the plaintext may somehow have been reordered or sorted before encipherment. One method that would achieve this effect is to first strip all spaces from the plaintext and break it into groups of alphabetically ordered strings. The sample plaintext used here is from John Dee's Tuba Veneris.

vocatus sive citationes
sex spiritum sub
veneris dominio
existentium
ubi docetur
methodus perficiendi
sigillum veneris
eiusque tubam circuli
compositio nomina
propis spirituum
eorum vocatus et sigilla
cum horum praeparatione
libri consecratio
operantis ritus spirituum
valedictio cum
aliis adhuc pluribus
in opere observandis

Once the spaces have been removed they are reinserted so that the strings are split into alphabetically sorted groups.

v o c atu s s iv e cit at io n es
s ex s p ir itu msu b
v en er is do m in io
ex ist ent iu mu
bi do cetu r
m et ho du s p er fi ci en di
s i gil lu mv en er is
eiu s qu etu b am cir cu l i
co mp os it io no m in a
pr op is s p ir itu u m
eoru mv o c atu s et s i gil l a
cu m horu mpr aep ar at io n e
l i br i co ns e cr at io
op er ant is r itu s s p ir itu u m
v al e di ct io cu m
al i is adhu cp lu r i bu s
inop er eo bs erv an dis

This introduces a fair number of one letter groups but these should be masked by the verbose nature of the substitutions. We can now assemble a table using the verbose groups constructed from the analysis of page f1r. We can then use this to encipher the reconstructed plaintext. A little thought now needs to be applied to the positions of the verbose groups in this table so that it will adhere to the crust-mantle-core paradigm. Any crust groups that normally start a word will appear at the start of the alphabet. Any that normally terminate a word will appear at the end. The core groups will gravitate towards the centre of the alphabet and be immediately surrounded by the mantle groups. The resulting table is shown below.

cipher substitution table (crust mantle core)

a   b   c   d   e   f   g   h   i   l   m

d   s   ch  sh  ok  cfh cph p   k   os  od
a   ai  y   o   ot  ckh cth t   ek  eo  ch


n   o   p   q   r   s   t    u    v    x
sh  ey  dy  ee  or  r   al   ain  om   oiin
oy  ol  o   an  ar  e   y    aiir aiil in
                    s   aiin air  oin

It can be seen from this that all plaintext letters have at least two substitutions. These can be used at the encipherer's discretion to upset statistical analysis. The letters at the end of the table from s to v have an extra substitution that can be used to increase word length where too many short words are found, as long as the insertion of the dead plaintext letter is always substituted by the correct verbose group these letters can be ignored when deciphering the text. In the case of a letter t being inserted at the end of the single plaintext letter r we would get the new form rt. As can be seen from the table above the r can be substituted by either 'or' or 'ar'. However as a dead letter the only substitution for the t would be 'aiin'. If we now apply the substitutions to our raw alphabetical strings we arrive at the following encipherment.

om.ol.ch.dalain.s.r.kom.ok.chekaiin.dal.kol.sh.ots
r.okoiin.s.dy.kar.kalain.odsair.
om.otoy.okar.eks.shol.od.ksh.kol
otoiin.ksaiin.okshy.kain.odaiir
sek.shol.chotalain.or
od.okal.pol.shain.r.dy.otar.ckhek.chek.okoy.shek
s.ek.cthekos.osair.chom.okoy.otar.ks
otkain.r.eeair.okalair.s.dch.chekar.chaiir.os.ek
chol.cho.olr.kal.kol.shol.ch.ksh.d
dyar.oldy.ks.r.dy.ekar.kalain.air.ch
otolarain.choin.ol.ch.dalair.r.okaiin.r.k.cphkos
chair.ch.polarain.choar.dokdy.dar.dal.kol.sh.ok
os.ek.sar.k.chol.shr.ot.chor.dal.kol
oldy.otar.dshy.ks.ar.kalain.r.r.dy.ekar.kalain.aiir.ch
om.dos.ot.shek.chy.kol.chain.ch
deo.ek.ks.dshpain.chdy.osair.ar.ek.sain
eksholdy.okor.otol.ais.okaraiil.dsh.sheks

It should be noted here that this encipherment does not use the dead letter insertion method and so all verbose substitutions relate to actual plaintext. We can now select an individual line of plaintext with a large number of single letter groups:

l i br i co ns e cr at io

With reference to alphabetical integrity and with reference to our cipher table we can expand the plaintext groups thus:

lT iV br iT co ns eU cr at io

Here the dead letters are indicated in uppercase. Applying the encipherment we get:

osaiin.koin.sor.ekaiin.chol.she.okair.chor.dal.ekey

This now flows more smoothly and the encipherer has control of output. The idea of verbose groups shown here reflects also the thinking of Nick Pelling. The main basis for this work was the original VMS text generator proposed by Jon Grove and the excellent analysis of Jorge Stolfi. I do not propose this as a definitive solution. I am hoping that further research may give new insight into the method used in the VMS. This work should be considered an open resource for anyone interested in further investigation.

Jeff Haley
June 2005