Verbose Patterns in the Voynich Manuscript
The structure of the Voynich Manuscript has been analysed
by Jorge Stolfi. His analysis has identified an internal
structure which he has termed crust-mantle-core. Full details
of this analysis can be found at his web site.
This document takes as its basis the work that Jorge Stolfi
has produced and applies it to a technique of encipherment
that aims to mimic the method found in the Voynich Manuscript.
It uses what I term verbose groups to accomplish this.
The verbose groups were determined using an analysis of the
text in the EVA transcription that first counted the frequency
of digraphs. This produced an initial curve that had a distinct
peak at the high end. The resulting pair counts are shown in the
table below.
| ch 1075 | ho 815 | ai 454 | ol 448 | in 438 |
| hy 415 | ii 405 | or 400 | da 365 | sh 353 |
| ok 245 | he 240 | th 208 | ot 231 | qo 229 |
| ct 212 | ha 148 | dy 194 | od 190 | kc 161 |
| tc 160 | ar 150 | ey 149 | ee 136 | eo 121 |
| ko 93 | al 87 | to 87 | do 82 | ka 78 |
| ky 75 | yt 73 | yk 70 | ty 66 | ck 62 |
| kh 59 | ke 57 | oc 55 | ta 53 | pc 52 |
| am 46 | om 42 | dc 41 | hc 41 | ir 41 |
| yc 39 | yd 39 | oi 39 | ks 36 | ea 35 |
| oa 35 | os 35 | ld 34 | so 34 | ph 33 |
| cp 33 | oy 33 | op 30 | lo 29 | an 27 |
| oe 26 | es 26 | te 23 | sa 22 | ro 22 |
| hk 21 | ry 21 | ly 20 | ts 18 | ds 16 |
| ls 15 | ra 14 | ek 14 | po 14 | ht 14 |
| sc 12 | cf 12 | fh 12 | sy 12 | de 10 |
| yp 10 | rc 10 | oo 9 | lc 8 | py 8 |
| hd 7 | rd 7 | dl 7 | en 7 | hs 7 |
| ys 7 | fc 6 | hl 6 | fo 6 | la 5 |
| ad 5 | se 5 | of 5 | im 5 | et 5 |
| hp 4 | ps 4 | ay 4 | pa 3 | ya 3 |
| ec 3 | dd 3 | hh 3 | ri 3 | dk 3 |
| yo 3 | as 3 | is 3 | ic 2 | ed 2 |
| nd 2 | pd 2 | td 2 | le 2 | re 2 |
| ye 2 | dg 2 | eg 2 | ak 2 | sk 2 |
| il 2 | ao 2 | mo 2 | hr 2 | rs 2 |
| ss 2 | at 2 | ny 2 | fa 1 | ac 1 |
| id 1 | ae 1 | yf 1 | lg 1 | rg 1 |
| oh 1 | di 1 | lk 1 | qk 1 | el 1 |
| tl 1 | hm 1 | mm 1 | io 1 | no 1 |
| ap 1 | dp 1 | ep 1 | oq 1 | dr 1 |
| er 1 | yr 1 | cs 1 | fs 1 | dt 1 |
| lt 1 | fy 1 |
Two things become immediately apparent when studying this table. Firstly the deviation of the pairs ch and ho. As ch is thought to be a single glyph this is no surprise. The word chol is very common in the manuscript so it was assumed at this time that the group cho was not in fact a verbose group. Secondly that groups with lower occurances appear more popular than those of higher occurance. All occurances of ch were then replaced with an uppercase A and the paircounts were recalculated to determine what effect this would have. The results are shown in the table below.
| Ao 524 | ai 454 | ol 450 | in 441 | da 406 |
| ii 405 | or 403 | sh 353 | ho 296 | ok 251 |
| ka 240 | Ay 235 | ot 233 | qo 229 | ta 216 |
| ct 212 | th 208 | dy 196 | od 194 | hy 184 |
| ar 153 | Ae 149 | ey 149 | ee 136 | eo 121 |
| ko 94 | al 92 | he 92 | aA 88 | to 87 |
| do 83 | yt 79 | yk 77 | ky 76 | ty 67 |
| ha 63 | ck 62 | kh 59 | oa 58 | ke 57 |
| pA 55 | am 47 | ya 43 | om 42 | sa 42 |
| ir 41 | yd 45 | ld 39 | oi 39 | ea 37 |
| ks 37 | oc 36 | so 36 | os 35 | oy 34 |
| cp 33 | ph 33 | Ac 31 | lo 31 | op 30 |
| ra 29 | an 28 | es 27 | oe 26 | te 23 |
| ro 22 | ry 21 | ly 20 | Ak 19 | ds 19 |
| ts 18 | lA 17 | ls 17 | ek 14 | po 14 |
| at 13 | sy 13 | cf 12 | fh 12 | yp 11 |
| Ad 10 | as 10 | de 10 | fa 9 | hc 9 |
| oo 9 | py 8 | ys 8 | dl 7 | en 7 |
| rd 7 | fo 6 | Ap 5 | et 5 | im 5 |
| of 5 | se 5 | yo 5 | dk 4 | hk 4 |
| lt 4 | ps 4 | dd 3 | dg 3 | hh 3 |
| ht 3 | is 3 | ri 3 | dc 2 | dt 2 |
| ed 2 | eg 2 | fy 2 | hd 2 | il 2 |
| lc 2 | le 2 | mo 2 | nd 2 | ny 2 |
| pd 2 | re 2 | rs 2 | sk 2 | ss 2 |
| td 2 | ye 2 | co 1 | cs 1 | di 1 |
| dp 1 | dr 1 | ec 1 | el 1 | ep 1 |
| er 1 | fs 1 | hf 1 | hl 1 | hr 1 |
| iA 1 | ic 1 | id 1 | ik 1 | io 1 |
| lg 1 | lk 1 | mm 1 | nA 1 | no 1 |
| oh 1 | oq 1 | qk 1 | rg 1 | rl 1 |
| sc 1 | sf 1 | tl 1 | yc 1 | yf 1 |
| yr 1 |
We now have a smoother curve but still too many low occurance
groups. We can immediately replace sh with a single letter and
also groups such as cfh, ckh, etc. If an analysis of trigraphs
is then carried out combinations such as iin and iir can also
be amalgamated.
Once this has been completed we arrive at a set of verbose groups
shown below. A group preceded by a dot means it can start a word,
followed by a dot means it can end a word and surrounded by dots
indicates it is a stand alone word. Those groups with no dots
appear within words but do not start or end them.
.a 1 a 16 a. 7
ae 1
ai 3 ai. 1
aii 2
aim 1 aim. 2
ain 3 ain. 58
air 6 air. 21
ais. 2
aiil 2
aiim. 2
.aiin. 7 aiin 2 aiin. 328
aiir 1 aiir. 8
aiiin. 4
aiiir 1
ak 2
.al. 3 al 23 al. 61
.am. 4 am 2 am. 40
.an. 1 an. 27
ap 1
.ar. 5
.ar 2 ar 19 ar. 125
.c 2
.ch 539 ch 532 ch. 4
.cfh 10 cfh 2
.ck 3 ck 2
.ckh 30 ckh 26 ckh. 1
.cph 28 cph 5
.cth. 2
.cth 160 cth 45 cth. 1
.ct 2 ct 2
.d. 10
.d 405 d 59 d. 20
.dy. 52
.dy 14 dy 4 dy. 126
.e 5 e 149 e. 9
.ee 5 ee 102 ee. 2
.ek 1 ek 6
eo 4 eo. 16
ey. 96
.f 11 f 4 f. 1
.g. 3 g. 7
h 6
in. 2
iin. 1
iiin. 1
.iiin 1
.k 129 k 199
.l 7 l 5 l. 12
ls 1
.m. 1 m. 2
.n. 1 n. 7
.o. 4
.o 40 o 141 o. 89
.od. 1
.od 16 od 62 od. 22
.oe 7 oe 3
.of 1 of 2
oin. 2
.oir. 1
.oiin. 3 oiin. 27
oiir 1 oiir. 2
oiis. 1
oiiin. 2
.ok 117 ok 47 ok. 2
.ol. 14
.ol 29 ol 67 ol. 338
.om. 1 om. 41
.op 14 op 6 op. 1
.or. 25
.or 5 or 37 or. 336
.os. 4
.os 4 os 2 os. 15
.ot. 1
.ot 110 ot 36 ot. 1
oy 1 oy. 22
.p 37 p 25 p. 1
.q 1
.qo 228 qo 1
.r. 8
.r 13 r 3 r. 3
.s. 37
.s 71 s 16 s. 45
.sh. 2
.sh 257 sh 90 sh. 4
.so. 1
.so 7
.t 77 t 187 t. 4
.y. 25
.y 183 y 71 y. 621
This final analysis leaves us with a discreet set of verbose groups. We can now use these along with an analysis of the Voynich text to produce a lookalike encipherment. An analysis of page f1r produces a set of verbose groups. These are shown below.
d s ch sh ok cfh cph p k os od a ai y o ot ckh cth t ek eo ch sh ey dy ee or r al ain om oiin oy ol o an ar e y aiir aiil in s aiin air oin
These groups can now be used in the attempted encipherment. The suggestion has been put forward that the plaintext may somehow have been reordered or sorted before encipherment. One method that would achieve this effect is to first strip all spaces from the plaintext and break it into groups of alphabetically ordered strings. The sample plaintext used here is from John Dee's Tuba Veneris.
vocatus sive citationes sex spiritum sub veneris dominio existentium ubi docetur methodus perficiendi sigillum veneris eiusque tubam circuli compositio nomina propis spirituum eorum vocatus et sigilla cum horum praeparatione libri consecratio operantis ritus spirituum valedictio cum aliis adhuc pluribus in opere observandis
Once the spaces have been removed they are reinserted so that the strings are split into alphabetically sorted groups.
v o c atu s s iv e cit at io n es s ex s p ir itu msu b v en er is do m in io ex ist ent iu mu bi do cetu r m et ho du s p er fi ci en di s i gil lu mv en er is eiu s qu etu b am cir cu l i co mp os it io no m in a pr op is s p ir itu u m eoru mv o c atu s et s i gil l a cu m horu mpr aep ar at io n e l i br i co ns e cr at io op er ant is r itu s s p ir itu u m v al e di ct io cu m al i is adhu cp lu r i bu s inop er eo bs erv an dis
This introduces a fair number of one letter groups but these should be masked by the verbose nature of the substitutions. We can now assemble a table using the verbose groups constructed from the analysis of page f1r. We can then use this to encipher the reconstructed plaintext. A little thought now needs to be applied to the positions of the verbose groups in this table so that it will adhere to the crust-mantle-core paradigm. Any crust groups that normally start a word will appear at the start of the alphabet. Any that normally terminate a word will appear at the end. The core groups will gravitate towards the centre of the alphabet and be immediately surrounded by the mantle groups. The resulting table is shown below.
cipher substitution table (crust mantle core)
a b c d e f g h i l m
d s ch sh ok cfh cph p k os od
a ai y o ot ckh cth t ek eo ch
n o p q r s t u v x
sh ey dy ee or r al ain om oiin
oy ol o an ar e y aiir aiil in
s aiin air oin
It can be seen from this that all plaintext letters have at least two substitutions. These can be used at the encipherer's discretion to upset statistical analysis. The letters at the end of the table from s to v have an extra substitution that can be used to increase word length where too many short words are found, as long as the insertion of the dead plaintext letter is always substituted by the correct verbose group these letters can be ignored when deciphering the text. In the case of a letter t being inserted at the end of the single plaintext letter r we would get the new form rt. As can be seen from the table above the r can be substituted by either 'or' or 'ar'. However as a dead letter the only substitution for the t would be 'aiin'. If we now apply the substitutions to our raw alphabetical strings we arrive at the following encipherment.
om.ol.ch.dalain.s.r.kom.ok.chekaiin.dal.kol.sh.ots r.okoiin.s.dy.kar.kalain.odsair. om.otoy.okar.eks.shol.od.ksh.kol otoiin.ksaiin.okshy.kain.odaiir sek.shol.chotalain.or od.okal.pol.shain.r.dy.otar.ckhek.chek.okoy.shek s.ek.cthekos.osair.chom.okoy.otar.ks otkain.r.eeair.okalair.s.dch.chekar.chaiir.os.ek chol.cho.olr.kal.kol.shol.ch.ksh.d dyar.oldy.ks.r.dy.ekar.kalain.air.ch otolarain.choin.ol.ch.dalair.r.okaiin.r.k.cphkos chair.ch.polarain.choar.dokdy.dar.dal.kol.sh.ok os.ek.sar.k.chol.shr.ot.chor.dal.kol oldy.otar.dshy.ks.ar.kalain.r.r.dy.ekar.kalain.aiir.ch om.dos.ot.shek.chy.kol.chain.ch deo.ek.ks.dshpain.chdy.osair.ar.ek.sain eksholdy.okor.otol.ais.okaraiil.dsh.sheks
It should be noted here that this encipherment does not use the dead letter
insertion method and so all verbose substitutions relate to actual plaintext.
We can now select an individual line of plaintext with a large number of
single letter groups:
l i br i co ns e cr at io
With reference to alphabetical integrity and with reference to our cipher table
we can expand the plaintext groups thus:
lT iV br iT co ns eU cr at io
Here the dead letters are indicated in uppercase. Applying the encipherment we
get:
osaiin.koin.sor.ekaiin.chol.she.okair.chor.dal.ekey
This now flows more smoothly and the encipherer has control of output. The idea
of verbose groups shown here reflects also the thinking of Nick Pelling. The
main basis for this work was the original VMS text generator proposed by Jon
Grove and the excellent analysis of Jorge Stolfi. I do not propose this as a
definitive solution. I am hoping that further research may give new insight into
the method used in the VMS.
This work should be considered an open resource for anyone interested in further
investigation.
Jeff Haley
June 2005