Nostradamus Quatrains Quatrain Lists Allanuu@hotmail.com eMail Link Nostratextor BlogMy Rules etc Allan Webber websiteMy Web Site Amazon BooksMy e-Books

ANAGRAMS IN ANCIENT TEXTS

 (Nostradamus, Shakespeare, Dante, Moliere, Spenser)

Allan Webber 2004

 

 

A DIARY OF THE COMPUTER ANALYSIS BY ALLAN WEBBER OF 340,000 WORDS FOUND IN WELL KNOWN TEXTS.

This series of 2 papers shows my efforts to establish a set of  measures for gauging the reality of anagrammatic coding. Hopefully it will prove useful to all those interested in coding.

The first stage examines the statistical evidence when comparing a variety of ancient texts.  

The second stage examines patterns between the larger anagrams found in the texts.

This exercise was designed with the aim of  definitively testing Nostradamus' text for coding.

 

Tuesday 2nd November 2004- The US presidential campaign ends tonight, the Melbourne Cup is run this afternoon, its raining in what has been a dry and hot spring.

NostvOthers

Three computers work away while I type upon a slower machine. They have been working on the same task for 10 days and tomorrow they should be finished. I have seen some of the analysis- enough to be excited and anxious to get on with the task in hand.

The analysis is designed to test the manner in which anagrams are found in any text and to find what is normal and what implies a hidden code. It tests over 340, 000 words English words (including derivatives) against well known texts in different languages.

The analysis arises out of my scepticism, a scepticism that many would consider at odds with the efforts I have put into my Nostradamus research.

I can see no convincing evidence that any living person can see into the future or is psychic. If such effects were possible then in this populous world there should be some evidence that did not wilt when placed under scrutiny. Instead there is only folk wisdom that relies on its delicate nature to explain its inability to be confirmed. Yet if there was really any person with such insight they would be the subject of great respect especially in the fields of investment and military adventurism.

I am a sceptic but only where reason and evidence support this view. I  don't  like to take a view based on lack of evidence for that seems a mentally lazy course.

And the present offers ample evidence, testable in a way that the past is not. Purported prophets of the past usually gain strength not from the evidence but from the clouding of reality that tends to come with time.

Having due regard to what is apparent in the present world I cannot believe in  the ability of individuals to prophesy despite what others might make of my writings. But in this one case, that of Nostradamus, I have found it much more difficult to find the flaw. Each of my writings published on the internet serves its purpose in my ongoing attempt to find the evidence that Nostradamus could or could not see into the future, to resolve one way or the other this enigmatic writing. I have no doubt Nostradamus' writing is coded but was it, is it. prophetic? If Nostradamus was able to truly prophesy it seems counter to normality and  reason.

For over thirty years I have established ground rules to place Nostradamus' work in a testable modern setting. This should have let me relinquish any thought that Nostradamus could see into the future. Yet on every occasion I have been left with a bigger mountain of coincidence to overcome.

My work has led me into a view that there is an unusual and inexplicable patterning in Nostradamus' writings. It seems to hold too many beautifully arranged English anagrams.

For example there is a verse where the anagrams "American" and "Disaster" occur complete and alongside each other. To what extent is this coincidence? Can it be considered to be 'proof'' of coding? 
           CIII. Verse 90 L.3 "Vn chef de clasSE ISTRA DE CARMANIe "

Only recently I had cause to use my programs to search for the complete anagram of "acronyms", a form of coding using the first letter of lines to generate messages. One occurrence occurs and in the same line are clearly seen anagrams saying "My codes". 
              IX. Verse 13 L.4 MYs DECOuuers par feu de buRANCOYS-Mys"

How many such coincidences add up to proof? At what point can I say "There is a code in Nostradamus.". And at what point would I be forced to link this with the much more unnatural statement "Nostradamus could really see into the future."

Of course it is possible to prove that a person has incorporated a code without it supporting any other claims as to their abilities. But in this case my method of proof makes it difficult to distinguish the two for the codes appear to be in Modern English and Nostradamus was a 16th century Frenchman. Is this premise wrong and the code really in French? To what extent is this my delusion, my falsely seeing what I seek? 

Over the last year I have concentrated on finding whole anagrams ( see ANAGRAMS to clarify term). It has been somewhat of a surprise to me to find the superabundance of anagrams in Nostradamus' ancient text. Superabundant, highly complex and often seemingly inexplicably connected with each other and the original text.

The question arises as to how important is this data. It would seem at a casual glance that the finding of words such as "American" and "Disaster" alongside each other is very significant when there are only 4 anagrams of "American" and 2 of "Disaster" in all of the 3742 lines of text of Nostradamus. But is it?

It is easy to be misled about the nature of chance and to include false probabilities into our assumptions. The classic example of this is the probability of 2 people having the same birth date  in any class of 30 people. The real odds of this turn out to be extremely likely not the one in 12 chance (1/365 x 30) that people mostly assume. The false calculation occurs when the question is approached from the point of view of it occurring rather than from the view of it not occurring.

So over time I have experienced increasing dissatisfaction at the ease with which I have found material in Nostradamus' text that supports my hypothesis. It seems too easy to get, much easier than people might assume.

I knew I was able to analyse Nostradamus' work in a much more scientific way than the 'ad hoc' approach I had been using and I set out to test the texts of Nostradamus to their fullest extent. My gut-feeling was that these anagrams in Nostradamus were more than chance but the left-hand side of my brain chided me that disciplined testing of this proposition would show my feelings to be falsely based.. 

In order to settle this conflict within myself I set out to gain approval of my brain's left-side. Firstly I gathered from various dictionaries on the internet a set of source words -all being at least five letters long-that I could use for testing various texts. I ended up with a base of over 340,000 words.

Now testing this amount of data is no simple task. It is only in the last few years that such a data base could be applied to the analysis of Nostradamus work. There are 3742 lines of text to be analysed and each word has to be tested. This means over 1,200 million tests have to be done.

Ten years ago, with the technology available to me then, it would have taken over 8 months just to run the tests on Nostradamus text. And beyond 20 years ago the task would have been impossible. It seems fair to say that this is the first time since Nostradamus wrote his work that such a test is possible.

Today, using 3 machines that are not the fastest around I can accomplish the analysis  task on all my sources  in about 14 days.

In order to achieve something useful I needed to test other sources and I resolved to use short sections from a range of texts that fairly compared to Nostradamus work and which allowed some greater understanding of anagrams within a wider context. I ended up choosing poetry totalling some 1,800 lines which is about half that of the Nostradamus text. I chose 2 old French classics, an Italian classic and 2 old English classics in order to test various aspects of anagrams and the potential of my methods to uncover hidden coding. I also included the English translation of the Italian classic to act as a control. See TEXT SELECTION for further details.

Each text (and the word list) was converted into a standard form while ensuring that the integrity of the sources remained intact. For instance the Nostradamus text is the format I started using 10 years ago and hasn't been changed since. See TEXT CONVERSION PRINCIPLES for details. I believe the comparisons made possible by these processes are fair and free of distortion.

The rules I apply to the anagrams are also critical for they are designed to ensure rigour in this particular analysis. Imperfect anagrams are not allowed -the program  used DOESN'T look for letter substitutions, composite words or anagrams in incomplete state (although I have programs that allow these aspects to be tested).

One important variable aspect is the way the program treats end-of-line lettering. Each line is considered as a circle so anagrams can be found right up to the last letter (with the rest at the beginning of the line). A second important aspect is that every letter in the line is considered to be a potential starting point for an anagram not just the letters that begin words. .See RULES FOR FINDING ANAGRAMS.for  principles. There are other rules that could be applied but I have chosen these on which to base the analysis. The important issue here is that these rules are mandatory and mechanical and not able to be chosen at will- there is no discretion to choose that which benefits or detracts from a particular result.

The program for this whole anagram analysis is fairly simple (compared to that for split-anagrams) and for those who have an interest in building a routine for themselves I set out the basics in HOW THE ANALYSIS PROGRAM WORKS

Tuesday 9th November 2004- There is a difficulty in any research designed to test for a negative result in that the desire for the research to come up with something unexpected can conflict with the need for a disciplined outcome. This has been my dilemma over the last week because the analysis turned up a result that showed Nostradamus' text is different to the other texts analysed. 

Although I felt my past work suggested otherwise I expected the results of disciplined analysis to show that Nostradamus' text was no different to the other texts examined.Yet this is not the case, so I was forced to re-examine every element of the research before I could proceed further.

 I re-wrote the program using a completely different paradigm (See SECOND PROGRAM) and ran it against the first. The new  program was far faster than the earlier one (for the new one allowed me to use a faster technology called SQL) and because of the slowness of progress in the earlier program there was a great deal of merging output from different computers which gave a potential for error. By checking the output of one program against the other I could eliminate any error of this kind. There were a minor number of omissions that had occurred and the program was rerun to redress this problem. The data from both analyses now matches and I am confident that the final product is free of similar errors.

There is a difference in the size of data from each source and this had to be normalised against a standard for the results to be meaningful. This was done by counting the letters in each text (for this gave the number of starting points for anagrams) and then using this number I proportioned each to match the count in the text of Nostradamus. This letter-count data is gathered very easily in a database such as Access (Summing length of lines) and is not prone to error. After reviewing this aspect I am confident it is a valid approach and not the source of  any bias in the results.

 (11th November- I realised that I had not checked the validity of the entry from which I was basing the count- this turned out to be an error for I had used the lines with blanks and other characters in it whereas it should be the one where all these are removed- this produced a significant change in the ratio which increases the count for Non-Nostradamus texts and increases the probability that his text is not different from the rest. My left hand brain (the rational side) is confident that this will cause the analysis to properly show no difference in Nostradamus and the other texts. My right hand side's smug confidence that the results were going in favour of Nostradamus lies in tatters and now it is more in hope than anticipation that it awaits the completion of the analysis. - Final values used for letter count are Nostradamus: 116,871, Spens:er19,092, Roland: 18,763, Moll: 7,089, Shake: 5,682, Dante-Eng: 4,630, Dante-It: 3,794)

The word base is another potential source of error but only if it had inbuilt influence from me. The sources for my word base are completely independent of my work. They can be downloaded from the following sites for those wishing to verify the independent worthiness of my sources.

NostvOth2

These sources have not been vetted by me in any way and include any misspelling, foreign words and a large number of what seem to be poorly constructed words. The list of unique words with more than 5 letters within these two sources were merged into a single base. The very simplest words for the five-letter group were removed prior to use with any text.  Each text was exposed to this same lexicon without any prejudice on my part and this lexicon seems to me to be a valid base for the comparison.

Alongside is shown the distribution of words by length in the word base used.

There is no doubt there will be a masking effect produced by the nonsensical structure of many words in this lexicon but as the number is not vast and it should be equally distributed this should have no significant bearing on the assessment as to the possible use of anagrams in any text.

Some of the texts are in English (or a translation) and this must be a source of bias if the redundant anagrams (those that are identical to the test word) are included. To ensure fair comparison redundant anagrams were marked out as such and a result including and excluding these words was then possible for all the different sources.

 

ANAGRAMS

A word group is  considered to be an anagram of another word group when it consists of  exactly the same letters.

The following is an  example of  a complete sets of anagrams:

  • act, cat

Although having the same letters the following are NOT ANAGRAMS

  •  atc, cta, tca, tac

It can be seen that only a limited number of the possible letter permutations are words in their own right.

(NOTE: The number of possible permutations grows quite dramatically as the length of the set increases)

Further examples of anagrams:

  • withdraw, "draw with"
  • disaster, "ass tried"
  • attention, "into a tent"

 

 

 

CLUSTER TYPES

In this paper I classify letter clusters in the following way:

  • Anagrammatic:
    cluster forms a word found in word list.

  • Redundant:
    The cluster is identical to the word being tested

  • Sub-cluster:
    A larger cluster found in the same line is a derivative form e.g 'cat' is a sub-cluster if 'cats' exist

 

 

TEXT SELECTION

The texts used for comparison are

  • Nostradamus Prophecies:
    942 quatrains plus an extra verse -3742 lines
    16th Century French text exactly as presented in Erika Cheetham's "The Final Prohecies of Nostradamus" (Wagner Books-1993)

 

  • Mollier:
    Chapter 1 of "L'ecole des Femmes"-  17th Century French play in verse (sourced from internet)

 

  • Rolands Chanson:
    Chapter 1 of this 11th  or 12th century minstrel play (sourced from internet)

 

  • Dante (Italian):
    Chapter 1 of Dante's 'The Divine Comedy' . Italian poem (sourced from internet)

  • Dante (Modern English):
    Chapter 1 of Dante's 'The Divine Comedy' . Italian poem (sourced from internet)

 

  • Spenser (Old english):
    Chapter 1 of  "The Faerie Queene". 16th Century English poem.play (sourced from internet)

 

  • Shakespeare (Old english):
    First section of "The English Helicon" Sonnets by Shakespeare 17th Century (sourced from internet)

 

 

 

TEXT & WORD LIST CONVERSION PRINCIPLES

The principles used in the conversion were strictly the following:

  • 'w' always replaced with 'uu'
  • '&' always replaced with 'et'
  • all punctuation eliminated
  • all stressed letters replaced by normal values.

EXCEPTION: 10 years ago I amended the Cheetham text of Nostradamus by interpreting the 'i' as a 'j' where it seemed mandatory - no changes have been made since.

 

RULES FOR FINDING ANAGRAMS

From each line being tested for a particular word:

  • all blanks are removed
  • the length of each cluster  is the length of the word
  • testable clusters are  unbroken strings
  • lines are made circular by adding the line to itself.
  • every letter in the line generates a testable cluster (i.e. not based on start of words).
 

HOW THE ANALYSIS PROGRAM WORKS

The program is primarily based on rejection of each word not acceptance.

The infrequency of letters is used to index words and clusters to produce greater speed. Order is 'kjzxqyvhbfgmcpdolitunsrae"

The program includes functions that turn any cluster or line into one of three strings in order of letter infrequency:

  1. Base anagram- all anagrams will be the same using this function e.g. 'three' and 'there' become 'htree'
  2. Base anagram count- similar to 1 but groups of three characters with the last two being a letter count e.g.'three' and 'there' are 'H01T01R01E02"
  3. Full letter count- a full alphabetic string showing count of letters

The program takes each word in the lexicon in turn and finds all the anagrams within all the lines of text.

The program does this inthe following way:

  1. It uses the first two letters given in base count for the test word and using full letter count on each and every line finds maximum number of every letter in the lines of text for this 2-letter combination.
  2. Rules out the test word if its base anagram exceeds maximum for any letter and marks that word as excluded.
  3. Tests possible word against every line of text.
  4. Uses base anagram count on line and then rules out word if any letter  exceeds value in line.
  5. Tests for every position in line by comparing base anagrams of cluster and word.
  6. Accepts any word that 5 says is a match, records result as 'in-text' and its location and moves on to next line.
  7. Finds the first letter of rejection (if 5 fails) by using base anagram count and moves start of cluster to one letter past the first letter identified in cluster.
  8. Repeats 6 to 7
  9. When every line is completed , the word is marked as processed and the analysis moves to the next word

EXAMPLE

looking for 'assistant' in 'Estant assis de nuict secret estude'

'assistant' has 9 letters, 6 of the them are unique , word structure is A02-I01-N01-S03-T02

line has 30 letters so 30 clusters of 9 to be tested using string 'estantassisdenuictsecretestuseESTANTAS',

line structure for a,i,n,s,t is A02-I02-N01-S06-T04 so word could be in line- therfore proceed with test.

first cluster of 9 letters  is 'estantass' has structure A02-I01-N00-S03-T02. Does not equal A02-I01-N01-S03-T02. Analysis can stop after result for 'e' (letter not found) or 'n' (wrong count)- whichever comes first.

third cluster of  9 letters is 'tantassis' which has structure A02-I01-N01-S03-T02 and is equal to 'assistant'. Analysis returns position 3 and moves on to next test.

 

 

HOW THE SECOND PROGRAM WORKS

The program is still primarily based on rejection of each word not acceptance.

This analysis takes each line and finds all the words in the lexicon that form anagrams within the line. (The earlier program reversed this by taking each word and finding all the line that contained the word as an anagram).

The program used the following:

  1. A simplified "Full Letter Count" function that created a string 26 characters long with each character giving the count for the corresponding letter of the alphabet.
  2. A 'Compare A to B' function- which compares two letter-count-strings and returns 'equal'  when both  strings are identical, 'A greater than B' if any element of A is greater  than that same element in B (Important Note: the result of  'A greater than B' does NOT mean string A is greater than string B e.g. '213' is greater than '321' ), otherwise returns 'B greater than A'.
  3. A function that tested each location in a line and returned the position in the line where the anagram and sector letter-count functions are equal.
  4. A series of SQL structures to find all possible anagrams in a single line using the above functions.

This program is about four times faster than the earlier program. It takes 2-3 minutes to compile any line using an 800mhz computer.

My intent is to build this version into a downloadable file that will be able to be used by anyone wanting to verify the results for individual lines.

Monday 15th November 2004- Both the emotional and rational minds are astounded by the result. The expectation of each was that after the adjustments described earlier there would be nothing of significance in a comparison of the Nostradamus' text with the others. Below is the chart of the results of the analysis.

LogCt1

The graph displays words of length greater than 7. The word base was filtered by me for words below 6 as these smaller words become so easy to find that any deliberate anagrammatisation would be hard to see. The analysis below 8 words is therefore discarded in the presentation of all charts.

The results show that there is a clear and consistent pattern in all groupings that place Nostradamus' text well above the other texts.

 

The right hand, emotional side of the brain is pleased but apprehensive at this result for although it seems this is a clear victory it knows the left-hand side too well, this result will be pursued to try and find its logical flaw.

And the emotional side is  also disappointed for it aches for the issue to be resolved and a negative result would have allowed this quest to end. Nothing is proven by this result, it just continues the pattern whereby  Nostradamus text continues to intrigue and puzzle me.

Already the computers are running once more, running logical tests to try and resolve what it means when an analysis of a French text using English words leads to this surprising  result.

This process of understanding begins with the graph below, which uses a logarithmic scale to display the result. This shows much more clearly that as the length of words increases there is a distinct increase in the ratio of Nostradamus count compared to that of the Other texts.  

Logct2

A normal scale uses equal intervals on the vertical axis to show the same gap in count. In the previous graph (above) each label on the vertical scale is evenly spaced and is 500 counts greater than the one below it. The normal graph is excellent for small vertical ranges.

A logarithmic scale shows proportions much better when the graph covers a large range. In the graph alongside each label on the vertical axis is evenly spaced but is 10 times greater than the one below it.

This graph suggests that a straight line could be drawn across the tops of each Series. Each line has a distinctive slope and they come together at some point to the left of the graph

It would not be surprising to see such a result if there was deliberate coding for it is less masked where randomly generated anagrams are less frequent (i.e. longer words). 

Potential for Code in the Other Texts

The reason for choosing several texts in different languages was to try and quantify the extent of random anagram generation. Dante in modern English was expected to be particularly significant in this regard for it is nigh impossible for it to contain coding.  Several of the texts were from a period in time where coding in text was very fashionable. It has been implied that Shakespeare's work may hide code in the forms of anagrams but the selection of sonnets makes this selection unlikely to be coded (for the nature of the word is paramount in sonnets). The same can be said for Moliere who was writing a play that needed to be attuned to the ear, not the mind.  Saga's are different for in telling a story, elegance is less important, So Dante's Italian and Roland's Chansong hold greater potential for concealing code

 

anagct1

The results for each of the different sources is shown alongside, this time using a line rather than column graph.

Nostradamus text remains above each of the other texts throughout the range (Except at 13+ stage where Dante-Italian exceeds Nostradamus).

Dante's English version falls into the pattern that could be expected for an uncoded work. It has the least number of anagrams. Shakespeares sonnets also fit to the expectation but Moliere does not. It lies closer to the Nostradamus text than anticipated. So is it coded or is there an error or a structural explanation? 

Roland's Chansong and Dante's Italian version show the signs that could imply coding- they lie beneath Nostradamus but above the one's expected not to show any signs of code. 

anagct2The two genuine  English sources, Shakespeare and Spenser show a slightly greater count than the Dante version but less than their French counterparts. This result is indeed intriguing and could be a reflection of coding but it might indicate a flaw in the methodology.

They imply a need for some simple tests.

The sources have unequal numbers of letters in them and it is important to test whether my equalisation factor distorts the result. The nature of the data is such that a simple test can be provided. The sum of the Other Source letters is very nearly equal to half the letters In Nostradamus. By using 2 line pairings of Nostradamus' four- line-quatrains as a basis for comparison  it can be shown the results still hold. 

 

Longwords1The nature of this graph has additional confirmation to a trend apparent in the earlier charts. It implies that the best chance of determining the existence of code occurs in the analysis of longer words found in each text.

The evidence, although it doesn't prove there is coding, does have a direct bearing on the search, for it indicates the things which need to be tested.

The research at this point sheds little light on whether this coding was in English and certainly it tells us nothing about the purpose of any code that might be contained in any text. These are aspects that can perhaps be resolved once some of the issues regarding this part of the search are more fully answered.

Amongst the  significant questions that are raised by these results is the reason why  the texts written in foreign languages show a tendency to imply code when examined with an English word base. Even if they are in code it would seem unlikely they are coded in their own language. Simple logic suggests that  testing with an English base should not reveal a coding pattern in another language. However simple logic often leads to wrong conclusions (as in the case of "Which one of  a lead, wooden and glass ball of same size will fall fastest when dropped?-They all fall at the same rate).

The simple logic suggesting English (or another foreign base) would not work is undermined by the connections of profiles in different languages.

In this series of analyses I am not actually looking for English words in other language texts, I am looking for anagrams of English words.

In preparing for this analysis I have drawn upon the Moby reference source to construct both a French and an Italian word base. I have then examined them using my anagram profile functions and MS Access query tools. These foreign word bases are not as big as the English word base but they show that between 16% and 20% of the anagram profiles in the smaller bases are common to each other and the English word base.

1stSource No. of Unique profiles in list 1 2nd Source No. of Unique profiles in list 2 Common profiles
English 310,000 French 120,000 24,955 (20%)
French 120,000 Italian 52,000 8002 of (16%)
Italian 52,000 English 310,000 9251 (18%)

This is a high percentage and if to this is added the very significant number that have very-similar profile then we will always get a large number of anagrams no matter what language is used,. However, the expectation would be that the variance between those with and without coding should increase when analysed with the appropriate language base (that in which the code is written). 

It is evident from this that I should at least run a full test using the French Word base to see if the results hold and to see what is revealed.

This research has shown that a huge number of anagrams of varying lengths occur by chance. (The number of occurrences is certainly much larger than I expected.)

I consequently believe that the finding of anagrammatic structures in any text is extremely likely and that the ability to find chance sequences that suggest a meaningful relationship are also high. 

From this we can conclude that:

  • the finding of a pattern in a very small and restricted set can't and shouldn't imply it was deliberately implanted.
  • the superabundance of anagrams to be found in all texts means it cannot be argued that chance alone supports the import of any single message. 
  • it renders meaningless those anagrams formed by letter substitutions- such practices are no more than  a license to fantasise. 
  • Gematria and Temurah methods of reading texts would appear to have the same order of credibility as tea-leaf reading.

Yet we know code does exist in many texts and in order to assess it we need a more rigid understanding of what is abnormal and what can occur by chance.

In order to go beyond this issue of probability and create a firmer base, an understanding of the findings in the earliest part of the research are essential. In order to pursue this I have created two different functions to randomise the order in the search lines. This means they contain the same letters but the syllabic structures are destroyed. This has then been used to analyse 1 in every 9 of the search lines in each Word base. Exact correlations can therefore be created for each line that is examined  in its original and randomised state.

Tuesday 16th November 2004- The French analyses  are complete and I have the computers running once more. They are testing randomisation when the word base is in English.

Since there are a different number of words in the English and French data bases it needs to be highlighted that my comparison for different languages is of form rather than quantity. It is difficult to know at this stage what impact trebling the word base has, it is highly unlikely to produce 3 times as many anagrams (per 100k letters) and  is unlikely to have a definable factor. However in the two bases I have used this factor for non-redunadant anagrams seems to be between 1.40 (unique word-forms only) and 1.55 (non-unique forms included- e.g repetitions such as  rat, art, tar).

The English word base I have used is so large because it includes a vast number of technical names and terms from a range of sciences and human endeavours. These are not part of the French word-base although many (such as chemical names) would be identical in each.

The reason for running the French word base is that a pattern seems to have emerged from the English analysis - it shows a distinct difference in the sources that is consistent over the full range of word-lengths.

The graphs below comparing the new French analysis to the earlier English one use a slightly different format to that used earlier- (logarithmic columns where each interval is ten times the one below it). They are divided into two groups (Count 1 and Count 2) and the results for the English and French word bases are shown alongside each other.

Significantly, when looking at the new results using a French lexicon base, there is no change in the order of the sources and there is no change in the consistency of the result over the range of word-lengths.

FRENCH Base (135k words) English Base (340k words)
Fr anagrams Eng anagrams

It can be seen that as the size of the word data base increases  it produces different results for short and long words. In the larger data base more random anagrams are produced for short words which masks the results.. Randomly generated anagrams should drop off rapidly as the length of the word increases and this seems to be the case. If there is a code then it is likely that this would show up more easily in the long word region. 

 

Fr Ana2 Eg Ana 2

One reason for running the program using the French data was to test whether the low result for the English sources was due to something inbuilt into my program that prejudiced the result against the host language. These graphs show that this is not the case- the English sources are still poorly performed while all the French sources are very highly performed. Dante's English version performs  worst  in both languages. This English version of Dante was tested as a form of control (highly improbable that it is in code) and its placement provides no reason to dismiss the theory that these anagrams are revealing the potential for hidden code. And Shakespeare's Sonnets support this view- they give a better comparative performance under English analysis than in French. We can therefore conclude that on the sample tested my methodology produces consistent outcomes across various languages.

There are still important issues to be resolved because at this point we have observed a difference but not offered any reason as  to why the English and French sources differ in the counts found in both word bases. However English is built on a historical basis and offers many more alternative word forms than French. These alternatives are derived from a whole host of invader-based sources and therefore English is likely to perform well across many languages. 

In the next section of this paper I look at the classes of words in Nostradamus text in order to discover whether there are imbalances or themes that support the idea of coding. In the rst of this paper I look at the impact of randomising letters for those more interested in coding methodology.

 

 

Sunday 21st November 2004- I have spent the last few days trying to understand and then present the results of my random-letter analyses.

My hope was to produce a formula for the measurement of coding in any text. Although I cannot provide a totally definitive measure I believe the following provides a fair comparison basis Adjusted Anagram Count= Anagram Count x Word-Base-Size-Factor (1.55 for French, 1.0 for English).

I believe that it is essential in any measurement for any data to be assessed in the following ways.

  • Analysis using a word base in the same language as to that in which the text is written.
  • Analysis of the text using a different language word base
  • Analysis of a text known not to be coded.
  • At least one analysis of the randomised lettering of the lines of text.

The first two allow a measure to be made of the randomised and non-randomised generation of anagrams.

 

My results from randomising lettering.

The purpose of my randomised lettering test was to establish a base level for anagrams generated from a randomly ordered lettering set..

The first method I applied was to randomise the order of the lettering in the line based on a random number generator. This caused a marked drop in the count for every source. However, I realized this form of generator might well preserve syllabic structures (just as random shuffling of cards can preserve flushes etc.). In order to break these down I used a second method.

The second method involved dividing the letters into four (1,5,9 etc, 2,6,10 and so on) then placing these groups one after the other. The result of this was a further significant fall in the number of anagrams.

This analysis has an immediate import on the relevance of Nostradamus text. The text I have used for the analysis is derived from Erika Cheetham's "The Final Prophecies of Notradamus" -Warner Books -1993. In her preface she says "Note to Reader: The French text of the quatrains reproduces as closely as possible that of the original edition of 1568 (Benoist Rigaud, Lyon).."

Now this text is most peculiar, replacing u's with v's and v's with u's and using words and using many spellings no one has been able to attribute to normal patterns. e.g. The fourth line says "Fait psperer q n'est a croire vain." It would be expected that on the basis of this oddity throughout the text that Nostradamus text should be below the other texts, not above them, when any anagram count is taken. I can conclude on the basis of the randomisation effect that the high count for Nostradamus' text is likely to be understated .

There is also a significant pattern to be found when looking at the results for same language Text and Word-base versus ones that are different . This is much easier to interpret after a fair allowance is made for the difference in Word-base size (I applied a factor of 1.55). A  consistent fall applies when analysed with a word-base in a language different from the source. This fall persists even when the line of text is randomised. The result using my two methods of randomisation lead to similar values and imply that there is a 1 in 6 decrease in count level whenever measuring English and French. This is independent of the direction (i.e Enlish Text to French base, French text to English word base).. This has to be due to structural constraints between the two languages and in particular in the frequencies of letter usage.

Count of words with more than 6 letters -Non-Redundant anagrams/100k letters

Text in original order

Source Wordbase Same-Lang Wordbase Diff-Lang adjusted Same-Lang adjusted Diff-Lang ratio S/D
French Sources 93413 113388 144790 113388 1.28
Nost-Fr 98188 123724 152191 123724 1.23
English Sources 79842 32155 79842 49840 1.60

Text randomised by 4 way split

urce Wordbase Same-Lang Wordbase Diff-Lang adjusted Same-Lang adjusted Diff-Lang ratio S/D
French Sources 42960 57246 66588 57246 1.16
Nost-Fr 44451 61547 68899 61547 1.12
English Sources 42873 24135 42873 37409 1.15

Although there is a smaller difference in this ratio for the Nostradamus' data it isn't enough to conclude that Nostradamus' text is coded in any language other than French (if at all).

However, this fall in count for language also implies that the count for Nostradamus' text is understated. His text is known to include Provencal, Latin and a variety of  language variations. At best there impact should be neutral but in all probability they would mean the count given is understated.

There is a distinct difference in the results for the English sources and the French sources. Part of this arises because the English base is less English than might be expected. It incorporates many French words that are used in English. There are 13, 870 words that are common to the French and English data bases.

When these words are removed and each data base is showing its true language characteristics there is still a bias against anagrams in the English sources. A significant shift is however seen in the ratio's of same to different anagram counts. Nostradamus' text falls further behind implying that it is less pure in its use of the French Language.

 

Source Wordbase Same-Lang Wordbase Diff-Lang adjusted Same-Lang adjusted Diff-Lang ratio S/D
French Sources 78244 93666 125190 93666 1.33
Nost-Fr 86165 109413 137864 109413 1.26
English Sources  79590  32611 79590 52178 1.52

The difference between the English and French counts may still include a structural component but even so the results show that once more Nostradamus' text is different to the other French texts. Once more through its high anagram counts it consistently points to the possibility of anagrammatic code.

At this point I would conclude that contrary to expectations Nostradamus' text has withstood my analysis. It consistently outperforms the other texts, across all categories. Further the nature of the text implies that this result is made more relevant because there are valid reasons for believing the count to be understated. 

These conclusions imply that the analysis should be taken a stage further with there still being a possibility that Nostradamus' text is not in French and may be multilingual.

In this part of the analysis I have looked solely at the statistical nature of the anagrams, devoid of any relevance. The next step is to analyse the words that have been uncovered and see if they hold some measurable significance. The next stages will perform the following:

  • Analyse the longer words into categories
  • Determine whether the count for these categories deviate from those within the word-base.
  • Analyse whether there is any association between words in specific locations.



eXTReMe Tracker