Most writers of cryptography texts, to my mind, spend a disproportionate amount of time and space carefully discussing the cryptanalysis of the Vigenère cipher. Maybe it’s because this is the first “non-trivial” cipher most students learn, and its cryptanalysis is also slightly non-trivial. Anyway, who am I to buck this trend? So this post shows how to do it in Sage.
First, the cipher itself. It’s a polyalphabetic cipher, where each letter of the plaintext is shifted by an amount given by a keyword; this key being repeated as much as required:
T H I S I S T H E P L A I N T E X T T O B E E N C R Y P T E D
K E Y W O R D K E Y W O R D K E Y W O R D K E Y W O R D K E Y
In this example, and using the correspondence A=0, B=1 up to Y=24, Z=25, we see that every seventh letter is shifted by the same amount: letters in the 1st, 8th, 15th, 22nd positions are shifted by K=10; letters in the 2nd, 9th, 16th 23rd positions are shifted by E=4, and so on. The resulting ciphertext is
D L G O W J W R I N H O Z Q D I V P H F E O I L Y F P S D I B
What makes this cipher seem so strong is that similar letters in the plaintext are not necessarily encrypted to the same letters in the ciphertext: notice for example that the first two I’s are encrypted to Y and O; and that similar letters in the ciphertext do not necessarily correspond to the same letters in the plaintext.
However, if the length of the keyword can be determined, then we know that every
th letter is shifted by the same amount. If there are enough letters to perform a frequency analysis (and using the fact that the most common letter in English text is E), then the value of the shift can be found. One way of determining the keyword length is to shift the ciphertext by a given amount and check all the coincidences (equality of letters) between the ciphertext and its shift. A shift with a large number of coincidences may be the length of the keyword. This method is called Kasiski’s method after its 19th century discoverer.
For an example, I have a very long ciphertext which has been obtained with a Vigenère cipher. You can see it here. Anyway, it’s nearly 20,000 characters long. To find the length of the keyword which was used, the first step is to write a little program to perform a cyclic shift of a string:
def cshift(str,n): slen=len(str) return str[n:slen+1]+str[0:n]
Now to find the coincidences with different shifts:
sage: clen=len(ct)
sage: for i in range(20):
ctx=cshift(ct,i)
coin=0
for j in range(clen):
if ct[j]==ctx[j]:
coin=coin+1
print i,coin
and this returns the output:
0 19369
1 683
2 782
3 791
4 728
5 675
6 655
7 1284
8 712
9 734
10 718
11 764
12 708
13 718
14 1192
15 709
16 716
17 734
18 792
19 697
Neglecting the first output, we see that the greatest number of coincidences occur for shifts of 7 and 14. This would seem to indicate that the keyword has length 7.
Now we break up the ciphertext into seven groups:
sage: ct7s=['','','','','','','']
sage: sage: for i in range(clen):
ct7s[i%7]=ct7s[i%7]+ct[i]
The next step is to find which of the letters is most common in each group. Here’s one way:
sage: alph='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
sage: for i in alph:
print i, ct7s[0].count(i)
which produces
A 1 B 71 C 1 D 217 E 41 F 78 G 123 H 366 I 64 J 52 K 160 L 192 M 2 N 27 O 102 P 86 Q 187 R 196 S 40 T 2 U 167 V 188 W 236 X 74 Y 23 Z 71
from which it is obvious that the most common letter in this group is H. Repeating this same procedure for the other six groups enables us to build up a table of the most common letter in each group:
0 H 1 M 2 G 3 O 4 I 5 R 6 W
(Of course, this entire process can be easily automated; on the other hand it’s quite nice to do everything separately one step at a time.) Now, the most common letter in English is E, which has the value 4. If H corresponds to E, that means that for group 0, there has been a shift of 3, which corresponds to the letter D. This is the first letter of the keyword. And in fact every other letter of the keyword can be obtained by shifting back by 4 from each common letter. This produces the keyword
DICKENS.
Applying this to decrypt the ciphertext produces:
WHETHERISHALLTURNOUTTOBETHEHEROOFMYOWNLIFEORWHETHERTHATSTATION WILLBEHELDBYANYBODYELSETHESEPAGESMUSTSHOW ... BOURNEOFALLSUCHTRAVELLERSANDTHEMOUNDABOVETHEASHESANDTHEDUST THATONCEWASHEWITHOUTWHOMIHADNEVERBEEN
- it’s the first chapter of “David Copperfield”, by Charles Dickens, in uppercase with all punctuation removed.
Filed under: Cryptography, Sage
this is all very nice! …but: as far as i can see, you do not use any of the sage-specific features. all of your code used here runs on plain python.