Cryptographic hash functions sit at the core of many secure applications. They are vital to digital signatures for example: rather than signing a message, the hash of the message is signed. A cryptographic hash may be considered as a sort of digital fingerprint: it identifies a message in the same way that a fingerprint identifies a person by associating a message – no matter its size – with a string of bits of fixed length. To do this, such a hash must satisfy the following:
- It must be fast to compute.
- Given a hash value
, it should not be possible to find a message
for which
. This is called pre-image resistance.
- Given a message
and its hash value
, it should not be possible to find another message
for which
. This is called weak collision resistance.
- It should not be possible to find any two different messages
and
for which
. This is called strong collision resistance.
Most cryptographic hash functions are based on bit operations, and are designed to be blisteringly fast, as well as satisfying the other properties above. See the links on the wikipedia page for some examples.
However, if we are prepared to trade speed for simplicity, a very nice hash function was developed by Adi Shamir. Take two large primes and
, and a number
whose multiplicative order is large in the ring
. If a message
can be encoded as a large integer, the hash is defined as
.
By the discrete logarithm problem, this is pre-image resistant. It is also collision resistant, for if
then we would require
.
But determining requires factoring
.
This hash function can be easily implemented. First we have a little function which turns any ASCII text into a large number, by treating the ASCII values of each character as digits in base 256:
def str2num(s):
return ZZ(map(ord,s),256)
Next, to find which has a large multiplicative order. The largest possible order modulo
is
, and this will be obtained if
is a primitive root of both
and
. Given that a number can be tested to be a primitive root by checking that
for all prime factors of
, we have the following little program for finding a common primitive root:
def common_primroot(p,q):
# Finds a primitive root common to both p and q
ps = [f for f,e in factor(p-1)]
qs = [f for f,e in factor(q-1)]
a=2
while true:
outp = true
for d in ps:
if power_mod(a,(p-1)//d,p)==1:
outp=false
outq = true
for d in qs:
if power_mod(a,(q-1)//d,q)==1:
outp=false
if outp and outq:
return a
break
else:
a+=1
Now the hash, for say, 160 bits. This can be computed with these steps:
- Choose two primes
and
and a suitable value
.
- Append the string “xx” to the text. This will ensure that the empty string has a non-trivial hash.
- Turn the text into a number
and compute
- Take the residue modulo
, turn the result into hexadecimal, and pad out to 40 hexadecimal characters.
All this can be done in one fell swoop:
def qd_hash(pl,a,p,q):
# Implements Shamir's hash function h = a^m (mod p*q) where a is a
# number of high multiplicative order mod p*q. This works well if
# a is chosen to be a common primitive root of both p and q
return hex(power_mod(a,str2num(pl+'xx'),p*q)%(2^160)).zfill(40)
Let’s try it out, we’ll choose two 150-bit primes, and find a common primitive root:
sage: p=next_prime(randint(2^150,2^151));p 1681637307496557466781353395191134574935425217 sage: q=next_prime(randint(2^150,2^151));q 1592026707039533251493751206235271693959169369 sage[142]: a=common_primroot(p,q); a 7
For the message text, we choose the first few lines of Shakespeare’s “Richard III”. Also take another text differing by one character, and the empty string:
sage: pl="Now is the winter of our discontent/ Made glorious summer by this sun of York;/ And all the clouds that lour'd upon our house/ In the deep bosom of the ocean buried." sage: pl1="Now is the winter of our discontent/ Made glorious summer by this sun of York;/ And all the clouds that lour'd upon our house/ In the deep bosom of the ocean buried-" sage: pl2=''
Now hash them:
sage: qd_hash(pl,a,p,q) '53ed334a9d8937f0b42bec194b0be729de2a7da3' sage: qd_hash(pl1,a,p,q) '15b5735208cdb11ab27f1b52f5da4f0132f88137' sage: qd_hash(pl2,a,p,q) '28c2ec5fb67e2b1af21ce8dd1a79f4c347197c34'
A tiny change in the plaintext causes a complete change in the hash, as we would hope.
I know this is an old post, but I just came across it. I like! Strange that this is the first time I’ve run into it. What’s the avalanche look like? Pretty uniform distribution collision potential? One thing to point out though is that it’s implementation in the real-world would be incredibly slow compared to other hashing algorithms as–no matter what you do–it depends on division operations. And of course, if we break out of native int limitations (undoubtedly as a target hash would need to be at least 128 bits to be even remotely useful), that’s a whole new world of slow. Simply ratcheting up the size of something like SHA (aka 512) or implementing an ISAAC derived hasher, etc, etc, would be far faster and, in the real-world, undoubtedly just as secure. Still, great food for thought though and I like how the algorithm demonstrates some of the fundamentals of modern cryptography (DH key exchange immediately pops in my mind).