It’s well known that in a group of 23 people, the odds are good that at least two of them will share the same birthday. This initially counter-intuitive result is easily explained by noting that the probability of people having different birthdays is
assuming that there are only 365 days in a year, and that each birthday is equally likely. This expression can be more concisely written as
The probability of at least two birthdays being the same is then
For this value is slightly greater than 0.5:
sage: def perm(n,k): return binomial(n,k)*factorial(k) ....: sage: (1-perm(365,23)/365^23).n() 0.507297234323985
This fact that birthday “collisions” are surprisingly likely for a relatively small group, is the foundation for much of the theory of cryptographic hash functions, where security is related to the difficulty of finding collisions: two different inputs with the same output.
But recently I discovered that at least three people known to me had the same birthday. What were the odds of this, I wondered? In general, how many people do you need to have better than even odds for at least three people to share a birthday? Being generally pretty poor at probability (at school I answered a probability question in a local mathematics magazine, and had my answer printed, only to discover later that it was in fact wrong), I did a search. The best answer, with a nice explanation, is here.
Here’s how this author, Dr Rick, solved it. As for the simpler birthday problem above, first find the probability that any birthday is shared by at most two people. We can build this answer up in stages. To start, suppose there are people, what is the probability that there are exactly four pairs who share (different) birthdays?
Start with assuming that the first two people have the same birthday, and the second pair has the same birthday (which is different to the birthday of the first pair), and then the third pair share a birthday (again which is different to the first two pair birthdays) and similarly for the fourth pair. This probability can be computed by choosing any four birthdays for the four pairs, and ensuring that all the other people have different birthdays. Start by noting that there are
ways of choosing four different birthdays (one for each pair) and
ways of choosing the first pair. This leaves
ways of choosing the second pair. And the number of ways of choosing the third and fourth pairs will be
respectively. Then we require that the other people all have different birthdays chosen from the remaining 361 possible days. Putting all this together gives us the total probability of exactly fours pairs of people, each pair of which share a birthday:
where the last expression is of course the probability of people having different birthdays from a total of possible days.
This expression can be simplified. First note that
Now the above expression can be written as
Now we can also express the permutation and binomial coefficient as factorials:
This means that the probability of exactly pairs of equal birthdays is
and so the total probability of pairs of birthdays (but no triples) is
Thus the probability of at least one trio of birthdays is
This is the same formula which is given (without derivation) in Anirban DasGupta, The matching, birthday and the strong birthday problem: a contemporary review, Journal of Statistical Planning and Inference 130 (2005), 377-389. (For this reference, I am indebted to Michael Lugo on Mathematics StackExchange).
Now for some numerical experiments:
sage: def trio(n): return (1-sum((factorial(365)*factorial(n))/\ ....: (factorial(i)*factorial(n-2*i)*factorial(365-n+i)*2^i*365^n)\ ....: for i in range(n//2+1))).n() ....: sage: trio(87) 0.499454850631401 sage: trio(88) 0.511065110624730
and we see we need a group of 88 people to have better than even odds that three of them share a birthday.