In 2015 I was at a convention playing some game or other, and this question came up. I’d heard “four” at some point, and said so. The guy sitting next to me insisted that I was quite wrong, that “seven” was the answer, that there were mathematical proofs, and that he even had a web site for a particular game that linked to ones that explained why.
Well, to cut right to the chase, I’ll tell you what I would have liked to have told that guy at the time: he’s wrong. Not because the mathematics are wrong. The problem is, the question is wrong. You see, the mathematical evidence was made by mathematicians, not game players, and what they answered was the question “How many times do you have to shuffle a deck of 52 cards to rearrange them into a fairly random order?”
This is not the same as the question that all the non-mathematicians are asking, which is “How many times do I have to shuffle a deck in order to make a game fair?” The answer, not all that surprisingly, depends a lot on just which card game we’re talking about!
The New York Times did a pretty good article on the relevant research, and there’s also a text-only version available. If you want to feel your eyes bug out of your head, Prof. Mann (Dept. of Mathematics, Harvard) wrote a paper discussing many of the details. (To give credit where due, the author of the original paper, Persi Diaconis, issued an update about 15 years later where he notes that four shuffles is, in fact, sufficient for some games.
Prof. Mann’s paper is, well, large chunks of it are way, way beyond my math skills, but he does cover two really important components of the question. (1) What is “random,” and (2) what is “shuffling?”
Shuffling first. There are actually lots and lots of ways to scramble a sequence, but when we’re talking about a deck of cards, we’re talking about what’s called a “riffle shuffle.” Take the deck, split it roughly in half, then recombine the stacks by interleaving them. There’s a bit of a problem with this process. Say you’re starting with a brand-new deck, and it comes from the factory arranged with the red suits first, then the black, and each suit is in Ace to King order. After one shuffle you’ll have a mixture of black and red cards, but the Ace of Spades will still be before the 2 of Spades, which is before the 3, and so on.
There are stage magicians who learn to shuffle ‘perfectly,’ which is to say, they can shuffle a deck so that the cards combine left, right, left, right; one card and only one comes from each half of the deck during the shuffle. The result is extremely UN-random. Fortunately, most humans can’t do that; cards come together in clumps of one, two, or three cards at a time. The upshot of that is important if you want to have a computer shuffle for you.
There is a widely-accepted scheme for this, and here’s how it works. You start by “cutting the deck” into two parts. Now, you take a card from the bottom of one of the stacks, and put it on your deck. Keep going until you’ve put all the cards back on the deck. You decide which part to take a card from based on their size. If the left stack is twice as big as the right stack, then you should be taking cards from it twice as often. Since the ratio of cards in the stacks changes every time you take a card from one of them, you have to recalculate the chance of taking a card from one or the other after every card, but that’s what computers are good for.
Prof. Mann also spends time talking about how to measure the randomness, going into ways to measure “distance,” “rising sequences,” and other exotica. I’m going to stick to something easier to explain. I want to know how much shuffling I have to do in order to have a roughly equal chance of getting any card in the deck.
I’m going to start with dealing bridge, or hearts. In both games, you deal out all the cards to four players, play the cards to make tricks of four cards each, then shuffle them together and deal them out again. Seven shuffles comes close to putting a deck into random order. But in this case, I don’t care! As soon as I get my cards, I’m going to sort them into suits anyway. What I care about is getting every card in the deck equally often, and not being able to know ahead of time which ones I’m getting.
Now, when the dealer deals out the cards, I will receive every 4th card in the deck. Thus, in order to get a random hand, shuffling doesn’t have to make sure that a card is equally likely to appear anywhere in the deck, but just that it has roughly even chances of moving to a position that gets it dealt to a random player.
In order to get a sense of what that took, I wrote a computer program to deal hands, count cards, and give me a score of the results. Let’s say I get dealt a thousand hands. I’ll see 25% of the deck each hand. If the results are ‘perfectly’ random, I ought to be dealt each card 250 times during the game. So I’m grading by how close a shuffling scheme gets me to that. If the card I get dealt most often shows up 3 times as often as my least-seen card, that shuffling scheme will be scored as 33 on a scale of 1 to 100: (# of appearances of rarest card)/(# of appearances of commonest card). If my rarest card shows up 240 times and the most common 260, that would get a score of 92. Personally, unless there’s money on the line, I’ll be quite happy with anything over 90.
Let’s put the computer to work. After having it deal a million hands, with 100,000 of them after one riffle shuffle, another 100,000 after two shuffles, and so forth. I got the following results: one shuffle scored an abysmal 13, three shuffles scored 61, and six shuffles got me to 90. Seven scores 93.
Saaaaay, wait a minute. Didn’t I say that seven was wrong? I did. We’re just getting started.
First of all, Prof. Mann splits the deck into two parts at some random point in the deck. But in the real world, humans don’t shuffle a deck by splitting it into one stack of seven cards and one of 45. We split it right near the middle pretty much all the time. It’s really hard to shuffle two parts of the deck when one side is much larger than the other, and, frankly, I think most people would feel that shuffling five cards into a stack of 47 cards isn’t very random. And they’d be right. Making the computer cut near the middle helped quite a bit, scoring a 92 after the fifth shuffle.
By the way, even when I asked for a dozen shuffles, or even two dozen, I still didn’t see a score much higher than 97. I’m not sure if it’s just the nature of probability, or an artifact of shuffling with a computer’s pseudo-random number generator. Just so’s ya know.
Second of all, who plays hearts by opening a brand new deck for every hand? After each hand, the cards have been somewhat reordered, true, but not exactly the same every time. With trick-taking games, cards are going to tend to clump by suit. In order to include that factor, I decided to pretend the first thirteen cards were diamonds, the next 13 were clubs, and so on. I had the computer “players” sort their cards numerically (as in, card 2, card 5, card 6, card 8...card 46, card 49, card 51), which would arrange their hands by suit, and then I made ‘tricks’ by having them each play their first card, then their second card, and so on. The results is that the first trick might have cards 0, 2, 3, 5; the second trick 1, 4, 6, 8; the third 7, 9, 11, 12; and so on. Tricks are played face-up, but usually stacked face down, so when I was done ‘collecting’ the tricks, the deck order as it went to the next dealer to be shuffled would be ...7, 9, 11, 12, 1, 4, 6, 8, 0, 2, 3, 5.
What I did not include in this model are situations where players don’t play their highest value for each suit every time. There’s no trumping or sloughing. I’ve got one player taking all the tricks, instead of splitting them up among all the players. In short, I’ve left out many things that would increase the randomness of the deck going into the next hand.
There’s a catch. My previous scoring method won’t work here. With the cards jiggling around like this, it’s super-easy to see each card in the deck fairly evenly. What could happen, for example, is that you might get dealt most of the same cards that the player on your left had last time, or you might get the cards from the first two tricks every time. That’s not good, but it wouldn’t show up with the scoring method I was using. So I decided to check to see how often I would get the same card two hands in a row. The odds of that happening should be 25% in a four-player game where we deal out the whole deck.
I’m sure you won’t be too surprised to learn that with only two shuffles, the worst card only showed up two hands in succession 16% of the time. The best one showed up 25% of the time. With four shuffles, the worst was 22% and best was 25%. Five shuffles was enough to put the score (worst/best) back in the 90% range, with the average chance of seeing a card two hands in a row at 24.9%, right where it should be.
So now we’re at the point where you’ll see every card in the deck about the same number of times, with the chance that you’ll see ‘clumps’ of specific cards from one hqnd to the next rather unlikely, even if you should be clever enough to be able to identify those clumps. But there’s still something else missing from our scenario: the cut.
Cutting the cards does not make the cards more random. But what it does do is randomize who gets the hand. “Shooting the moon” in the game of Hearts means that one player took all the tricks that had hearts in them. Usually it means they got most of the tricks, period, but for the sake of argument, lets say that this results in a serious clump of hearts, and that the next dealer doesn’t shuffle very well, so that somebody’s likely to get more than their fair share of hearts. Which player gets it is still going to be quite random, because cutting a single card deeper or shallower will shift the hands left or right.
If you’re dealing Pinochle, then boy are you in luck. It’s a 48-card deck, but there are two cards of each value, so there’s only 24 different unique cards. Fewer unique values means it’s easier to mix them up. The duplicate cards did mess up my scoring system a bit. If the current hand has two Jacks of Hearts, and the previous hand had one, then my program will count +2 for the Jack, even though only one of them actually came back again. On the other hand, since all the cards are being treated that way, I still want to see a narrow range between the card that comes back in the next hand the least, and the one that comes back the most. After three shuffles, the score was 92.
I think it’s worth repeating that these numbers do not include the fact that sometimes my tricks will go on the top of the deck, and sometimes they’ll go on the bottom, that cards will be played less sequentially than my model does, and other bits of real-world noise that help nudge the randomness of the final deal upwards.
Or we can go back to the original process, where I start each shuffle with a perfectly ordered deck. It took six shuffles to get a standard 52-card deck to score 90. It only takes four riffle shuffles of a pinochle deck to reach 90. Five shuffles earns a score of 95. Well, unless you’re one of those pinochle players who deals out cards three at a time. That’s really bad. Now you’ll have to shuffle the deck six times before the score will break 90.
What if you’re playing a game where the hand size is seven? In that case, position does matter. If cards at the bottom don’t get shuffled up to the top, they’ll never be seen. So, you do need to shuffle seven times if you're playing Crazy 8's or Uno or King's Corners or another game with a seven-card hand. And you're a fanatic. Realistically, it probably only matters if you're playing Poker, or if you're playing at the senior center.
I started out by criticizing mathematicians who didn’t understand games. I readily confess that I have left myself wide open to criticism of being a game designer who doesn’t understand probability. I have pointed out some of the places where I took some short cuts. I welcome feedback or comments on this topic. You can reach me at “games” @ this domain.