Gemp Shuffler Investigation
NOLINK
If you've played on Gemp for any length of time, you've probably heard someone say — or said yourself — that the shuffler feels off. Maybe you've experienced it firsthand: you mulligan and get back half the same cards. Or you keep drawing the same clump of companions every game. Or your deck just feels like it isn't really shuffling.
This post will show the results of an investigation done to identify if there are any issues.
What People Report
The complaints generally fall into a few buckets:
- "I mulligan and get the same cards back." You pitch your opening hand, shuffle up, draw again — and three of the same cards are right back staring at you.
- "The shuffle just doesn't feel random." Cards seem to clump. You go half the game without seeing a card you're running four copies of, or you draw all four in a row.
- "This never seems to happen when I shuffle at my kitchen table!"
The question is: are they evidence of a bug, or evidence of something else?
What We Did
To test this properly, we needed to remove human bias from the equation; hard numbers only.
We built a test harness using a 60-card Highlander deck, meaning 60 1x cards; no duplicates. If you mulligan a normal deck and see "another copy" of a card you had before, that could be a completely different copy. With a Highlander deck, if a card comes back after a mulligan, it is the exact same card.
We then ran two tests:
Test 1: Shuffle Fairness. We created 10,000 fresh games, shuffled the deck, and drew an opening hand each time. Then we checked: does every card in the deck show up with roughly equal frequency? Or do some cards mysteriously favor certain positions?
Test 2: Mulligan Overlap. We simulated the mulligan 50,000 times: shuffle, draw 8 cards, put them all back, shuffle again, draw 6 (the standard mulligan penalty). Then we counted exactly how many cards from the original hand reappeared.
The Results
Shuffle Fairness: PASS
Every card appeared in the opening hand almost exactly as often as every other card, across 50,000 games. The statistical test (chi-squared) came in at 49.82 against a failure threshold of 77.93. Not a single card out of 60 was flagged as appearing unusually often or rarely. The shuffle is uniform.
Full output:
Code: Select all
=== SEEDING BIAS TEST ===
Iterations: 50000 | Hand size: 8 | Deck size: 60
Expected frequency per card: 6666.7 (stddev 76.0)
Card Observed Expected Z-score
------------------------------------------
1_1 6556 6666.7 -1.46
1_2 6628 6666.7 -0.51
1_3 6730 6666.7 0.83
1_4 6788 6666.7 1.60
1_5 6757 6666.7 1.19
1_6 6696 6666.7 0.39
1_7 6632 6666.7 -0.46
1_8 6599 6666.7 -0.89
1_9 6847 6666.7 2.37
1_10 6624 6666.7 -0.56
1_11 6623 6666.7 -0.57
1_12 6713 6666.7 0.61
1_13 6771 6666.7 1.37
1_14 6616 6666.7 -0.67
1_15 6672 6666.7 0.07
1_16 6634 6666.7 -0.43
1_17 6724 6666.7 0.75
1_18 6697 6666.7 0.40
1_19 6577 6666.7 -1.18
1_20 6799 6666.7 1.74
1_21 6804 6666.7 1.81
1_22 6598 6666.7 -0.90
1_23 6673 6666.7 0.08
1_24 6692 6666.7 0.33
1_25 6546 6666.7 -1.59
1_26 6870 6666.7 2.68
1_27 6648 6666.7 -0.25
1_28 6645 6666.7 -0.29
1_29 6639 6666.7 -0.36
1_30 6575 6666.7 -1.21
1_31 6604 6666.7 -0.82
1_32 6628 6666.7 -0.51
1_33 6571 6666.7 -1.26
1_34 6634 6666.7 -0.43
1_35 6573 6666.7 -1.23
1_36 6673 6666.7 0.08
1_37 6704 6666.7 0.49
1_38 6659 6666.7 -0.10
1_39 6593 6666.7 -0.97
1_40 6653 6666.7 -0.18
1_41 6772 6666.7 1.39
1_42 6703 6666.7 0.48
1_43 6631 6666.7 -0.47
1_44 6556 6666.7 -1.46
1_45 6713 6666.7 0.61
1_46 6724 6666.7 0.75
1_47 6770 6666.7 1.36
1_48 6578 6666.7 -1.17
1_49 6652 6666.7 -0.19
1_50 6748 6666.7 1.07
1_51 6593 6666.7 -0.97
1_52 6654 6666.7 -0.17
1_53 6681 6666.7 0.19
1_54 6730 6666.7 0.83
1_55 6567 6666.7 -1.31
1_56 6639 6666.7 -0.36
1_57 6684 6666.7 0.23
1_58 6635 6666.7 -0.42
1_59 6625 6666.7 -0.55
1_60 6680 6666.7 0.18
Chi-squared: 49.82 (df=59, critical=77.93 at p=0.05)
Outliers (|z| > 3): 0 of 60 cards
Verdict: PASS — no significant bias detectedMulligan Overlap: PASS — but the expected results are surprising
Here's the summary from 50,000 mulligan trials:
| Cards repeated from original hand | How often it happened | How often math says it should happen |
|---|---|---|
| 0 (completely fresh hand) | 40.77% | 40.67% |
| 1 card repeated | 41.49% | 41.53% |
| 2 cards repeated | 15.24% | 15.14% |
| 3 cards repeated | 2.33% | 2.47% |
| 4+ cards repeated | 0.17% | 0.20% |
The observed results match the mathematically predicted distribution almost perfectly. Chi-squared: 5.92 against a failure threshold of 9.46.
Full output:
Code: Select all
=== MULLIGAN OVERLAP TEST ===
Iterations: 50000 | Deck: 60 | Initial hand: 8 | Mulligan hand: 6
Expected overlap: 0.80 cards
Overlap Observed Expected Obs% Exp%
-------------------------------------------------------
0 20384 20332.6 40.77% 40.67%
1 20745 20765.2 41.49% 41.53%
2 7620 7570.6 15.24% 15.14%
3 1166 1236.0 2.33% 2.47%
4 81 92.7 0.16% 0.19%
5 4 2.9 0.01% 0.01%
6 0 0.0 0.00% 0.00%
Chi-squared: 5.92 (df=4, critical=9.46 at p=0.05)
Verdict: PASS — overlap matches hypergeometric expectationRead that table again, though. Only 41% of mulligans give you a completely fresh hand. The majority of the time — 59% — you will see at least one card from your original hand come back.
This is similar to the so-called Birthday Paradox, in which if you have 23 people in a room, the chances that 2 of them share a birthday is 50%. This is surprising to normal human intuition, who would assume you'd have to have closer to 100 or more for that to be the case. But here we can see that you should instead expect with every mulligan to get back at least 1 card you shuffled away.
And that's with a Highlander deck. In a real deck with 4 copies of key cards, you're not just getting the same card back — you're getting cards that look identical from a much larger pool. If you're running 4 copies of five different staples, that's 20 of your 60 cards that all feel like "the same stuff." No shuffler in the world is going to make those feel rare.
Technical Details
For those who want to peek under the hood:
The RNG. Gemp uses java.util.Collections.shuffle() backed by java.util.concurrent.ThreadLocalRandom. This is a solid PRNG — it's seeded per-thread from System.nanoTime() mixed with internal probe values. It's not java.util.Random with its weaker linear scramble, and it doesn't suffer from the classic "two Random instances created at the same millisecond get the same seed" problem.
The mulligan code path. When you mulligan, the server removes your hand, appends those cards back to the deck, and runs a full Collections.shuffle() on the entire deck before drawing your new hand. There is no partial shuffle, no "put them on top and cut" — it's a complete Fisher-Yates shuffle of all 60 cards.
Why the overlap math works out this way. The probability of getting exactly k cards back from your original hand follows the hypergeometric distribution. You're drawing 6 cards from a 60-card deck where 8 are "marked" (your original hand). The expected overlap is 6 x 8/60 = 0.8 cards. That's less than 1 — but the distribution is lumpy. Getting 0 or 1 back are both very common (~ 41% each), while getting 2 back happens about 1 in 7 mulligans. Rare enough to feel wrong, common enough to happen regularly.
The birthday paradox factor. In a real (non-Highlander) deck, perceived overlap is dramatically amplified. Consider: if you run 4 copies each of 5 key cards, you have 20 cards that are "memorable." The chance of drawing at least one of those 20 in any 6-card draw is approximately 93%. So even with a perfect shuffle, you'll almost always see something familiar — because your deck is built to give you those cards.
Bottom line. We tested it. The shuffler does what a shuffler should do. The human brain is just really bad at intuiting probability, and really good at remembering the time it got the same three cards back after a mulligan.