METHODOLOGY · ARTICLE

Six Bins

Why one event’s record is much closer to a coin flip than the precise-looking percentage suggests. Read individual tournament results as stories, not measurements.

The Editor · Protocol · Methodology

“Player X had a 60% win rate at the event.” It sounds like a measurement. It has a decimal-ready shape, the same shape as a faction’s 53.4% in a season table, and it invites the same kind of trust.

But at a five-round event there is no 55% personal win rate. There is no 67%. A player finishes a five-round event with exactly one of six records — 0–5, 1–4, 2–3, 3–2, 4–1, or 5–0 — and those map to exactly six win rates: 0%, 20%, 40%, 60%, 80%, 100%. Six bins. Every player at the event lands in one of them. The reported “win rate” isn’t a number on a continuous dial; it is which of six boxes a player fell into.

The number is a sample, not the thing itself

A player has some underlying skill, and against the field they face it produces some true win probability. That true rate is continuous — a player can genuinely be a 0.54 or a 0.58. What you observe at one event is not that rate. It is five games, each a win-or-loss, sampled from it. In statistical terms the event record is a binomial sample of size five, and the reported win rate is the sample proportion. Sample proportions at a sample size of five are coarse, and they are noisy.

How noisy is worth seeing concretely.

One player, one true rate, six possible weekends

Take a genuinely above-average player whose true win probability is 0.55 — better than the field, the kind of player who should be winning more than they lose. Run their five-round event a thousand times over and the records come out like this:

Distribution of 5-round records for a true rate of p = 0.55. The most likely single result, 3–2, accounts for only about a third of weekends.

Record	Win rate	Probability
5–0	100%	~5%
4–1	80%	~21%
3–2	60%	~34%
2–3	40%	~28%
1–4	20%	~11%
0–5	0%	~2%

Same player. Same skill. Same true rate, every time. And still: roughly one weekend in twenty they go 5–0, and roughly one weekend in nine they go 1–4. Both happen. Neither is the player getting better or worse between events — it is the same 0.55 rate, landing in different bins. The single most likely result, 3–2, accounts for only about a third of weekends; most of the time this player does not post their most-likely record.

Put a number on the spread. At a sample size of five and a rate near 0.55, the standard deviation of the observed win rate is about 22 percentage points. That is the error bar around a single event’s individual result. Reading “they went 4–1” as evidence a player sits at a true 80% rate is reading something close to twenty points of noise as if it were signal.

Why the season tables still look smooth

If individual results are this coarse, why does a faction’s win-rate history across a season look like a smooth curve rather than a staircase of six values?

Because smoothness is a property of the aggregation, not of the outcomes underneath it. Compile thousands of individual records — across many players, many factions, many events — and the six-bin discreteness of any one player washes out. The histogram fills in. But every observation feeding that smooth curve was still one player in one of six boxes. The curve’s smoothness is real; the precision it seems to promise about any single contributing result is not. “53.4%” is a perfectly good summary of a large pile of games. It does not mean any individual result in the pile was measured to a tenth of a percent.

The same insight, two zoom levels

This is the companion to The Big Soup Problem seen from closer in. Big Soup looks between events and says: events are distinct clusters, and pooling them oversmooths real variation. Six Bins looks inside a single event and says: even here, individual outcomes are discrete and noisy, and stacking them up manufactures an apparent precision that exists at no level you actually read a result from. One insight, two scales — the precision implied by the decimal point lives in neither the single event nor the honest combination of events.

Closing — stories, not measurements

The Archive dwells on this because it governs how the lower-level numbers should be read. “Player Y went 4–1 at the GT” is a story — a true and worth-telling story, but a story, with all the contingency that word carries. “Faction Z went 60% at this regional” is the same story scaled up: a small set of discrete, correlated outcomes, smoothed by aggregation and then printed with a decimal point that implies a precision the underlying data was never able to carry.

None of this means the casual “60% win rate” is wrong to say. It is fine shorthand. The point is only that the number means considerably less than its form suggests — and that a result from a single event is best read as a story about what happened, not a measurement of how good someone is.

Footnote — Swiss makes it worse

The six-bin picture above quietly assumes the five games are independent draws. In Swiss pairings they are not. A player’s round-three record determines who they are paired against in round four: the 5–0 player spent the weekend climbing the top of the bracket, the 1–4 player spent it at the bottom. The games are correlated by design. That correlation doesn’t shrink the noise — it means the standard significance tests, which assume independent observations, overstate their confidence on top of everything described here. Swiss Isn’t Random takes up that thread on its own.