Inside the Magic of Matchmaking

Keeping you out of Elo hell is more complicated than you might think.

When you’re trying to strap together a top-flight multiplayer game, every role in a studio has to carry its weight. But while we might consider some more glamorous than others—the designer honing the perfect combat mechanic, or the artist casting the color and form of the battle map—there’s one job on the other side of the glittering gulf that’s a lot more important than most would give them credit for: the engineer who balances the matchmaking system that holds the whole thing together.

In the cutthroat world of competitive multiplayer mega-games, proper matchmaking is vital to ensuring that an otherwise great game has can endure long term when pitted against the Dotas and Counter-Strikes of the world. But while developers pour hours into optimizing the experience for players of all skill levels, it remains one of the most controversial and widely discussed aspects of modern gaming. Like the janitor at your workplace, you’re unlikely to appreciate or even think about the matchmaking algorithm when it works properly, but you can definitely tell when it hasn’t been doing its job.

“You might describe it as a thankless job,” said Joost van Dongen, lead programmer and co-founder of Ronimo Games, which is best known for the smash success indie MOBA Awesomenauts.

Van Dongen said that, while the team at Ronimo hoped for the best when it released its game in mid-2012, it never really expected to seriously compete with the likes of League of Legends or Dota 2. After all, it was a tiny indie dev trying to make a mark in a genre dominated by two titans of the industry.

Credit: Ronimo Games

Much to Ronimo’s surprise, however, the game not only sold shockingly well, but a die-hard core of players kept playing it and playing it, for thousands of hours—a use case that van Dongen and his fellow developers never even anticipated.

“We thought that people would use it as a break from League, play for a dozen hours,” he said. “But no. They just kept getting better and better, to the point that they’ve put in thousands of hours. Eventually, we realized that we needed to rethink our matchmaking system to make a better play experience possible for these hardcore players.”

Van Dongen knows this problem well—it’s become a meme in its own right. If you’ve sunk hundreds of hours into one of the handful of competitive games that have a permanent throne in Steam’s “most-played” charts, you’ve probably heard some spirited forum-goer complaining about “Elo hell.” While the term lacks a precise definition, it usually refers to a period of time where a player is struggling to find entertaining matches, especially in a team-based game, where coordination often makes the difference between a jubilant victory or a crushing defeat. The “Elo” in “Elo hell” refers to Arpad Elo, a physics professor who invented the first robust competitive ranking system for chess, which was later applied to many other games. The logic behind the system was fairly simple: Resting on an assumption that chess performance follows a traditional bell curve (or normal distribution), the system gives points to players that win a match and subtracts points from players that lose a match. If a player with a higher score beats a player with a lower score, they receive less points than they would a player or equal or higher score, and vice versa.

While the particular math behind each individual system might vary, the bedrock foundations of Elo’s theory have rarely been challenged. Rather, as van Dongen put it, it’s the act of applying these principles to your individual game that thwarts so many developers. Elo’s original system was designed specifically for chess, a symmetric two-player game. Trying to translate that to a team-based game like Awesomenauts was a process of trial-and-error, which van Dongen said is not uncommon for popular multiplayer games. (For example, Rocket League continued to futz with its formula well into its heyday, introducing entirely new progress systems in its second and fifth competitive seasons.)

Rocket League
Credit: Psyonix

According to van Dongen, in the early days of Awesomenauts, if a player had a much lower skill rating than their teammates but still managed to win, the system would attempt to correct the apparent incongruity by showering them with additional points. Eventually, sneaky enthusiasts figured this out, and began to willingly throw their early games in order to goose their rating and make later progress faster.

Once Ronimo discovered this behavior, the developer adjusted the system to specifically give less points to players of lower skill levels who were buoyed by their more elite comrades. “This is the exact kind of problem that you often run into with matchmaking systems,” van Dongen said. “If your game is successful, there will always be a group of players trying to game it for their own amusement, so you have to design accordingly.”

According to Dr. Mark Glickman, Senior Lecturer on Statistics at Harvard University, these issues often spring from the same key omission. Most traditional Elo systems don’t have a way to determine how accurate their measure of a player’s skill level is—as far as the system is concerned, the rating of a player with 1,000 games notched is just as reliable as one with 10 games under their belt.

Counter-Strike: Global Offensive
Credit: Valve Corporation

That’s why Dr. Glickman devised his own rating scale, which he dubbed “Glicko.” The scale is now used in several high-profile games, including Counter-Strike: Global Offensive. While the exact differences between Glicko and Elo are buried in fairly technical math, it basically boils down to one singular addition: the concept of “ratings reliability,” essentially a way for the system to know whether or not to trust itself.

When a player registers many games played in the same rating period, they’re considered to have high reliability. The more reliable your opponent’s rating is, the more points you get when you defeat them, all else being equal. Of course, it cuts the other way, too: If you have a high reliability, your rating is less likely to swing dramatically, and you thus get less points for wins. Glicko also takes into account periods of inactivity—if you’ve racked up a lot of matches overall but haven’t played in a year or two, the rating system will do you no favors.

“If you were watching two people play chess who you had never seen play chess, and someone asked you, ‘Who’s more likely to win?’ you’d probably say, ‘Gee, I don’t know,’” Dr. Glickman said. “It’s basically a coin flip, right? Well, that’s exactly what the system takes into account.”

As van Dongen took great pains to point out, your competitive rating is far from the only factor that comes into play when you try to get into a match. In an online game, ping is king, and trying to keep every region of the globe populated sometimes seems impossible. (“Australia has it the worst,” he said. “They’re just so far from everybody. You can’t do a lot about that.”) In terms of other indie multiplayer games, he said that many studios like to make their matchmaking systems as complicated as possible, with many variables taken into account across different game modes. To van Dongen, this is the number one mistake an indie studio can make, because it can splinter your tiny playerbase and inflate matchmaking times.

“If you want to make a complex system, you better have a massive amount of players, or what players you do have are going to have a bad experience,” he said. “That’s just how it works.”

For his part, Dr. Glickman said there’s still a lot of room for these ranking systems to improve in their own right, especially when it comes to games that aren’t quite as simple and elegant as chess. An Elo system, for instance, typically assumes that ratings are transitive: If player A usually beats player B, and player B usually beats player C, then player A will almost always triumph over player C. Dr. Glickman has always found this particular construction galling, since you don’t have to look very far in the world of sports to find underdogs managing to overcome the odds. (An exemple from the world of boxing: George Foreman blew out Joe Frazier in two rounds; Muhammad Ali beat George Foreman; yet Joe Frazier managed to decisively defeat Ali in their first meeting.) Dr. Glickman is currently working with one of his graduate students on a system that might solve this problem, but he emphasized that he won’t be able to know how effective it is until it’s out in the world.

Dota 2 splash art, and possible artist’s rendering of Elo hell.
Credit: Valve Corporation

After working on Awesomenauts off-and-on since its 2012 release, van Dongen said he’s learned a few lessons on how Ronimo will approach its next multiplayer game. More than anything, he feels that splitting player progression into a glowing “level” bar that creeps ever-higher even when you lose and an invisible Elo score that fluctuates with every win or loss is key to a happy playerbase. He views displaying the point loss to a losing player as a sort of “double punishment”—they already feel bad because they lost, so why twist the knife by quantifying it? While the first bar would reward players more when they accomplish feats of skill, or simply win matches, it mostly tracks with playtime, which gives players a sense of constant advancement even when they’re getting utterly shellacked out there.

This is part of van Dongen’s overall philosophy of matchmaking, which he likened to a form of stage magic. When I told him that multiple notable game developers like Ubisoft and Psyonix refused to talk to me for this piece, he simply laughed.

“Yes, of course. A magician doesn’t reveal his secrets,” he said. “Matchmaking takes science into account, but it’s more of an illusion than anything. You want to convince your players that they’re having a good experience, and a big part of that is them trusting the system…. Most players have to feel a sense that they’re getting better at the game to keep playing, even when they’ve stopped improving as players. Maybe they plateaued in skill, maybe they just stopped taking it as seriously, maybe they’ve reached the edge of their talent. Whatever the reason, you have to keep that feeling of progression, or they will drop the game. It’s a balancing act, for sure. The more your players know, the more they can exploit the system.”

Header image: Blizzard Entertainment

You may also like