Wordle: A Spanish data scientist’s strategy to save 99% of the time at Wordle | science and technology

There are two types of social media users: those who are addicted to Wordle and those who are intrigued by the little green and yellow squares that their friends keep sharing on Twitter. The simple word game involves guessing a five-letter hidden word in six tries with the correct letters in the correct position appearing in green; the right letter in the wrong place appearing in yellow and the wrong letter appearing in grey. It may sound basic, but the formula has gone viral in recent months with the 90 fans who played it in November currently reaching 300,000.

The inventor of the original English version, Welsh-born software engineer Josh Wardle, created it during the pandemic to entertain his partner – a pun addict, as he put it. The New York Times. He then had his family play on a group chat and he later became popular through Twitter and Facebook.

The simplicity of the game is, in fact, the key to its success. The player enters a no-frills, no-cost, no-registration, no-ads page and tries to guess a five-letter word in a format similar to the legendary decryption game, Mastermind. Wardle sets only one challenge a day, an idea he borrowed from The New York Times‘ Spelling Bee, so his followers have to wait 24 hours to play again. But the game is perhaps more engaging because of it. Once it has been played several times, a question arises: is there an optimal method to solve the puzzle and reduce the number of tries?

Esteban Moro, professor, researcher and data scientist at Carlos III University in Madrid and visiting professor at the Massachusetts Institute of Technology (MIT), sought a scientific answer to this question. In a blog post, he outlined a strategy that would solve 99% of the 206 challenges Wordle has so far posed in less than six steps, although this method cannot be applied to other versions of the game. such as those circulating in Spanish and even in Galician.

His strategy is based on two factors: starting the game with a word identified as the best option, and making successive attempts by following a simple rule. But how do you find this rule?

Moro used a free software programming language called R for his calculations, which allows him to perform statistical analysis and try to replicate Wordle on his computer. He then created a game with the same rules that includes all 12,972 five-letter words that exist in the English language. The program then simulated successive parts, always starting with the word “aeros”, which has the five most commonly used letters in English. In the next five attempts, a random word is chosen from all that could match the solution. With these instructions, the program managed to find the solution in less than six steps 80% of the times it had to guess a randomly chosen word, with an average of 5.1 attempts. And he solved almost 90% of the puzzles, with an average of 4.7 tries, when given one of more than 200 puzzles already offered by Wardle.

But there was a way to improve those statistics. Other researchers found that the game’s solutions are not chosen at random from more than 12,000 possibilities: some words were more likely to appear than others. By cross-referencing the correct answers of the previous words with a set of the most commonly used English terms, Moro confirmed that Wardle chooses frequently used words in English, which the game’s inventor also pointed out in his interview with The New York Times, which mentioned that he avoided rare words. “It makes perfect sense,” Moro says from his home in Boston. “For the game to be a success, it must be simple and playable, and choosing the most common terms means that in the end, we are all successful within a few tries.”

I’m a data scientist and as such I’m always on the lookout for those biases and patterns that help us create algorithms

Esteban Moro, data scientist at Carlos III University of Madrid

Moro then changed the algorithm. He programmed the simulations so that, also starting the game with the word “aeros”, he always then chooses the most used term in English among all the possibilities, using a tool that orders the words according to frequency. of use. The results do not improve much for randomly chosen words, but the strategy proves to be much more effective for the words that Wardle had already proposed in its challenges: the program solves 97% of the puzzles in 3.9 attempts on average.

“I’m a data scientist and as such I’m always looking for those biases and patterns that help us create algorithms. So what I did was see that there was a bias in the words that Wardle chose, and exploit that to improve the strategy,” Moro explains.

Was there anything else that could be done to improve the method? Maybe change the starting word? The letters “aeros” comprise the five most frequently used letters in English (as pointed out by Edgar Allan Poe in the cryptographic challenge included in his famous short story The Golden Beetle), but Moro noticed that in the more than 200 solutions published so far in the original version of the game, the “t” appeared more often than the “s”. He then changed the initial word “aeros” to “orate” and – keeping the rule of always choosing the most frequently used word thereafter – the algorithm solved 99% of the puzzles Wardle asked. Moro points out, however, that this two-point improvement in results could be a statistical fluke and that more data would be needed to assess whether it is significant.

Wardle deliberately chooses more common words to make the game more friendly. But a super-difficult Wordle could be programmed, using rarer terms or containing, for example, several letters common to many words. “In English, it would be quite difficult to guess the word ‘belly’ for example, because many words end with these three letters,” Moro explains. In the case of the deliberate inclusion of rare words, Moro’s method would not work, and one would have to detect new biases and adjust the algorithm so that it chooses, for example, the least used terms or those which resemble more to others.

In addition to finding the best possible strategy, there is another recurring question raised by Wordle, which is why it has become so popular all over the world. Moro isn’t alone in believing that part of it has to do with him bringing a certain serenity to our high-octane lifestyles. “Because Wardle only publishes one puzzle per day, this slow brings us synchronized and unhurried social interaction. And that’s one of the successes of the game,” he explains.

Sean N. Ayres