FIFA World Cup: Machine learning predicts two possible winners

By Sourabh Kulesh | Published on 13 Jun 2018
  • A research team from Germany used bookmakers’ odds and random-forest approach as the data for their project to narrow down two teams that could win this year’s FIFA World Cup tournament.

FIFA World Cup: Machine learning predicts two possible winners

Lionel Messi’s Argentina lost four staright major FIFA cup finals, including one in the 2014 World Cup in Brazil, leaving people brooding and waiting for this year’s tournament to probably see the little talisman lift an international trophy. But a team of researchers hailing from Germany has deduced that one of the two European teams -- Germany and Spain -- will be crowned as the winner of the FIFA World Cup 2018 which kickstarts from June 14 in Russia.

Based on their Machine Learning (ML) project and after simulating the entire tournament 100,000 times, the team of researchers led by Andreas Groll from the Technical University of Dortmund has said that if either one of Germany or Spain reaches the final, they will win the tournament, however, if both the teams clash in the ultimate match, Germany has higher odds to win the showdown. The researchers did not mention if none of the teams reach the final, who will be crowned the champion.

For the results, the team used the data they got from a number of bookmakers’ odds which themselves use professional statisticians to analyse extensive databases of results in a way that quantifies the probability of different outcomes of any possible match and the tournament. Based on their odds, the researchers found that Brazil is the clear favorite to win the 2018 World Cup, with a probability of 16.6 percent, followed by Germany (15.8 percent) and Spain (12.5 percent).

The team then used the outcome with a combination of ML and conventional statistics, a method called a random-forest approach, to identify the likely winner. As for random-forest approach, it is a technique that has emerged in recent years as a powerful way to analyse large amounts of data dodging the risks of other methods.

This approach is based on decision tree method in which an outcome is calculated at each branch by reference to a set of training data. Since, this decision tree technique is not efficacious and suffers from overfitting, a problem which gives distorted results from the training data at the latter stages of the branching process, the researchers moved on to random-forest technique that instead of calculating the outcome at every branch, calculates the outcome of random branches.

The technique revealed the factors which are most important in determining the outcome and narrowed down the results to two teams being favoured to win the World Cup. The data used took in consideration economic factors such as the country’s GDP and population, FIFA’s ranking of national teams and the properties of the teams themselves, such as their average age, the number of Champions League players they have, home advantage, among others to finally come out with two names.

The outcome derived on the basis of data said that in the beginning of the tournament, Spain will be the most likely winner, with a probability of 17.8 percent. Considering the structure of tournaments and upsets, the results may change.

“Spain is slightly favored over Germany mainly due to the fact that Germany has a comparatively high chance to drop out in the round-of-sixteen,” Groll said, adding that based on the entire tournament simulation and on the most probable tournament course, “instead of the Spanish the German team would win the World Cup.

According to the research, if Germany clears the group phase of the competition, it is more likely to face strong opposition in the 16-team knockout phase calculating Germany’s chances of reaching the quarter-finals as 58 percent.

On the contrary, Spain is likely to face not-so-strong opposition in the final 16 and has a 73 percent chance of reaching the quarter-finals. If both make the quarter-finals, they have a more or less equal chance of winning.

Sourabh Kulesh

A journalist at heart; has knowledge of a wide gamut of topics related to enterprise and consumer tech.

email Protection Status