Using Google’s Page Rank algorithm to find the most crucial players in the build-up phase – data analysis
Google is one of the world’s most valuable companies, with an unquestionable hold over the search engine market, and influence in many other areas. There are many stats that show just how big Google is, but this hardly seems like something that needs to be proven. Everyone knows Google!
When you ask Google a question, it searches an index of web pages and returns results depending onits ranking algorithm, which depends on over 200 parameters. It is likely that many of these are unique to Google, but Google’s founders – Larry Page and Sergey Brin – only wrote an academic paper about one: PageRank. And Page and Brin’s impact has not been limited to search; PageRank has been adapted for many uses beyond the web, from finding the leaders of terrorist networks to optimising engineering based on traffic flow. PageRank gives a measure of the quality of any webpage, or person, or stop-sign – and it can be used in football too, to find the most crucial player in a passing network, in a metric we will call PassRank.
The basic idea of PageRank is that the importance of any web page can be judged by the pages that link to it. If there are many pages that link to a specific page, it is likely that the page is both relevant and important, and that it is a trustworthy source of information. This is even truer if the pages that are linking to the site are themselves respected – an important site has a reputation to uphold, which would be damaged if it were linking to poor quality sites, or sites that did not add any useful information. As a result, a score can be assigned to every page, based off the scores of the pages that link to it.
Likewise, if many players pass to a central figure, then it is likely that he is pretty good, especially if the players passing to him are themselves good – and so more likely to have made the right decision when distributing the ball.
How does this work in practice? We can represent the passes between players in a team by a directed graph, with edges for passes and weights representing the number of times each pass is made. This graph can be summarised by a transition matrix, T, whose entries give the proportion of passes that a given player made to a particular recipient. To illustrate this, consider the reduced network of Arsenal attackers below, taken from the Arsenal v West Ham match on the opening day of the 2015-16 season.
Let us now follow the ball in a random series of passes. It can start at any player with equal probability – say it starts with Cazorla. Of the 23 passes Cazorla made (to another attacker) all game, 15 went to O, giving a probability of 15/23 that the ball will be at Özil after one pass; likewise, it will be at Oxlade-Chamberlain or Giroud with probabilities 5/23 and 3/23 respectively. So if we represent the probabilities of each player having the ball by a vector, we get:
After two passes, the probability is given by T2v0, and so on. Given certain conditions, which are normally satisfied, the sequence T3v0, T4v0, … eventually converges to equilibrium – in this case, with Özil being the most likely player to have the ball.
We also have to account for the chance that the team loses the ball and wins it back. Initially, we represent this by the value α, set to be the probability that an attempted pass is successful. We then restart with the vector of equal probabilities. So our updated rule is in fact:
What is the point of this? Well, it gives us an idea of how central each player is in a team’s attacking build-up play, and of how often their teammates look to pass them the ball. As a consequence, we can also see if a team over-relies on certain players, and who to close down more carefully when we are defending. To test the method, consider the results when applying it to Barcelona in 2014-15:
We can see that Messi is at the centre of most attacks, unsurprisingly, and there is also a strong preference to utilise the full-backs and Neymar from left-wing. Players like Busquets, Xavi and Rakitic act as pivots for attacks, helping to recycle possession. Predictably, defenders and goalkeepers are least central; perhaps more unexpected is that Suárez is also relatively low down, ranked 12th, since he tends to finish attacks, rather than help in the build-up.
There are a couple of ways we can modify this to address slightly different questions. First, we can weight passes by Expected Threat (xT) gain instead, to see which players are key to creating dangerous attacks – rather than just in recycling possession. (We do not need to normalise now, as players should not be punished for a large total xT gain.)
The results show that Iniesta was most influential in creating attacks, as were Neymar and Xavi. Some players have negative values, because they tended to recycle the ball instead of increasing the threat of an attack. Most interestingly, Messi is at the bottom of the list. This is most likely because he often dribbles or shoots, which aren’t accounted for in the xT model, and because he picks the ball up in dangerous positions, so his passes have little xT available to gain. Because of these limitations, the xT results are most meaningful for midfielders.
Looking at the tables together also shows us something about the efficiency of players – Iniesta is clearly very good at creating chances out of relatively little, whereas Busquets, who is high in the first table but low in the second, receives the ball a lot but is not as attacking-minded.
We could have also optimised the values of α (how often the team restarts an attack) and v0 (for example, to favour positions or players that start with the ball more often). But even without these improvements, this is a tool that can highlight players who are critical in a team’s attacking play, and therefore should be shut down or even man-marked by the defence – especially early in an attack.