Number crunching in sports – for business and pleasure

Mats Franzén
Institute for Housing and Urban Research, Uppsala University

Christophe Ley & Yves Dominicy (eds.)
Science Meets Sports: When statistics are more than numbers
244 pages, hardcover
Newcastle upon Tyne: Cambridge Scholars Publishing 2020
ISBN 978-1-5275-5856-4

Sport is for positivists. The world of sport is constantly generating new results, and if records are not broken all the time, then every result is noted and judged with the records as a yardstick or against something else, such as personal best or longest losing streak. The number of figures is growing all the time and continues to captivate at least one kind of sports nerd. New generations of results nerds are constantly being added as an inherent part of the world of sport.

Now, you don’t have to be a nerd to be interested in statistics. Every sports fan notes figures that seem interesting; even if you don’t collect them, you will always remember some for your own team or favorite athlete. But one can also try to ponder the figures in a deeper way: make them the subject of statistical analysis. With statistical analysis, patterns in what seem to happen can be distinguished and discovered and taken-for-granted truths shaken and questioned. For sports, this applies not least when the interest is directed towards what is behind the results – and then I think not first of all about different conditions such as access to facilities, sponsors and other so-called contextual or external factors, but about the possibility of statistically analyzing, for example, matches.

Here, a minor revolution is now underway thanks to the development of new data collection technologies that have given access to previously unprecedented datasets and data qualities for analysis. So, the sports practice itself can be made the subject of more intrusive analyses – here, as I see it, are the most exciting new potentialities. The anthology Science Meets Sports is devoted to these new statistical possibilities for analysis of different sports. Unfortunately, access to this new data is taken by previously unprecedented amounts as a given starting point for the various types of analyses vented in the anthology. The analyses are thus based on existing datasets without discussing how this data was collected or constructed in more detail. For those who wish to move forward with the possibilities created by this revolution, this is an unnecessary obstacle, if perhaps not insurmountable.

More systematic analysis of mental strength has been done but is limited by it being, as a rule, based on self-reported data whose validity is ‘neither safe nor confirmable’, as Stephanie Kovalchik sums it up.

The anthology consists of ten chapters. Of them, six are devoted directly to the practice of sports, the actual performances. Here the new analysis possibilities will be most exciting, which is why I leave the chapter on measurement to the reader (it is clearly basic) together with the chapters on ranking and prediction models, betting odds in tennis, and how to establish as fair a game schedule as possible in competitions where everyone should meet everyone. Of the remaining six chapters, five are devoted to ball sports; four of them clearly demonstrate the potential of the new statistical analysis models for the analysis of matches or sequences in matches. Before I get to them, however, I would like to address two chapters dealing with the analysis of other aspects of sports practice.

First a chapter on the inner game of tennis, the mentality that this sport demands. Tennis is an individual sport with its own mentality requirements about which we primarily possess a never-ending stream of so-called anecdotal evidence. More systematic analysis of mental strength has been done but is limited by it being, as a rule, based on self-reported data whose validity is ‘neither safe nor confirmable’, as Stephanie Kovalchik sums it up. However, in this new situation it’s possible to analyze how players handle different moments in matches and how players’ mentality profiles can be constructed based on actual performance. An unresolved problem here, though, is to distinguish player psychology from strategic alignment. Analysis of visual match sequences – the facial expressions they bring – also opens to capture the connection between a player’s emotional reaction to a point and its importance to the next point.

The second of these chapters deals with a simple lie, or myth: the idea of a causal relationship between type of running shoe and risk of injury. The thing is, it’s about overtraining, and that the triggering mechanism is within-body. The thing is also that the idea is a kind of staple of the shoe manufacturers. And although the design of the running shoe can modify the risk of injury to a particular runner, switching to a completely different shoe from the one you’re using involves a clearly increased risk of injury. We know this much thanks to several studies, sometimes with almost experimental setup (control group). But we still don’t know how much running training we can withstand – if we insist on testing the limit.


Four different sports are subject to match analyses: basketball, baseball, football, and netball sports (here tennis and table tennis). The basketball chapter has a wide scope and is devoted to what can be done with the BasketballAnalyzerR computer program. In addition to being able to produce a large amount of descriptive data about teams and individual players, such as tried and successful three-point shots for each player, the assist network for different players, and defensive statistics for each team (ball thefts, blocked shots and picked returns), I liked the exciting possibilities that the network analysis opens to see how a particular team works: How different players depend on each other. The analysis can make the importance of every player rather than just the stars visible.

Analytically, baseball is a relatively simple sport. Data is primarily collected for each ball played. The chapter provides an analysis explaining why so-called home runs have become increasingly common in five consecutive seasons (2015-19), which is believed to make the sport more boring because fewer balls come into play. For each ball, it has become possible to measure the hitting angle and the hitting speed. Now, the probability of a stroke resulting in a home run can be modeled with an additive model based on a function of these pitching data and the categorical effects of season and game month (variation occurs both between and within the seasons). So let the model be specified with data for the four seasons of 2015-18 predicting the number of home runs for the following season. The model turns out to underestimate the number of home runs for the 2019 season by just under 5%. So how can the increased number of home runs be explained this year? How the players hit obviously has to do with it, but there is also something unexplained here. This has come to be discussed in the sport as possibly dependent on changes to the ball’s constitution – something that this reviewer cannot question.

The chapter with football as an example is a case of positional data analysis. During a match, a player usually takes 135,000 positions. However, based on an event log for a match, data can be analyzed in sequences (trajectories) determined by their speed and acceleration, in addition to localization on the pitch. However, with the help of motion models, which are certainly difficult to interpret, control zones can be constructed for players and for teams, where the positions of both teams ultimately give us a kind of dynamic space. The dynamic then lies in what happens in the situation, in the next few seconds – if you lose the ball, the room and the dynamic immediately becomes something else. The point now is not to reconstruct entire matches, but to find interesting and preferably complex situations where the outcome could have been different. However, to do this kind of counterfactual analysis, the positional data must be supplemented with a more traditional video analysis. What you can get in that way could be defined as the most critical match situations: those who could have given the match a different result.

My impression of Science Meets Sports is that there are a number of promising approaches to analysis of the sporting practices, but they are still under development and for that reason cannot be made into standard methods yet.

Tennis and table tennis can be modeled as so-called Markov chains. Each stroke then corresponds to a step in the chain of events with varying probability of possible outcomes. Online sports such as tennis can be analyzed as finite Markov chains that provide the basis for so-called transition matrices. Is the first serve returned or not? The methodology is under development and needs to be refined. On the one hand, match models for two competing players can be constructed that have a certain descriptive value; these can then be used for tactical purposes: which match sequences are beneficial to a particular player, and which ones to try to avoid? On the other hand, the validity of these matrices/models is already low in that the conditions they really require cannot be expected in, for instance, tennis: players do not maintain the same level during an entire match, for example. But there are opportunities to deal with this difficulty using, for instance, variance measures, so the Markov analyses are certainly here to stay – and be refined.

My impression of Science Meets Sports is that there are a number of promising approaches to analysis of sporting practices, but they are still under development and for that reason cannot be made into standard methods yet. In addition, they require access to data of a quantity and quality that is not easily affordable. On the other hand, in some cases – for some sports – they can produce descriptive results of various kinds, for the benefit and pleasure of more than the positivist sports nerds. Then again, developing these methods in a way relevant to different sports is a challenge worth adopting. However, the meeting between the scientific study of sports and – say – mathematical statistics is probably done most easily from the sports-interested statistician’s point of view.

Several of the new analysis approaches are suitable for visual performance presentation; the anthology diligently uses this, mostly in full color, but it often becomes difficult to read due to poor graphic quality. The publisher could also have put more care into the typography. Science Meets Sports requires much from its readers. However, for those who want to move on in this field it provides a lot of useful references, and it serves as an introduction to the state of the art.

Copyright © Mats Franzén 2022


Print Friendly, PDF & Email


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.