Among avid football enthusiasts, it is a common phenomenon for every supporter to showcase and assert their coaching prowess, especially in the aftermath of a disappointing match. Fans spend countless hours engaging in passionate discussions, dissecting player performances, critiquing tactical decisions, and delving into controversial game incidents. Although most opinions are constructed purely based on a team’s weekend performance, this passion and controversy contribute to football’s immense popularity while at the same time making it so difficult to analyze objectively.
Given the great number of playing styles and game tactics, we are constantly seeing new, creative ways of approaching the game. Beyond the spirited "pub debates", football experts are now interested in objective facts about their players and opponents to make informed, tailored and timely decisions and elevate their chances of success. This is where data analysis comes into play.
The main objective of football analytics is to comprehensively and objectively evaluate player performance through a meticulous examination of match event data. This process not only highlights the perceptible highs and lows of outstanding and lackluster performances but also offers nuanced insights into players who may not garner a lot of attention due to their specific roles or assignments on the field.
However, navigating the football analytics landscape presents challenges that cannot easily be overlooked. Some of the main challenges are:
1)Offensive Bias: As the primary objective of football is to score more goals than your opponents, event data is predominantly skewed towards offensive actions. This results in a noticeable imbalance between offensive and defensive statistics.
2)Player Roles and Statistics: Offensive players are often considered "volume" players, accumulating statistics through frequent involvement in the play. In contrast, defenders are deemed "situational" players, requiring readiness for infrequent but crucial tackles and duels. This discrepancy not only limits the availability of statistics for defenders but also challenges the notion that "more" necessarily equates to "better."
3)Opponent Quality Bias: Evaluations based on a single game are susceptible to bias stemming from the caliber of the opposing team. Statistics that appear commendable against top-tier teams may lose significance against lower-ranked opponents.
4)Normalization Challenges: While normalizing statistics "per 90 minutes" aids in achieving a more comprehensive representation of data, it introduces potential outliers for players with brief playing durations. Robust checks become imperative to mitigate distortions arising from such anomalies.
In light of these considerations, we have chosen to sidestep the intricate evaluations of individual matches, and instead concentrate on comprehensive assessments spanning entire tournaments. The rationale is that aggregating data across multiple matches helps mitigate the impact of various issues inherent in single-game analyses, providing a more holistic perspective on player performance.
In the preliminary stages of formulating our comprehensive performance index (Kama Index), it is important to delve into the detailed nature of the data, which is ranked based on various magnitudes:
Given our intent to construct the index through a synthesis of key statistical evaluations, a crucial prerequisite is to normalize these diverse statistics to a uniform scale, ideally ranging between 0 and 10. This normalization facilitates a seamless integration of scores, making it easier to compute the final index score. In essence, this approach can be conceptualized as a translation: scores change more rapidly where observations are densely concentrated, and the rate of change diminishes in regions where observations are less dense.
In the context of our statistical analysis, each type of data now corresponds to objective scores. Negative occurrences, such as fouls, possessions lost, and missed chances, have been inverted to align with a standardized scale. The subsequent task involves the judicious selection of scores most fitting for various player profiles.
To recognize the distinct roles and characteristics of each footballer, we categorize players into seven distinct roles:
While we collect an extensive pool of over 300 statistics, we also ensure to curate a meaningful subset. Initial attempts with a mere 10 statistics revealed an undesirable flattening of results and a loss of valuable information. Consequently, for each outfield role, we judiciously select between 6 and 8 statistics, with goalkeepers requiring only 4 key statistics due to their comparatively limited dataset.
To derive the KamaIndex, we use a very straightforward approach: a scalar product between scores and a vector of weights, where each element represents the importance weight of the corresponding statistic. Normalization is achieved by dividing the result by the sum of weights. This process, being a convex combination of values within the [0-10] range, ensures that the KamaIndex also falls within this specified range [0-10].
In the future, we are contemplating an extension that focuses on evaluating performances based only on the last 1500 minutes played. This feature would allow for the interpretation of recent form throughout an entire season for players who surpass this threshold. Conversely, players falling below this minimum threshold (due to hierarchy or injury reasons) have their data sourced from the previous season, albeit at the cost of reduced result reliability.
To showcase what the Kama Index looks like, we have selected and ranked a few Serie A players based on their performance so far this season.
We first take a look at goalkeepers as they are the ones whose evaluation is most likely to be misleading and misread. Goalkeepers playing for teams with weaker defenses may showcase seemingly impressive statistics, while those in teams with robust defenses may appear relatively inactive. Notably, Juventus’ goalkeeper, Wojciech Szczęsny, has emerged as one of the standout performers of the season. Despite a lower-than-average saves per game metric, attributable to the team's solid defensive organization, Szczesny holds the league's best save percentage while also shining in the building-up phase of the game, where he shines with his impressive passing abilities. Szczesny has a Kama Index of 8.28 so far - one of the leading rankings among goalies in the Serie A.
Transitioning to the center-back role, Fiorentina's defensive stalwart, Lucas Martinez Quarta, consistently garners recognition as a top-class performer in the league. Although he doesn’t shine with any superior physical attributes, he manages to excel as a consistent and versatile defender who displays stellar playmaking from deep positions and makes his presence known on the offensive end with three goals so far this season. He rakes in an impressive 7.97 Kama Index.
In the midfield, the zone characterized by the most diverse parameters, Hakan Çalhanoğlu emerges as the epitome of excellence, achieving the highest KamaIndex in Serie A with an impressive 9.7. Calhanoglu's multifaceted skillset encompasses defensive intensity, superb playmaking abilities, a potent long-range shooting ability, and an impeccable record as a penalty kicker.
And finally, turning our attention to strikers, we spotlight Matteo Politano, who has played a pivotal role in maintaining Napoli's resilience amidst a challenging season following their last title-winning season. While Osimhen and Kvaratshkelia are struggling to reach the heights achieved during the 22/23 campaign, Politano has been outstanding during the first half of the 23/24 season. Leading the standings for expected threats, his proficiency lies in his dynamic dribbling abilities, strategic movement and his ability to create space for his cannon of a left foot. All of these qualities contribute to his impressive 8.69 Kama rating.
While our focus has primarily been on evaluating individual players, it's essential to highlight that Kama Index is equally suitable for assessing team performance, with some minor distinctions. In order to have a complete, holistic picture of team's capabilities, the Kama Team Index incorporates three different indexes instead of the one applied for players.
Similar to the player index, each of these three indices incorporates a unique set of parameters that embody the essence of the respective phases. In some instances, certain parameters that are unique to a given team are also considered in the evaluation.
While elite teams are naturally expected to excell across all three indices, the division into offensive, defensive, and transition phases also reveals a team’s playing philosophy and approach to the game. The Kama Index showcases the strengths and weaknesses of teams while providing valuable insights into their distinctive strategies.