The Analysis Of Football Premiere Leagues & Market Value & Performance Of Players
All previous efforts that was ever dedicated in the sports analytics field to address the topic of player’s valuation was done in European Leagues, no previous research was ever published targeting the Egyptian Premiere League or any other league in the MENA region, which makes this research the first of its kind.
However, several researches were presented on the topic for other European National Leagues such as the English Premiere League, German Bundesliga Premiere League, Spanish League and the Italian Premiere League. In that prospect the German soccer league Eschweiler and Vieth (2004), questions to what extend do club-specific or player- specific factors influence transfer prices? and it was found that for seasons 1997/1998 throughout 2002/2003 it was found that the extent of transfer price is significantly dependent on club and player specific factors, club-specific factors are revenues from main sponsor, average number of spectators from prior season, and qualification for international competition in next season as for player-specific factors Age, tenure (number of games played) were proven to be the most significant.
For the same league and research question Huebl and Swieter (2002) concluded a concave relationship between age/number of games within Bundesliga and salary and a positive relationship between variables such as Player in national team (yes/no), origin, player position (goal keeper, defender,midfielder).
For the Spanish Premiere League Garcia-del-Barrio, Pujol (2004) on his quest for discovery whether there is there a winner takes it all effect? and he concluded that Index of performance, reputation, superstar, age, international appearance, games in the Champions League/UEFA, and position cause the existence of two segments in the labor market supply and the presence of a the previously mentioned effect leading to a significant bargaining power of players identified as superstars.
For one of the most studied leagues in this topic the English Premiere League it was found that the largest costs for teams are wages spent on players and transfer fees paid to acquire such players (Battle et. al., 2011), Carmichael, Forrest and Simmons (1999) used English Premiere League data to model transfers with an Ordinary Least Square cross-sectional model it was found in their analysis that age, games played in previous season, goals scored in previous season, to be significant in determining transfer fees, most of the papers including the mentioned ones determine these papers’ give to much focus and weight of the analysis on number of goals scores by a player which gives a very clear bias towards players in attacking positions in the field, which makes this assessment unfair for players in non-attacking positions such as defenders, midfielders and undoubtedly goal keepers.
With the Italian Premiere League Lucifora and Simmons (2003), upon their investigation on What shapes players’ earnings function in the Italian league? Is there a superstar effect? Find that Variables representing player experience including age, appearances, performance as described through number of goals, number of strikes, and number of assists, in addition to position, reputation are the most significant player related variables yields to a convex structure across some performance measures.
This is upon discussing significant variables found in each league as for the topic of predicting the market value of the players there has been several attempts also all in European leagues to address the issue as discussed in the next paragraph RADU TUNARU, EPHRAIM CLARK, HOWARD VINEY (2014) tried to uncover the constituents of the equation of a footballer’s price through a mathematical stochastic model based on the geometric Brownian motion, Poisson processes and jump-diffusion process and they concluded that Player value varies from club to club, depending on club turnover and the total number of performance points generated by the entire team is not given by the simple sum of points of each player individually because the synergy (or the correlations between the players) effects must be taken into consideration. Their model was described to be better than any other decision tree or discount cash flow techniques because they recognize the potential of upside movements in the value of the player while limiting the downside movements of player’s value, however they faced a problem of multidimensionality and they classified the problem of a computational nature.
Erik van den Berg (2011) in their research questioned the valuation of football players in the English Premier League. Also they analyzed the variance in transfer fees paid and received by English Premier League clubs in order to acquire players’ services in the seasons 2008-2009 and 2009-2010.
In addition to that they investigated whether asset characteristics are the main determinants of value of player, or other contextual variables (e.g. buying and selling club characteristics) ‘obscure’ asset valuation through a Simple Linear Regression Model. Their work yielded to the conclusion that individual player performance and innate ability are prominent determinants of transfer fees, however indirect measures of performance and/or innate ability appear to be more adequate in approaching these determinants, as for the second point of investigation they concluded that buying clubs size brings transfer fees up to levels unexplained by measures of performance or ability which lends credence to the use of a bargaining framework.
Their presented model as stated in their research was limited on the methodological front to the present inadequate data collection capabilities, where the bargaining framework likely is imperfect in the current liberalized transfer market. Also they clearly stated that many prominent variables influencing transfer prices were not available in their research such as remaining contract duration of transferred players which is classified as a critical determinant of bargaining power in transfer negotiations, also any sort of data on club budgets or even balance sheet data would have been hugely helpful in exploring the bargaining as stated in their research, on the drawbacks also they stated that data lacked detailed, accurate or reliable figures on stadium attendances and television audiences, nor proxies for sponsorship deals and so forth.
On their recommendation of future work front they recommended for future to develop some kind of methodology for typifying players for example as being extraordinarily exciting or unpredictable in their level of play. Also they clarified that current direct measures of player performance are far off from the quality of scouting reports that are the actual input for the transfer decision, they also stated that current measures understate the team-element of the game, and fail to adequately distinguish between the various player actions that are relevant to he outcome of games, and therefore fail to measure a player’s actual contribution as his influence on the probability of the team to win or lose a game, recommendations in their research was to incorporating comprehensive statistics such as the Player’s Index to proxy player’s performance.
And finalizing their recommendations they shed light on the importance of separately investigating various transfers by the same player as players change clubs multiple times throughout their career although their innate ability does not change greatly throughout heir career and highlighted that this kind of research would then have a better chance of investigating the influence of investments in human capital and other factors upon transfer fees.[7]Yuan He (year not mentioned) aimed in his research to predict market value of top players using statistical modeling techniques.
Four modeling techniques were used in his research: OLS, KNN (with different k values), Ridge Regression (with different lambda values) and Principle Component Regression (with different k values). A cross validation of 10 fold has been used for each of the techniques, which meant that the model was trained on most of the data matrix and was tested on one fold. The root mean square (RMS) of predictions and test data was used as a criteria for judging the power of each model, best model was PCR with k=15 and accordingly this model was used to carry out the predictions, Several modeling techniques were used to make predictions and efficiency of each model was tested.
Upon suggesting future work the writer stated that the best way of improvement is to augment the data matrix, both by rows and by columns to include data from as many as possible players, best recommended more than 1000 players (for example more than 5000 rows and simultaneously introduce new predictors and add them to the matrix.
On the limitations of the research front it was stated that used predictors did not cover every possible factor that could affect players' market value. For example, major transfers and decisive performance in key games are both crucial to a player's value and are both not included, Players from different positions should have different criteria for judging their performance. For example, number of goals scored is supposed to constitute a fair assessment of a striker's ability, but not so fair for defenders or goalkeepers. In addition to these missing variables from the research it was also highlighted that other crucial on pitch related variables such as number of assists, number of clean sheets, tackles per game, would have made it much bette for the evaluation of midfielders (assists), defenders (tackles) and goalkeepers (clean sheets).
Jeroen Ruijg, HansvanOphem (2014) proposed an estimation method that was assumed to apply corrections for sample selectivity and also allows the use of more observations in a simple manner, the hiring of a player was perceived as threefold:
- the player has to be paid a salary as a compensation for his efforts,
- if the player’s contract is still valid with another football club then a transfer fee has to be paid to compensate the club formerly owning the player for his lost efforts during the rest of the duration of the contract and
- the efforts and investments that the former club put onto the player at his present level to bring out it his current level of performance has also to be compensated.
In the research a linear regression model was used where estimation results of the transfer fee (in millions of pounds) on the characteristics of the football player and some other variables for the subsample of observed and positive transfer fees in an ordered probit model. Upon concluding their research it was stated that significance was very modest, where only effects of age and the number of minutes played in the season before the transfer were found.
Age was found to have a positive effect on the transfer fee up to the age of 26, while playing games was found to have a positive effect on the transfer fee. Also it was highlighted that taller players seem to be less attractive on the player market. In addition to that it was declared that playing and not being a substitute usually increases the transfer fee, however a positive effect of red cards received was revealed. Other generated insights were clarified such as being left-footed is not a positive player characteristic, Attackers, midfielders and defenders were found to be in higher demand than goalkeepers.
In summary, Age, average number of minutes played and not being a goal keeper were found to be the most important determinants. And a final surprising result was that the number of goals scored does not seem to have a big impact on the player’s value. Cornelius Arndt and Ulf Brefeld (2017) performed a multitask, regression-based approach for predicting future performances of soccer players.
The multitask approach was said to allow authors to simultaneously learn individual player models as offsets to a general model. Multitask variants of ridge regression and ε-support vector regression were devised, with a hashed joint feature space. Relevant features for the prediction were identified by a modified recursive feature elimination strategy. Upon concluding research results it was stated that the proposed multi- task generalizations of ridge regression and support vector regression allowed efficiently learning player-specific models.
And Empirical results on real-world data from the German Bundesliga showed that data was often too sparse for learning individual player models. The single-task models thus often performed better than the multitask extensions. Also an additional analysis of the distribution of the target variables was introduced through the research to show that there is a clear prior toward average grades in the data, which is inherently reproduced by all models.
Miao He, Ricardo Cachucho, and Arno Knobbe (year not mentioned) attempted to investigate how the market value and performance of La Liga (the Spanish League) players can be modeled by using public data sources by applying regression model to predict the real market value and assess a player’s performance, their model applied only to forward players, however they recommended it’s extension to other positions where a specifically league tailored model also they recommended extending their study up to include all other European leagues. However they made it clear that even if this recommendation was put to real action it would still be a problem to deal with players who have never been valued before (never been transferred) nor their values were voted for by voting system.