Once in every 4 years, the world celebrates a festival called “Fifa World Cup” and with that, everything seems to change. Priorities switch to football, and predictions switch to the teams and players that would perform in the tournament. Through the medium of this blog, I am going to predict the “World’s Best Playing XI” in 2018 and I would be using Python for the analytical implementation.
Analyze the Fifa Dataset to predict the World’s Best Playing XI in 2018!!
In my quest to carry out the above mentioned task, I stumbled upon an interesting dataset on Kaggle. I am going to stick with it and use it to predict the strongest 11 players taking part in this world cup 2018. Based on player availability, the best possible lineup is a 4-3-3. Using this dataset, I would be giving you a step by step approach to analyze various characteristics that would help us infer the best players for the World Cup 2018.
So, let’s get started :-)
Let’s start by importing the dataset and the required libraries in Python.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np % matplotlib inline df = pd.read_csv("FullData.csv") df.head(7)
It is evident from the above screenshot, there are 53 columns which include the following attributes:
Name, Nationality, National_Position, National_Kit, Club, Club_Position, Club_Kit, Club_Joining, Contract_Expiry, Rating, Height, Weight, Preffered_Foot, Birth_Date, Age, Preffered_Position, Work_Rate, Weak_foot, Skill_Moves, Ball_Control, Dribbling, Marking, Sliding_Tackle, Standing_Tackle, Aggression, Reactions, Attacking_Position, Interceptions, Vision, Composure, Crossing, Short_Pass, Long_Pass, Acceleration, Speed, Stamina, Strength, Balance, Agility, Jumping, Heading, Shot_Power, Finishing, Long_Shots, Curve, Freekick_Accuracy, Penalties, Volleys, GK_Positioning, GK_Diving, GK_Kicking, GK_Handling and GK_Reflexes.
Analyzing this huge dataset is a tedious task as it involves quite a few pre-processing steps. There might be a lot of redundant and unwanted columns, which can be removed. Therefore you can delete certain columns if needed, by writing the below code:
del df['National_Kit'] #deletes the column National_Kit df.head()
Once you have simplified this data, you can then start with the analysis part. Let us begin with the simplest plot. This graph gives us the number of players representing a particular country. Now, these graphs are best used to gain statistical insights.
plt.figure(figsize=(15,32)) sns.countplot(y = df.Nationality,palette="Set2") #Plot all the nations on Y Axis
Note: The plot generated from the above code will display all the football playing nations, but for the sake of simplicity, I have just displayed the top 28 countries which would give me the desired results.
Using this graph, we conclude that most of the players are from England, Argentina, Spain, France and Brazil. In this case, the graph won’t add a lot of value because we would be picking the best XI, and the results may vary.
Moving ahead with the analysis, you can try out different visualizations with the player’s age, preferred_position, rating, club etc. Let me show you one such visualization and then we’ll switch to our analysis of the best playing XI for the world cup.
plt.figure(figsize=(15,6)) sns.countplot(x="Age",data=df)
It is evident from the above screenshot that the majority of players are between the age of 20 and 29, with the largest peak of 25 years.
Now is the time we try and find the answer to the question put forth in the problem statement: Who will be the World’s Best Playing XI?
Let us begin our analysis. Let us start by considering the following playing formation, 4-3-3. So here, we need to find 4 best defenders, 3 best mid-fielders and 3 best attackers. Let us start our quest by finding a goalkeeper first.
World’s Best Playing XI: Finding The Best Goalkeeper
In order to get the best goalkeeper, I’ll be analyzing the data for the below mentioned parameters:
- Shot Stopper: A goalkeeper who is strong in stopping shots taken by opponents.
- Sweeper: A goalkeeper who is strong in playing with his feet and making passes.
#weights a = 0.5 b = 1 c= 2 d = 3 #GoalKeeping Characterstics df['gk_Shot_Stopper'] = (b*df.Reactions + b*df.Composure + a*df.Speed + a*df.Strength + c*df.Jumping + b*df.GK_Positioning + c*df.GK_Diving + d*df.GK_Reflexes + b*df.GK_Handling)/(2*a + 4*b + 2*c + 1*d) df['gk_Sweeper'] = (b*df.Reactions + b*df.Composure + b*df.Speed + a*df.Short_Pass + a*df.Long_Pass + b*df.Jumping + b*df.GK_Positioning + b*df.GK_Diving + d*df.GK_Reflexes + b*df.GK_Handling + d*df.GK_Kicking + c*df.Vision)/(2*a + 4*b + 3*c + 2*d)
Based on the above parameters, I’ll be predicting my best goalkeeper as per the dataset. Let us now plot these parameters:
plt.figure(figsize=(15,6)) # Generate sequential data and plot sd = df.sort_values('gk_Shot_Stopper', ascending=False)[:5] x1 = np.array(list(sd['Name'])) y1 = np.array(list(sd['gk_Shot_Stopper'])) sns.barplot(x1, y1, palette= "colorblind") plt.ylabel("Shot Stopping Score")
Based on the shot-stopper characteristics, it can be inferred that Manuel Neuer is the best goalkeeper as you can see he tops the above list. Let us now plot the other parameter(Sweeper) as well.
plt.figure(figsize=(15,6)) sd = df.sort_values('gk_Sweeper', ascending=False)[:5] x2 = np.array(list(sd['Name'])) y2 = np.array(list(sd['gk_Sweeper'])) sns.barplot(x2, y2, palette= "colorblind") plt.ylabel("Sweeping Score")
World’s Best Playing XI: Finding The Best Defenders
- Centre Backs: We need two center-backs. One who plays LCB and the other who plays RCB.
- Wing Backs: We again need two wing backs. One who plays on the Left and the other who plays on the right.
#Choosing Defenders df['df_centre_backs'] = ( d*df.Reactions + c*df.Interceptions + d*df.Sliding_Tackle + d*df.Standing_Tackle + b*df.Vision+ b*df.Composure + b*df.Crossing +a*df.Short_Pass + b*df.Long_Pass+ c*df.Acceleration + b*df.Speed + d*df.Stamina + d*df.Jumping + d*df.Heading + b*df.Long_Shots + d*df.Marking + c*df.Aggression)/(6*b + 3*c + 7*d) df['df_wb_Wing_Backs'] = (b*df.Ball_Control + a*df.Dribbling + a*df.Marking + d*df.Sliding_Tackle + d*df.Standing_Tackle + a*df.Attacking_Position + c*df.Vision + c*df.Crossing + b*df.Short_Pass + c*df.Long_Pass + d*df.Acceleration +d*df.Speed + c*df.Stamina + a*df.Finishing)/(4*a + 2*b + 4*c + 4*d)
plt.figure(figsize=(15,6)) sd = df[(df['Club_Position'] == 'LCB')].sort_values('df_centre_backs', ascending=False)[:5] x2 = np.array(list(sd['Name'])) y2 = np.array(list(sd['df_centre_backs'])) sns.barplot(x2, y2, palette=sns.color_palette("Blues_d")) plt.ylabel("LCB Score")
RIGHT CENTRAL DEFENDER:
plt.figure(figsize=(15,6)) sd = df[(df['Club_Position'] == 'RCB')].sort_values('df_centre_backs', ascending=False)[:5] x2 = np.array(list(sd['Name'])) y2 = np.array(list(sd['df_centre_backs'])) sns.barplot(x2, y2, palette=sns.color_palette("Blues_d")) plt.ylabel("RCB Score")
LEFT WING BACK:
plt.figure(figsize=(15,6)) sd = df[(df['Club_Position'] == 'LWB') | (df['Club_Position'] == 'LB')].sort_values('df_wb_Wing_Backs', ascending=False)[:5] x4 = np.array(list(sd['Name'])) y4 = np.array(list(sd['df_wb_Wing_Backs'])) sns.barplot(x4, y4, palette=sns.color_palette("Blues_d")) plt.ylabel("Left Back Score")
Since David Alaba’s team does not qualify in the world cup 2018, I’ll be picking Alex Sandro as the best LWB/LB defender.
RIGHT WING BACK:
plt.figure(figsize=(15,6)) sd = df[(df['Club_Position'] == 'RWB') | (df['Club_Position'] == 'RB')].sort_values('df_wb_Wing_Backs', ascending=False)[:5] x5 = np.array(list(sd['Name'])) y5 = np.array(list(sd['df_wb_Wing_Backs'])) sns.barplot(x5, y5, palette=sns.color_palette("Blues_d")) plt.ylabel("Right Back Score")
Moving ahead with the World’s Best Playing XI, it’s time we choose some midfielders.
World’s Best Playing XI: Finding The Best Mid-Fielders
- Playmaker: A playmaker is someone who will move the ball to the attacking 3rd from defence or midfield.
- Beast:A beast is a typical box-to-box player with loads of energy and who can boss the midfield.
- Controller:A controller is the person who is orchestrating your midfield engine by either sitting back or going forward based on dynamic needs.
#Midfielding Indices df['mf_playmaker'] = (d*df.Ball_Control + d*df.Dribbling + a*df.Marking + d*df.Reactions + d*df.Vision + c*df.Attacking_Position + c*df.Crossing + d*df.Short_Pass + c*df.Long_Pass + c*df.Curve + b*df.Long_Shots + c*df.Freekick_Accuracy)/(1*a + 1*b + 4*c + 4*d) df['mf_beast'] = (d*df.Agility + c*df.Balance + b*df.Jumping + c*df.Strength + d*df.Stamina + a*df.Speed + c*df.Acceleration + d*df.Short_Pass + c*df.Aggression + d*df.Reactions + b*df.Marking + b*df.Standing_Tackle + b*df.Sliding_Tackle + b*df.Interceptions)/(1*a + 5*b + 4*c + 4*d) df['mf_controller'] = (b*df.Weak_foot + d*df.Ball_Control + a*df.Dribbling + a*df.Marking + a*df.Reactions + c*df.Vision + c*df.Composure + d*df.Short_Pass + d*df.Long_Pass)/(2*c + 3*d + 4*a)
Let us plot each one of them.
PLAYMAKER:
plt.figure(figsize=(15,6)) ss = df[(df['Club_Position'] == 'CAM') | (df['Club_Position'] == 'LAM') | (df['Club_Position'] == 'RAM')].sort_values('mf_playmaker', ascending=False)[:5] x3 = np.array(list(ss['Name'])) y3 = np.array(list(ss['mf_playmaker'])) sns.barplot(x3, y3, palette=sns.diverging_palette(145, 280, s=85, l=25, n=5)) plt.ylabel("PlayMaker Score")
Beast:
plt.figure(figsize=(15,6)) ss = df[(df['Club_Position'] == 'RCM') | (df['Club_Position'] == 'RM')].sort_values('mf_beast', ascending=False)[:5] x2 = np.array(list(ss['Name'])) y2 = np.array(list(ss['mf_beast'])) sns.barplot(x2, y2, palette=sns.diverging_palette(145, 280, s=85, l=25, n=5)) plt.ylabel("Beast Score")
Next, let us pick the World’s best controller.
Controller:
plt.figure(figsize=(15,6)) # Generate some sequential data ss = df[(df['Club_Position'] == 'LCM') | (df['Club_Position'] == 'LM')].sort_values('mf_controller', ascending=False)[:5] x1 = np.array(list(ss['Name'])) y1 = np.array(list(ss['mf_controller'])) sns.barplot(x1, y1, palette=sns.diverging_palette(145, 280, s=85, l=25, n=5)) plt.ylabel("Controller Score")
As per the above analysis, I’ll pick Iniesta as the best controller/ Left Central Midfielder.
Having said that, below is the list of the best mid-fielders for this World Cup 2018:
Moving ahead with the World’s Best Playing XI, it’s time we choose attackers.
World’s Best Playing XI: Finding The Best Attackers
- Attacking Left Wing: He is a player, attacking from the left flank.
- Attacking Right Wing: He is a player, attacking from the right flank.
- Striker: He is a player attacking from the center.
#Attackers df['att_left_wing'] = (c*df.Weak_foot + c*df.Ball_Control + c*df.Dribbling + c*df.Speed + d*df.Acceleration + b*df.Vision + c*df.Crossing + b*df.Short_Pass + b*df.Long_Pass + b*df.Aggression + b*df.Agility + a*df.Curve + c*df.Long_Shots + b*df.Freekick_Accuracy + d*df.Finishing)/(a + 6*b + 6*c + 2*d) df['att_right_wing'] = (c*df.Weak_foot + c*df.Ball_Control + c*df.Dribbling + c*df.Speed + d*df.Acceleration + b*df.Vision + c*df.Crossing + b*df.Short_Pass + b*df.Long_Pass + b*df.Aggression + b*df.Agility + a*df.Curve + c*df.Long_Shots + b*df.Freekick_Accuracy + d*df.Finishing)/(a + 6*b + 6*c + 2*d) df['att_striker'] = (b*df.Weak_foot + b*df.Ball_Control + a*df.Vision + b*df.Aggression + b*df.Agility + a*df.Curve + a*df.Long_Shots + d*df.Balance + d*df.Finishing + d*df.Heading + c*df.Jumping + c*df.Dribbling)/(3*a + 4*b + 2*c + 3*d)
Let us plot all of them and find the best attackers in the world for our best XI.
plt.figure(figsize=(15,6)) ss = df[(df['Club_Position'] == 'LW') | (df['Club_Position'] == 'LM') | (df['Club_Position'] == 'LS')].sort_values('att_left_wing', ascending=False)[:5] x1 = np.array(list(ss['Name'])) y1 = np.array(list(ss['att_left_wing'])) sns.barplot(x1, y1, palette=sns.diverging_palette(255, 133, l=60, n=5, center="dark")) plt.ylabel("Left Wing")
It’s quite evident from the above plot that Ronaldo is the best Left Wing Attacker for World Cup 2018.
Next, let us plot the right wing attacker.
plt.figure(figsize=(15,6)) ss = df[(df['Club_Position'] == 'RW') | (df['Club_Position'] == 'RM') | (df['Club_Position'] == 'RS')].sort_values('att_right_wing', ascending=False)[:5] x2 = np.array(list(ss['Name'])) y2 = np.array(list(ss['att_right_wing'])) sns.barplot(x2, y2, palette=sns.diverging_palette(255, 133, l=60, n=5, center="dark")) plt.ylabel("Right Wing")
As per the above analysis, I’ll pick Lionel Messi as the right wing attacker for World Cup 2018.
Moving ahead with the World’s Best Playing XI, it’s time we choose our striker.
STRIKER:
plt.figure(figsize=(15,6)) ss = df[(df['Club_Position'] == 'ST') | (df['Club_Position'] == 'LS') | (df['Club_Position'] == 'RS') | (df['Club_Position'] == 'CF')].sort_values('att_striker', ascending=False)[:5] x3 = np.array(list(ss['Name'])) y3 = np.array(list(ss['att_striker'])) sns.barplot(x3, y3, palette=sns.diverging_palette(255, 133, l=60, n=5, center="dark")) plt.ylabel("Striker")