Playstyle for Victory

Analysis of Football European Teams

Taking a database from Kaggle, in this notebook I analyze some of the Football teams from the most popular European Leagues. The database is composed by 199 columns and 7 tables.

Description of the tables:

This is a very extensive dataset with more than 11000 players, 300 teams and more than 25k matches. Most of the attributes I mention above are the ones I will be using to answer different questions.

Objectives

The main goal of this analysis if the use of SQL (SQLite) language to extract analytical information to answer specific questions and provide different insights.

Technical skills used in SQLite:

Topics addressed in the analysis:

Main Question

Table Of Contents:

1. Information of the database

2. Data Wrangling

2.1 Analysis of Data Wrangling

3. Exploratory Data Analysis

3.1 Classes of Attributes

3.2 Summary of Classes of Attributes

3.3 Teams Rank

    3.3.1 Top 5 Teams per Attribute

    3.3.2 Definning Winners and Lossers

    3.3.3 Ranking of Teams per Wins

    3.3.4 Best Teams per Season. Summary

3.4 Analyzing Team Attributes

    3.4.1 Summary of Team Attributes Analysis

3.5 Attributes vs Team Victory

    3.5.1 Attributes vs Team Victory. Summary of Results

3.6 Combined Attributes vs Team Victory

    3.6.1 Combined Attributes vs Team Victory. Summary of Results

3.7 Best Players

   3.7.1 Best Players Summary

4. Analysis of Results

5. Limitations

6. Future Ideas

Connecting the Julyter notebook to the database file: database.sqlite

1. Extracting information from the database

Extracting name of the tables

Leagues

Name of the leagues

Note: As I mentioned above, I will focus on the 5 best known leagues:

* England Premier League
* France Ligue 1
* Italy Serie A
* Germany 1. Bundesliga
* Spain LIGA BBVA

I will create a view with the data of only these views

2. Wrangling Data

Creating a VIEW with the best leagues

Creating a VIEW with the teams of the biggest leagues

Creating a VIEW for the players of the best teams

I will be analyzing the players that played in the teams that belonged to the best leagues (the main 5 leagues)

Testing the newly created views

Best_players

Best_teams

Best_leagues

END OF WRANGLING PROCEDURE

As a result of my wrangling analysis, 3 different views were created to diminish the amount of data to the one of my interest for my analysis. New columns such as birth_year, birth_month and birth_day were added to facilitate future calculations and analysis. Name of the leagues were modified to their popular name (how must of people call them).

Major problems such as null data were not identify. From the source I had the information of no null data in this dataset. Structure problems or grammar issues were not visually identified. As the dataset is very long, possible grammar issues in the data may be encountered during the future analysis and will be addressed them.

3.Exploratory Data Analysis

Exploring the attributes table

Classes of Attributes

Let's understand better our attributes and classes

I will extract the range of the classes of our attributes

BuildUpPlaySpeed

BuildUpPlayDribbling

BuildUpPlayPassing

ChanceCreationPassingClass

ChanceCreationCrossingClass

ChanceCreationShootingClass

DefencePressureClass

DefenceAggressionClass

DefenceTeamWidthClass

Summary of Classes of Attributes

Before continuing, I will define the summarized attributes according to FIFA in order to understand how they define and are related to the victory or defeat of a team and its style of play.

Important Note: It is fair to say that in football better statistical results not always lead to more victories. Some attributes have more weight in this factor than others. Outside factors such as luck, mistake of players and referees during the game are not taken into account inside this data. So, per safe of this analysis, I will trust the number provided by the attributes in order to define the best teams, at least on these terms.

buildUpPlaySpeed:

Define the speed in which attacks are put together:

Values Category
1 - 33 SLOW
34 - 66 BALANCED
67 - 100 FAST

buildUpPassing:

Affects passing distance and support from teammates:

Values Category
1 - 33 SHORT
34 - 66 MIXED
67 - 100 LONG

buildUpDribbling:

This parameter defines the creativity of the player in 1 on 1 situations.

Values Category
1 - 33 LITTLE
34 - 66 NORMAL
67 - 100 LOTS

ChanceCreationPassingClass:

Amount of risk in pass decision and run support:

Values Category
1 - 33 SAFE
34 - 66 NORMAL
67 - 100 RISKY

ChanceCreationCrossingClass:

The tendency / frequency of crosses into the box

Values Category
1 - 33 LITTLE
34 - 66 NORMAL
67 - 100 LOTS

ChanceCreationShootingClass:

The tendency / frequency of shots taken:

Values Category
1 - 33 LITTLE
34 - 66 NORMAL
67 - 100 LOTS

DefencePressureClass

Defines how hight he pitch the team will start pressuring:

Values Category
1 - 33 DEEP
34 - 66 MEDIUM
67 - 100 HIGH

DefenceAggressionClass:

Defines the team approach to tackling the ball possessor:

Values Category
1 - 33 CONTAIN
34 - 66 PRESS
67 - 100 DOUBLE

DefenceTeamWidthClass

Defines how much the team shift to the ball side. The narrower width means that the team tends to cover central position while the wider teams tend to cover more the wings/sides.

Values Category
1 - 33 NARROW
34 - 66 NORMAL
67 - 100 WIDE

Source

Ranking Teams

Ranking Leagues according their attributes

In the next analysis, I will extract the best leagues by parameters through the years to see the variation.

Leagues with best buildUpPlaySpeed

Creating a top 5 teams per Attribute per league

Top 5 teams with better play Passing score per league

Ranking the teams with better passing, in other words, those whose player's passes have higher efficacy. Good pass score leads to a player whose passes reaches his teammates

Teams with better builUpDribbling

Ranking the teams per each league with best one to one players score

Identifying winners and Lossers

Creating a view with the teams who won and lossed the match

New columns:

who_loss: team_api_id of the team who lost the game

who_win: team_api_id of the tean who won the game `

Creating a view with the ranking of teams according to their victories per season

Columns: League, Team, Wins, Rank

Creating a view with the ranking of teams according to their losses

Columns: League, Team, Losses, Rank

Extracting ranking of the teams with more victories

Image taken from Trollfootball

Top 5 Teams with more victories per season

Season 2008/2009

Season 2009/2010

Season 2010/2011

Season 2011/2012

Season 2012/2013

Season 2013/2014

Season 2014/2015

Best Teams per Season Summary

Image taken from golfm

In the analysis of the best teams per season, we can clearly notice there are two teams that most of times are at the top. This is the case of Real Madrid and Barcelona. In the last century, these teams have been the greatest rivals in the football, being their matches between them the most popular and most seen around the world, also called as 'The Classic'. They both belongs to 'La Liga' league. Also, Manchester United is the one of the most popular teams. All of them: Juventus, Bayern Munich, Barcelona, Real Madrid and the others have been the best teams in their leagues during these years.

Difference on Attributes for teams with more wins and losses

Attributes with teams with more losses

Next I analyze some of the attributes of the teams with more losses during the seasons, so later I can compared them to the teams with more wins to study which playstyles and attributes had a higher influence in the achievemenbt of victory.

Top 5 Teams Per League with more Wins

Summary: Teams with more wins and losses

Surpringsly, there are not that many differences in the playstyle of the teams with more winds related to the teams with more losses. I will need to look at other parameters. However there are some differences to take into account:

Similarities

Differences

Note: Even though some differences and particularities were extracted to identify playstyle that can lead the teams to more victories, a different approach may be useful for better and more concised conclusions

Attributes vs Team Victory

Image taken from 90min

In the following, I compare the different attributes that led to more victories in the different teams. Using a different approach than before. Previously, I visually compared the teams with more victories and their attirbutes, trying to find a relationship between victory and class of attributes. However, this approach can have a huge bias in some attributes when analyzing their relationship. Teams with more victories are usually teams with more economic power, in other words, where the best players are. So, sometimes, individual talent makes the difference in the games in order to obtain the victory, regardless of the type of strategy the team may take. There is other groups more similar regarding theirt economic power and quality of their players, in these teams their playstyle may have a bigger role when achieving victories. With the following approach, I am looking to reveal that relationship by grouping the different classes by category with the number of wins and loss teams had using these categories.

Speed Class

Dribbling Class

Passing Class

Positioning Class

Creation Passing Class

Creation Crossing Class

Creation Shooting Class

Creation Positioning Class

Defence Pressure Class

Defence Aggression Class

Defence Team Width Class

Defence Line Class

Summary of Attributes per Victory

Final Notes:

With this analysis, I confirm that attributes such Passing Class, and Possition Class have an impact by itself in the possibility of a team to obtain victory. Free Form Possition Class had a rate over 2 of wins over losses, meaning you have twice the probability of winning than lossing using this playstyle. In addition, Dribbling and Crossing attributes also has an impact in this probability.

Passing Class and Dribbling Class attributes are categories that mainly depend on the talent of the players, so there could be the issue that players quality lead to the type of playstyle use under this category. On the other side, Possition Class and Crossing Class may depend more on the strategty of the team work. These types of strategies are usually resulting of training and team play style.

Combination of Attributes vs Victory

Image taken from hindustatimes

In this section, I rank the combination of attributes with more relevance for achieving victory, to identify the combination with the highest probability of victory.

Best possible combination

Teams which employ the best attributes

Analysis of Results: Combined Attributes vs Victory

Image taken from diario lateral

Best Players

Image taken from defensacentral

Rating of the best players

Creating ranking for players according their overall rating

Players with some of the best attributes

Analysis of Results for players

Similar to teams attributes, some attributes influence more in the rating of the player than others. At the same time, offensive players tend to get higher ratings than other positions due to the influence they have in the victory of their teams. Other features, also depend on the style of game of the team.

Due to the limitations of SQLite, a more thoroughly analysis is difficult. A similar analysis to the teams section is repetitive in the procedure and will require too time consuming queries due to the schema of this particular database.

4. Analysis of Results

Through the analysis of the football database I have used some sqlite functions and techniques such as Views, Window functions, nested queries, common table expresions among others. The results have been analyzed at the end of each section. Next, it is a quick summary.

5. Limitations

6. Future Ideas