A Soccer Action Valuation Toolkit#

socceraction is a Python package for objectively quantifying the value of the individual actions performed by soccer players using event stream data. It contains the following components:

  • A set of API clients for loading event stream data from StatsBomb, Wyscout and Opta.

  • Converters for each of these provider’s proprietary data format to the SPADL and atomic-SPADL formats, which are unified and expressive languages for on-the-ball player actions.

  • An implementation of the Expected Threat (xT) possession value framework.

  • An implementation of the VAEP and Atomic-VAEP possession value frameworks.

_images/actions_bra-bel.png

Quickstart#

Eager to get started valuing some soccer actions? This page gives a quick introduction on how to get started.

Installation#

First, make sure that socceraction is installed:

$ pip install socceraction[statsbomb]

For detailed instructions and other installation options, check out our detailed installation instructions.

Loading event stream data#

First of all, you will need some data. Luckily, both StatsBomb and Wyscout provide a small freely available dataset. The data module of socceraction makes it trivial to load these datasets as Pandas DataFrames. In this short introduction, we will work with Statsbomb’s dataset of the 2018 World Cup.

import pandas as pd
from socceraction.data.statsbomb import StatsBombLoader

# Set up the StatsBomb data loader
SBL = StatsBombLoader()

# View all available competitions
df_competitions = SBL.competitions()

# Create a dataframe with all games from the 2018 World Cup
df_games = SBL.games(competition_id=43, season_id=3).set_index("game_id")

Note

Keep in mind that by using the public StatsBomb data you are agreeing to their user agreement.

For each game, you can then retrieve a dataframe containing the teams, all players that participated, and all events that were recorded in that game. Specifically, we’ll load the data from the third place play-off game between England and Belgium.

game_id = 8657
df_teams = SBL.teams(game_id)
df_players = SBL.players(game_id)
df_events = SBL.events(game_id)

Converting to SPADL actions#

The event stream format is not well-suited for data analysis: some of the recorded information is irrelevant for valuing actions, each vendor uses their own custom format and definitions, and the events are stored as unstructured JSON objects. Therefore, socceraction uses the SPADL format for describing actions on the pitch. With the code below, you can convert the events to SPADL actions.

import socceraction.spadl as spadl

home_team_id = df_games.at[game_id, "home_team_id"]
df_actions = spadl.statsbomb.convert_to_actions(df_events, home_team_id)

With the matplotsoccer package, you can try plotting some of these actions:

import matplotsoccer as mps

# Select relevant actions
df_actions_goal = df_actions.loc[2196:2200]
# Replace result, actiontype and bodypart IDs by their corresponding name
df_actions_goal = spadl.add_names(df_actions_goal)
# Add team and player names
df_actions_goal = df_actions_goal.merge(df_teams).merge(df_players)
# Create the plot
mps.actions(
    location=df_actions_goal[["start_x", "start_y", "end_x", "end_y"]],
    action_type=df_actions_goal.type_name,
    team=df_actions_goal.team_name,
    result=df_actions_goal.result_name == "success",
    label=df_actions_goal[["time_seconds", "type_name", "player_name", "team_name"]],
    labeltitle=["time", "actiontype", "player", "team"],
    zoom=False
)
_images/eden_hazard_goal_spadl.png

Valuing actions#

We can now assign a numeric value to each of these individual actions that quantifies how much the action contributed towards winning the game. Socceraction implements three frameworks for doing this: xT, VAEP and Atomic-Vaep. In this quickstart guide, we will focus on the xT framework.

The expected threat or xT model overlays a \(M \times N\) grid on the pitch in order to divide it into zones. Each zone \(z\) is then assigned a value \(xT(z)\) that reflects how threatening teams are at that location, in terms of scoring. An example grid is visualized below.

_images/default_xt_grid.png

The code below allows you to load league-wide xT values from the 2017-18 Premier League season (the 12x8 grid shown above). Instructions on how to train your own model can be found in the detailed documentation about xT.

import socceraction.xthreat as xthreat

url_grid = "https://karun.in/blog/data/open_xt_12x8_v1.json"
xT_model = xthreat.load_model(url_grid)

Subsequently, the model can be used to value actions that successfully move the ball between two zones by computing the difference between the threat value on the start and end location of each action. The xT framework does not assign a value to failed actions, shots and defensive actions such as tackles.

df_actions_ltr = spadl.play_left_to_right(df_actions, home_team_id)
df_actions["xT_value"] = xT_model.rate(df_actions_ltr)
_images/eden_hazard_goal_xt.png

Ready for more? Check out the detailed documentation about the data representation and action value frameworks.

Installation#

Before you can use socceraction, you’ll need to get it installed. This guide will guide you to a minimal installation that’ll work while you walk through the introduction.

Install Python#

Being a Python library, socceraction requires Python. Currently, socceraction supports Python version 3.9 – 3.11. Get the latest version of Python at https://www.python.org/downloads/ or with your operating system’s package manager.

You can verify that Python is installed by typing python from your shell; you should see something like:

Python 3.x.y
[GCC 4.x] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Install socceraction#

You’ve got two options to install socceraction.

Installing an official release with pip#

This is the recommended way to install socceraction. Simply run this simple command in your terminal of choice:

$ python -m pip install socceraction

You might have to install pip first. The easiest method is to use the standalone pip installer.

Installing the development version#

Socceraction is actively developed on GitHub, where the code is always available. You can easily install the development version with:

$ pip install git+https://github.com/ML-KULeuven/socceraction.git

However, to be able to make modifications in the code, you should either clone the public repository:

$ git clone git://github.com/ML-KULeuven/socceraction.git

Or, download the zipball:

$ curl -OL https://github.com/ML-KULeuven/socceraction/archive/master.zip

Once you have a copy of the source, you can embed it in your own Python package, or install it into your site-packages easily:

$ cd socceraction
$ python -m pip install -e .

Verifying#

To verify that socceraction can be seen by Python, type python from your shell. Then at the Python prompt, try to import socceraction:

>>> import socceraction
>>> print(socceraction.__version__)

Loading data#

Socceraction provides API clients for various popular event stream data sources. These clients enable fetching event streams and their corresponding metadata as Pandas DataFrames using a unified data model. Alternatively, you can also use kloppy to load data.

Loading data with socceraction#

All API clients implemented in socceraction inherit from the EventDataLoader interface. This interface provides the following methods to retrieve data as a Pandas DataFrames with a unified data model (i.e., Schema). The schema defines the minimal set of columns and their types that are returned by each method. Implementations of the EventDataLoader interface may add additional columns.

Method

Output schema

Description

competitions()

CompetitionSchema

All available competitions and seasons

games(competition_id, season_id)

GameSchema

All available games in a season

teams(game_id)

TeamSchema

Both teams that participated in a game

players(game_id)

PlayerSchema

All players that participated in a game

events(game_id)

EventSchema

The event stream of a game

Currently, the following data providers are supported:

Loading StatsBomb data#

The StatsBombLoader class provides an API client enabling you to fetch StatsBomb event stream data as Pandas DataFrames. This document provides an overview of the available data sources and how to access them.

Setup#

To be able to load StatsBomb data, you’ll first need to install a few additional dependencies which are not included in the default installation of socceraction. You can install these additional dependencies by running:

$ pip install "socceraction[statsbomb]"
Connecting to a data store#

First, you have to create a StatsBombLoader object and configure it for the data store you want to use. The StatsBombLoader supports loading data from the StatsBomb Open Data repository, from the official StatsBomb API, and from local files.

Open Data repository#

StatsBomb has made event stream data of certain leagues freely available for public non-commercial use at https://github.com/statsbomb/open-data. This open data can be accessed without the need of authentication, but its use is subject to a user agreement. The code below shows how to setup an API client that can fetch data from the repository.

# optional: suppress warning about missing authentication
import warnings
from statsbombpy.api_client import NoAuthWarning
warnings.simplefilter('ignore', NoAuthWarning)

from socceraction.data.statsbomb import StatsBombLoader

api = StatsBombLoader(getter="remote", creds=None)

Note

If you publish, share or distribute any research, analysis or insights based on this data, StatsBomb requires you to state the data source as StatsBomb and use their logo.

StatsBomb API#

API access is for paying customers only. Authentication can be done by setting environment variables named SB_USERNAME and SB_PASSWORD to your login credentials. Alternatively, the constructor accepts an argument creds to pass your login credentials in the format {"user": "", "passwd": ""}.

from socceraction.data.statsbomb import StatsBombLoader

# set authentication credentials as environment variables
import os
os.environ["SB_USERNAME"] = "your_username"
os.environ["SB_PASSWORD"] = "your_password"
api = StatsBombLoader(getter="remote")

# or provide authentication credentials as a dictionary
api = StatsBombLoader(getter="remote", creds={"user": "", "passwd": ""})
Local directory#

A final option is to load data from a local directory. This local directory can be specified by passing the root argument to the constructor, specifying the path to the local data directory.

from socceraction.data.statsbomb import StatsBombLoader

api = StatsBombLoader(getter="local", root="data/statsbomb")

Note that the data should be organized in the same way as the StatsBomb Open Data repository, which corresponds to the following file hierarchy:

root
├── competitions.json
├── events
│   ├── <match_id>.json
│   ├── ...
│   └── ...
├── lineups
│   ├── <match_id>.json
│   └── ...
├── matches
│   ├── <competition_id>
│   │   └── <season_id>.json
│   │   └── ...
│   └── ...
└── three-sixty
    ├── <match_id>.json
    └── ...
Loading data#

Next, you can load the match event stream data and metadata by calling the corresponding methods on the StatsBombLoader object.

StatsBombLoader.competitions()#
df_competitions = api.competitions()

season_id

competition_id

competition_name

country_name

competition_gender

season_name

106

43

FIFA World Cup

International

male

2022

30

72

Women’s World Cup

International

female

2019

3

43

FIFA World Cup

International

male

2018

StatsBombLoader.games()#
df_games = api.games(competition_id=43, season_id=3)

game_id

season_id

competition_id

competition_stage

game_day

game_date

home_team_id

away_team_id

home_score

away_score

venue

referee_id

8658

3

43

Final

7

2018-07-15 17:00:00

771

785

4

2

Stadion Luzhniki

730

8657

3

43

3rd Place Final

7

2018-07-14 16:00:00

782

768

2

0

Saint-Petersburg Stadium

741

StatsBombLoader.teams()#
df_teams = api.teams(game_id=8658)

team_id

team_name

771

France

785

Croatia

StatsBombLoader.players()#
df_players = api.players(game_id=8658)

game_id

team_id

player_id

player_name

nickname

jersey_number

is_starter

starting_position_id

starting_position_name

minutes_played

8658

771

3009

Kylian Mbappé Lottin

Kylian Mbappé

10

True

12

Right Midfield

95

8658

785

5463

Luka Modrić

10

True

13

Right Center Midfield

95

StatsBombLoader.events()#
df_events = api.events(game_id=8658)

event_id

index

period_id

timestamp

minute

second

type_id

type_name

possession

possession_team_id

possession_team_name

play_pattern_id

play_pattern_name

team_id

team_name

duration

extra

related_events

player_id

player_name

position_id

position_name

location

under_pressure

counterpress

game_id

47638847-fd43-4656-b49c-cff64e5cfc0a

1

1

1900-01-01

0

0

35

Starting XI

1

771

France

1

Regular Play

771

France

0.0

{…}

[]

False

False

8658

0c04305d-5615-4520-9be5-7c232829954b

2

1

1900-01-01

0

0

35

Starting XI

1

771

France

1

Regular Play

785

Croatia

1.412

{…}

[]

False

False

8658

c5e17439-efe2-480b-9cff-1600998674d7

3

1

1900-01-01

0

0

18

Half Start

1

771

France

1

Regular Play

771

France

0.0

{}

[‘7e1460eb-c572-4059-8cd4-cec4857f818d’]

False

False

8658

If 360 data snapshots are available for the game, they can be loaded by passing load_360=True to the events() method. This will add two columns to the events dataframe: visible_area_360 and freeze_frame_360. The former contains the visible area of the pitch in the 360 snapshot, while the latter contains the player locations in the 360 snapshot.

df_events = api.events(game_id=3788741, load_360=True)

Loading Wyscout data#

The WyscoutLoader class provides an API client enabling you to fetch Wyscout event stream data as Pandas DataFrames. This document provides an overview of the available data sources and how to access them.

Note

Currently, only version 2 of the Wyscout API is supported. See https://github.com/ML-KULeuven/socceraction/issues/156 for progress on version 3 support.

Connecting to a data store#

First, you have to create a WyscoutLoader object and configure it for the data store you want to use. The WyscoutLoader supports loading data from the official Wyscout API and from local files. Additionally, the PublicWyscoutLoader class can be used to load a publicly available dataset.

Wyscout API#

Wyscout API access requires a separate subscription. Wyscout currently offers three different packs: a Database Pack (match sheet data), a Stats Pack (statistics derived from match event data), and an Events Pack (raw match event data). A subscription to the Events Pack is required to access the event stream data.

Authentication can be done by setting environment variables named WY_USERNAME and WY_PASSWORD to your login credentials (i.e., client id and secret). Alternatively, the constructor accepts an argument creds to pass your login credentials in the format {"user": "", "passwd": ""}.

from socceraction.data.wyscout import WyscoutLoader

# set authentication credentials as environment variables
import os
os.environ["WY_USERNAME"] = "your_client_id"
os.environ["WY_PASSWORD"] = "your_secret"
api = WyscoutLoader(getter="remote")

# or provide authentication credentials as a dictionary
api = WyscoutLoader(getter="remote", creds={"user": "", "passwd": ""})
Local directory#

Data can also be loaded from a local directory. This local directory can be specified by passing the root argument to the constructor, specifying the path to the local data directory.

from socceraction.data.wyscout import WyscoutLoader

ap = WyscoutLoader(getter="local", root="data/wyscout")

The loader uses the directory structure and file names to determine which files should be parsed to retrieve the requested data. Therefore, the local directory should have a predefined file hierarchy. By default, it expects following file hierarchy:

root
├── competitions.json
├── seasons_<competition_id>.json
├── matches_<season_id>.json
└── matches
    ├── events_<game_id>.json
    └── ...

If your local directory has a different file hierarchy, you can specify this custom hierarchy by passing the feeds argument to the constructor. A wide range of file names and directory structures are supported. However, the competition, season, and game identifiers must be included in the file names to be able to locate the corresponding files for each entity.

from socceraction.data.wyscout import WyscoutLoader

ap = WyscoutLoader(getter="local", root="data/wyscout", feeds={
  "competitions": "competitions.json",
  "seasons": "seasons_{competition_id}.json",
  "games": "matches_{season_id}.json",
  "events": "matches/events_{game_id}.json",
}))

The {competition_id}, {season_id}, and {game_id} placeholders will be replaced by the corresponding id values when data is retrieved.

Soccer logs dataset#

As part of the “A public data set of spatio-temporal match events in soccer competitions” paper, Wyscout made an event stream dataset available for research purposes. The dataset covers the 2017/18 season of the Spanish, Italian, English, German, and French first division. In addition, it includes the data of the 2018 World Cup and the 2016 European championship. The dataset is available at https://figshare.com/collections/Soccer_match_event_dataset/4415000/2.

As the format of this dataset is slightly different from the format of the official Wyscout API, a separate PublicWyscoutLoader class is provided to load this dataset. This loader will download the dataset once and extract it to the specified root directory.

from socceraction.data.wyscout import PublicWyscoutLoader

api = PublicWyscoutLoader(root="data/wyscout")
Loading data#

Next, you can load the match event stream data and metadata by calling the corresponding methods on the WyscoutLoader object.

Loading Opta data#

Opta’s event stream data comes in many different flavours. The OptaLoader class provides an API client enabling you to fetch data from the following data feeds as Pandas DataFrames:

  • Opta F1, F9 and F24 JSON feeds

  • Opta F7 and F24 XML feeds

  • StatsPerform MA1 and MA3 JSON feeds

  • WhoScored.com JSON data

Currently, only loading data from local files is supported.

Connecting to a data store#

First, you have to create a OptaLoader object and configure it for the data feeds you want to use.

Generic setup#

To set up a OptaLoader you have to specify the root directory, the filename hierarchy of the feeds and a parser for each feed. For example:

from socceraction.data.opta import OptaLoader, parsers

api = OptaLoader(
  root="data/opta",
  feeds = {
      "f7": "f7-{competition_id}-{season_id}-{game_id}.xml",
      "f24": "f24-{competition_id}-{season_id}-{game_id}.xml",
  }
  parser={
      "f7": parsers.F7XMLParser,
      "f24": parsers.F24XMLParser
  }
)

Since the loader uses the directory structure and file names to determine which files should be parsed, the root directory should have a predefined file hierarchy defined in the feeds argument. A wide range of file names and directory structures are supported. However, the competition, season, and game identifiers must be included in the file names to be able to locate the corresponding files for each entity. For example, you might have grouped feeds by competition and season as follows:

root
├── competition_<competition_id>
│   ├── season_<season_id>
│   │   ├── f7_<game_id>.xml
│   │   └── f24_<game_id>.xml
│   └── ...
└── ...

In this case, you can use the following feeds configuration:

feeds = {
    "f7": "competition_{competition_id}/season_{season_id}/f7_{game_id}.xml",
    "f24": "competition_{competition_id}/season_{season_id}/f24_{game_id}.xml",
}

Note

On Windows, the backslash character should be used as a path separator.

Furthermore, a few standard configurations are provided. These are listed below.

Opta F7 and F24 XML feeds#
from socceraction.data.opta import OptaLoader

api = OptaLoader(root="data/opta", parser="xml")

The root directory should have the following structure:

root
├── f7-{competition_id}-{season_id}.xml
├── f24-{competition_id}-{season_id}-{game_id}.xml
└── ...
Opta F1, F9 and F24 JSON feeds#
from socceraction.data.opta import OptaLoader

api = OptaLoader(root="data/opta", parser="json")

The root directory should have the following structure:

root
├── f1-{competition_id}-{season_id}.json
├── f9-{competition_id}-{season_id}.json
├── f24-{competition_id}-{season_id}-{game_id}.json
└── ...
StatsPerform MA1 and MA3 JSON feeds#
from socceraction.data.opta import OptaLoader

api = OptaLoader(root="data/statsperform", parser="statsperform")

The root directory should have the following structure:

root
├── ma1-{competition_id}-{season_id}.json
├── ma3-{competition_id}-{season_id}-{game_id}.json
└── ...
WhoScored#

WhoScored.com is a popular website that provides detailed live match statistics. These statistics are compiled from Opta’s event feed, which can be scraped from the website’s source code using a library such as soccerdata. Once you have downloaded the raw JSON data, you can parse it using the OptaLoader with:

from socceraction.data.opta import OptaLoader

api = OptaLoader(root="data/whoscored", parser="whoscored")

The root directory should have the following structure:

root
├── {competition_id}-{season_id}-{game_id}.json
└── ...

Alternatively, the soccerdata library provides a wrapper that immediately returns a OptaLoader object for a scraped dataset.

import soccerdata as sd

# Setup a scraper for the 2021/2022 Premier League season
ws = sd.WhoScored(leagues="ENG-Premier League", seasons=2021)
# Scrape all games and return a OptaLoader object
api = ws.read_events(output_fmt='loader')

Warning

Scraping data from WhoScored.com violates their terms of service. Legally, scraping this data is therefore a grey area. If you decide to use this data anyway, this is your own responsibility.

Loading data#

Next, you can load the match event stream data and metadata by calling the corresponding methods on the OptaLoader object.

Loading data with kloppy#

Similarly to socceraction, kloppy implements a unified data model for soccer data. The main differences between kloppy and socceraction are: (1) kloppy supports more data sources (including tracking data), (2) kloppy uses a more flexible object-based data model in contrast to socceraction’s dataframe-based model, and (3) kloppy covers a more complete set of events while socceraction focuses on-the-ball events. Thus, we recommend using kloppy if you want to load data from a source that is not supported by socceraction or when your analysis is not limited to on-the-ball events.

The following code snippet shows how to load data from StatsBomb using kloppy:

from kloppy import statsbomb

dataset = statsbomb.load_open_data(match_id=8657)

Instructions for loading data from other sources can be found in the kloppy documentation.

You can then convert the data to the SPADL format using the convert_to_actions() function:

from socceraction.spadl.kloppy import convert_to_actions

spadl_actions = convert_to_actions(dataset, game_id=8657)

Note

Currently, the data model of kloppy is only complete for StatsBomb data. If you use kloppy to load data from other sources and convert it to the SPADL format, you may lose some information.

Data representation#

Socceraction uses a tabular action-oriented data format, as opposed to the formats by commercial vendors that describe events. The distinction is that actions are a subset of events that require a player to perform the action. For example, a passing event is an action, whereas an event signifying the end of the game is not an action. Unlike all other event stream formats, we always store the same attributes for each action. Excluding optional information snippets enables us to store the data in a table and more easily apply automatic analysis tools.

Socceraction implements two versions of this action-oriented data format: SPADL and Atomic-SPADL.

SPADL#

Definitions#

SPADL (Soccer Player Action Description Language) represents a game as a sequence of on-the-ball actions \([a_1, a_2, . . . , a_m]\), where \(m\) is the total number of actions that happened in the game. Each action is a tuple of the same twelve attributes:

Attribute

Description

game_id

the ID of the game in which the action was performed

period_id

the ID of the game period in which the action was performed

seconds

the action’s start time

player

the player who performed the action

team

the player’s team

start_x

the x location where the action started

start_y

the y location where the action started

end_x

the x location where the action ended

end_y

the y location where the action ended

action_type

the type of the action (e.g., pass, shot, dribble)

result

the result of the action (e.g., success or fail)

bodypart

the player’s body part used for the action

Start and End Locations

SPADL uses a standardized coordinate system with the origin on the bottom left of the pitch, and a uniform field of 105m x 68m. For direction of play, SPADL uses the “home team attacks to the right” convention, but this can be converted conveniently with the play_left_to_right() function such that the lower x-coordinates represent the own half of the team performing the action.

_images/spadl_coordinates.png
Action Type

The action type attribute can have 22 possible values. These are pass, cross, throw-in, crossed free kick, short free kick, crossed corner, short corner, take-on, foul, tackle, interception, shot, penalty shot, free kick shot, keeper save, keeper claim, keeper punch, keeper pick-up, clearance, bad touch, dribble and goal kick. A detailed definition of each action type is available here.

Result

The result attribute can either have the value success, to indicate that an action achieved it’s intended result; or the value fail, if this was not the case. An example of a successful action is a pass which reaches a teammate. An example of an unsuccessful action is a pass which goes over the sideline. Some action types can have special results. These are offside (for passes, corners and free-kicks), own goal (for shots), and yellow card and red card (for fouls).

Body Part

The body part attribute can have 4 possible values. These are foot, head, other and none. For Wyscout, which does not distinguish between the head and other body parts a special body part head/other is used.

All actions, except for some dribbles, are derived from an event in the original event stream data. They can be linked back to the original data by the original_event_id attribute. Synthetic dribbles are added to fill gaps between two events. These synthetic dribbles do not have an original_event_id.

Example#

Socceraction currently implements converters for StatsBomb, Wyscout, and Opta event stream data. We’ll use StatsBomb data to illustrate the API, but the API of the other converters is identical.

First, we load the event stream data of the third place play-off in the 2018 FIFA World Cup between Belgium and England.

from socceraction.data.statsbomb import StatsBombLoader

SBL = StatsBombLoader()
df_events = SBL.events(game_id=8657)

These events can now be converted to SPADL using the convert_to_actions() function of the StatsBomb converter.

import socceraction.spadl as spadl

df_actions = spadl.statsbomb.convert_to_actions(df_events, home_team_id=777)

The obtained dataframe represents the body part, result, action type, players and teams with numeric IDs. The code below adds their corresponding names.

df_actions = (
  spadl
  .add_names(df_actions)  # add actiontype and result names
  .merge(SBL.teams(game_id=8657))  # add team names
  .merge(SBL.players(game_id=8657))  # add player names
)

Below are the five actions in the SPADL format leading up to Belgium’s second goal.

game_id

period_id

seconds

team

player

start_x

start_y

end_x

end_y

actiontype

result

bodypart

8657

2

2179

Belgium

Witsel

37.1

44.8

53.8

48.2

pass

success

foot

8657

2

2181

Belgium

De Bruyne

53.8

48.2

70.6

42.2

dribble

success

foot

8657

2

2184

Belgium

De Bruyne

70.6

42.2

87.4

49.1

pass

success

foot

8657

2

2185

Belgium

Hazard

87.4

49.1

97.9

38.7

dribble

success

foot

8657

2

2187

Belgium

Hazard

97.9

38.7

105

37.4

shot

success

foot

Here is the same phase visualized using the matplotsoccer package

_images/eden_hazard_goal_spadl.png

See also

This notebook gives an example of the complete pipeline to download public StatsBomb data and convert it to the SPADL format.

Atomic-SPADL#

Definitions#

Atomic-SPADL is an alternative version of SPADL which removes the result attribute from SPADL and adds a few new action types. Each action is a now a tuple of the following eleven attributes:

Attribute

Description

game_id

the ID of the game in which the action was performed

period_id

the ID of the game period in which the action was performed

seconds

the action’s start time

player

the player who performed the action

team

the player’s team

x

the x location where the action started

y

the y location where the action started

dx

the distance covered by the action along the x-axis

dy

the distance covered by the action along the y-axis

action_type

the type of the action (e.g., pass, shot, dribble)

bodypart

the player’s body part used for the action

In this representation, all actions are atomic in the sense that they are always completed successfully without interruption. Consequently, while SPADL treats a pass as one action consisting of both the initiation and receival of the pass, Atomic-SPADL sees giving and receiving a pass as two separate actions. Because not all passes successfully reach a teammate, Atomic-SPADL introduces an interception action if the ball was intercepted by the other team or an out event if the ball went out of play. Atomic-SPADL similarly divides shots, freekicks, and corners into two separate actions. Practically, the effect is that this representation helps to distinguish the contribution of the player who initiates the action (e.g., gives the pass) and the player who completes the action (e.g., receives the pass).

Example#

SPADL actions can be converted to their atomic version with the convert_to_atomic() function.

import socceraction.atomic.spadl as atomicspadl

df_atomic_actions = atomicspadl.convert_to_atomic(df_actions)

This is what Belgium’s second goal against England in the third place play-off in the 2018 FIFA world cup looks like in the Atomic-SPADL format.

game_id

period_id

seconds

team

player

x

y

dx

dy

actiontype

bodypart

8657

2

2179

Belgium

Witsel

37.1

44.8

0.0

0.0

dribble

foot

8657

2

2179

Belgium

Witsel

37.1

44.8

16.8

3.4

pass

foot

8657

2

2180

Belgium

De Bruyne

53.8

48.2

0.0

0.0

receival

foot

8657

2

2181

Belgium

De Brunne

53.8

48.2

16.8

-6.0

dribble

foot

8657

2

2184

Belgium

De Bruyne

70.6

42.2

16.8

6.9

pass

foot

8657

2

2184

Belgium

Hazard

87.4

49.1

0.0

0.0

receival

foot

8657

2

2185

Belgium

Hazard

87.4

49.1

10.6

-10.3

dribble

foot

8657

2

2187

Belgium

Hazard

97.9

38.7

7.1

-1.4

shot

foot

8657

2

2187

Belgium

Hazard

105.0

37.4

0.0

0.0

goal

foot

_images/eden_hazard_goal_atomicspadl.png

See also

This notebook gives an example of the complete pipeline to download public StatsBomb data and convert it to the Atommic-SPADL format.

Valuing actions#

Once you’ve collected the data and converted it to the SPADL format, you can start valuing the contributions of soccer players. This document gives a general introduction to action-valuing frameworks and links to a detailed discussion of the three implemented frameworks.

General idea#

When considering event stream data, a soccer match can be viewed as a sequence of \(n\) consecutive on-the-ball actions \(\left[a_1, a_2, \ldots, a_n\right]\) (e.g., [pass, dribble,…, interception]). Action-valuing frameworks aim to assign a numeric value to each of these individual actions that quantifies how much the action contributed towards winning the game. This value should reflect both the circumstances under which it was performed as well as its longer-term effects. This is illustrated in the figure below:

a sequence of actions with action values

However, rather than directly assigning values to actions, the existing approaches all start by assigning values to game states. To illustrate the underlying intuition, consider the pass below:

example action

The effect of the pass was to change the game state:

example action changes gamestate

The figure on the left shows the game in state \(S_{i−1} = \{a_1,\dots,a_{i−1}\}\), right before Benzema passes to Valverde and the one on the right shows the game in state \(S_i = \{a_1, \ldots, a_{i−1}, a_i\}\) just after Valverde successfully controlled the pass.

Consequently, a natural way to assess the usefulness of an action is to assign a value to each game state. Then an action’s usefulness is simply the difference between the post-action game state \(S_i\) and pre-action game state \(S_{i-1}\). This can be expressed as:

\[U(a_i) = V(S_i) - V(S_{i-1}),\]

where \(V\) captures the value of a particular game state.

The differences between different action-valuing frameworks arise in terms of (1) how they represent a game state \(S_i\), that is, define features such as the ball’s location or score difference that capture relevant aspects of the game at a specific point in time; and (2) assign a value \(V\) to a specific game state.

Implemented frameworks#

The socceraction package implements three frameworks to assess the impact of the individual actions performed by soccer players: Expected Threat (xT), VAEP and Atomic-VAEP.

Expected Threat (xT)#

The expected threat or xT model is a possession-based model. That is, it divides matches into possessions, which are periods of the game where the same team has control of the ball. The key insights underlying xT are that (1) players perform actions with the intention to increase their team’s chance of scoring, and (2) the chance of scoring can be adequately captured by only considering the location of the ball.

Point (2) means that xT represents a game state solely by using the current location of the ball. Therefore, xT overlays a \(M \times N\) grid on the pitch in order to divide it into zones. Each zone \(z\) is then assigned a value \(xT(z)\) that reflects how threatening teams are at that location, in terms of scoring. These xT values are illustrated in the figure below.

_images/default_xt_grid.png

The value of each zone can be learned with a Markov decision process. The corresponding code is shown below. For an intuitive explanation of how this works, we refer to Karun’s blog post.

import pandas as pd
from socceraction.data.statsbomb import StatsBombLoader
import socceraction.spadl as spadl
import socceraction.xthreat as xthreat

# 1. Load a set of actions to train the model on
SBL = StatsBombLoader()
df_games = SBL.games(competition_id=43, season_id=3)
dataset = [
    {
        **game,
        'actions': spadl.statsbomb.convert_to_actions(
            events=SBL.events(game['game_id']),
            home_team_id=game['home_team_id']
        )
    }
    for game in df_games.to_dict(orient='records')
]

# 2. Convert direction of play + add names
df_actions_ltr = pd.concat([
  spadl.play_left_to_right(game['actions'], game['home_team_id'])
  for game in dataset
])
df_actions_ltr = spadl.add_names(df_actions_ltr)

# 3. Train xT model with 16 x 12 grid
xTModel = xthreat.ExpectedThreat(l=16, w=12)
xTModel.fit(df_actions_ltr)

# 4. Rate ball-progressing actions
# xT should only be used to value actions that move the ball
# and that keep the current team in possession of the ball
df_mov_actions = xthreat.get_successful_move_actions(df_actions_ltr)
df_mov_actions["xT_value"] = xTModel.rate(df_mov_actions)

See also

This notebook gives an example of the complete pipeline to train and apply an xT model.

VAEP#

VAEP (Valuing Actions by Estimating Probabilities) is based on the insight that players tend to perform actions with two possible intentions:

  1. increase the chance of scoring a goal in the short-term future and/or,

  2. decrease the chance of conceding a goal in the short-term future.

Valuing an action then requires assessing the change in probability for both scoring and conceding as a result of an action. Thus, VAEP values a game state as:

\[V(S_i) = P_{score}(S_i, t) - P_{concede}(S_i, t),\]

where \(P_{score}(S_i, t)\) and \(P_{concede}(S_i, t)\) are the probabilities that team \(t\) which possesses the ball in state \(S_i\) will respectively score or concede in the next 10 actions.

The remaining challenge is to “learn” \(P_{score}(S_i, t)\) and \(P_{concede}(S_i, t)\). That is, a gradient boosted binary classifier is trained on historical data to predict how a game state will turn out based on what happened in similar game states that arose in past games. VAEP also uses a more complex representation of the game state: it considers the three last actions that happened during the game: \(S_i = \{a_{i-2}, a_{i−1}, a_i\}\). With the code below, you can convert the SPADL action of the game to these game states:

import socceraction.vaep.features as fs

# 1. convert actions to game states
gamestates = fs.gamestates(actions, 3)
gamestates = fs.play_left_to_right(gamestates, home_team_id)

Then each game state is represented using three types of features. The first category of features includes characteristics of the action itself such as its location and type as well as more complex relationships such as the distance and angle to the goal. The second category of features captures the context of the action, such as the current tempo of the game, by comparing the properties of consecutive actions. Examples of this type of feature include the distance covered and time elapsed between consecutive actions. The third category of features captures the current game context by looking at things such as the time remaining in the match and the current score differential. The table below gives an overview the features that can be used to encoded a gamestate \(S_i = \{a_{i-2}, a_{i−1}, a_i\}\):

Transformer

Feature

Description

actiontype()

actiontype(_onehot)_ai

The (one-hot encoding) of the action’s type.

result()

result(_onehot)_ai

The (one-hot encoding) of the action’s result.

bodypart()

actiontype(_onehot)_ai

The (one-hot encoding) of the bodypart used to perform the action.

time()

time_ai

Time in the match the action takes place, recorded to the second.

startlocation()

start_x_ai

The x pitch coordinate of the action’s start location.

start_y_ai

The y pitch coordinate of the action’s start location.

endlocation()

end_x_ai

The x pitch coordinate of the action’s end location.

end_y_ai

The y pitch coordinate of the action’s end location.

startpolar()

start_dist_to_goal_ai

The distance to the center of the goal from the action’s start location.

start_angle_to_goal_ai

The angle between the action’s start location and center of the goal.

endpolar()

end_dist_to_goal_ai

The distance to the center of the goal from the action’s end location.

end_angle_to_goal_ai

The angle between the action’s end location and center of the goal.

movement()

dx_ai

The distance covered by the action along the x-axis.

dy_ai

The distance covered by the action along the y-axis.

movement_ai

The total distance covered by the action.

team()

team_ai

Boolean indicating whether the team that had possesion in action \(a_{i-2}\) still has possession in the current action.

time_delta()

time_delta_i

Seconds elapsed between \(a_{i-2}\) and the current action.

space_delta()

dx_a0i

The distance covered by action \(a_{i-2}\) to \(a_{i}\) along the x-axis.

dy_a0i

The distance covered by action \(a_{i-2}\) to \(a_{i}\) along the y-axis.

mov_a0i

The total distance covered by action \(a_{i-2}\) to \(a_{i}\).

goalscore()

goalscore_team

The number of goals scored by the team executing the action.

goalscore_opponent

The number of goals scored by the other team.

goalscore_diff

The goal difference between both teams.

import socceraction.vaep.features as fs

# 2. compute features
xfns = [fs.actiontype, fs.result, ...]
X = pd.concat([fn(gamestates) for fn in xfns], axis=1)

For estimating \(P_{score}(S_i, t)\), each game state is given a positive label (= 1) if the team that possesses the ball after action \(a_i\) scores a goal in the subsequent \(k\) actions. Otherwise, a negative label (= 0) is given to the game state. Analogously, for estimating \(P_{concede}(S_i, t)\), each game state is given a positive label (= 1) if the team that possesses the ball after action \(a_i\) concedes a goal in the subsequent \(k\) actions. If not, a negative label (= 0) is given to the game state.

import socceraction.vaep.labels as lab

# 3. compute labels
yfns = [lab.scores, lab.concedes]
Y = pd.concat([fn(actions) for fn in yfns], axis=1)

VAEP models the scoring and conceding probabilities separately as these effects may be asymmetric in nature and context-dependent. Hence, it trains one gradient boosted tree model to predict each one based on the current game state.

# 4. load or train models
models = {
  "scores": Classsifier(...)
  "concedes": Classsifier(...)
}

# 5. predict scoring and conceding probabilities for each game state
for col in ["scores", "concedes"]:
    Y_hat[col] = models[col].predict_proba(testX)

Using these probabilities, VAEP defines the offensive value of an action as the change in scoring probability before and after the action.

\[\Delta P_\textrm{score}(a_{i}, t) = P^{k}_\textrm{score}(S_i, t) - P^{k}_\textrm{score}(S_{i-1}, t)\]

This change will be positive if the action increased the probability that the team which performed the action will score (e.g., a successful tackle to recover the ball). Similarly, VAEP defines the defensive value of an action as the change in conceding probability.

\[\Delta P_\textrm{concede}(a_{i}, t) = P^{k}_\textrm{concede}(S_i, t) - P^{k}_\textrm{concede}(S_{i-1}, t)\]

This change will be positive if the action increased the probability that the team will concede a goal (e.g., a failed pass). Finally, the total VAEP value of an action is the difference between that action’s offensive value and defensive value.

\[V_\textrm{VAEP}(a_i) = \Delta P_\textrm{score}(a_{i}, t) - \Delta P_\textrm{concede}(a_{i}, t)\]
import socceraction.vaep.formula as vaepformula

# 6. compute VAEP value
values = vaepformula.value(actions, Y_hat["scores"], Y_hat["concedes"])

See also

A set of notebooks illustrates the complete pipeline to train and apply a VAEP model:

  1. compute features and labels

  2. estimate scoring and conceding probabilities

  3. compute VAEP values and top players

Atomic-VAEP#

When building models to value actions, a heavy point of debate is how to handle the results of actions. In other words, should our model make a distinction between a failed and a successful pass or not? On the one hand, an action should be valued on all its properties, and whether or not the action was successful (e.g., did a pass receive a teammate, was a shot converted into a goal) plays a crucial role in how useful the action was. That is, if you want to measure a player’s contribution during a match, successful actions are important. This is the viewpoint of SPADL and VAEP.

On the other hand, including the result of an action intertwines the contribution of the player who started the action (e.g., provides the pass) and the player who completes it (e.g., receives the pass). Perhaps a pass was not successful because of its recipient’s poor touch or because he was not paying attention. It would seem unfair to penalize the player who provided the pass in such a circumstance. Hence, it can be useful to generalize over possible results of an action to arrive at an action’s “expected value”.

The combination of Atomic-SPADL and VAEP accomodates this alternative viewpoint. Atomic-SPADL removes the “result” attribute from SPADL and adds a few new action and event types. This affects the features that can be computed to represent each game state. By default, Atomic-VAEP uses the following features to encoded a gamestate \(S_i = \{a_{i-2}, a_{i−1}, a_i\}\):

Transformer

Feature

Description

actiontype()

actiontype(_onehot)_ai

The (one-hot encoding) of the action’s type.

bodypart()

actiontype(_onehot)_ai

The (one-hot encoding) of the bodypart used to perform the action.

time()

time_ai

Time in the match the action takes place, recorded to the second.

team()

team_ai

Boolean indicating whether the team that had possesion in action \(a_{i-2}\) still has possession in the current action.

time_delta()

time_delta_i

Seconds elapsed between \(a_{i-2}\) and the current action.

location()

x_ai

The x pitch coordinate of the action.

y_ai

The y pitch coordinate of the action.

polar()

dist_to_goal_ai

The distance to the center of the goal.

angle_to_goal_ai

The angle between the start location and center of the goal.

movement_polar()

mov_d_ai

The distance covered by the action.

mov_angle_ai

The direction in which the action was executed (relative to the top left of the field).

direction()

dx_ai

Direction of the action, expressed as the x-component of the unit vector.

dy_ai

Direction of the action, expressed as the y-component of the unit vector.

goalscore()

goalscore_team

The number of goals scored by the team executing the action.

goalscore_opponent

The number of goals scored by the other team.

goalscore_diff

The goal difference between both teams.

The computation of the labels and the VAEP formula are similar to the standard VAEP model.

Empirically, we have noticed two benefits of using the Atomic-SPADL representation. First, the standard SPADL representation tends to assign shots a value that is the difference between the shot’s true outcome and its xG score. Hence, goals or a number of misses, particularly for players who do not take a lot of shots can have an outsized effect on their VAEP score. In contrast, Atomic-SPADL assigns shots a value closer to their xG score, which often better matches domain experts’ intuitions on action values.

Second, Atomic-SPADL leads to more robust action values and player ratings. A good rating system should capture the true quality of all players. Although some fluctuations in performances are possible across games, over the course of a season a few outstanding performances (possibly stemming from a big portion of luck) should not dramatically alter an assessment of a player. In our prior work comparing VAEP to xT, one advantage of xT was that it produced more stable ratings. Using Atomic-SPADL helps alleviate this weakness.

See also

A set of notebooks illustrates the complete pipeline to train and apply an Atomic-VAEP model:

  1. compute features and labels

  2. estimate scoring and conceding probabilities

  3. compute VAEP values and top players

FAQ#

Q: What is socceraction? Socceraction is an open source Python package that primarily provides an implementation of the VAEP possession value framework. However, the package also provides a number of other features, such as API clients for loading data from the most popular data providers and converters for each of these data provider’s proprietary data formats to a common action-based data format (i.e., SPADL) that enables subsequent data analysis. Therefore, socceraction can take away some of the heavy data preprocessing burden from researchers and data scientists who are interested in working with soccer event stream data.

Q: Where can I get event stream data? Both StatsBomb and Wyscout provide a free sample of their data. Alternatively, you can buy a subscription to the event data feed from StatsBomb, Wyscout or Opta (Stats Perform). Instructions on how to load the data from each of these sources with socceraction are available in the documentation.

Q: What license is socceraction released under? Socceraction is released under the MIT license. You are free to use, modify and redistribute socceraction in any way you see fit. However, if you do use socceraction in your research, please cite our research papers. When you use socceraction in public work or when building a product or service using socceraction, we kindly request that you include the following attribution text in all advertising and documentation:

This product includes socceraction created by the <a href="https://dtai.cs.kuleuven.be/sports/">DTAI Sports Analytics lab</a>,
available from <a href="https://github.com/ML-KULeuven/socceraction">https://github.com/ML-KULeuven/socceraction</a>.

socceraction.data#

StatsBomb

Module for loading StatsBomb event data

Opta

Module for loading Opta event data and the derived formats used by Stats Perform and WhoScored

Wyscout

Module for loading Wyscout event data

socceraction.data.base#

Implements serializers for the event data of various providers.

Serializers#

socceraction.data.base.EventDataLoader

Load event data either from a remote location or from a local folder.

Schema#

socceraction.data.schema.CompetitionSchema

Definition of a dataframe containing a list of competitions and seasons.

socceraction.data.schema.TeamSchema

Definition of a dataframe containing the list of teams of a game.

socceraction.data.schema.PlayerSchema

Definition of a dataframe containing the list of players on the teamsheet of a game.

socceraction.data.schema.GameSchema

Definition of a dataframe containing a list of games.

socceraction.data.schema.EventSchema

Definition of a dataframe containing event stream data of a game.

socceraction.data.statsbomb#

Module for loading StatsBomb event data.

Serializers#

socceraction.data.statsbomb.StatsBombLoader

Load Statsbomb data either from a remote location or from a local folder.

Schema#

socceraction.data.statsbomb.StatsBombCompetitionSchema

Definition of a dataframe containing a list of competitions and seasons.

socceraction.data.statsbomb.StatsBombTeamSchema

Definition of a dataframe containing the list of teams of a game.

socceraction.data.statsbomb.StatsBombPlayerSchema

Definition of a dataframe containing the list of players of a game.

socceraction.data.statsbomb.StatsBombGameSchema

Definition of a dataframe containing a list of games.

socceraction.data.statsbomb.StatsBombEventSchema

Definition of a dataframe containing event stream data of a game.

socceraction.data.opta#

Module for loading Opta event data.

Serializers#

socceraction.data.opta.OptaLoader

Load Opta data feeds from a local folder.

Schema#

socceraction.data.opta.OptaCompetitionSchema

Definition of a dataframe containing a list of competitions and seasons.

socceraction.data.opta.OptaTeamSchema

Definition of a dataframe containing the list of teams of a game.

socceraction.data.opta.OptaPlayerSchema

Definition of a dataframe containing the list of players of a game.

socceraction.data.opta.OptaGameSchema

Definition of a dataframe containing a list of games.

socceraction.data.opta.OptaEventSchema

Definition of a dataframe containing event stream data of a game.

socceraction.data.wyscout#

Module for loading Wyscout event data.

Serializers#

socceraction.data.wyscout.WyscoutLoader

Load event data either from a remote location or from a local folder.

socceraction.data.wyscout.PublicWyscoutLoader

Load the public Wyscout dataset.

Schema#

socceraction.data.wyscout.WyscoutCompetitionSchema

Definition of a dataframe containing a list of competitions and seasons.

socceraction.data.wyscout.WyscoutTeamSchema

Definition of a dataframe containing the list of players of a game.

socceraction.data.wyscout.WyscoutPlayerSchema

Definition of a dataframe containing the list of teams of a game.

socceraction.data.wyscout.WyscoutGameSchema

Definition of a dataframe containing a list of games.

socceraction.data.wyscout.WyscoutEventSchema

Definition of a dataframe containing event stream data of a game.

socceraction.spadl#

Implementation of the SPADL language.

Converters#

socceraction.spadl.statsbomb.convert_to_actions

Convert StatsBomb events to SPADL actions.

socceraction.spadl.opta.convert_to_actions

Convert Opta events to SPADL actions.

socceraction.spadl.wyscout.convert_to_actions

Convert Wyscout events to SPADL actions.

socceraction.spadl.kloppy.convert_to_actions

Convert a Kloppy event data set to SPADL actions.

Schema#

socceraction.spadl.SPADLSchema

Definition of a SPADL dataframe.

Config#

socceraction.spadl.config.field_length

Convert a string or number to a floating point number, if possible.

socceraction.spadl.config.field_width

Convert a string or number to a floating point number, if possible.

socceraction.spadl.config.actiontypes

Built-in mutable sequence.

socceraction.spadl.config.bodyparts

Built-in mutable sequence.

socceraction.spadl.config.results

Built-in mutable sequence.

Utility functions#

socceraction.spadl.play_left_to_right

Perform all action in the same playing direction.

socceraction.spadl.add_names

Add the type name, result name and bodypart name to a SPADL dataframe.

socceraction.spadl.actiontypes_df

Return a dataframe with the type id and type name of each SPADL action type.

socceraction.spadl.bodyparts_df

Return a dataframe with the bodypart id and bodypart name of each SPADL action type.

socceraction.spadl.results_df

Return a dataframe with the result id and result name of each SPADL action type.

socceraction.xthreat#

Implements the xT framework.

Model#

socceraction.xthreat.ExpectedThreat

An implementation of the Expected Threat (xT) model.

Utility functions#

socceraction.xthreat.load_model

Create a model from a pre-computed xT value surface.

socceraction.xthreat.get_move_actions

Get all ball-progressing actions.

socceraction.xthreat.get_successful_move_actions

Get all successful ball-progressing actions.

socceraction.xthreat.scoring_prob

Compute the probability of scoring when taking a shot for each cell.

socceraction.xthreat.action_prob

Compute the probability of taking an action in each cell of the grid.

socceraction.xthreat.move_transition_matrix

Compute the move transition matrix from the given actions.

socceraction.vaep#

Implements the VAEP framework.

Model#

socceraction.vaep.VAEP

An implementation of the VAEP framework.

Utility functions#

socceraction.vaep.features

Implements the feature tranformers of the VAEP framework.

socceraction.vaep.labels

Implements the label tranformers of the VAEP framework.

socceraction.vaep.formula

Implements the formula of the VAEP framework.

socceraction.atomic.spadl#

Converters#

socceraction.atomic.spadl.convert_to_atomic

Convert regular SPADL actions to atomic actions.

Schema#

socceraction.atomic.spadl.AtomicSPADLSchema

Definition of an Atomic-SPADL dataframe.

Config#

socceraction.atomic.spadl.config.field_length

Convert a string or number to a floating point number, if possible.

socceraction.atomic.spadl.config.field_width

Convert a string or number to a floating point number, if possible.

socceraction.atomic.spadl.config.actiontypes

Built-in mutable sequence.

socceraction.atomic.spadl.config.bodyparts

Built-in mutable sequence.

Utility functions#

socceraction.atomic.spadl.play_left_to_right

Perform all action in the same playing direction.

socceraction.atomic.spadl.add_names

Add the type name, result name and bodypart name to an Atomic-SPADL dataframe.

socceraction.atomic.spadl.actiontypes_df

Return a dataframe with the type id and type name of each Atomic-SPADL action type.

socceraction.atomic.spadl.bodyparts_df

Return a dataframe with the bodypart id and bodypart name of each SPADL action type.

socceraction.atomic.vaep#

Implements the Atomic-VAEP framework.

Model#

socceraction.atomic.vaep.AtomicVAEP

An implementation of the VAEP framework for atomic actions.

Utility functions#

socceraction.atomic.vaep.features

Implements the feature tranformers of the VAEP framework.

socceraction.atomic.vaep.labels

Implements the label tranformers of the Atomic-VAEP framework.

socceraction.atomic.vaep.formula

Implements the formula of the Atomic-VAEP framework.

Contributor guide#

This document lays out guidelines and advice for contributing to this project. If you’re thinking of contributing, please start by reading this document and getting a feel for how contributing to this project works. If you have any questions, feel free to reach out to either Tom Decroos, or Pieter Robberechts, the primary maintainers.

The guide is split into sections based on the type of contribution you’re thinking of making.

Bug reports#

Bug reports are hugely important! Before you raise one, though, please check through the GitHub issues, both open and closed, to confirm that the bug hasn’t been reported before.

When filing an issue, make sure to answer these questions:

  • Which Python version are you using?

  • Which version of socceraction are you using?

  • What did you do?

  • What did you expect to see?

  • What did you see instead?

The best way to get your bug fixed is to provide a test case, and/or steps to reproduce the issue.

Feature requests#

Socceraction is not actively developed. It’s primary use is to enable reproducability of our research. If you believe there is a feature missing, feel free to raise a feature request on the Issue Tracker, but please do be aware that the overwhelming likelihood is that your feature request will not be accepted.

Documentation contributions#

Documentation improvements are always welcome! The documentation files live in the docs/ directory of the codebase. They’re written in reStructuredText, and use Sphinx to generate the full suite of documentation.

You do not have to setup a development environment to make small changes to the docs. Instead, you can edit files directly on GitHub and suggest changes.

When contributing documentation, please do your best to follow the style of the documentation files. This means a soft-limit of 79 characters wide in your text files and a semi-formal, yet friendly and approachable, prose style.

When presenting Python code, use single-quoted strings ('hello' instead of "hello").

Code contributions#

If you intend to contribute code, do not feel the need to sit on your contribution until it is perfectly polished and complete. It helps everyone involved for you to seek feedback as early as you possibly can. Submitting an early, unfinished version of your contribution for feedback can save you from putting a lot of work into a contribution that is not suitable for the project.

Setting up your development environment#

You need Python 3.7.1+ and the following tools:

Install the package with development requirements:

$ poetry install

You can now run an interactive Python session.

$ poetry run python

Steps for submitting code#

When contributing code, you’ll want to follow this checklist:

  1. Fork the repository on GitHub.

  2. Run the tests to confirm they all pass on your system. If they don’t, you’ll need to investigate why they fail. If you’re unable to diagnose this yourself, raise it as a bug report.

  3. Write tests that demonstrate your bug or feature. Ensure that they fail.

  4. Make your change.

  5. Run the entire test suite again, confirming that all tests pass including the ones you just added.

  6. Make sure your code follows the code style discussed below.

  7. Send a GitHub Pull Request to the main repository’s master branch. GitHub Pull Requests are the expected method of code collaboration on this project.

Testing the project#

Download the test data:

$ poetry run python tests/datasets/download.py

Run the full test suite:

$ nox

List the available Nox sessions:

$ nox --list-sessions

You can also run a specific Nox session. For example, invoke the unit test suite like this:

$ nox --session=tests

Unit tests are located in the tests directory, and are written using the pytest testing framework.

Code style#

The socceraction codebase uses the PEP 8 code style. In addition, we have a few guidelines:

  • Line-length can exceed 79 characters, to 100, when convenient.

  • Line-length can exceed 100 characters, when doing otherwise would be terribly inconvenient.

  • Always use single-quoted strings (e.g. '#soccer'), unless a single-quote occurs within the string.

To ensure all code conforms to this format. You can format the code using the pre-commit hooks.

$ nox --session=pre-commit

Docstrings are to follow the numpydoc guidelines.

Submitting changes#

Open a pull request to submit changes to this project.

Your pull request needs to meet the following guidelines for acceptance:

  • The Nox test suite must pass without errors and warnings.

  • Include unit tests.

  • If your changes add functionality, update the documentation accordingly.

Feel free to submit early, though. We can always iterate on this.

To run linting and code formatting checks before committing your change, you can install pre-commit as a Git hook by running the following command:

$ nox --session=pre-commit -- install

It is recommended to open an issue before starting work on anything.

First steps#

Are you new to soccer event stream data and possession value frameworks? Check out our interactive explainer and watch Lotte Bransen’s and Jan Van Haaren’s presentation in Friends of Tracking. Once familiar with the basic concepts, you can move on to the quickstart guide or continue with the hands-on video tutorials of the Friends of Tracking series:

  • Valuing actions in soccer (video, slides)

    This presentation expands on the content of the introductory presentation by discussing the technicalities behind the VAEP framework for valuing actions of soccer players as well as the content of the hands-on video tutorials in more depth.

  • Tutorial 1: Run pipeline (video, notebook, notebook on Google Colab)

    This tutorial demonstrates the entire pipeline of ingesting the raw Wyscout match event data to producing ratings for soccer players. This tutorial touches upon the following four topics: downloading and preprocessing the data, valuing game states, valuing actions and rating players.

  • Tutorial 2: Generate features (video, notebook, notebook on Google Colab)

    This tutorial demonstrates the process of generating features and labels. This tutorial touches upon the following three topics: exploring the data in the SPADL representation, constructing features to represent actions and constructing features to represent game states.

  • Tutorial 3: Learn models (video, notebook, notebook on Google Colab)

    This tutorial demonstrates the process of splitting the dataset into a training set and a test set, learning baseline models using conservative hyperparameters for the learning algorithm, optimizing the hyperparameters for the learning algorithm and learning the final models.

  • Tutorial 4: Analyze models and results (video, notebook, notebook on Google Colab)

    This tutorial demonstrates the process of analyzing the importance of the features that are included in the trained machine learning models, analyzing the predictions for specific game states, and analyzing the resulting player ratings.

Note

The video tutorials are based on version 0.2.0 of the socceraction library. If a more recent version of the library is installed, the code may need to be adapted.

Getting help#

Having trouble? We’d like to help!

Contributing#

Learn about the development process itself and about how you can contribute in our developer guide.

Research#

If you make use of this package in your research, please consider citing the following papers.

  • Tom Decroos, Lotte Bransen, Jan Van Haaren, and Jesse Davis. “Actions speak louder than goals: Valuing player actions in soccer.” In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1851-1861. 2019.

    [pdf, bibtex]

  • Maaike Van Roy, Pieter Robberechts, Tom Decroos, and Jesse Davis. “Valuing on-the-ball actions in soccer: a critical comparison of xT and VAEP.” In Proceedings of the AAAI-20 Workshop on Artifical Intelligence in Team Sports. AI in Team Sports Organising Committee, 2020.

    [pdf, bibtex]