.. currentmodule:: socceraction.data.wyscout
=========================
Loading Wyscout data
=========================
The :class:`WyscoutLoader` class provides an API client enabling you to fetch
`Wyscout event stream data`_ as Pandas DataFrames. This document provides an
overview of the available data sources and how to access them.
.. note::
Currently, only version 2 of the Wyscout API is supported.
See https://github.com/ML-KULeuven/socceraction/issues/156
for progress on version 3 support.
--------------------------
Connecting to a data store
--------------------------
First, you have to create a :class:`WyscoutLoader` object and configure it
for the data store you want to use. The :class:`WyscoutLoader` supports
loading data from the official Wyscout API and from local files. Additionally,
the :class:`PublicWyscoutLoader` class can be used to load a publicly
available dataset.
Wyscout API
=============
`Wyscout API `_ access requires a separate
subscription. Wyscout currently offers `three different packs
`_: a Database Pack (match sheet
data), a Stats Pack (statistics derived from match event data), and an Events
Pack (raw match event data). A subscription to the Events Pack is required to
access the event stream data.
Authentication can be done by setting environment variables named
``WY_USERNAME`` and ``WY_PASSWORD`` to your login credentials (i.e., client id
and secret). Alternatively, the constructor accepts an argument ``creds`` to
pass your login credentials in the format ``{"user": "", "passwd": ""}``.
.. code-block:: python
from socceraction.data.wyscout import WyscoutLoader
# set authentication credentials as environment variables
import os
os.environ["WY_USERNAME"] = "your_client_id"
os.environ["WY_PASSWORD"] = "your_secret"
api = WyscoutLoader(getter="remote")
# or provide authentication credentials as a dictionary
api = WyscoutLoader(getter="remote", creds={"user": "", "passwd": ""})
Local directory
===============
Data can also be loaded from a local directory. This local directory
can be specified by passing the ``root`` argument to the constructor,
specifying the path to the local data directory.
.. code-block:: python
from socceraction.data.wyscout import WyscoutLoader
ap = WyscoutLoader(getter="local", root="data/wyscout")
The loader uses the directory structure and file names to determine which files
should be parsed to retrieve the requested data. Therefore, the local directory
should have a predefined file hierarchy. By default, it expects following file
hierarchy:
.. code-block::
root
├── competitions.json
├── seasons_.json
├── matches_.json
└── matches
├── events_.json
└── ...
If your local directory has a different file hierarchy, you can specify
this custom hierarchy by passing the ``feeds`` argument to the constructor.
A wide range of file names and directory structures are supported. However,
the competition, season, and game identifiers must be included in the file
names to be able to locate the corresponding files for each entity.
.. code-block:: python
from socceraction.data.wyscout import WyscoutLoader
ap = WyscoutLoader(getter="local", root="data/wyscout", feeds={
"competitions": "competitions.json",
"seasons": "seasons_{competition_id}.json",
"games": "matches_{season_id}.json",
"events": "matches/events_{game_id}.json",
}))
The ``{competition_id}``, ``{season_id}``, and ``{game_id}`` placeholders
will be replaced by the corresponding id values when data is retrieved.
Soccer logs dataset
===================
As part of the "A public data set of spatio-temporal match events in soccer
competitions" paper, Wyscout made an event stream dataset available for
research purposes. The dataset covers the 2017/18 season of the Spanish,
Italian, English, German, and French first division. In addition, it includes
the data of the 2018 World Cup and the 2016 European championship. The dataset
is available at https://figshare.com/collections/Soccer_match_event_dataset/4415000/2.
As the format of this dataset is slightly different from the format of the
official Wyscout API, a separate :class:`PublicWyscoutLoader` class is
provided to load this dataset. This loader will download the dataset once and
extract it to the specified ``root`` directory.
.. code-block:: python
from socceraction.data.wyscout import PublicWyscoutLoader
api = PublicWyscoutLoader(root="data/wyscout")
------------
Loading data
------------
Next, you can load the match event stream data and metadata by calling the
corresponding methods on the :class:`WyscoutLoader` object.
- :func:`WyscoutLoader.competitions()`
- :func:`WyscoutLoader.games()`
- :func:`WyscoutLoader.teams()`
- :func:`WyscoutLoader.players()`
- :func:`WyscoutLoader.events()`
.. _Wyscout event stream data: https://footballdata.wyscout.com/