Loading Wyscout data¶
The WyscoutLoader
class provides an API client enabling you to fetch
Wyscout event stream data as Pandas DataFrames. This document provides an
overview of the available data sources and how to access them.
Note
Currently, only version 2 of the Wyscout API is supported. See https://github.com/ML-KULeuven/socceraction/issues/156 for progress on version 3 support.
Connecting to a data store¶
First, you have to create a WyscoutLoader
object and configure it
for the data store you want to use. The WyscoutLoader
supports
loading data from the official Wyscout API and from local files. Additionally,
the PublicWyscoutLoader
class can be used to load a publicly
available dataset.
Wyscout API¶
Wyscout API access requires a separate subscription. Wyscout currently offers three different packs: a Database Pack (match sheet data), a Stats Pack (statistics derived from match event data), and an Events Pack (raw match event data). A subscription to the Events Pack is required to access the event stream data.
Authentication can be done by setting environment variables named
WY_USERNAME
and WY_PASSWORD
to your login credentials (i.e., client id
and secret). Alternatively, the constructor accepts an argument creds
to
pass your login credentials in the format {"user": "", "passwd": ""}
.
from socceraction.data.wyscout import WyscoutLoader
# set authentication credentials as environment variables
import os
os.environ["WY_USERNAME"] = "your_client_id"
os.environ["WY_PASSWORD"] = "your_secret"
api = WyscoutLoader(getter="remote")
# or provide authentication credentials as a dictionary
api = WyscoutLoader(getter="remote", creds={"user": "", "passwd": ""})
Local directory¶
Data can also be loaded from a local directory. This local directory
can be specified by passing the root
argument to the constructor,
specifying the path to the local data directory.
from socceraction.data.wyscout import WyscoutLoader
ap = WyscoutLoader(getter="local", root="data/wyscout")
The loader uses the directory structure and file names to determine which files should be parsed to retrieve the requested data. Therefore, the local directory should have a predefined file hierarchy. By default, it expects following file hierarchy:
root
├── competitions.json
├── seasons_<competition_id>.json
├── matches_<season_id>.json
└── matches
├── events_<game_id>.json
└── ...
If your local directory has a different file hierarchy, you can specify
this custom hierarchy by passing the feeds
argument to the constructor.
A wide range of file names and directory structures are supported. However,
the competition, season, and game identifiers must be included in the file
names to be able to locate the corresponding files for each entity.
from socceraction.data.wyscout import WyscoutLoader
ap = WyscoutLoader(getter="local", root="data/wyscout", feeds={
"competitions": "competitions.json",
"seasons": "seasons_{competition_id}.json",
"games": "matches_{season_id}.json",
"events": "matches/events_{game_id}.json",
}))
The {competition_id}
, {season_id}
, and {game_id}
placeholders
will be replaced by the corresponding id values when data is retrieved.
Soccer logs dataset¶
As part of the “A public data set of spatio-temporal match events in soccer competitions” paper, Wyscout made an event stream dataset available for research purposes. The dataset covers the 2017/18 season of the Spanish, Italian, English, German, and French first division. In addition, it includes the data of the 2018 World Cup and the 2016 European championship. The dataset is available at https://figshare.com/collections/Soccer_match_event_dataset/4415000/2.
As the format of this dataset is slightly different from the format of the
official Wyscout API, a separate PublicWyscoutLoader
class is
provided to load this dataset. This loader will download the dataset once and
extract it to the specified root
directory.
from socceraction.data.wyscout import PublicWyscoutLoader
api = PublicWyscoutLoader(root="data/wyscout")
Loading data¶
Next, you can load the match event stream data and metadata by calling the
corresponding methods on the WyscoutLoader
object.