Working with Local Dataset#

In this tutorial, we will show how to use your own local dataset with the Dataset class. The Dataset class can help you to manage and process your eyetracking data.

Preparations#

We import pymovements as the alias pm for convenience.

[1]:
import pymovements as pm

For demonstration purposes, we will use the raw data provided by the Toy dataset, a sample dataset that comes with pymovements.

We will download the resources of this dataset the directory to simulate a local dataset for you. All downloaded archive files are automatically extracted and then removed. The directory of the dataset will be data/my_dataset.

After that we won’t use the python class anymore and delete the object (the files on your system will stay in place). Don’t worry if you’re confused about these lines as they are not relevant to your use case.

Just keep in mind that we now have some files with gaze data in the directory data/my_dataset.

[2]:
toy_dataset = pm.Dataset('ToyDataset', path='data/my_dataset')
toy_dataset.download(remove_finished=True)

del toy_dataset
INFO:pymovements.dataset.dataset:
        You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
        host or distribute any dataset resources and only provides a convenient interface to
        download the public dataset resources that were published by their respective authors.

        Please cite the referenced publication if you intend to use the dataset in your research.

Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/my_dataset/downloads/pymovements-toy-dataset.zip
Checking integrity of pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/my_dataset/raw
100%|██████████| 23/23 [00:00<00:00, 345.74it/s]

Define your Experiment#

To use the Dataset class, we first need to create an Experiment instance. This class represents the properties of the experiment, such as the screen dimensions and sampling rate.

[3]:
experiment = pm.gaze.Experiment(
    screen_width_px=1280,
    screen_height_px=1024,
    screen_width_cm=38,
    screen_height_cm=30.2,
    distance_cm=68,
    origin='upper left',
    sampling_rate=1000,
)

Parameters for File Parsing#

We also define a filename_format which is a pattern expression used to match and extract values from filenames of data files in the dataset. For example, r'trial_{text_id:d}_{page_id:d}.csv' will match filenames that follow the pattern trial_{text_id}_{page_id}.csv and extract the values of text_id and page_id for each file.

[4]:
filename_format = {'gaze': r'trial_{text_id:d}_{page_id:d}.csv'}

Both values of text_id and page_id are numeric. We can use a map to define the casting of these values.

[5]:
filename_format_schema_overrides = {'gaze': {
    'text_id': int,
    'page_id': int,
},
}

We can also adjust how the CSV files are read. Here, we specify that the separator in the CSV files is a tab (‘:nbsphinx-math:`t’`).

[6]:
custom_read_kwargs = {
    'gaze': {'separator': '\t'},
}

Column Definitions#

The trial_columns argument can be used to specify which columns define a single trial.

This is important for correctly applying all preprocessing methods.

For this very small single user dataset a trial is just defined by text_id and page_id.

[7]:
trial_columns = ['text_id', 'page_id']

The time_column and pixel_columns arguments can be used to correctly map the columns in your dataframes. If the time unit differs from the default milliseconds ms one must also specify the time_unit for correct computations.

Depending on the content of your dataset, you can alternatively also provide position_columns, velocity_columns and acceleration_columns.

Specifying these columns is needed for correctly applying preprocessing methods. For example, if you want to apply the pix2deg method, you will need to specify pixel_columns accordingly.

If your dataset has gaze positions available only in degrees of visual angle, you have to specify the position_columns instead.

[8]:
time_column = 'timestamp'
time_unit = 'ms'
pixel_columns = ['x', 'y']

Define and load the Dataset#

Next we use all these definitions and create a DatasetDefinition by passing in the root directory, Experiment instance, and other optional parameters such as the filename regular expression and custom CSV reading parameters.

[9]:
dataset_definition = pm.DatasetDefinition(
    name='my_dataset',
    has_files={'gaze': True, 'precomputed_events': False, 'precomputed_reading_measures': False},
    experiment=experiment,
    filename_format=filename_format,
    filename_format_schema_overrides=filename_format_schema_overrides,
    custom_read_kwargs=custom_read_kwargs,
    time_column=time_column,
    time_unit=time_unit,
    pixel_columns=pixel_columns,
)

Finally we create a Dataset instance by using the DatasetDefinition and specifying the directory path.

[10]:
dataset = pm.Dataset(
    definition=dataset_definition,
    path='data/my_dataset/',
)

If we have a root data directory which holds all your local datasets we can further need to define the paths of the dataset.

The dataset, raw, preprocessed, and events parameters define the names of the directories for the dataset, raw data, preprocessed data, and events data, respectively.

[11]:
dataset_paths = pm.DatasetPaths(
    root='data/',
    raw='raw',
    preprocessed='preprocessed',
    events='events',
)

dataset = pm.Dataset(
    definition=dataset_definition,
    path=dataset_paths,
)

Now let’s load the dataset into memory. Here we select a subset including the first page of texts with ID 1 and 2.

[12]:
subset = {
    'text_id': [1, 2],
    'page_id': 1,
}

dataset.load(subset=subset)
[12]:
Dataset
  • DatasetDefinition
    DatasetDefinition
    • None
      None
    • dict (0 items)
      • dict (1 items)
        • dict (1 items)
          • '\t'
            '\t'
      • None
        None
      • Experiment
        Experiment
        • EyeTracker
          EyeTracker
          • None
            None
          • None
            None
          • None
            None
          • None
            None
          • 1000
            1000
          • None
            None
          • None
            None
        • 1000
          1000
        • Screen
          Screen
          • 68
            68
          • 30.2
            30.2
          • 1024
            1024
          • 'upper left'
            'upper left'
          • 38
            38
          • 1280
            1280
          • 15.599386487782953
            15.599386487782953
          • -15.599386487782953
            -15.599386487782953
          • 12.508044410882546
            12.508044410882546
          • -12.508044410882546
            -12.508044410882546
      • None
        None
      • dict (1 items)
        • 'trial_{text_id:d}_{page_id:d}.csv'
          'trial_{text_id:d}_{page_id:d}.csv'
      • dict (1 items)
        • dict (2 items)
          • <class 'int'>
            <class 'int'>
          • <class 'int'>
            <class 'int'>
      • True
        True
      • None
        None
      • dict (0 items)
        • 'my_dataset'
          'my_dataset'
        • list (2 items)
          • 'x'
          • 'y'
        • None
          None
        • list (1 items)
          • ResourceDefinition
            • 'gaze'
              'gaze'
            • None
              None
            • 'trial_{text_id:d}_{page_id:d}.csv'
              'trial_{text_id:d}_{page_id:d}.csv'
            • dict (2 items)
              • <class 'int'>
                <class 'int'>
              • <class 'int'>
                <class 'int'>
            • None
              None
            • None
              None
        • 'timestamp'
          'timestamp'
        • 'ms'
          'ms'
        • None
          None
        • None
          None
      • list (0 items)
        • dict (1 items)
          • DataFrame (3 columns, 2 rows)
            shape: (2, 3)
            text_idpage_idfilepath
            i64i64str
            11"aeye-lab-pymovements-toy-datas…
            21"aeye-lab-pymovements-toy-datas…
        • list (2 items)
          • Gaze
            • DataFrame (6 columns, 23054 rows)
              shape: (23_054, 6)
              timestimuli_xstimuli_ytext_idpage_idpixel
              i64f64f64i64i64list[f64]
              2415266-1.0-1.011[176.8, 140.2]
              2415267-1.0-1.011[176.7, 139.8]
              2415268-1.0-1.011[176.7, 139.3]
              2415269-1.0-1.011[176.6, 139.3]
              2415270-1.0-1.011[176.7, 139.3]
              2438315-1.0-1.011[649.9, 633.9]
              2438316-1.0-1.011[650.1, 633.7]
              2438317-1.0-1.011[650.2, 633.5]
              2438318-1.0-1.011[650.0, 633.2]
              2438319-1.0-1.011[649.7, 633.1]
            • Events
              Events
              • DataFrame (6 columns, 0 rows)
                shape: (0, 6)
                text_idpage_idnameonsetoffsetduration
                i64i64stri64i64i64
              • list (2 items)
                • 'text_id'
                • 'page_id'
            • list (2 items)
              • 'text_id'
              • 'page_id'
            • Experiment
              Experiment
              • EyeTracker
                EyeTracker
                • None
                  None
                • None
                  None
                • None
                  None
                • None
                  None
                • 1000
                  1000
                • None
                  None
                • None
                  None
              • 1000
                1000
              • Screen
                Screen
                • 68
                  68
                • 30.2
                  30.2
                • 1024
                  1024
                • 'upper left'
                  'upper left'
                • 38
                  38
                • 1280
                  1280
                • 15.599386487782953
                  15.599386487782953
                • -15.599386487782953
                  -15.599386487782953
                • 12.508044410882546
                  12.508044410882546
                • -12.508044410882546
                  -12.508044410882546
          • Gaze
            • DataFrame (6 columns, 29660 rows)
              shape: (29_660, 6)
              timestimuli_xstimuli_ytext_idpage_idpixel
              i64f64f64i64i64list[f64]
              1788369-1.0-1.021[106.2, 90.3]
              1788370-1.0-1.021[107.2, 91.6]
              1788371-1.0-1.021[109.9, 94.4]
              1788372-1.0-1.021[113.3, 98.2]
              1788373-1.0-1.021[118.3, 102.7]
              1818024-1.0-1.021[357.0, 715.0]
              1818025-1.0-1.021[357.1, 714.9]
              1818026-1.0-1.021[357.1, 714.9]
              1818027-1.0-1.021[357.2, 714.5]
              1818028-1.0-1.021[357.2, 714.0]
            • Events
              Events
              • DataFrame (6 columns, 0 rows)
                shape: (0, 6)
                text_idpage_idnameonsetoffsetduration
                i64i64stri64i64i64
              • list (2 items)
                • 'text_id'
                • 'page_id'
            • list (2 items)
              • 'text_id'
              • 'page_id'
            • Experiment
              Experiment
              • EyeTracker
                EyeTracker
                • None
                  None
                • None
                  None
                • None
                  None
                • None
                  None
                • 1000
                  1000
                • None
                  None
                • None
                  None
              • 1000
                1000
              • Screen
                Screen
                • 68
                  68
                • 30.2
                  30.2
                • 1024
                  1024
                • 'upper left'
                  'upper left'
                • 38
                  38
                • 1280
                  1280
                • 15.599386487782953
                  15.599386487782953
                • -15.599386487782953
                  -15.599386487782953
                • 12.508044410882546
                  12.508044410882546
                • -12.508044410882546
                  -12.508044410882546
        • PosixPath('data/my_dataset')
          PosixPath('data/my_dataset')
        • DatasetPaths
          DatasetPaths
          • PosixPath('data/my_dataset')
            PosixPath('data/my_dataset')
          • PosixPath('data/my_dataset/downloads')
            PosixPath('data/my_dataset/downloads')
          • PosixPath('data/my_dataset/events')
            PosixPath('data/my_dataset/events')
          • PosixPath('data/my_dataset/precomputed_events')
            PosixPath('data/my_dataset/precomputed_events')
          • PosixPath
            PosixPath('data/my_dataset/precomputed_reading_measures')
          • PosixPath('data/my_dataset/preprocessed')
            PosixPath('data/my_dataset/preprocessed')
          • PosixPath('data/my_dataset/raw')
            PosixPath('data/my_dataset/raw')
          • PosixPath('data')
            PosixPath('data')
        • list (0 items)
          • list (0 items)

            Use the Dataset#

            Once we have created the Dataset instance, we can use its methods to preprocess and analyze data in our local dataset.

            [13]:
            
            dataset.gaze[0]
            
            [13]:
            
            Gaze
            • DataFrame (6 columns, 23054 rows)
              shape: (23_054, 6)
              timestimuli_xstimuli_ytext_idpage_idpixel
              i64f64f64i64i64list[f64]
              2415266-1.0-1.011[176.8, 140.2]
              2415267-1.0-1.011[176.7, 139.8]
              2415268-1.0-1.011[176.7, 139.3]
              2415269-1.0-1.011[176.6, 139.3]
              2415270-1.0-1.011[176.7, 139.3]
              2438315-1.0-1.011[649.9, 633.9]
              2438316-1.0-1.011[650.1, 633.7]
              2438317-1.0-1.011[650.2, 633.5]
              2438318-1.0-1.011[650.0, 633.2]
              2438319-1.0-1.011[649.7, 633.1]
            • Events
              Events
              • DataFrame (6 columns, 0 rows)
                shape: (0, 6)
                text_idpage_idnameonsetoffsetduration
                i64i64stri64i64i64
              • list (2 items)
                • 'text_id'
                • 'page_id'
            • list (2 items)
              • 'text_id'
              • 'page_id'
            • Experiment
              Experiment
              • EyeTracker
                EyeTracker
                • None
                  None
                • None
                  None
                • None
                  None
                • None
                  None
                • 1000
                  1000
                • None
                  None
                • None
                  None
              • 1000
                1000
              • Screen
                Screen
                • 68
                  68
                • 30.2
                  30.2
                • 1024
                  1024
                • 'upper left'
                  'upper left'
                • 38
                  38
                • 1280
                  1280
                • 15.599386487782953
                  15.599386487782953
                • -15.599386487782953
                  -15.599386487782953
                • 12.508044410882546
                  12.508044410882546
                • -12.508044410882546
                  -12.508044410882546

            Here we use the pix2deg method to convert the pixel coordinates to degrees of visual angle.

            [14]:
            
            dataset.pix2deg()
            
            dataset.gaze[0]
            
            [14]:
            
            Gaze
            • DataFrame (7 columns, 23054 rows)
              shape: (23_054, 7)
              timestimuli_xstimuli_ytext_idpage_idpixelposition
              i64f64f64i64i64list[f64]list[f64]
              2415266-1.0-1.011[176.8, 140.2][-11.420403, -9.148145]
              2415267-1.0-1.011[176.7, 139.8][-11.422806, -9.157834]
              2415268-1.0-1.011[176.7, 139.3][-11.422806, -9.169943]
              2415269-1.0-1.011[176.6, 139.3][-11.42521, -9.169943]
              2415270-1.0-1.011[176.7, 139.3][-11.422806, -9.169943]
              2438315-1.0-1.011[649.9, 633.9][0.260146, 3.038748]
              2438316-1.0-1.011[650.1, 633.7][0.265149, 3.033792]
              2438317-1.0-1.011[650.2, 633.5][0.26765, 3.028836]
              2438318-1.0-1.011[650.0, 633.2][0.262648, 3.021402]
              2438319-1.0-1.011[649.7, 633.1][0.255144, 3.018924]
            • Events
              Events
              • DataFrame (6 columns, 0 rows)
                shape: (0, 6)
                text_idpage_idnameonsetoffsetduration
                i64i64stri64i64i64
              • list (2 items)
                • 'text_id'
                • 'page_id'
            • list (2 items)
              • 'text_id'
              • 'page_id'
            • Experiment
              Experiment
              • EyeTracker
                EyeTracker
                • None
                  None
                • None
                  None
                • None
                  None
                • None
                  None
                • 1000
                  1000
                • None
                  None
                • None
                  None
              • 1000
                1000
              • Screen
                Screen
                • 68
                  68
                • 30.2
                  30.2
                • 1024
                  1024
                • 'upper left'
                  'upper left'
                • 38
                  38
                • 1280
                  1280
                • 15.599386487782953
                  15.599386487782953
                • -15.599386487782953
                  -15.599386487782953
                • 12.508044410882546
                  12.508044410882546
                • -12.508044410882546
                  -12.508044410882546

            We can use the pos2vel method to calculate the velocity of the gaze position.

            [15]:
            
            dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)
            
            dataset.gaze[0]
            
            [15]:
            
            Gaze
            • DataFrame (8 columns, 23054 rows)
              shape: (23_054, 8)
              timestimuli_xstimuli_ytext_idpage_idpixelpositionvelocity
              i64f64f64i64i64list[f64]list[f64]list[f64]
              2415266-1.0-1.011[176.8, 140.2][-11.420403, -9.148145][-0.772495, -4.238523]
              2415267-1.0-1.011[176.7, 139.8][-11.422806, -9.157834][-0.686663, -4.671012]
              2415268-1.0-1.011[176.7, 139.3][-11.422806, -9.169943][-0.257498, -3.806023]
              2415269-1.0-1.011[176.6, 139.3][-11.42521, -9.169943][1.459231, -1.557032]
              2415270-1.0-1.011[176.7, 139.3][-11.422806, -9.169943][4.034446, 1.556983]
              2438315-1.0-1.011[649.9, 633.9][0.260146, 3.038748][0.268004, -3.451512]
              2438316-1.0-1.011[650.1, 633.7][0.265149, 3.033792][-0.357339, -3.982536]
              2438317-1.0-1.011[650.2, 633.5][0.26765, 3.028836][-0.982682, -3.982549]
              2438318-1.0-1.011[650.0, 633.2][0.262648, 3.021402][-1.69736, -3.54005]
              2438319-1.0-1.011[649.7, 633.1][0.255144, 3.018924][-2.233368, -2.389544]
            • Events
              Events
              • DataFrame (6 columns, 0 rows)
                shape: (0, 6)
                text_idpage_idnameonsetoffsetduration
                i64i64stri64i64i64
              • list (2 items)
                • 'text_id'
                • 'page_id'
            • list (2 items)
              • 'text_id'
              • 'page_id'
            • Experiment
              Experiment
              • EyeTracker
                EyeTracker
                • None
                  None
                • None
                  None
                • None
                  None
                • None
                  None
                • 1000
                  1000
                • None
                  None
                • None
                  None
              • 1000
                1000
              • Screen
                Screen
                • 68
                  68
                • 30.2
                  30.2
                • 1024
                  1024
                • 'upper left'
                  'upper left'
                • 38
                  38
                • 1280
                  1280
                • 15.599386487782953
                  15.599386487782953
                • -15.599386487782953
                  -15.599386487782953
                • 12.508044410882546
                  12.508044410882546
                • -12.508044410882546
                  -12.508044410882546