Preprocessing Raw Gaze Data#
What you will learn in this tutorial:#
how to transform pixel coordinates into degrees of visual angle
how to transform positional data into velocity data
Preparations#
We import pymovements
as the alias pm
for convenience.
[1]:
import pymovements as pm
Let’s start by downloading our ToyDataset
and loading in its data:
[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
INFO:pymovements.dataset.dataset:
You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
Please cite the referenced publication if you intend to use the dataset in your research.
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 305.10it/s]
[2]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (3 columns, 20 rows)shape: (20, 3)
text_id page_id filepath i64 i64 str 0 1 "aeye-lab-pymovements-toy-datas… 0 2 "aeye-lab-pymovements-toy-datas… 0 3 "aeye-lab-pymovements-toy-datas… 0 4 "aeye-lab-pymovements-toy-datas… 0 5 "aeye-lab-pymovements-toy-datas… … … … 3 1 "aeye-lab-pymovements-toy-datas… 3 2 "aeye-lab-pymovements-toy-datas… 3 3 "aeye-lab-pymovements-toy-datas… 3 4 "aeye-lab-pymovements-toy-datas… 3 5 "aeye-lab-pymovements-toy-datas…
-
-
list (20 items)
-
Gaze
-
DataFrame (6 columns, 17223 rows)shape: (17_223, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
-
Gaze
-
DataFrame (6 columns, 29799 rows)shape: (29_799, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 2008305 -1.0 -1.0 0 2 [141.4, 153.6] 2008306 -1.0 -1.0 0 2 [141.1, 153.2] 2008307 -1.0 -1.0 0 2 [140.7, 152.8] 2008308 -1.0 -1.0 0 2 [140.6, 152.7] 2008309 -1.0 -1.0 0 2 [140.5, 152.6] … … … … … … 2038099 -1.0 -1.0 0 2 [273.8, 773.8] 2038100 -1.0 -1.0 0 2 [273.8, 774.1] 2038101 -1.0 -1.0 0 2 [273.9, 774.5] 2038102 -1.0 -1.0 0 2 [274.0, 774.4] 2038103 -1.0 -1.0 0 2 [274.0, 773.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
- (18 more)
-
Gaze
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
-
list (0 items)
-
list (0 items)
We can verify that all files have been loaded in by checking the fileinfo
attribute:
[3]:
dataset.fileinfo
[3]:
{'gaze': shape: (20, 3)
┌─────────┬─────────┬─────────────────────────────────┐
│ text_id ┆ page_id ┆ filepath │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════════╪═════════╪═════════════════════════════════╡
│ 0 ┆ 1 ┆ aeye-lab-pymovements-toy-datas… │
│ 0 ┆ 2 ┆ aeye-lab-pymovements-toy-datas… │
│ 0 ┆ 3 ┆ aeye-lab-pymovements-toy-datas… │
│ 0 ┆ 4 ┆ aeye-lab-pymovements-toy-datas… │
│ 0 ┆ 5 ┆ aeye-lab-pymovements-toy-datas… │
│ … ┆ … ┆ … │
│ 3 ┆ 1 ┆ aeye-lab-pymovements-toy-datas… │
│ 3 ┆ 2 ┆ aeye-lab-pymovements-toy-datas… │
│ 3 ┆ 3 ┆ aeye-lab-pymovements-toy-datas… │
│ 3 ┆ 4 ┆ aeye-lab-pymovements-toy-datas… │
│ 3 ┆ 5 ┆ aeye-lab-pymovements-toy-datas… │
└─────────┴─────────┴─────────────────────────────────┘}
Now let’s inpect our gaze dataframe:
[4]:
dataset.gaze[0]
[4]:
-
DataFrame (6 columns, 17223 rows)shape: (17_223, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
Apart from some trial identifier columns we see the columns time
and pixel
.
Preprocessing#
We now want to transform these pixel position coordinates into coordinates in degrees of visual angle. This is simply done by:
[5]:
dataset.pix2deg()
dataset.gaze[0]
[5]:
-
DataFrame (7 columns, 17223 rows)shape: (17_223, 7)
time stimuli_x stimuli_y text_id page_id pixel position i64 f64 f64 i64 i64 list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
The processed result has been added as a new column named position
to our gaze dataframe.
Additionally we would like to have velocity data available too. We have four different methods available:
preceding
: this will just take the single preceding sample in account for velocity calculation. Most noisy variant.neighbors
: this will take the neighboring samples in account for velocity calculation. A bit less noisy.smooth
: this will increase the neighboring samples to two on each side. You can get a smooth conversion this way.savitzky_golay
: this is using the Savitzky-Golay differentiation filter for conversion. You can specify additional parameters likewindow_length
anddegree
. Depending on your parameters this will lead to the best results.
Let’s use the fivepoint
method first:
[6]:
dataset.pos2vel(method='fivepoint')
dataset.gaze[0]
[6]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
The processed result has been added as a new column named velocity
to our gaze dataframe.
We can also use the Savitzky-Golay differentiation filter with some additional parameters like this:
[7]:
dataset.pos2vel(method='savitzky_golay', degree=2, window_length=7)
dataset.gaze[0]
[7]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [1.207641, -3.119165] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [1.20764, -4.072198] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.035119, -4.765267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [1.207654, -4.245382] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [1.552735, -2.339263] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-62.062479, -20.465552] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-61.343786, -18.073031] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-53.501231, -14.617634] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [-41.879965, -10.276475] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [-27.710881, -6.112645] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
This has overwritten our velocity columns. As we see, the values in the velocity columns are slightly different.
What you have learned in this tutorial:#
transforming pixel coordinates into degrees of visual angle by using
Dataset.pix2deg()
transforming positional data into velocity data by using
Dataset.pos2vel()
passing additional keyword arguments when using the Savitzky-Golay differentiation filter