Saving and Loading Preprocessed Data#
What you will learn in this tutorial:#
how to save your preprocessed data
how to load your preprocessed data
Preparations#
We import pymovements
as the alias pm
for convenience.
[1]:
import pymovements as pm
Let’s start by downloading our ToyDataset
and loading in its data:
[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
dataset.download()
dataset.load()
INFO:pymovements.dataset.dataset:
You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
host or distribute any dataset resources and only provides a convenient interface to
download the public dataset resources that were published by their respective authors.
Please cite the referenced publication if you intend to use the dataset in your research.
Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 310.77it/s]
[2]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (3 columns, 20 rows)shape: (20, 3)
text_id page_id filepath i64 i64 str 0 1 "aeye-lab-pymovements-toy-datas… 0 2 "aeye-lab-pymovements-toy-datas… 0 3 "aeye-lab-pymovements-toy-datas… 0 4 "aeye-lab-pymovements-toy-datas… 0 5 "aeye-lab-pymovements-toy-datas… … … … 3 1 "aeye-lab-pymovements-toy-datas… 3 2 "aeye-lab-pymovements-toy-datas… 3 3 "aeye-lab-pymovements-toy-datas… 3 4 "aeye-lab-pymovements-toy-datas… 3 5 "aeye-lab-pymovements-toy-datas…
-
-
list (20 items)
-
Gaze
-
DataFrame (6 columns, 17223 rows)shape: (17_223, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
-
Gaze
-
DataFrame (6 columns, 29799 rows)shape: (29_799, 6)
time stimuli_x stimuli_y text_id page_id pixel i64 f64 f64 i64 i64 list[f64] 2008305 -1.0 -1.0 0 2 [141.4, 153.6] 2008306 -1.0 -1.0 0 2 [141.1, 153.2] 2008307 -1.0 -1.0 0 2 [140.7, 152.8] 2008308 -1.0 -1.0 0 2 [140.6, 152.7] 2008309 -1.0 -1.0 0 2 [140.5, 152.6] … … … … … … 2038099 -1.0 -1.0 0 2 [273.8, 773.8] 2038100 -1.0 -1.0 0 2 [273.8, 774.1] 2038101 -1.0 -1.0 0 2 [273.9, 774.5] 2038102 -1.0 -1.0 0 2 [274.0, 774.4] 2038103 -1.0 -1.0 0 2 [274.0, 773.9] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
- (18 more)
-
Gaze
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
-
list (0 items)
-
list (0 items)
Now let’s load in the data and do some preprocessing:
[3]:
dataset.pix2deg()
dataset.pos2vel()
dataset.gaze[0]
[3]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
We have now added some additional columns for degrees in visual angle and velocity.
Saving#
Saving your preprocessed data is as simple as:
[4]:
dataset.save_preprocessed()
[4]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (3 columns, 20 rows)shape: (20, 3)
text_id page_id filepath i64 i64 str 0 1 "aeye-lab-pymovements-toy-datas… 0 2 "aeye-lab-pymovements-toy-datas… 0 3 "aeye-lab-pymovements-toy-datas… 0 4 "aeye-lab-pymovements-toy-datas… 0 5 "aeye-lab-pymovements-toy-datas… … … … 3 1 "aeye-lab-pymovements-toy-datas… 3 2 "aeye-lab-pymovements-toy-datas… 3 3 "aeye-lab-pymovements-toy-datas… 3 4 "aeye-lab-pymovements-toy-datas… 3 5 "aeye-lab-pymovements-toy-datas…
-
-
list (20 items)
-
Gaze
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
-
Gaze
-
DataFrame (8 columns, 29799 rows)shape: (29_799, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 2008305 -1.0 -1.0 0 2 [141.4, 153.6] [-12.268583, -8.823284] [null, null] 2008306 -1.0 -1.0 0 2 [141.1, 153.2] [-12.275749, -8.832989] [null, null] 2008307 -1.0 -1.0 0 2 [140.7, 152.8] [-12.285302, -8.842695] [-5.572617, -6.065816] 2008308 -1.0 -1.0 0 2 [140.6, 152.7] [-12.28769, -8.845121] [-3.582268, -4.043733] 2008309 -1.0 -1.0 0 2 [140.5, 152.6] [-12.290078, -8.847547] [-2.388085, -2.021821] … … … … … … … … 2038099 -1.0 -1.0 0 2 [273.8, 773.8] [-9.071149, 6.490168] [1.21962, 1.635403] 2038100 -1.0 -1.0 0 2 [273.8, 774.1] [-9.071149, 6.497527] [1.626175, 4.497406] 2038101 -1.0 -1.0 0 2 [273.9, 774.5] [-9.06871, 6.50734] [1.626186, 1.635423] 2038102 -1.0 -1.0 0 2 [274.0, 774.4] [-9.066271, 6.504886] [null, null] 2038103 -1.0 -1.0 0 2 [274.0, 773.9] [-9.066271, 6.492621] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
- (18 more)
-
Gaze
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
-
list (0 items)
-
list (0 items)
All of the preprocessed data is saved into this directory:
[5]:
dataset.paths.preprocessed
[5]:
PosixPath('data/ToyDataset/preprocessed')
Let’s confirm it by printing all the new files in this directory:
[6]:
print(list(dataset.paths.preprocessed.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.feather'), PosixPath('data/ToyDataset/preprocessed/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.feather')]
All of the files have been saved into the Dataset.paths.preprocessed
as feather
files.
If we want to save the data into an alternative directory and also use a different file format like csv
we can use the following:
[7]:
dataset.save_preprocessed(preprocessed_dirname='preprocessed_csv', extension='csv')
[7]:
-
DatasetDefinitionDatasetDefinition
-
NoneNone
-
dict (0 items)
-
dict (1 items)
-
dict (4 items)
-
list (5 items)
- 'timestamp'
- 'x'
- (3 more)
-
dict (5 items)
-
Float64Float64
-
Float64Float64
- (3 more)
-
- (2 more)
-
-
-
NoneNone
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
NoneNone
-
dict (1 items)
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
-
dict (1 items)
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
-
TrueTrue
-
'pymovements Toy Dataset''pymovements Toy Dataset'
-
dict (0 items)
-
'ToyDataset''ToyDataset'
-
list (2 items)
- 'x'
- 'y'
-
NoneNone
-
list (1 items)
-
ResourceDefinition
-
'gaze''gaze'
-
'pymovements-toy-dataset.zip''pymovements-toy-dataset.zip'
-
'trial_{text_id:d}_{page_id:d}.csv''trial_{text_id:d}_{page_id:d}.csv'
-
dict (2 items)
-
<class 'int'><class 'int'>
-
<class 'int'><class 'int'>
-
-
'4da622457637a8181d86601fe17f3aa8''4da622457637a8181d86601fe17f3aa8'
-
str'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
-
-
ResourceDefinition
-
'timestamp''timestamp'
-
'ms''ms'
-
NoneNone
-
NoneNone
-
-
list (0 items)
-
dict (1 items)
-
DataFrame (3 columns, 20 rows)shape: (20, 3)
text_id page_id filepath i64 i64 str 0 1 "aeye-lab-pymovements-toy-datas… 0 2 "aeye-lab-pymovements-toy-datas… 0 3 "aeye-lab-pymovements-toy-datas… 0 4 "aeye-lab-pymovements-toy-datas… 0 5 "aeye-lab-pymovements-toy-datas… … … … 3 1 "aeye-lab-pymovements-toy-datas… 3 2 "aeye-lab-pymovements-toy-datas… 3 3 "aeye-lab-pymovements-toy-datas… 3 4 "aeye-lab-pymovements-toy-datas… 3 5 "aeye-lab-pymovements-toy-datas…
-
-
list (20 items)
-
Gaze
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
-
Gaze
-
DataFrame (8 columns, 29799 rows)shape: (29_799, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 2008305 -1.0 -1.0 0 2 [141.4, 153.6] [-12.268583, -8.823284] [null, null] 2008306 -1.0 -1.0 0 2 [141.1, 153.2] [-12.275749, -8.832989] [null, null] 2008307 -1.0 -1.0 0 2 [140.7, 152.8] [-12.285302, -8.842695] [-5.572617, -6.065816] 2008308 -1.0 -1.0 0 2 [140.6, 152.7] [-12.28769, -8.845121] [-3.582268, -4.043733] 2008309 -1.0 -1.0 0 2 [140.5, 152.6] [-12.290078, -8.847547] [-2.388085, -2.021821] … … … … … … … … 2038099 -1.0 -1.0 0 2 [273.8, 773.8] [-9.071149, 6.490168] [1.21962, 1.635403] 2038100 -1.0 -1.0 0 2 [273.8, 774.1] [-9.071149, 6.497527] [1.626175, 4.497406] 2038101 -1.0 -1.0 0 2 [273.9, 774.5] [-9.06871, 6.50734] [1.626186, 1.635423] 2038102 -1.0 -1.0 0 2 [274.0, 774.4] [-9.066271, 6.504886] [null, null] 2038103 -1.0 -1.0 0 2 [274.0, 773.9] [-9.066271, 6.492621] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
-
- (18 more)
-
Gaze
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
DatasetPathsDatasetPaths
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
PosixPath('data/ToyDataset/downloads')PosixPath('data/ToyDataset/downloads')
-
PosixPath('data/ToyDataset/events')PosixPath('data/ToyDataset/events')
-
PosixPath('data/ToyDataset/precomputed_events')PosixPath('data/ToyDataset/precomputed_events')
-
PosixPathPosixPath('data/ToyDataset/precomputed_reading_measures')
-
PosixPath('data/ToyDataset/preprocessed')PosixPath('data/ToyDataset/preprocessed')
-
PosixPath('data/ToyDataset/raw')PosixPath('data/ToyDataset/raw')
-
PosixPath('data/ToyDataset')PosixPath('data/ToyDataset')
-
-
list (0 items)
-
list (0 items)
Let’s confirm again by printing all the new files in this alternative directory:
[8]:
alternative_dirpath = dataset.path / 'preprocessed_csv'
print(list(alternative_dirpath.glob('*/*/*')))
[PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_2_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_3.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_5.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_1_2.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_4.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_0_1.csv'), PosixPath('data/ToyDataset/preprocessed_csv/aeye-lab-pymovements-toy-dataset-6cb5d66/data/trial_3_5.csv')]
Loading#
Now let’s imagine that this preprocessing and saving was done in another file and we only want to load the preprocessed data.
We simulate this by initializing a new dataset object. We don’t need to download any additional data.
[9]:
events_dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')
The preprocessed data can now simply be loaded by setting preprocessed
to True
:
[10]:
events_dataset.load(preprocessed=True)
events_dataset.gaze[0]
[10]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y pixel position velocity text_id page_id i64 f64 f64 list[f64] list[f64] list[f64] i64 i64 1988145 -1.0 -1.0 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 0 1 1988146 -1.0 -1.0 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 0 1 1988147 -1.0 -1.0 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 0 1 1988148 -1.0 -1.0 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 0 1 1988149 -1.0 -1.0 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] 0 1 … … … … … … … … 2005363 -1.0 -1.0 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 0 1 2005364 -1.0 -1.0 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 0 1 2005365 -1.0 -1.0 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 0 1 2005366 -1.0 -1.0 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 0 1 2005367 -1.0 -1.0 [351.2, 412.9] [-7.173881, -2.448686] [null, null] 0 1 -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
ExperimentExperiment
-
EyeTrackerEyeTracker
-
NoneNone
-
NoneNone
-
NoneNone
-
NoneNone
-
10001000
-
NoneNone
-
NoneNone
-
-
10001000
-
ScreenScreen
-
6868
-
30.230.2
-
10241024
-
'upper left''upper left'
-
3838
-
12801280
-
15.59938648778295315.599386487782953
-
-15.599386487782953-15.599386487782953
-
12.50804441088254612.508044410882546
-
-12.508044410882546-12.508044410882546
-
-
By default, the preprocessed
directory and the feather
extension will be chosen.
In case of alternative directory names or other file formats you can use the following:
[11]:
events_dataset.load(
preprocessed=True,
preprocessed_dirname='preprocessed_csv',
extension='csv',
)
events_dataset.gaze[0]
[11]:
-
DataFrame (8 columns, 17223 rows)shape: (17_223, 8)
time stimuli_x stimuli_y text_id page_id pixel position velocity i64 f64 f64 i64 i64 list[f64] list[f64] list[f64] 1988145 -1.0 -1.0 0 1 [206.8, 152.4] [-10.697598, -8.852399] [null, null] 1988146 -1.0 -1.0 0 1 [206.9, 152.1] [-10.695183, -8.859678] [null, null] 1988147 -1.0 -1.0 0 1 [207.0, 151.8] [-10.692768, -8.866956] [1.610194, -5.256267] 1988148 -1.0 -1.0 0 1 [207.1, 151.7] [-10.690352, -8.869381] [0.402548, -4.447465] 1988149 -1.0 -1.0 0 1 [207.0, 151.5] [-10.692768, -8.874233] [0.402561, -3.234462] … … … … … … … … 2005363 -1.0 -1.0 0 1 [361.0, 415.4] [-6.932438, -2.386672] [-63.266374, -21.085616] 2005364 -1.0 -1.0 0 1 [358.0, 414.5] [-7.006376, -2.408998] [-63.249652, -19.431326] 2005365 -1.0 -1.0 0 1 [355.8, 413.8] [-7.060582, -2.426362] [-60.359624, -15.710061] 2005366 -1.0 -1.0 0 1 [353.1, 413.2] [-7.12709, -2.441245] [null, null] 2005367 -1.0 -1.0 0 1 [351.2, 412.9] [-7.173881, -2.448686] [null, null] -
EventsEvents
-
DataFrame (6 columns, 0 rows)shape: (0, 6)
text_id page_id name onset offset duration i64 i64 str i64 i64 i64 -
list (2 items)
- 'text_id'
- 'page_id'
-
-
list (2 items)
- 'text_id'
- 'page_id'
-
NoneNone
What you have learned in this tutorial:#
saving your preprocesed data using
Dataset.save_preprocessed()
load your preprocesed data using
Dataset.load(preprocessed=True)
using custom directory names by specifying
preprocessed_dirname
using other file formats than the default
feather
format by specifyingextension