Downloading Public Datasets#

What you will learn in this tutorial:#

  • how to download and extract one of the available public datasets

  • how to customize the default directory structure

Preparations#

We import pymovements as the alias pm for convenience.

[1]:
import pymovements as pm

pymovements provides a library of publicly available datasets.

You can browse through the available dataset definitions here: Datasets

For this tutorial we will limit ourselves to the ToyDataset due to its minimal space requirements.

Other datasets can be downloaded by simply replacing ToyDataset with one of the other available datasets.

## Initialization

First we initialize our public dataset by specifying its name and the root data directory.

Our dataset will then be placed in a directory with the name of the dataset:

[2]:
dataset = pm.Dataset('ToyDataset', path='data/ToyDataset')

dataset.path
[2]:
PosixPath('data/ToyDataset')

If you only want to specify a root directory which contains all your datasets, you can pass a DatasetPaths instance.

The directory of your dataset will have the same name as in the dataset definition.

[3]:
dataset_paths = pm.DatasetPaths(root='data/')
dataset = pm.Dataset('ToyDataset', path=dataset_paths)

dataset.path
[3]:
PosixPath('data/ToyDataset')

Can also specify an alternative dataset directory for your downloaded dataset.

[4]:
dataset_paths_alt = pm.DatasetPaths(root='data/', dataset='my_dataset')
dataset_alt = pm.Dataset('ToyDataset', path=dataset_paths_alt)

dataset_alt.path
[4]:
PosixPath('data/my_dataset')

Downloading#

The dataset will then be downloaded by calling:

[5]:
dataset.download()
INFO:pymovements.dataset.dataset:
        You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
        host or distribute any dataset resources and only provides a convenient interface to
        download the public dataset resources that were published by their respective authors.

        Please cite the referenced publication if you intend to use the dataset in your research.

Using already downloaded and verified file: data/ToyDataset/downloads/pymovements-toy-dataset.zip
Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
100%|██████████| 23/23 [00:00<00:00, 304.27it/s]
[5]:
Dataset
  • DatasetDefinition
    DatasetDefinition
    • None
      None
    • dict (0 items)
      • dict (1 items)
        • dict (4 items)
          • list (5 items)
            • 'timestamp'
            • 'x'
            • (3 more)
          • dict (5 items)
            • Float64
              Float64
            • Float64
              Float64
            • (3 more)
          • (2 more)
      • None
        None
      • Experiment
        Experiment
        • EyeTracker
          EyeTracker
          • None
            None
          • None
            None
          • None
            None
          • None
            None
          • 1000
            1000
          • None
            None
          • None
            None
        • 1000
          1000
        • Screen
          Screen
          • 68
            68
          • 30.2
            30.2
          • 1024
            1024
          • 'upper left'
            'upper left'
          • 38
            38
          • 1280
            1280
          • 15.599386487782953
            15.599386487782953
          • -15.599386487782953
            -15.599386487782953
          • 12.508044410882546
            12.508044410882546
          • -12.508044410882546
            -12.508044410882546
      • None
        None
      • dict (1 items)
        • 'trial_{text_id:d}_{page_id:d}.csv'
          'trial_{text_id:d}_{page_id:d}.csv'
      • dict (1 items)
        • dict (2 items)
          • <class 'int'>
            <class 'int'>
          • <class 'int'>
            <class 'int'>
      • True
        True
      • 'pymovements Toy Dataset'
        'pymovements Toy Dataset'
      • dict (0 items)
        • 'ToyDataset'
          'ToyDataset'
        • list (2 items)
          • 'x'
          • 'y'
        • None
          None
        • list (1 items)
          • ResourceDefinition
            • 'gaze'
              'gaze'
            • 'pymovements-toy-dataset.zip'
              'pymovements-toy-dataset.zip'
            • 'trial_{text_id:d}_{page_id:d}.csv'
              'trial_{text_id:d}_{page_id:d}.csv'
            • dict (2 items)
              • <class 'int'>
                <class 'int'>
              • <class 'int'>
                <class 'int'>
            • '4da622457637a8181d86601fe17f3aa8'
              '4da622457637a8181d86601fe17f3aa8'
            • str
              'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
        • 'timestamp'
          'timestamp'
        • 'ms'
          'ms'
        • None
          None
        • None
          None
      • list (0 items)
        • DataFrame (0 columns, 0 rows)
          shape: (0, 0)
        • list (0 items)
          • PosixPath('data/ToyDataset')
            PosixPath('data/ToyDataset')
          • DatasetPaths
            DatasetPaths
            • PosixPath('data/ToyDataset')
              PosixPath('data/ToyDataset')
            • PosixPath('data/ToyDataset/downloads')
              PosixPath('data/ToyDataset/downloads')
            • PosixPath('data/ToyDataset/events')
              PosixPath('data/ToyDataset/events')
            • PosixPath('data/ToyDataset/precomputed_events')
              PosixPath('data/ToyDataset/precomputed_events')
            • PosixPath
              PosixPath('data/ToyDataset/precomputed_reading_measures')
            • PosixPath('data/ToyDataset/preprocessed')
              PosixPath('data/ToyDataset/preprocessed')
            • PosixPath('data/ToyDataset/raw')
              PosixPath('data/ToyDataset/raw')
            • PosixPath('data')
              PosixPath('data')
          • list (0 items)
            • list (0 items)

              As we see from the download message, the dataset resource has been downloaded to a downloads directory.

              You can get the path to this directory from the Datset.paths.downloads attribute:

              [6]:
              
              dataset.paths.downloads
              
              [6]:
              
              PosixPath('data/ToyDataset/downloads')
              

              You can also specify a custom directory name during initialization:

              [7]:
              
              dataset_paths_3 = pm.DatasetPaths(root='data/', downloads='new_downloads')
              dataset_3 = pm.Dataset('ToyDataset', path=dataset_paths_3)
              
              dataset_3.paths.downloads
              
              [7]:
              
              PosixPath('data/ToyDataset/new_downloads')
              

              By default, all archives are recursively extracted to Dataset.paths.raw:

              [8]:
              
              dataset.paths.raw
              
              [8]:
              
              PosixPath('data/ToyDataset/raw')
              

              If you want to remove the downloaded archives after extraction to save some space, you can set remove_finished to True:

              [9]:
              
              dataset.extract(remove_finished=True)
              
              Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
              
              100%|██████████| 23/23 [00:00<00:00, 307.23it/s]
              
              [9]:
              
              Dataset
              • DatasetDefinition
                DatasetDefinition
                • None
                  None
                • dict (0 items)
                  • dict (1 items)
                    • dict (4 items)
                      • list (5 items)
                        • 'timestamp'
                        • 'x'
                        • (3 more)
                      • dict (5 items)
                        • Float64
                          Float64
                        • Float64
                          Float64
                        • (3 more)
                      • (2 more)
                  • None
                    None
                  • Experiment
                    Experiment
                    • EyeTracker
                      EyeTracker
                      • None
                        None
                      • None
                        None
                      • None
                        None
                      • None
                        None
                      • 1000
                        1000
                      • None
                        None
                      • None
                        None
                    • 1000
                      1000
                    • Screen
                      Screen
                      • 68
                        68
                      • 30.2
                        30.2
                      • 1024
                        1024
                      • 'upper left'
                        'upper left'
                      • 38
                        38
                      • 1280
                        1280
                      • 15.599386487782953
                        15.599386487782953
                      • -15.599386487782953
                        -15.599386487782953
                      • 12.508044410882546
                        12.508044410882546
                      • -12.508044410882546
                        -12.508044410882546
                  • None
                    None
                  • dict (1 items)
                    • 'trial_{text_id:d}_{page_id:d}.csv'
                      'trial_{text_id:d}_{page_id:d}.csv'
                  • dict (1 items)
                    • dict (2 items)
                      • <class 'int'>
                        <class 'int'>
                      • <class 'int'>
                        <class 'int'>
                  • True
                    True
                  • 'pymovements Toy Dataset'
                    'pymovements Toy Dataset'
                  • dict (0 items)
                    • 'ToyDataset'
                      'ToyDataset'
                    • list (2 items)
                      • 'x'
                      • 'y'
                    • None
                      None
                    • list (1 items)
                      • ResourceDefinition
                        • 'gaze'
                          'gaze'
                        • 'pymovements-toy-dataset.zip'
                          'pymovements-toy-dataset.zip'
                        • 'trial_{text_id:d}_{page_id:d}.csv'
                          'trial_{text_id:d}_{page_id:d}.csv'
                        • dict (2 items)
                          • <class 'int'>
                            <class 'int'>
                          • <class 'int'>
                            <class 'int'>
                        • '4da622457637a8181d86601fe17f3aa8'
                          '4da622457637a8181d86601fe17f3aa8'
                        • str
                          'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
                    • 'timestamp'
                      'timestamp'
                    • 'ms'
                      'ms'
                    • None
                      None
                    • None
                      None
                  • list (0 items)
                    • DataFrame (0 columns, 0 rows)
                      shape: (0, 0)
                    • list (0 items)
                      • PosixPath('data/ToyDataset')
                        PosixPath('data/ToyDataset')
                      • DatasetPaths
                        DatasetPaths
                        • PosixPath('data/ToyDataset')
                          PosixPath('data/ToyDataset')
                        • PosixPath('data/ToyDataset/downloads')
                          PosixPath('data/ToyDataset/downloads')
                        • PosixPath('data/ToyDataset/events')
                          PosixPath('data/ToyDataset/events')
                        • PosixPath('data/ToyDataset/precomputed_events')
                          PosixPath('data/ToyDataset/precomputed_events')
                        • PosixPath
                          PosixPath('data/ToyDataset/precomputed_reading_measures')
                        • PosixPath('data/ToyDataset/preprocessed')
                          PosixPath('data/ToyDataset/preprocessed')
                        • PosixPath('data/ToyDataset/raw')
                          PosixPath('data/ToyDataset/raw')
                        • PosixPath('data')
                          PosixPath('data')
                      • list (0 items)
                        • list (0 items)

                          This is also available for the PublicDataset.download() method:

                          [10]:
                          
                          dataset.download(remove_finished=True)
                          
                          INFO:pymovements.dataset.dataset:
                                  You are downloading the pymovements Toy Dataset. Please be aware that pymovements does not
                                  host or distribute any dataset resources and only provides a convenient interface to
                                  download the public dataset resources that were published by their respective authors.
                          
                                  Please cite the referenced publication if you intend to use the dataset in your research.
                          
                          
                          Downloading http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/ to data/ToyDataset/downloads/pymovements-toy-dataset.zip
                          
                          Checking integrity of pymovements-toy-dataset.zip
                          Extracting pymovements-toy-dataset.zip to data/ToyDataset/raw
                          
                          100%|██████████| 23/23 [00:00<00:00, 305.97it/s]
                          
                          [10]:
                          
                          Dataset
                          • DatasetDefinition
                            DatasetDefinition
                            • None
                              None
                            • dict (0 items)
                              • dict (1 items)
                                • dict (4 items)
                                  • list (5 items)
                                    • 'timestamp'
                                    • 'x'
                                    • (3 more)
                                  • dict (5 items)
                                    • Float64
                                      Float64
                                    • Float64
                                      Float64
                                    • (3 more)
                                  • (2 more)
                              • None
                                None
                              • Experiment
                                Experiment
                                • EyeTracker
                                  EyeTracker
                                  • None
                                    None
                                  • None
                                    None
                                  • None
                                    None
                                  • None
                                    None
                                  • 1000
                                    1000
                                  • None
                                    None
                                  • None
                                    None
                                • 1000
                                  1000
                                • Screen
                                  Screen
                                  • 68
                                    68
                                  • 30.2
                                    30.2
                                  • 1024
                                    1024
                                  • 'upper left'
                                    'upper left'
                                  • 38
                                    38
                                  • 1280
                                    1280
                                  • 15.599386487782953
                                    15.599386487782953
                                  • -15.599386487782953
                                    -15.599386487782953
                                  • 12.508044410882546
                                    12.508044410882546
                                  • -12.508044410882546
                                    -12.508044410882546
                              • None
                                None
                              • dict (1 items)
                                • 'trial_{text_id:d}_{page_id:d}.csv'
                                  'trial_{text_id:d}_{page_id:d}.csv'
                              • dict (1 items)
                                • dict (2 items)
                                  • <class 'int'>
                                    <class 'int'>
                                  • <class 'int'>
                                    <class 'int'>
                              • True
                                True
                              • 'pymovements Toy Dataset'
                                'pymovements Toy Dataset'
                              • dict (0 items)
                                • 'ToyDataset'
                                  'ToyDataset'
                                • list (2 items)
                                  • 'x'
                                  • 'y'
                                • None
                                  None
                                • list (1 items)
                                  • ResourceDefinition
                                    • 'gaze'
                                      'gaze'
                                    • 'pymovements-toy-dataset.zip'
                                      'pymovements-toy-dataset.zip'
                                    • 'trial_{text_id:d}_{page_id:d}.csv'
                                      'trial_{text_id:d}_{page_id:d}.csv'
                                    • dict (2 items)
                                      • <class 'int'>
                                        <class 'int'>
                                      • <class 'int'>
                                        <class 'int'>
                                    • '4da622457637a8181d86601fe17f3aa8'
                                      '4da622457637a8181d86601fe17f3aa8'
                                    • str
                                      'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
                                • 'timestamp'
                                  'timestamp'
                                • 'ms'
                                  'ms'
                                • None
                                  None
                                • None
                                  None
                              • list (0 items)
                                • DataFrame (0 columns, 0 rows)
                                  shape: (0, 0)
                                • list (0 items)
                                  • PosixPath('data/ToyDataset')
                                    PosixPath('data/ToyDataset')
                                  • DatasetPaths
                                    DatasetPaths
                                    • PosixPath('data/ToyDataset')
                                      PosixPath('data/ToyDataset')
                                    • PosixPath('data/ToyDataset/downloads')
                                      PosixPath('data/ToyDataset/downloads')
                                    • PosixPath('data/ToyDataset/events')
                                      PosixPath('data/ToyDataset/events')
                                    • PosixPath('data/ToyDataset/precomputed_events')
                                      PosixPath('data/ToyDataset/precomputed_events')
                                    • PosixPath
                                      PosixPath('data/ToyDataset/precomputed_reading_measures')
                                    • PosixPath('data/ToyDataset/preprocessed')
                                      PosixPath('data/ToyDataset/preprocessed')
                                    • PosixPath('data/ToyDataset/raw')
                                      PosixPath('data/ToyDataset/raw')
                                    • PosixPath('data')
                                      PosixPath('data')
                                  • list (0 items)
                                    • list (0 items)

                                      Loading into memory#

                                      The PublicDataset class is a subset of the Dataset class and thus inherits all its functionality.

                                      Hende, we can load the data into our working memory by using the common load() method:

                                      [11]:
                                      
                                      dataset.load()
                                      
                                      [11]:
                                      
                                      Dataset
                                      • DatasetDefinition
                                        DatasetDefinition
                                        • None
                                          None
                                        • dict (0 items)
                                          • dict (1 items)
                                            • dict (4 items)
                                              • list (5 items)
                                                • 'timestamp'
                                                • 'x'
                                                • (3 more)
                                              • dict (5 items)
                                                • Float64
                                                  Float64
                                                • Float64
                                                  Float64
                                                • (3 more)
                                              • (2 more)
                                          • None
                                            None
                                          • Experiment
                                            Experiment
                                            • EyeTracker
                                              EyeTracker
                                              • None
                                                None
                                              • None
                                                None
                                              • None
                                                None
                                              • None
                                                None
                                              • 1000
                                                1000
                                              • None
                                                None
                                              • None
                                                None
                                            • 1000
                                              1000
                                            • Screen
                                              Screen
                                              • 68
                                                68
                                              • 30.2
                                                30.2
                                              • 1024
                                                1024
                                              • 'upper left'
                                                'upper left'
                                              • 38
                                                38
                                              • 1280
                                                1280
                                              • 15.599386487782953
                                                15.599386487782953
                                              • -15.599386487782953
                                                -15.599386487782953
                                              • 12.508044410882546
                                                12.508044410882546
                                              • -12.508044410882546
                                                -12.508044410882546
                                          • None
                                            None
                                          • dict (1 items)
                                            • 'trial_{text_id:d}_{page_id:d}.csv'
                                              'trial_{text_id:d}_{page_id:d}.csv'
                                          • dict (1 items)
                                            • dict (2 items)
                                              • <class 'int'>
                                                <class 'int'>
                                              • <class 'int'>
                                                <class 'int'>
                                          • True
                                            True
                                          • 'pymovements Toy Dataset'
                                            'pymovements Toy Dataset'
                                          • dict (0 items)
                                            • 'ToyDataset'
                                              'ToyDataset'
                                            • list (2 items)
                                              • 'x'
                                              • 'y'
                                            • None
                                              None
                                            • list (1 items)
                                              • ResourceDefinition
                                                • 'gaze'
                                                  'gaze'
                                                • 'pymovements-toy-dataset.zip'
                                                  'pymovements-toy-dataset.zip'
                                                • 'trial_{text_id:d}_{page_id:d}.csv'
                                                  'trial_{text_id:d}_{page_id:d}.csv'
                                                • dict (2 items)
                                                  • <class 'int'>
                                                    <class 'int'>
                                                  • <class 'int'>
                                                    <class 'int'>
                                                • '4da622457637a8181d86601fe17f3aa8'
                                                  '4da622457637a8181d86601fe17f3aa8'
                                                • str
                                                  'http://github.com/aeye-lab/pymovements-toy-dataset/zipball/6cb5d663317bf418cec0c9abe1dde5085a8a8ebd/'
                                            • 'timestamp'
                                              'timestamp'
                                            • 'ms'
                                              'ms'
                                            • None
                                              None
                                            • None
                                              None
                                          • list (0 items)
                                            • dict (1 items)
                                              • DataFrame (3 columns, 20 rows)
                                                shape: (20, 3)
                                                text_idpage_idfilepath
                                                i64i64str
                                                01"aeye-lab-pymovements-toy-datas…
                                                02"aeye-lab-pymovements-toy-datas…
                                                03"aeye-lab-pymovements-toy-datas…
                                                04"aeye-lab-pymovements-toy-datas…
                                                05"aeye-lab-pymovements-toy-datas…
                                                31"aeye-lab-pymovements-toy-datas…
                                                32"aeye-lab-pymovements-toy-datas…
                                                33"aeye-lab-pymovements-toy-datas…
                                                34"aeye-lab-pymovements-toy-datas…
                                                35"aeye-lab-pymovements-toy-datas…
                                            • list (20 items)
                                              • Gaze
                                                • DataFrame (6 columns, 17223 rows)
                                                  shape: (17_223, 6)
                                                  timestimuli_xstimuli_ytext_idpage_idpixel
                                                  i64f64f64i64i64list[f64]
                                                  1988145-1.0-1.001[206.8, 152.4]
                                                  1988146-1.0-1.001[206.9, 152.1]
                                                  1988147-1.0-1.001[207.0, 151.8]
                                                  1988148-1.0-1.001[207.1, 151.7]
                                                  1988149-1.0-1.001[207.0, 151.5]
                                                  2005363-1.0-1.001[361.0, 415.4]
                                                  2005364-1.0-1.001[358.0, 414.5]
                                                  2005365-1.0-1.001[355.8, 413.8]
                                                  2005366-1.0-1.001[353.1, 413.2]
                                                  2005367-1.0-1.001[351.2, 412.9]
                                                • Events
                                                  Events
                                                  • DataFrame (6 columns, 0 rows)
                                                    shape: (0, 6)
                                                    text_idpage_idnameonsetoffsetduration
                                                    i64i64stri64i64i64
                                                  • list (2 items)
                                                    • 'text_id'
                                                    • 'page_id'
                                                • list (2 items)
                                                  • 'text_id'
                                                  • 'page_id'
                                                • Experiment
                                                  Experiment
                                                  • EyeTracker
                                                    EyeTracker
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                    • 1000
                                                      1000
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                  • 1000
                                                    1000
                                                  • Screen
                                                    Screen
                                                    • 68
                                                      68
                                                    • 30.2
                                                      30.2
                                                    • 1024
                                                      1024
                                                    • 'upper left'
                                                      'upper left'
                                                    • 38
                                                      38
                                                    • 1280
                                                      1280
                                                    • 15.599386487782953
                                                      15.599386487782953
                                                    • -15.599386487782953
                                                      -15.599386487782953
                                                    • 12.508044410882546
                                                      12.508044410882546
                                                    • -12.508044410882546
                                                      -12.508044410882546
                                              • Gaze
                                                • DataFrame (6 columns, 29799 rows)
                                                  shape: (29_799, 6)
                                                  timestimuli_xstimuli_ytext_idpage_idpixel
                                                  i64f64f64i64i64list[f64]
                                                  2008305-1.0-1.002[141.4, 153.6]
                                                  2008306-1.0-1.002[141.1, 153.2]
                                                  2008307-1.0-1.002[140.7, 152.8]
                                                  2008308-1.0-1.002[140.6, 152.7]
                                                  2008309-1.0-1.002[140.5, 152.6]
                                                  2038099-1.0-1.002[273.8, 773.8]
                                                  2038100-1.0-1.002[273.8, 774.1]
                                                  2038101-1.0-1.002[273.9, 774.5]
                                                  2038102-1.0-1.002[274.0, 774.4]
                                                  2038103-1.0-1.002[274.0, 773.9]
                                                • Events
                                                  Events
                                                  • DataFrame (6 columns, 0 rows)
                                                    shape: (0, 6)
                                                    text_idpage_idnameonsetoffsetduration
                                                    i64i64stri64i64i64
                                                  • list (2 items)
                                                    • 'text_id'
                                                    • 'page_id'
                                                • list (2 items)
                                                  • 'text_id'
                                                  • 'page_id'
                                                • Experiment
                                                  Experiment
                                                  • EyeTracker
                                                    EyeTracker
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                    • 1000
                                                      1000
                                                    • None
                                                      None
                                                    • None
                                                      None
                                                  • 1000
                                                    1000
                                                  • Screen
                                                    Screen
                                                    • 68
                                                      68
                                                    • 30.2
                                                      30.2
                                                    • 1024
                                                      1024
                                                    • 'upper left'
                                                      'upper left'
                                                    • 38
                                                      38
                                                    • 1280
                                                      1280
                                                    • 15.599386487782953
                                                      15.599386487782953
                                                    • -15.599386487782953
                                                      -15.599386487782953
                                                    • 12.508044410882546
                                                      12.508044410882546
                                                    • -12.508044410882546
                                                      -12.508044410882546
                                              • (18 more)
                                            • PosixPath('data/ToyDataset')
                                              PosixPath('data/ToyDataset')
                                            • DatasetPaths
                                              DatasetPaths
                                              • PosixPath('data/ToyDataset')
                                                PosixPath('data/ToyDataset')
                                              • PosixPath('data/ToyDataset/downloads')
                                                PosixPath('data/ToyDataset/downloads')
                                              • PosixPath('data/ToyDataset/events')
                                                PosixPath('data/ToyDataset/events')
                                              • PosixPath('data/ToyDataset/precomputed_events')
                                                PosixPath('data/ToyDataset/precomputed_events')
                                              • PosixPath
                                                PosixPath('data/ToyDataset/precomputed_reading_measures')
                                              • PosixPath('data/ToyDataset/preprocessed')
                                                PosixPath('data/ToyDataset/preprocessed')
                                              • PosixPath('data/ToyDataset/raw')
                                                PosixPath('data/ToyDataset/raw')
                                              • PosixPath('data')
                                                PosixPath('data')
                                            • list (0 items)
                                              • list (0 items)

                                                Let’s verify that we have correctly scanned the dataset files:

                                                [12]:
                                                
                                                dataset.fileinfo
                                                
                                                [12]:
                                                
                                                {'gaze': shape: (20, 3)
                                                 ┌─────────┬─────────┬─────────────────────────────────┐
                                                 │ text_id ┆ page_id ┆ filepath                        │
                                                 │ ---     ┆ ---     ┆ ---                             │
                                                 │ i64     ┆ i64     ┆ str                             │
                                                 ╞═════════╪═════════╪═════════════════════════════════╡
                                                 │ 0       ┆ 1       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 0       ┆ 2       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 0       ┆ 3       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 0       ┆ 4       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 0       ┆ 5       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ …       ┆ …       ┆ …                               │
                                                 │ 3       ┆ 1       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 3       ┆ 2       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 3       ┆ 3       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 3       ┆ 4       ┆ aeye-lab-pymovements-toy-datas… │
                                                 │ 3       ┆ 5       ┆ aeye-lab-pymovements-toy-datas… │
                                                 └─────────┴─────────┴─────────────────────────────────┘}
                                                

                                                Wonderful, all of our data has been downloaded and loaded in successfully!

                                                What you have learned in this tutorial:#

                                                • how to initialize a public dataset

                                                • how to download and extract dataset resources

                                                • how to customize the default directory structure

                                                • how to load the dataset into your working memory