RepTrace

Data Staging

RepTrace does not download large public datasets automatically and does not store staged data in git. Stage datasets outside version control under data/, then validate the benchmark manifest before running decoding.

The repository tracks only code, documentation, and lightweight benchmark manifests. .gitignore excludes data/, results/, and large local data formats such as *.fif, *.h5, and *.hdf5.

NOD-EEG and NOD-MEG

The first recommended public benchmark is NOD-EEG because it is smaller than NOD-MEG and provides preprocessed epoch data.

For each subject, RepTrace expects:

The NOD records document detailed trial information under derivatives/detailed_events/sub-subID_events.csv. In NOD-EEG, epoch data are stored in the preprocessed epochs derivative area and named like sub-01_eeg_epo.fif; stage them under the simpler local filenames shown below.

For a small first pilot, download only one subject’s epoch file and detailed events file from OpenNeuro ds005811. The openneuro-py client supports include filters, which avoids downloading the full dataset snapshot:

python -m pip install openneuro-py

openneuro-py download \
  --dataset=ds005811 \
  --target-dir=data/openneuro_ds005811_sub01 \
  --include="derivatives/preprocessed/epochs/sub-01_eeg_epo.fif" \
  --include="derivatives/detailed_events/sub-01_events.csv" \
  --max-concurrent-downloads=2 \
  --metadata-timeout=60

Then stage the files under the local layout expected by benchmarks/nod_animate_sub01.csv:

mkdir -p data/nod
cp data/openneuro_ds005811_sub01/derivatives/preprocessed/epochs/sub-01_eeg_epo.fif \
  data/nod/sub-01_epo.fif
cp data/openneuro_ds005811_sub01/derivatives/detailed_events/sub-01_events.csv \
  data/nod/sub-01_events.csv

On Windows PowerShell, the staging commands are:

New-Item -ItemType Directory -Force -Path data\nod
Copy-Item data\openneuro_ds005811_sub01\derivatives\preprocessed\epochs\sub-01_eeg_epo.fif `
  data\nod\sub-01_epo.fif
Copy-Item data\openneuro_ds005811_sub01\derivatives\detailed_events\sub-01_events.csv `
  data\nod\sub-01_events.csv

The expected staged file sizes for the sub-01 pilot are about 215 MB for the epoch file and less than 1 MB for the events CSV.

For a 5-subject pilot, the same commands can be repeated in a loop:

for sid in 01 02 03 04 05; do
  openneuro-py download \
    --dataset=ds005811 \
    --target-dir="data/openneuro_ds005811_sub${sid}" \
    --include="derivatives/preprocessed/epochs/sub-${sid}_eeg_epo.fif" \
    --include="derivatives/detailed_events/sub-${sid}_events.csv" \
    --max-concurrent-downloads=2 \
    --metadata-timeout=60

  mkdir -p data/nod
  cp "data/openneuro_ds005811_sub${sid}/derivatives/preprocessed/epochs/sub-${sid}_eeg_epo.fif" \
    "data/nod/sub-${sid}_epo.fif"
  cp "data/openneuro_ds005811_sub${sid}/derivatives/detailed_events/sub-${sid}_events.csv" \
    "data/nod/sub-${sid}_events.csv"
done

PowerShell equivalent:

foreach ($sid in 1..5) {
  $s = "{0:D2}" -f $sid
  & openneuro-py download `
    --dataset=ds005811 `
    --target-dir="data/openneuro_ds005811_sub$s" `
    --include="derivatives/preprocessed/epochs/sub-$s`_eeg_epo.fif" `
    --include="derivatives/detailed_events/sub-$s`_events.csv" `
    --max-concurrent-downloads=2 `
    --metadata-timeout=60

  New-Item -ItemType Directory -Force -Path data\nod | Out-Null
  Copy-Item "data\openneuro_ds005811_sub$s\derivatives\preprocessed\epochs\sub-$s`_eeg_epo.fif" `
    "data\nod\sub-$s`_epo.fif"
  Copy-Item "data\openneuro_ds005811_sub$s\derivatives\detailed_events\sub-$s`_events.csv" `
    "data\nod\sub-$s`_events.csv"
}

The available NOD-EEG preprocessed epoch subjects are:

sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09
sub-10 sub-11 sub-12 sub-13 sub-14 sub-24 sub-26 sub-27 sub-29 sub-30

After staging those 19 subjects, the expected data/nod/ footprint is about 3 GB: the first nine epoch files are about 215 MB each, and the remaining ten epoch files are about 108 MB each.

Alternative Download Methods

For IPS-Stuttgart runners, the private staged NOD-EEG share can be copied with rclone over OwnCloud/WebDAV. The workflows use the repository secrets NODEEG_WEBDAV_URL, NODEEG_DATA_KEY, and NODEEG_DATA_PASSWORD; local runs can use the same environment variable names:

python scripts/download_nodeeg_data.py \
  --data-root data/nod \
  --require-subject-count 19

If the WebDAV share stores the staged files below a subdirectory, pass it with --remote-path, for example:

python scripts/download_nodeeg_data.py \
  --data-root data/nod \
  --remote-path staged/nod \
  --require-subject-count 19

The downloader copies only the flat files expected by RepTrace: sub-*_epo.fif and sub-*_events.csv.

OpenNeuro’s browser download, the Deno OpenNeuro CLI, or DataLad/git-annex can also be used. The Deno CLI command downloads a dataset snapshot:

deno run -A jsr:@openneuro/cli download ds005811 data/openneuro_ds005811

After a full download, copy the same two files from the snapshot into data/nod/.

Example target layout:

data/nod/sub-01_epo.fif
data/nod/sub-01_events.csv
data/nod/sub-02_epo.fif
data/nod/sub-02_events.csv
data/nod/sub-03_epo.fif
data/nod/sub-03_events.csv
data/nod/sub-04_epo.fif
data/nod/sub-04_events.csv
data/nod/sub-05_epo.fif
data/nod/sub-05_events.csv
data/nod/sub-06_epo.fif
data/nod/sub-06_events.csv
...
data/nod/sub-30_epo.fif
data/nod/sub-30_events.csv

Then update benchmarks/nod_animate_sub01.csv if the staged filenames differ.

Validate Before Decoding

Run:

python -m reptrace.validate_manifest \
  benchmarks/nod_animate_sub01.csv \
  --report-out results/nod_animate_sub01_validation.csv

Only run the benchmark after validation passes:

python -m reptrace.benchmark \
  benchmarks/nod_animate_sub01.csv \
  --out-dir results/nod_animate_sub01 \
  --aggregate-out results/nod_animate_sub01_summary.csv \
  --plot-out results/nod_animate_sub01_summary.png \
  --chance 0.5