RepTrace does not download large public datasets automatically and does not
store staged data in git. Stage datasets outside version control under
data/, then validate the benchmark manifest before running decoding.
The repository tracks only code, documentation, and lightweight benchmark
manifests. .gitignore excludes data/, results/, and large local data
formats such as *.fif, *.h5, and *.hdf5.
The first recommended public benchmark is NOD-EEG because it is smaller than NOD-MEG and provides preprocessed epoch data.
ds005811ds005810For each subject, RepTrace expects:
sub-01_epo.fif;sub-01_events.csv;The NOD records document detailed trial information under
derivatives/detailed_events/sub-subID_events.csv. In NOD-EEG, epoch data are
stored in the preprocessed epochs derivative area and named like
sub-01_eeg_epo.fif; stage them under the simpler local filenames shown below.
For a small first pilot, download only one subject’s epoch file and detailed
events file from OpenNeuro ds005811. The openneuro-py client supports
include filters, which avoids downloading the full dataset snapshot:
python -m pip install openneuro-py
openneuro-py download \
--dataset=ds005811 \
--target-dir=data/openneuro_ds005811_sub01 \
--include="derivatives/preprocessed/epochs/sub-01_eeg_epo.fif" \
--include="derivatives/detailed_events/sub-01_events.csv" \
--max-concurrent-downloads=2 \
--metadata-timeout=60
Then stage the files under the local layout expected by
benchmarks/nod_animate_sub01.csv:
mkdir -p data/nod
cp data/openneuro_ds005811_sub01/derivatives/preprocessed/epochs/sub-01_eeg_epo.fif \
data/nod/sub-01_epo.fif
cp data/openneuro_ds005811_sub01/derivatives/detailed_events/sub-01_events.csv \
data/nod/sub-01_events.csv
On Windows PowerShell, the staging commands are:
New-Item -ItemType Directory -Force -Path data\nod
Copy-Item data\openneuro_ds005811_sub01\derivatives\preprocessed\epochs\sub-01_eeg_epo.fif `
data\nod\sub-01_epo.fif
Copy-Item data\openneuro_ds005811_sub01\derivatives\detailed_events\sub-01_events.csv `
data\nod\sub-01_events.csv
The expected staged file sizes for the sub-01 pilot are about 215 MB for the epoch file and less than 1 MB for the events CSV.
For a 5-subject pilot, the same commands can be repeated in a loop:
for sid in 01 02 03 04 05; do
openneuro-py download \
--dataset=ds005811 \
--target-dir="data/openneuro_ds005811_sub${sid}" \
--include="derivatives/preprocessed/epochs/sub-${sid}_eeg_epo.fif" \
--include="derivatives/detailed_events/sub-${sid}_events.csv" \
--max-concurrent-downloads=2 \
--metadata-timeout=60
mkdir -p data/nod
cp "data/openneuro_ds005811_sub${sid}/derivatives/preprocessed/epochs/sub-${sid}_eeg_epo.fif" \
"data/nod/sub-${sid}_epo.fif"
cp "data/openneuro_ds005811_sub${sid}/derivatives/detailed_events/sub-${sid}_events.csv" \
"data/nod/sub-${sid}_events.csv"
done
PowerShell equivalent:
foreach ($sid in 1..5) {
$s = "{0:D2}" -f $sid
& openneuro-py download `
--dataset=ds005811 `
--target-dir="data/openneuro_ds005811_sub$s" `
--include="derivatives/preprocessed/epochs/sub-$s`_eeg_epo.fif" `
--include="derivatives/detailed_events/sub-$s`_events.csv" `
--max-concurrent-downloads=2 `
--metadata-timeout=60
New-Item -ItemType Directory -Force -Path data\nod | Out-Null
Copy-Item "data\openneuro_ds005811_sub$s\derivatives\preprocessed\epochs\sub-$s`_eeg_epo.fif" `
"data\nod\sub-$s`_epo.fif"
Copy-Item "data\openneuro_ds005811_sub$s\derivatives\detailed_events\sub-$s`_events.csv" `
"data\nod\sub-$s`_events.csv"
}
The available NOD-EEG preprocessed epoch subjects are:
sub-01 sub-02 sub-03 sub-04 sub-05 sub-06 sub-07 sub-08 sub-09
sub-10 sub-11 sub-12 sub-13 sub-14 sub-24 sub-26 sub-27 sub-29 sub-30
After staging those 19 subjects, the expected data/nod/ footprint is about
3 GB: the first nine epoch files are about 215 MB each, and the remaining ten
epoch files are about 108 MB each.
For IPS-Stuttgart runners, the private staged NOD-EEG share can be copied with
rclone over OwnCloud/WebDAV. The workflows use the repository secrets
NODEEG_WEBDAV_URL, NODEEG_DATA_KEY, and NODEEG_DATA_PASSWORD; local runs can
use the same environment variable names:
python scripts/download_nodeeg_data.py \
--data-root data/nod \
--require-subject-count 19
If the WebDAV share stores the staged files below a subdirectory, pass it with
--remote-path, for example:
python scripts/download_nodeeg_data.py \
--data-root data/nod \
--remote-path staged/nod \
--require-subject-count 19
The downloader copies only the flat files expected by RepTrace:
sub-*_epo.fif and sub-*_events.csv.
OpenNeuro’s browser download, the Deno OpenNeuro CLI, or DataLad/git-annex can also be used. The Deno CLI command downloads a dataset snapshot:
deno run -A jsr:@openneuro/cli download ds005811 data/openneuro_ds005811
After a full download, copy the same two files from the snapshot into
data/nod/.
Example target layout:
data/nod/sub-01_epo.fif
data/nod/sub-01_events.csv
data/nod/sub-02_epo.fif
data/nod/sub-02_events.csv
data/nod/sub-03_epo.fif
data/nod/sub-03_events.csv
data/nod/sub-04_epo.fif
data/nod/sub-04_events.csv
data/nod/sub-05_epo.fif
data/nod/sub-05_events.csv
data/nod/sub-06_epo.fif
data/nod/sub-06_events.csv
...
data/nod/sub-30_epo.fif
data/nod/sub-30_events.csv
Then update benchmarks/nod_animate_sub01.csv if the staged filenames differ.
Run:
python -m reptrace.validate_manifest \
benchmarks/nod_animate_sub01.csv \
--report-out results/nod_animate_sub01_validation.csv
Only run the benchmark after validation passes:
python -m reptrace.benchmark \
benchmarks/nod_animate_sub01.csv \
--out-dir results/nod_animate_sub01 \
--aggregate-out results/nod_animate_sub01_summary.csv \
--plot-out results/nod_animate_sub01_summary.png \
--chance 0.5