Dataset specs
NeuRepTrace dataset specs move study-specific path, participant, and role conventions out of Python modules and into a versioned YAML or JSON file. Loader code still performs interpretation: MATLAB parsing, array indexing, metadata joins, feature extraction, MNE object creation, and shape validation remain Python responsibilities.
The intended boundary is:
- the spec describes where files are and how those files map to logical roles;
- loader adapters read those files and normalize them to NeuRepTrace inputs;
- decoding, calibration, temporal generalization, observations, and result tables stay in NeuRepTrace workflows.
Example
schema_version: 1
dataset:
id: pymegdec_meg
root: "${PYMEGDEC_DATA_DIR}"
format: matlab_struct
participants:
ids: ["1-4", 6, 8, 9, 10, "13-27"]
files:
main: "Part{participant}Data.mat"
cue: "Part{participant}CueData.mat"
roles:
train:
file_role: main
validation:
file_role: cue
matlab:
variable: data
index_path: [0]
features:
data_field: trial
time_field: time
label_field: trialinfo
output: trials_channels_time
Use participant_number in file templates when numeric formatting is required:
participants:
ids: ["01-03"]
files:
epochs: "sub-{participant_number:02d}_epo.fif"
Validation
Validate a spec and the files it resolves to:
neureptrace dataset-spec validate examples/pymegdec/dataset.yml
Validate only the schema and path templates, without requiring private data files to exist:
neureptrace dataset-spec validate examples/pymegdec/dataset.yml --no-check-exists
Write a CSV validation report:
neureptrace dataset-spec validate examples/pymegdec/dataset.yml --report-out dataset_validation.csv
Inspect the concrete participant-file table:
neureptrace dataset-spec list-files examples/pymegdec/dataset.yml
neureptrace dataset-spec list-files examples/pymegdec/dataset.yml --format json
The standalone entry point is equivalent:
neureptrace-dataset-spec validate examples/pymegdec/dataset.yml
Current schema
The current schema version is 1.
Required top-level sections:
schema_version: currently1.dataset: containsid,root, andformat.participants: containsidsandfiles.
Optional top-level sections:
roles: maps logical roles such astrainandvalidationto entries inparticipants.files. When omitted, every file role becomes its own logical role.matlab: MATLAB loader options. The included MATLAB adapter supportsvariable,index_path,squeeze_first_element,squeeze_me, andstruct_as_record.features,decoding, andoutputs: declarative workflow hints for downstream adapters and scripts.
Participant IDs may be integers, strings, compact ranges such as "1-4", or mapping forms such as {range: ["08", "10"]} and {id: "sub-01"}.