Benchmarking
The first intended public benchmark is NOD-MEG/NOD-EEG. NeuRepTrace does not download large public datasets automatically; stage the relevant subject epochs and metadata locally first.
NOD Animate/Inanimate Pilot
Use the NOD preprocessed epochs file and the matching detailed events CSV. If
the metadata already contains stim_is_animate, pass it directly to
neureptrace.mne_time_decode or derive a named condition column first:
python -m neureptrace.metadata \
--events-csv data/nod/sub-01_events.csv \
--source-column stim_is_animate \
--positive-pattern "True" \
--label-column condition \
--positive-label animate \
--negative-label inanimate \
--out data/nod/sub-01_metadata_animate.csv
Then run the decoder:
python -m neureptrace.mne_time_decode \
--epochs data/nod/sub-01_epo.fif \
--metadata-csv data/nod/sub-01_metadata_animate.csv \
--label-column condition \
--group-column session \
--tmin -0.1 \
--tmax 0.8 \
--window-ms 20 \
--step-ms 10 \
--out results/nod_sub-01_animate.csv \
--observations-out results/nod_sub-01_animate_observations.csv \
--emission-mode both
The output CSV contains fold-wise accuracy, log loss, Brier score, and expected calibration error for each time window.
For probability-driven model selection, tune inside each outer training fold with a proper or calibration-oriented probability objective instead of accuracy:
python -m neureptrace.mne_time_decode ... --tune-hyperparameters --tuning-scoring neg_log_loss
python -m neureptrace.mne_time_decode ... --tune-hyperparameters --tuning-scoring neg_brier
The optional observations CSV keeps the held-out decoder probabilities before
they are reduced to accuracy or calibration summaries. Each row is one
trial/time-window observation with the fold, time, sample index, true class,
predicted class, confidence, probability assigned to the true class, and one
prob_class_* column per class. This is the output to use when treating decoder
traces as probabilistic evidence streams for HMMs or other temporal state
models.
Fit the first conservative temporal model from those observations:
python -m neureptrace.temporal_model \
results/nod_sub-01_animate_observations.csv \
--out-summary results/nod_sub-01_animate_temporal_model.csv \
--out-states results/nod_sub-01_animate_state_trace.csv \
--effect-window 0.1 0.8 \
--baseline-window -0.1 0.0 \
--n-permutations 100
This command treats each trial as a probability time series. The hidden states are the decoder classes, and the fitted parameter is the probability that the latent state persists between adjacent time bins. The summary reports persistence gain relative to a uniform-memory baseline, plus controls that shuffle time order, shuffle probability-label columns, and fit the same model in the pre-stimulus baseline window. The optional state trace CSV contains Viterbi states and posterior state probabilities for downstream sequence analyses.
When the observations contain both calibrated and uncalibrated emissions, compare which emission mode gives cleaner state inference:
python -m neureptrace.emission_compare \
results/nod_sub-01_animate_temporal_model.csv \
--out-csv results/nod_sub-01_animate_emission_compare.csv \
--out-report results/nod_sub-01_animate_emission_compare.md
The comparison uses the temporal model's persistence gain. Its main value is the control margin: observed effect-window gain minus the strongest baseline-window, shuffled-time, or shuffled-label control gain. A positive calibrated-minus- uncalibrated margin is evidence that calibrated probabilities give cleaner state inference, not merely nicer reliability plots.
Ask the first NOD neuroscience question from the state traces:
python -m neureptrace.semantic_stages \
results/nod_sub-01_animate_state_trace.csv \
--out-time results/nod_sub-01_animate_semantic_stage_time.csv \
--out-stages results/nod_sub-01_animate_semantic_stages.csv \
--out-report results/nod_sub-01_animate_semantic_stages.md \
--posterior-threshold 0.6 \
--match-threshold 0.6 \
--min-duration 0.04
This asks whether the decoded semantic category for each trial becomes a stable latent state over contiguous time ranges. For NOD this corresponds to category staging, such as animate or inanimate evidence emerging over a post-stimulus interval. For navigation or planning data, use the same output shape with spatial bins or task states in place of semantic classes; the resulting stable stages become candidate trajectory segments that still need the temporal-model controls above.
Plot the single-subject result:
python -m neureptrace.plot_time_decode \
results/nod_sub-01_animate.csv \
--chance 0.5 \
--title "NOD sub-01 animate/inanimate" \
--out results/nod_sub-01_animate.png
After running several subjects, aggregate across subjects:
python -m neureptrace.results \
results/nod_sub-01_animate.csv \
results/nod_sub-02_animate.csv \
results/nod_sub-03_animate.csv \
--out results/nod_animate_summary.csv
Then plot the aggregate:
python -m neureptrace.plot_time_decode \
results/nod_animate_summary.csv \
--chance 0.5 \
--title "NOD animate/inanimate summary" \
--out results/nod_animate_summary.png
Manifest Runner
The same workflow can be run from a manifest:
python -m neureptrace.validate_manifest \
benchmarks/nod_animate_sub01.csv \
--report-out results/nod_animate_sub01_validation.csv
python -m neureptrace.benchmark \
benchmarks/nod_animate_sub01.csv \
--out-dir results/nod_animate_sub01 \
--aggregate-out results/nod_animate_sub01_summary.csv \
--plot-out results/nod_animate_sub01_summary.png \
--observation-dir results/nod_animate_sub01/observations \
--emission-mode both \
--chance 0.5
Manifest paths are resolved relative to the manifest file. The example manifest
expects staged files under data/nod/.
Five-Subject Pilot
For a paper-ready first pass, use the same animate/inanimate task and run five subjects at once from a single manifest:
python -m neureptrace.validate_manifest \
benchmarks/nod_animate_first5.csv \
--report-out results/nod_animate_first5_validation.csv
python -m neureptrace.benchmark \
benchmarks/nod_animate_first5.csv \
--out-dir results/nod_animate_first5 \
--aggregate-out results/nod_animate_first5_summary.csv \
--plot-out results/nod_animate_first5_summary.png \
--chance 0.5
This keeps the experiment scope fixed (same preprocessing, same target labels, same window/grid parameters) and changes only the subject set.
Generate a compact Markdown report from the aggregate and subject-level result CSVs:
python -m neureptrace.report \
results/nod_animate_first5/summary.csv \
"results/nod_animate_first5/sub-*_time_decode.csv" \
--chance 0.5 \
--out results/nod_animate_first5/report.md
The report records the aggregate peak, baseline-window accuracy, effect-window accuracy, calibration metrics at the peak, and per-subject peaks.
Full NOD-EEG Pilot
After staging all available NOD-EEG preprocessed epoch files and detailed event files, validate the 19-subject manifest. This manifest uses 2 grouped folds because several subjects have only 2 unique session groups:
python -m neureptrace.validate_manifest \
benchmarks/nod_animate_all.csv \
--report-out results/nod_animate_all_validation.csv
Then run the same animate/inanimate benchmark over every staged NOD-EEG subject:
python -m neureptrace.benchmark \
benchmarks/nod_animate_all.csv \
--out-dir results/nod_animate_all \
--aggregate-out results/nod_animate_all/summary.csv \
--plot-out results/nod_animate_all/nod_animate_all_summary.png \
--chance 0.5
Make calibration explicit in the benchmark report:
python -m neureptrace.calibration \
results/nod_animate_all/summary.csv \
--out-report results/nod_animate_all/calibration_report.md
The calibration report orders models by effect-window ECE, then Brier score and log loss. Accuracy is included as context, but the report is designed to keep probability quality visible rather than treating it as a secondary metric.
Run subject-level inference on the resulting subject CSVs:
python -m neureptrace.inference \
"results/nod_animate_all/sub-*_time_decode.csv" \
--chance 0.5 \
--n-permutations 10000 \
--cluster-alpha 0.05 \
--out-time results/nod_animate_all/inference_time.csv \
--out-clusters results/nod_animate_all/inference_clusters.csv
The inference command first averages folds within each subject, then runs a one-sided subject-level sign-flip test against chance at each time point. It also reports max-cluster-mass corrected p-values for contiguous above-threshold periods.
This larger run is the minimum useful scale for subject-level statistical testing. The 5-subject pilot is useful for smoke testing and early signal checking; reported claims should use the full staged manifest.
Second NOD-EEG Task
Use benchmarks/nod_superclass_canine_device_all.csv for a second public task
within the same staged NOD-EEG data. This task decodes ImageNet superclass
labels canine versus device, using only trials whose super_class exactly
matches one of those labels. The full staged set contains 7,293 canine trials
and 6,950 device trials across 19 subjects.
python -m neureptrace.validate_manifest \
benchmarks/nod_superclass_canine_device_all.csv \
--report-out results/nod_superclass_canine_device_all_validation.csv
python -m neureptrace.benchmark \
benchmarks/nod_superclass_canine_device_all.csv \
--out-dir results/nod_superclass_canine_device_all \
--aggregate-out results/nod_superclass_canine_device_all/summary.csv \
--plot-out results/nod_superclass_canine_device_all/summary.png \
--calibration-dir results/nod_superclass_canine_device_all/calibration \
--chance 0.5
python -m neureptrace.report \
results/nod_superclass_canine_device_all/summary.csv \
"results/nod_superclass_canine_device_all/sub-*_time_decode.csv" \
--chance 0.5 \
--out results/nod_superclass_canine_device_all/report.md
python -m neureptrace.inference \
"results/nod_superclass_canine_device_all/sub-*_time_decode.csv" \
--chance 0.5 \
--n-permutations 10000 \
--cluster-alpha 0.05 \
--out-time results/nod_superclass_canine_device_all/inference_time.csv \
--out-clusters results/nod_superclass_canine_device_all/inference_clusters.csv
python -m neureptrace.calibration \
results/nod_superclass_canine_device_all/summary.csv \
"results/nod_superclass_canine_device_all/calibration/*_calibration_bins.csv" \
--out-report results/nod_superclass_canine_device_all/calibration_report.md \
--out-bins results/nod_superclass_canine_device_all/reliability_bins.csv
This gives the paper a second semantic benchmark without changing dataset, preprocessing, CV logic, or reporting machinery.
Next NOD-EEG Task
Use benchmarks/nod_superclass_container_covering_all.csv for the next staged
task. This contrast decodes ImageNet superclass labels container versus
covering, using only trials whose super_class exactly matches one of those
labels. Both labels are inanimate, so this task tests category decoding beyond
the animate/inanimate distinction. The full staged set contains 5,215 container
trials and 4,809 covering trials across all 19 subjects.
python -m neureptrace.validate_manifest \
benchmarks/nod_superclass_container_covering_all.csv \
--report-out results/nod_superclass_container_covering_all_validation.csv
python -m neureptrace.benchmark \
benchmarks/nod_superclass_container_covering_all.csv \
--out-dir results/nod_superclass_container_covering_all \
--aggregate-out results/nod_superclass_container_covering_all/summary.csv \
--plot-out results/nod_superclass_container_covering_all/summary.png \
--calibration-dir results/nod_superclass_container_covering_all/calibration \
--chance 0.5 \
--resume
After the benchmark finishes, generate the same report, inference, calibration, and reliability outputs as for the canine/device superclass task:
python -m neureptrace.report \
results/nod_superclass_container_covering_all/summary.csv \
"results/nod_superclass_container_covering_all/sub-*_time_decode.csv" \
--out results/nod_superclass_container_covering_all/report.md \
--chance 0.5
python -m neureptrace.inference \
"results/nod_superclass_container_covering_all/sub-*_time_decode.csv" \
--chance 0.5 \
--n-permutations 10000 \
--out-time results/nod_superclass_container_covering_all/inference_time.csv \
--out-clusters results/nod_superclass_container_covering_all/inference_clusters.csv
python -m neureptrace.calibration \
results/nod_superclass_container_covering_all/summary.csv \
"results/nod_superclass_container_covering_all/calibration/*_calibration_bins.csv" \
--out-report results/nod_superclass_container_covering_all/calibration_report.md \
--out-bins results/nod_superclass_container_covering_all/reliability_bins.csv
python -m neureptrace.plot_calibration \
results/nod_superclass_container_covering_all/reliability_bins.csv \
--out results/nod_superclass_container_covering_all/reliability.png \
--time-window 0.1 0.8 \
--title "NOD container/covering calibration"
Decoder Comparison
NeuRepTrace supports standard probability-producing decoders with the decoder
manifest column or --decoder CLI option:
logistic: balanced multinomial logistic regression;sparse_logistic: L1-regularized balanced logistic regression with the SAGA solver;elastic_net_logistic: balanced logistic regression with SAGA elastic-net regularization;ridge: calibrated balanced ridge classifier;gaussian_nb: Gaussian naive Bayes;lda: linear discriminant analysis;shrinkage_lda: LDA with LSQR covariance shrinkage estimated inside each training fold;linear_svm: calibrated balanced linear support vector machine.
Run the first-five-subject decoder comparison:
python -m neureptrace.benchmark \
benchmarks/nod_animate_decoders_first5.csv \
--out-dir results/nod_animate_decoders_first5 \
--aggregate-out results/nod_animate_decoders_first5/summary.csv \
--plot-out results/nod_animate_decoders_first5/summary.png \
--calibration-dir results/nod_animate_decoders_first5/calibration \
--chance 0.5
When a manifest contains a decoder column, result files are named like
sub-01_logistic_time_decode.csv, and aggregate summaries preserve the
decoder column rather than averaging decoders together.
Manifests can also pin fold-local feature preprocessing and nested decoder
tuning. The relevant columns are feature_preprocessor, pca_components,
tune_hyperparameters, tuning_cv_splits, tuning_scoring, and
tuning_c_grid. Supported feature preprocessors are none, pca,
pca-whiten, and anova-select. For anova-select, pca_components is the
percentage of highest-scoring ANOVA F-test features kept inside each training
fold. These settings are preserved in aggregate summaries, so tuned and
untuned variants are never averaged together accidentally.
Each benchmark also writes provenance.csv. This table has one row per run
condition and records decoder, emission mode, PCA mode/components, tuning grid,
compact selected-parameter counts, temporal mode/train window, and selected
plus effect-window accuracy, log_loss, brier, and ece values. Use it to
check whether an apparent gain changes accuracy as well as calibration metrics.
Run the tuned PCA-whitened logistic variant over all 19 staged subjects:
python -m neureptrace.benchmark \
benchmarks/nod_animate_logistic_tuned_pca_whiten_all.csv \
--out-dir results/nod_animate_logistic_tuned_pca_whiten_all \
--aggregate-out results/nod_animate_logistic_tuned_pca_whiten_all/summary.csv \
--plot-out results/nod_animate_logistic_tuned_pca_whiten_all/summary.png \
--calibration-dir results/nod_animate_logistic_tuned_pca_whiten_all/calibration \
--chance 0.5 \
--resume
That manifest uses feature_preprocessor=pca-whiten, pca_components=0.95,
tune_hyperparameters=true, a 2-fold inner CV, balanced-accuracy scoring, and
the C grid 0.01,0.1,1,10,100. PCA whitening and C tuning are fitted only on
the training split for each outer fold.
Run the tuned ANOVA feature-selection logistic variant over all 19 staged subjects:
python -m neureptrace.benchmark \
benchmarks/nod_animate_logistic_tuned_anova_select_all.csv \
--out-dir results/nod_animate_logistic_tuned_anova_select_all \
--aggregate-out results/nod_animate_logistic_tuned_anova_select_all/summary.csv \
--plot-out results/nod_animate_logistic_tuned_anova_select_all/summary.png \
--calibration-dir results/nod_animate_logistic_tuned_anova_select_all/calibration \
--chance 0.5 \
--resume
That manifest uses anova-select with an initial 20 percent setting, then
tunes both the selected feature percentile (10,20,40,60) and logistic C with
2-fold inner CV. This tests whether supervised fold-local denoising helps the
main animate/inanimate task without changing the outer held-out folds.
Run an explicit shrinkage-LDA variant over all 19 staged subjects:
python -m neureptrace.benchmark \
benchmarks/nod_animate_shrinkage_lda_all.csv \
--out-dir results/nod_animate_shrinkage_lda_all \
--aggregate-out results/nod_animate_shrinkage_lda_all/summary.csv \
--plot-out results/nod_animate_shrinkage_lda_all/summary.png \
--calibration-dir results/nod_animate_shrinkage_lda_all/calibration \
--chance 0.5 \
--resume
shrinkage_lda uses LinearDiscriminantAnalysis(solver="lsqr",
shrinkage="auto"). This is still fitted independently in each outer training
fold, but it regularizes covariance estimates that can be unstable in short
high-dimensional MEG windows.
Run the elastic-net logistic variant when dense logistic regression may be using too many weak noisy features but pure feature selection would be too aggressive:
python -m neureptrace.benchmark \
benchmarks/nod_animate_elastic_net_logistic_all.csv \
--out-dir results/nod_animate_elastic_net_logistic_all \
--aggregate-out results/nod_animate_elastic_net_logistic_all/summary.csv \
--plot-out results/nod_animate_elastic_net_logistic_all/summary.png \
--calibration-dir results/nod_animate_elastic_net_logistic_all/calibration \
--chance 0.5 \
--resume
The untuned manifest uses a fixed 50/50 L1/L2 mix. If
tune_hyperparameters=true is enabled for elastic_net_logistic, nested CV
searches both the C grid and the L1/L2 mixing grid 0.15,0.5,0.85.
Run the ridge classifier variant over all 19 staged subjects:
python -m neureptrace.benchmark \
benchmarks/nod_animate_ridge_all.csv \
--out-dir results/nod_animate_ridge_all \
--aggregate-out results/nod_animate_ridge_all/summary.csv \
--plot-out results/nod_animate_ridge_all/summary.png \
--calibration-dir results/nod_animate_ridge_all/calibration \
--chance 0.5 \
--resume
ridge uses a balanced RidgeClassifier with sigmoid calibration by default.
When tune_hyperparameters=true, nested CV searches alpha values
0.01,0.1,1,10,100.
Run the Gaussian naive Bayes variant over all 19 staged subjects:
python -m neureptrace.benchmark \
benchmarks/nod_animate_gaussian_nb_all.csv \
--out-dir results/nod_animate_gaussian_nb_all \
--aggregate-out results/nod_animate_gaussian_nb_all/summary.csv \
--plot-out results/nod_animate_gaussian_nb_all/summary.png \
--calibration-dir results/nod_animate_gaussian_nb_all/calibration \
--chance 0.5 \
--resume
gaussian_nb estimates class-conditional feature distributions independently.
When tune_hyperparameters=true, nested CV searches variance smoothing values
1e-12,1e-10,1e-9,1e-8,1e-6.
Run the slower tuned temporal train-window ensemble:
python -m neureptrace.benchmark \
benchmarks/nod_animate_logistic_tuned_temporal_ensemble_all.csv \
--out-dir results/nod_animate_logistic_tuned_temporal_ensemble_all \
--aggregate-out results/nod_animate_logistic_tuned_temporal_ensemble_all/summary.csv \
--plot-out results/nod_animate_logistic_tuned_temporal_ensemble_all/summary.png \
--calibration-dir results/nod_animate_logistic_tuned_temporal_ensemble_all/calibration \
--chance 0.5 \
--resume
This manifest combines --temporal-train-window 0.12 0.25 with
--tune-hyperparameters. For each outer fold, every model in the temporal
train-window ensemble is fitted on the outer training split and tunes C with
inner CV before its probabilities are averaged across train-window centers.
Compare raw decoder probabilities with temporal posterior smoothing without changing the decoder:
python -m neureptrace.benchmark \
benchmarks/nod_animate_logistic_temporal_smoothing_all.csv \
--out-dir results/nod_animate_logistic_temporal_smoothing_all \
--aggregate-out results/nod_animate_logistic_temporal_smoothing_all/summary.csv \
--plot-out results/nod_animate_logistic_temporal_smoothing_all/summary.png \
--calibration-dir results/nod_animate_logistic_temporal_smoothing_all/calibration \
--temporal-smoothing-dir results/nod_animate_logistic_temporal_smoothing_all/temporal_smoothing \
--temporal-smoothing-fit-window 0.1 0.8 \
--chance 0.5 \
--resume
When --temporal-smoothing-dir is supplied, the benchmark runner exports the
held-out probability observations, fits sticky forward-backward smoothing on
those same observations, writes smoothed posterior metrics, and aggregates raw
and smoothed rows into the same summary. The comparison appears as
emission_mode=calibrated versus
emission_mode=calibrated_temporal_posterior in both summary.csv and
provenance.csv.
Run the sparse logistic variant when dense logistic regression may be using too many noisy sensor-time features:
python -m neureptrace.benchmark \
benchmarks/nod_animate_sparse_logistic_all.csv \
--out-dir results/nod_animate_sparse_logistic_all \
--aggregate-out results/nod_animate_sparse_logistic_all/summary.csv \
--plot-out results/nod_animate_sparse_logistic_all/summary.png \
--calibration-dir results/nod_animate_sparse_logistic_all/calibration \
--chance 0.5 \
--resume
Generate a decoder comparison report:
python -m neureptrace.report \
results/nod_animate_decoders_first5/summary.csv \
--out results/nod_animate_decoders_first5/report.md
For decoder comparisons, the report includes both raw effect-window accuracy and effect minus baseline-window accuracy. The baseline-corrected value is the more relevant reported number when a decoder shows pre-stimulus bias.
Create a calibration-aware decoder report and aggregate reliability bins:
python -m neureptrace.calibration \
results/nod_animate_decoders_first5/summary.csv \
"results/nod_animate_decoders_first5/calibration/*_calibration_bins.csv" \
--out-report results/nod_animate_decoders_first5/calibration_report.md \
--out-bins results/nod_animate_decoders_first5/reliability_bins.csv
Plot an effect-window reliability diagram from aggregate reliability bins:
python -m neureptrace.plot_calibration \
results/nod_animate_decoders_first5/reliability_bins.csv \
--out results/nod_animate_decoders_first5/reliability.png \
--time-window 0.1 0.8 \
--title "NOD animate/inanimate decoder calibration"
Run paired subject-level decoder statistics:
python -m neureptrace.paired_stats \
"results/nod_animate_decoders_first5/sub-*_time_decode.csv" \
--out-csv results/nod_animate_decoders_first5/paired_stats.csv \
--out-report results/nod_animate_decoders_first5/paired_stats.md \
--chance 0.5
Run the full 19-subject decoder comparison on a self-hosted GitHub Actions runner:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_decoders_all.csv \
-f output_dir=results/nod_animate_decoders_all \
-f n_permutations=10000
The workflow rewrites the committed manifest to use the supplied data_root,
runs logistic regression, LDA, and calibrated linear SVM across all 19 staged
NOD-EEG subjects, then uploads only compact summary, calibration, and inference
artifacts. The benchmark step uses --resume, so a rerun in the same output
directory skips completed subject-decoder rows whose result and calibration-bin
CSVs already exist. Use an absolute data_root when the self-hosted runner
keeps the NOD files outside the repository workspace.
The same workflow can run the tuned PCA-whitened logistic manifest:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_logistic_tuned_pca_whiten_all.csv \
-f output_dir=results/nod_animate_logistic_tuned_pca_whiten_all \
-f n_permutations=10000
Or run the tuned ANOVA feature-selection logistic manifest:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_logistic_tuned_anova_select_all.csv \
-f output_dir=results/nod_animate_logistic_tuned_anova_select_all \
-f n_permutations=10000
Or run the tuned temporal train-window ensemble:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_logistic_tuned_temporal_ensemble_all.csv \
-f output_dir=results/nod_animate_logistic_tuned_temporal_ensemble_all \
-f n_permutations=10000
Or run the shrinkage-LDA manifest:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_shrinkage_lda_all.csv \
-f output_dir=results/nod_animate_shrinkage_lda_all \
-f n_permutations=10000
Or run the elastic-net logistic manifest:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_elastic_net_logistic_all.csv \
-f output_dir=results/nod_animate_elastic_net_logistic_all \
-f n_permutations=10000
Or run the ridge manifest:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_ridge_all.csv \
-f output_dir=results/nod_animate_ridge_all \
-f n_permutations=10000
Or run the Gaussian naive Bayes manifest:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_gaussian_nb_all.csv \
-f output_dir=results/nod_animate_gaussian_nb_all \
-f n_permutations=10000
Or run the raw-versus-smoothed posterior comparison:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_logistic_temporal_smoothing_all.csv \
-f output_dir=results/nod_animate_logistic_temporal_smoothing_all \
-f temporal_smoothing=true \
-f temporal_smoothing_fit_window_start=0.1 \
-f temporal_smoothing_fit_window_stop=0.8 \
-f temporal_smoothing_stay_grid_size=200 \
-f n_permutations=10000
Or run the sparse logistic L1 decoder:
gh workflow run nod-decoder-all.yml \
--repo IPS-Stuttgart/NeuRepTrace \
--ref main \
-f data_root=../data/nod \
-f manifest_csv=benchmarks/nod_animate_sparse_logistic_all.csv \
-f output_dir=results/nod_animate_sparse_logistic_all \
-f n_permutations=10000
After downloading or locating the workflow output directory, export only paper-safe artifacts into the compact export directory:
python scripts/export_paper_results.py \
results/nod_animate_decoders_all \
../NeuRepTrace-Paper/results/nod_animate_decoders_all \
--max-mb 50 \
--plot-reliability \
--reliability-window 0.1 0.8
Acceptance Target
The first useful milestone is not just above-chance accuracy. The benchmark should produce stable probability traces and calibration metrics that can be compared across subjects, sessions, and decoder variants.
For interrupted runs, rerun the same command with --resume. NeuRepTrace will keep
complete existing rows, regenerate missing rows, and rebuild the aggregate
summary and plot from the combined output set.
When --observation-dir is requested, resume mode also requires the matching
subject observation CSV before skipping a manifest row. This prevents a run from
appearing complete when metric summaries exist but the probability traces needed
for downstream state modeling are missing.
After a manifest run with --observation-dir, fit the same temporal model across
all exported subject observations:
python -m neureptrace.temporal_model \
"results/nod_animate_all/observations/*_observations.csv" \
--out-summary results/nod_animate_all/temporal_model.csv \
--out-states results/nod_animate_all/state_trace.csv \
--n-permutations 100
Compare calibrated versus uncalibrated emissions:
python -m neureptrace.emission_compare \
results/nod_animate_all/temporal_model.csv \
--out-csv results/nod_animate_all/emission_compare.csv \
--out-report results/nod_animate_all/emission_compare.md
Then summarize category-conditioned stages:
python -m neureptrace.semantic_stages \
results/nod_animate_all/state_trace.csv \
--out-time results/nod_animate_all/semantic_stage_time.csv \
--out-stages results/nod_animate_all/semantic_stages.csv \
--out-report results/nod_animate_all/semantic_stages.md
Calibration-Aware Temporal-State Workflow
Use neureptrace.temporal_state_workflow to run the calibration-aware temporal-state pass
across the three staged NOD tasks: animate/inanimate, canine/device, and
container/covering. The workflow prepares runner-local manifests, validates all
19 NOD-EEG subjects, runs matched calibrated and uncalibrated emissions in the
same folds, fits sticky temporal models, compares controls, summarizes semantic
stages, and writes compact artifacts for the compact export directory.
python -m neureptrace.temporal_state_workflow \
--data-root data/nod \
--out-dir results/temporal_state_inference \
--compact-export-dir ../NeuRepTrace-Compact-Results/results/temporal_state_inference \
--decoders logistic linear_svm \
--n-permutations 100
The top-level outputs are temporal_state_summary.csv, temporal_state_reliability.png,
temporal_state_evidence.md, and temporal_state_commands.md. The compact export intentionally
excludes large probability observation and state-trace CSVs.
For a smoke test, run one task and one subject with fewer permutations:
python -m neureptrace.temporal_state_workflow \
--data-root data/nod \
--out-dir results/temporal_state_smoke \
--task nod_animate \
--decoders linear_svm \
--max-subjects 1 \
--n-permutations 5 \
--stay-grid-size 20