The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting

Schroeder, F; Fairclough, S; Dehais, F; Richins, M

The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting

Export Citation

Schroeder, F, Fairclough, S ORCID: 0000-0002-7850-5688, Dehais, F and Richins, M (2025) The impact of cross-validation choices on pBCI classification metrics: lessons for transparent reporting. Frontiers in Neuroergonomics, 6.

[thumbnail of The impact of cross validation choices on pBCI classification metrics lessons for transparent reporting.pdf]

Preview

Text
The impact of cross validation choices on pBCI classification metrics lessons for transparent reporting.pdf - Published Version
Available under License Creative Commons Attribution.
Download (3MB) | Preview

Publisher URL: https://doi.org/10.3389/fnrgo.2025.1582724

Abstract

Neuroadaptive technologies are a type of passive Brain-computer interface (pBCI) that aim to incorporate implicit user-state information into human-machine interactions by monitoring neurophysiological signals. Evaluating machine learning and signal processing approaches represents a core aspect of research into neuroadaptive technologies. These evaluations are often conducted under controlled laboratory settings and offline, where exhaustive analyses are possible. However, the manner in which classifiers are evaluated offline has been shown to impact reported accuracy levels, possibly biasing conclusions. In the current study, we investigated one of these sources of bias, the choice of cross-validation scheme, which is often not reported in sufficient detail. Across three independent electroencephalography (EEG) n-back datasets and 74 participants, we show how metrics and conclusions based on the same data can diverge with different cross-validation choices. A comparison of cross-validation schemes in which train and test subset boundaries either respect the block-structure of the data collection or not, illustrated how the relative performance of classifiers varies significantly with the evaluation method used. By computing bootstrapped 95% confidence intervals of differences across datasets, we showed that classification accuracies of Riemannian minimum distance (RMDM) classifiers may differ by up to 12.7% while those of a Filter Bank Common Spatial Pattern (FBCSP) based linear discriminant analysis (LDA) may differ by up to 30.4%. These differences across cross-validation implementations may impact the conclusions presented in research papers, which can complicate efforts to foster reproducibility. Our results exemplify why detailed reporting on data splitting procedures should become common practice.

Item Type:	Article
Uncontrolled Keywords:	EEG; cross-validation; electroencephalography; non-stationarity; pBCI; passive Brain-Computer Interfaces; workload; 46 Information and Computing Sciences; 4608 Human-Centred Computing; Networking and Information Technology R&D (NITRD)
Subjects:	B Philosophy. Psychology. Religion > BF Psychology
Divisions:	Psychology (from Sep 2019)
Publisher:	Frontiers Media
Date of acceptance:	28 May 2025
Date of first compliant Open Access:	28 August 2025
Date Deposited:	28 Aug 2025 10:06
Last Modified:	28 Aug 2025 10:15
DOI or ID number:	10.3389/fnrgo.2025.1582724
URI:	https://researchonline.ljmu.ac.uk/id/eprint/27010

View Item

CORE (COnnecting REpositories)