Position paper: advocating for a structured methodology in developing data-driven predictive models for healthcare – evidence from a large-scale national study

Agius, S, Cassar, V, Magri, C, Khan, W orcid iconORCID: 0000-0002-7511-3873 and Topham, L orcid iconORCID: 0000-0002-6689-7944 (2025) Position paper: advocating for a structured methodology in developing data-driven predictive models for healthcare – evidence from a large-scale national study. Health and Technology. ISSN 2190-7188

[thumbnail of Position paper advocating for a structured methodology.pdf]
Preview
Text
Position paper advocating for a structured methodology.pdf - Published Version
Available under License Creative Commons Attribution.

Download (750kB) | Preview

Abstract

Background Despite the growing adoption of predictive models in healthcare, the development process is often inconsistent and lacks methodological rigour. Many models are created ad hoc, without transparent handling of missing data, proper validation, or alignment with clinical workflows. These shortcomings have undermined trust, reproducibility, and generalisability, especially in high-stakes environments like emergency care. Objectives This position paper aims to advocate for the adoption of structured, transparent, and reproducible methodologies in the development of predictive models for healthcare. Drawing on a large-scale national study of emergency department (ED) visits in Malta, the paper demonstrates that methodological discipline, guided by data science principles, clinical expertise and an understanding of human decision-making behaviour leads to safer, more trustworthy, and clinically relevant models. Methods Using over 32 million data points from 650,000 ED visits across six years, the study employed a structured modelling pipeline that integrated clinical and administrative data sources. The methodology included Cognitive Task Analysis (CTA) to map triage decision-making, rigorous feature engineering based on clinical workflows, handling of missing data through informed strategies, and robust model validation using XGBoost with stratified cross-validation and calibration analysis. Importantly, domain experts were involved throughout the development lifecycle to ensure clinical relevance and interpretability. Results The structured methodology enabled the development of predictive models that reflected the real-world complexity of ED triage, achieved strong performance, and gained clinician acceptance. The models aligned with staged clinical decision-making and were interpretable, trustworthy, and feasible to scale across healthcare environments. Through transparent documentation, robust calibration, and post-deployment monitoring protocols, the models demonstrated readiness for clinical integration. Conclusions The study confirms that structured, domain-informed methodologies are not only feasible at scale but essential for the responsible deployment of predictive models in healthcare. This approach ensures safety, fosters trust, promotes reproducibility and increases the likelihood that the model is used and adopted in real clinical settings. The authors call on researchers, developers, and regulators to establish such methodologies as the standard for AI and data-driven approaches in healthcare, particularly in high-stakes applications where poor model performance can lead to clinical harm.

Item Type: Article
Uncontrolled Keywords: 08 Information and Computing Sciences; 11 Medical and Health Sciences; 4202 Epidemiology; 4203 Health services and systems
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine
Divisions: Computer Science and Mathematics
Publisher: Springer
Date of acceptance: 31 July 2025
Date of first compliant Open Access: 18 August 2025
Date Deposited: 18 Aug 2025 10:25
Last Modified: 18 Aug 2025 10:30
DOI or ID number: 10.1007/s12553-025-01010-5
URI: https://researchonline.ljmu.ac.uk/id/eprint/26943
View Item View Item