Optimized HDBSCAN clustering for reconstructing the merger history of the Milky Way: applications and limitations

Sante, A, Font, AS orcid iconORCID: 0000-0001-8405-9883, Mistry, D orcid iconORCID: 0000-0001-8300-109X, Ortega-Martorell, S orcid iconORCID: 0000-0001-9927-3209 and Olier, I orcid iconORCID: 0000-0002-5679-7501 (2026) Optimized HDBSCAN clustering for reconstructing the merger history of the Milky Way: applications and limitations. Monthly Notices of the Royal Astronomical Society, 547 (4). ISSN 0035-8711

[thumbnail of stag503.pdf]
Preview
Text
stag503.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

Clustering algorithms can help reconstruct the assembly history of the Milky Way by identifying groups of stars sharing similar properties in a kinematical or chemical abundance space. Despite being promising tools, their efficiency has not yet been fully tested in a realistic cosmological framework. We investigate the effectiveness of the HDBSCAN clustering algorithm in the recovery of the progenitors of Milky Way-type galaxies, using several systems from the Auriga suite of simulations. We develop a methodology aimed at improving the efficiency of the algorithm and avoiding fragmentation: first, we use a 12-dimensional feature space including a range of chemodynamical properties and stellar ages; furthermore, we optimize the algorithm using information from the internal structure of the clusters of accreted stars. We show that our approach yields good results in terms of both purity and completeness of clusters for galaxies with different types of accretion histories. We also evaluate the decrease in efficiency due to contamination by in situ stars. While for accreted-only haloes the algorithm matches well the recovered clusters with the individual progenitors and is able to recover accretion events up to a redshift of accretion zacc ∼ 3, for accreted + in situ haloes it can only identify the more recent accretion events (zacc < 1). However, the purity of the identified clusters remains remarkably high even in this case. Our results suggest that HDBSCAN can efficiently identify accreted debris in Milky Way-type galaxies in realistic conditions, however, it requires careful optimization to provide valid results.

Item Type: Article
Uncontrolled Keywords: software: machine learning; software: simulations; Galaxy: halo; Galaxy: stellar content; 5101 Astronomical Sciences; 51 Physical Sciences; 0201 Astronomical and Space Sciences; Astronomy & Astrophysics; 5101 Astronomical sciences; 5107 Particle and high energy physics; 5109 Space sciences
Subjects: Q Science > QA Mathematics > QA76 Computer software
Q Science > QB Astronomy
Q Science > QC Physics
Divisions: Astrophysics Research Institute
Computer Science and Mathematics
Publisher: Oxford University Press
Date of acceptance: 11 March 2026
Date of first compliant Open Access: 29 April 2026
Date Deposited: 29 Apr 2026 13:45
Last Modified: 29 Apr 2026 13:45
DOI or ID number: 10.1093/mnras/stag503
URI: https://researchonline.ljmu.ac.uk/id/eprint/28492
View Item View Item