An Automated Pipeline for Variability Detection and Classification for the Small Telescopes Installed at the Liverpool Telescope

McWhirter, PR

An Automated Pipeline for Variability Detection and Classification for the Small Telescopes Installed at the Liverpool Telescope

Export Citation

McWhirter, PR (2018) An Automated Pipeline for Variability Detection and Classification for the Small Telescopes Installed at the Liverpool Telescope. Doctoral thesis, Liverpool John Moores University.

Preview	Text 2018mcwhirterphd.pdf - Published Version Download (7MB) \| Preview
	Text 2018mcwhirterphdinternal.pdf - Submitted Version Access Restricted Download (7MB)

Abstract

The Small Telescopes at the Liverpool Telescope (STILT) is an almost decade old project to install a number of wide field optical instruments to the Liverpool Telescope, named Skycams, to monitor weather conditions and yield useful photometry on bright astronomical sources. The motivation behind this thesis is the development of algorithms and techniques which can automatically exploit the data generated during the first 1200 days of Skycam operation to catalogue variable sources in the La Palma sky. A previously developed pipeline reduces the Skycam images and produces photometric time-series data named light curves of millions of objects. 590,492 of these objects have 100 or more data points of sufficient quality to attempt a variability analysis. The large volume and relatively high noise of this data necessitated the use of Machine Learning and sophisticated optimisation techniques to successfully extract this information. The Skycam instruments have no control over the orientation and pointing of the Liverpool Telescope and therefore resample areas of the sky highly irregularly. The term used for this resampling in astronomy is ‘cadence’. The unusually irregular Skycam cadence places increased strain on the algorithms designed for the detection of periodicity in light curves. This thesis details the development of a period estimation method based on a novel implementation of a genetic algorithm combined with a generational clustering method. Named GRAPE (Genetic Routine for Astronomical Period Estimation), this algorithm deconstructs the space of possible periods for a light curve into regions in which the genetic population clusters. These regions are then fine-tuned using a k-means clustering algorithm to return a set of independent period candidates which are then analysed using a Vuong closeness test to discriminate between aliased and true periods. This thesis demonstrates the capability of GRAPE on a set of synthetic light curves built using traditional regular cadence sampling and Skycam style cadence for four different shapes of periodic light curve. The performance of GRAPE on these light curves is compared to a more traditional periodogram which returns a set of peaks and is then analysed using Vuong closeness tests. GRAPE obtains similar performance compared to the periodogram on all the light curve shapes but with less computational complexity allowing for more efficient light curve analysis. Automated classification of variable light curves has been explored over the last decade. Multiple features have been engineered to identify patterns in the light curves of different classes of variable star. Within the last few years deep learning has come to prominence as a method of automatically generating informative representations of the data for the solution of a desired problem, such as a classification task. A set of models using Random Forests, Support Vector Machines and Neural Networks were trained using a set of variable Skycam light curves of five classes. Using 16 features engineered from previous methods an Area under the Curve (AUC) of 0.8495 was obtained. Replacing these features with inputs from the pixel intensities from a 100 by 20 pixel image representation, produced an AUC of 0.6348, which improved to 0.7952 when provided with additional context to the dimensionality of the image. Despite the inferior performance, the importance of the different pixels produced relations in the trained models demonstrating that they had produced features based on well-understood patterns in the different classes of light curve. Using features produced by Richards et al. and Kim & Bailer-Jones et al., a set of features to train machine learning classification models was constructed. In addition to this set of features, a semi-supervised set of novel features was designed to describe the shape of light curves phased around the GRAPE candidate period. This thesis investigates the performance of the PolyFit algorithm of Prsa et al., a technique to fit four piecewise polynomials with discontinuous knots capable of connecting across the phase boundary at phases of zero and one. This method was designed to fit eclipsing binary phased light curves however were also described to be fully capable on other variable star types. The optimisation method used by PolyFit is replaced by a novel genetic algorithm optimisation routine to fit the model to Skycam data with substantial improvement in performance. The PolyFit model is applied to the candidate period and twice this period for every classified light curve. This interpolation produces novel features which describe similar statistics to the previously developed methods but which appear significantly more resilient to the Skycam noise and are often preferred by the trained models. In addition, Principal Component Analysis (PCA) is used to investigate a set of 6897 variable light curves and discover that the first ten principal components are sufficient to describe 95\% of the variance of the fitted models. This trained PCA model is retained and used to generate twenty novel shape features. Whilst these features are not dominant in their importance to the learned models, they have above average importance and help distinguish some objects in the light curve classification task. The second principal component in particular is an important feature in the discrimination of short period pulsating and eclipsing variables as it appears to be an automatically learned robust skewness measure. The method described in this thesis produces 112 features of the Skycam light curves, 38 variability indices which are quickly obtainable and 74 which require the computation of a candidate period using GRAPE. A number of machine learning classifiers are investigated to produce high-performance models for the detection and classification of variable light curves from the Skycam dataset. A Random Forest classifier uses a training set of 859 light curves of 12 object classes to produce a classifier with a multi-class F1 score of 0.533. It would be computationally infeasible to produce all the features for every Skycam light curve, therefore an automated pipeline has been developed which combines a Skycam trend removal pipeline, GRAPE and our machine learned classifiers. It initialises with a set of Skycam light curves from objects cross-matched from the American Association of Variable Star Observers (AAVSO) Variable Star Index (VSI), one of the most comprehensive catalogues of variable stars available. The learned models classify the full 112 features generated for these cross-matched light curves and confident matches are selected to produce a training set for a binary variability detection model. This model utilises only the 38 variability indices to identify variable light curves rapidly without the use of GRAPE. This variability model, trained using a random forest classifier, obtains an F1 score of 0.702. Applying this model to the 590,492 Skycam light curves yields 103,790 variable candidates of which 51,129 candidates have been classified and are available for further analysis.

Item Type:	Thesis (Doctoral)
Uncontrolled Keywords:	Astronomical Databases; Observational methods; Data Analysis; Variable Stars; Eclipsing Binary; Liverpool Telescope; Astronomical Time-series; Light Curves; Period Estimation; Machine Learning; Classification; Random Forests; Feature Extraction
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QB Astronomy
Divisions:	Computer Science and Mathematics
Date of first compliant Open Access:	15 October 2018
Date Deposited:	15 Oct 2018 08:34
Last Modified:	08 Nov 2022 13:23
DOI or ID number:	10.24377/researchonline.ljmu.ac.uk.00009479
Supervisors:	Al-Jumeily, D, Steele, I and Hussain, A
URI:	https://researchonline.ljmu.ac.uk/id/eprint/9479

View Item

CORE (COnnecting REpositories)