Hind, J (2019) RaSaR: A Novel Methodology for the Detection of Epistasis. Doctoral thesis, Liverpool John Moores University.
|
Text
CMP Hind, Jade 621777 PhD 21_05 (1).pdf - Published Version Download (5MB) | Preview |
Abstract
Complex diseases which affect a large proportion of our population today demand more strategic methods to produce significant association results. As it currently stands there are numerous disorders and diseases which are yet to be identified with a genetic causal variant despite evidence produced by research efforts which indicate the existence of high genetic concordance. Breast Cancer is one of the most prominent cancers in the female population with approximately 55K new cases each year in the UK and approximately 11K deaths. The genetic component of Breast Cancer is a popular research area and has uncovered many genetic associations from high to low penetrance. The dataset used within this research is obtained from the DRIVE project, one of five introduced under the GAME-ON initiative. The general research use DRIVE dataset contains approximately 533K single-nucleotide polymorphisms (SNPs), with more than 280K sequenced with reference to the 5 most prominent cancers; colon, breast, ovarian, prostate and lung. SNP’s are sequenced for approximately 28K subjects, of which approximately 14K were diagnosed with one of three stages of Breast Cancer; unknown, in-situ and invasive. Epistasis is a progressive approach that complements the ‘common disease, common variant’ hypothesis that highlights the potential for connected networks of genetic variants collaborating to produce a phenotypic expression. Epistasis is commonly performed as a pairwise or limitless-arity capacity that considers variant networks as either variant vs variant or as high order interactions. This type of analysis extends the number of tests that were previously performed in a standard approach such as GWAS, in which FDR was already an issue, therefore by multiplying the number of tests up to a factorial rate also increases the issue of FDR. Further to this, epistasis introduces its own limitations of computational complexity that are generated based on the analysis performed; to consider the most intense approach, a multivariate analysis introduces a time complexity of ( !) On . Throughout this thesis, approaches, methods and techniques for epistasis analysis and GWAS are discussed, as well as the limitations that exist and how to address these issues. Proposed in this thesis is a novel methodology, methodology and methods for the detection of epistasis using interpretable methods and best practice to outline interactions through filtering processes. RaSaR refers to process of Random Sampling Regularisation which randomly splits and produces sample sets to conduct a voting system to regularise the significance and reliability of biological markers, SNPs. Parallel to this, the proposed methodology takes into consideration and adjusts for the common limitations of computational complexity and false discovery using filter selection and a novel method to association analysis. Preliminary results are promising, outlining a concise detection of interactions using benchmarking standard approaches that consider the common approaches to multiple testing. Results for the detection of epistasis, in the classification of breast cancer patients, indicated nine outlined risk candidate interactions from five variants and a singular candidate variant with high protective association.
Item Type: | Thesis (Doctoral) |
---|---|
Uncontrolled Keywords: | Genomics; RaSaR; Breast Cancer; SNPs; GWAS; Epistasis |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science R Medicine > R Medicine (General) |
Divisions: | Computer Science & Mathematics |
Date Deposited: | 14 Jun 2019 12:10 |
Last Modified: | 08 Nov 2022 14:47 |
DOI or ID number: | 10.24377/LJMU.t.00010883 |
Supervisors: | Lisboa, P, Hussain, A and Al-Jumeily, D |
URI: | https://researchonline.ljmu.ac.uk/id/eprint/10883 |
View Item |