Ruiz, H (2013) Fisher networks: A principled approach to retrieval-based classification. Doctoral thesis, Liverpool John Moores University.
Text
157307_2013ruizphd.pdf - Published Version Download (7MB) |
Abstract
Due to the technological advances in the acquisition and processing of information, current data mining applications involve databases of sizes that would be unthinkable just two decades ago. However, real-word datasets are often riddled with irrelevant variables that not only do not generate any meaningful information about the process of interest, but may also obstruct the contribution of the truly informative data features. Taking into consideration the relevance of the different measures available can make the difference between reaching an accurate reflection of the underlying truth and obtaining misleading results that cause the drawing of erroneousconclusions.
Another important consideration in data analysis is the interpretability of the models used to fit the data. It is clear that performance must be a key aspect in deciding which methodology to use, but it should not be the only one. Models with an obscure internal operation see their practical usefulness effectively diminished by the difficulty to understand the reasoning behind their inferences, which makes them less appealing to users that are not familiar with their theoretical basis.
This thesis proposes a novel framework for the visualisation and categorisation of data in classification contexts that tackles the two issues discussed above and provides an informative output of intuitive interpretation. The system is based on a Fisher information metric that automatically filters the contribution of variables depending on their relevance with respect to the classification problem at hand, measured by their influence on the posterior class probabilities.
Fisher distances can then be used to calculate rigorous problem-specific similarity measures, which can be grouped into a pairwise adjacency matrix, thus defining a network. Following this novel construction process results in a principled visualisation of the data organised in communities that highlights the structure of the underlying class membership probabilities. Furthermore, the relational nature of the network can be used to reproduce the probabilistic predictions of the original estimates in a case-based approach, making them explainable by means of known cases in the dataset.
The potential applications and usefulness of the framework are illustrated using several real-world datasets, giving examples of the typical output that the end user receives and how they can use it to learn more about the cases of interest as well as about the dataset as a whole.
Item Type: | Thesis (Doctoral) |
---|---|
Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Applied Mathematics (merged with Comp Sci 10 Aug 20) |
Date Deposited: | 27 Oct 2016 13:20 |
Last Modified: | 03 Sep 2021 23:26 |
DOI or ID number: | 10.24377/LJMU.t.00004371 |
Supervisors: | Lisboa, P, Jarman, I and Martin-Guerrero, JD |
URI: | https://researchonline.ljmu.ac.uk/id/eprint/4371 |
View Item |