Facial reconstruction

Search LJMU Research Online

Browse Repository | Browse E-Theses

The Categorical Data Conundrum: Heuristics for Classification Problems A Case Study on Domestic Fire Injuries

Reilly, D, Taylor, M, Fergus, P, Chalmers, C and Thompson, S (2022) The Categorical Data Conundrum: Heuristics for Classification Problems A Case Study on Domestic Fire Injuries. IEEE Access, 10. pp. 70113-70125.

The_Categorical_Data_Conundrum_Heuristics_for_Classification_ProblemsA_Case_Study_on_Domestic_Fire_Injuries (1).pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview


Machine learning is well developed amongst the scientific community in terms of theoretical foundations (statistics and algorithms) and frameworks (Tensorflow, PyTorch, H2O). However, machine learning is heavily focused on numerical data, or numerical data mixed with some categorical data. For numerical datasets, scientists and engineers can enjoy reasonable success with only a limited knowledge of theoretical foundations and the inner workings of machine learning frameworks. However, it is a different story when dealing with purely categorical datasets, which require a deeper understanding of machine learning frameworks and associated encodings and algorithms in order to achieve success. This paper addresses the issues in handling purely categorical datasets for multi-classification problems and provides a set of heuristics for dealing with purely categorical data. In particular, issues such as pre-processing, feature encoding and algorithm selection are considered. The heuristics are then demonstrated through a case study, based on a categorical data set of domestic fire injuries, covering a 10-year period. Novel contributions are made through the heuristics and the performance analysis of different encoding techniques. The case study itself also makes a novel contribution through the classification of different types of injuries, based on related features.

Item Type: Article
Uncontrolled Keywords: 08 Information and Computing Sciences; 09 Engineering; 10 Technology
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Computer Science & Mathematics
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
SWORD Depositor: A Symplectic
Date Deposited: 07 Oct 2022 09:16
Last Modified: 07 Oct 2022 09:30
DOI or ID number: 10.1109/ACCESS.2022.3187287
URI: https://researchonline.ljmu.ac.uk/id/eprint/17726
View Item View Item