Reilly, D, Taylor, M, Fergus, P, Chalmers, C and Thompson, S (2022) The Categorical Data Conundrum: Heuristics for Classification Problems A Case Study on Domestic Fire Injuries. IEEE Access, 10. pp. 70113-70125.
|
Text
The_Categorical_Data_Conundrum_Heuristics_for_Classification_ProblemsA_Case_Study_on_Domestic_Fire_Injuries (1).pdf - Published Version Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Machine learning is well developed amongst the scientific community in terms of theoretical foundations (statistics and algorithms) and frameworks (Tensorflow, PyTorch, H2O). However, machine learning is heavily focused on numerical data, or numerical data mixed with some categorical data. For numerical datasets, scientists and engineers can enjoy reasonable success with only a limited knowledge of theoretical foundations and the inner workings of machine learning frameworks. However, it is a different story when dealing with purely categorical datasets, which require a deeper understanding of machine learning frameworks and associated encodings and algorithms in order to achieve success. This paper addresses the issues in handling purely categorical datasets for multi-classification problems and provides a set of heuristics for dealing with purely categorical data. In particular, issues such as pre-processing, feature encoding and algorithm selection are considered. The heuristics are then demonstrated through a case study, based on a categorical data set of domestic fire injuries, covering a 10-year period. Novel contributions are made through the heuristics and the performance analysis of different encoding techniques. The case study itself also makes a novel contribution through the classification of different types of injuries, based on related features.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | 08 Information and Computing Sciences; 09 Engineering; 10 Technology |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Computer Science & Mathematics |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
SWORD Depositor: | A Symplectic |
Date Deposited: | 07 Oct 2022 09:16 |
Last Modified: | 07 Oct 2022 09:30 |
DOI or ID number: | 10.1109/ACCESS.2022.3187287 |
URI: | https://researchonline.ljmu.ac.uk/id/eprint/17726 |
View Item |