Bhih, A and Johnson, P and Randles, M (2015) EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set. International Journal of Engineering Research & Technology (IJERT), 4 (1). pp. 553-557. ISSN 2278-0181
V4I1-IJERTV4IS010563.pdf - Published Version
Available under License Creative Commons Attribution.
Data mining is one of the long known research topics, which is making a comeback especially with the advent of Big Data. ’Clustering’ technique is an important component in data mining. As we enter the Big Data era where many realworld datasets consist of multi-dimensional features, clustering has been gaining momentum in importance within this topic. The traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data set. Therefore, they become computationally expensive when dealing with data comprised of multiple dimensions. In this paper, we have proposed a modified technique that will perform well with high dimensional data set. In our proposed method we used Principle Component Analysis for dimension reduction before applying standard EM algorithm. The performance of the proposed set of algorithms is evaluated on the basis of silhouette index and time of execution.
|Uncontrolled Keywords:||Clustering; dimensionality reduction; Particle Component Analysis; Expectation Maximization|
|Subjects:||Q Science > QA Mathematics > QA75 Electronic computers. Computer science|
Electronics and Electrical Engineering
|Date Deposited:||12 Oct 2015 09:10|
|Last Modified:||12 Oct 2015 09:10|
Actions (login required)