SQL Injection Attack Classification through the Feature Extraction of SQL Query strings using a Gap-Weighted String Subsequence Kernel

Kifayat, K; Shi, Q; Askwith, RJ; McWhirter, PR

SQL Injection Attack Classification through the Feature Extraction of SQL Query strings using a Gap-Weighted String Subsequence Kernel

Export Citation

Kifayat, K, Shi, Q, Askwith, RJ and McWhirter, PR (2018) SQL Injection Attack Classification through the Feature Extraction of SQL Query strings using a Gap-Weighted String Subsequence Kernel. Journal of Information Security and Applications, 40. pp. 199-216. ISSN 2214-2126

Preview

Text
paper_ieee_final_V3.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB) | Preview

Publisher URL: https://doi.org/10.1016/j.jisa.2018.04.001

Abstract

SQL Injection Attacks are one of the most common methods behind data security breaches. Previous research has attempted to produce viable detection solutions in order to filter SQL Injection Attacks from regular queries. Unfortunately it has proven to be a challenging problem with many solutions suffering from disadvantages such as being unable to process in real time as a preventative solution, a lack of adaptability to differing types of attack and the requirement for access to difficult-to-obtain information about the source application. This paper presents a novel solution of classifying SQL queries purely on the features of the initial query string. A Gap-Weighted String Subsequence Kernel algorithm is implemented to identify subsequences of shared characters between query strings for the output of a similarity metric. Finally a Support Vector Machine is trained on the similarity metrics between known query strings which are then used to classify unknown test queries. By gathering all feature data from the query strings, additional information from the source application is not required. The probabilistic nature of the learned models allows the solution to adapt to new threats whilst in operation. The proposed solution is evaluated using a number of test datasets derived from the Amnesia testbed datasets. The demonstration software achieved 97.07% accuracy for Select type queries and 92.48% accuracy for Insert type queries. This limited success rate is due to unsanitised quotation marks within legitimate inputs confusing the feature extraction. Using a test dataset that denies legitimate queries the use of unsanitised quotation marks, the Select and Insert query accuracy rose.

Item Type:	Article
Uncontrolled Keywords:	Intrusion Detection, SQL injection attacks, data mining, String Subsequence Kernel, Support Vector Machine, Supervised Learning
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Computer Science and Mathematics
Publisher:	Elsevier
Date of acceptance:	2 April 2018
Date of first compliant Open Access:	25 April 2019
Date Deposited:	22 Feb 2018 10:37
Last Modified:	04 Sep 2021 10:44
URI:	https://researchonline.ljmu.ac.uk/id/eprint/8112

View Item

CORE (COnnecting REpositories)