Facial reconstruction

Search LJMU Research Online

Browse Repository | Browse E-Theses

SQL Injection Attack Classification through the Feature Extraction of SQL Query strings using a Gap-Weighted String Subsequence Kernel

Kifayat, K, Shi, Q, Askwith, RJ and McWhirter, PR (2018) SQL Injection Attack Classification through the Feature Extraction of SQL Query strings using a Gap-Weighted String Subsequence Kernel. Journal of Information Security and Applications, 40. pp. 199-216. ISSN 2214-2126

paper_ieee_final_V3.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview


SQL Injection Attacks are one of the most common methods behind data security breaches. Previous research has attempted to produce viable detection solutions in order to filter SQL Injection Attacks from regular queries. Unfortunately it has proven to be a challenging problem with many solutions suffering from disadvantages such as being unable to process in real time as a preventative solution, a lack of adaptability to differing types of attack and the requirement for access to difficult-to-obtain information about the source application. This paper presents a novel solution of classifying SQL queries purely on the features of the initial query string. A Gap-Weighted String Subsequence Kernel algorithm is implemented to identify subsequences of shared characters between query strings for the output of a similarity metric. Finally a Support Vector Machine is trained on the similarity metrics between known query strings which are then used to classify unknown test queries. By gathering all feature data from the query strings, additional information from the source application is not required. The probabilistic nature of the learned models allows the solution to adapt to new threats whilst in operation. The proposed solution is evaluated using a number of test datasets derived from the Amnesia testbed datasets. The demonstration software achieved 97.07% accuracy for Select type queries and 92.48% accuracy for Insert type queries. This limited success rate is due to unsanitised quotation marks within legitimate inputs confusing the feature extraction. Using a test dataset that denies legitimate queries the use of unsanitised quotation marks, the Select and Insert query accuracy rose.

Item Type: Article
Uncontrolled Keywords: Intrusion Detection, SQL injection attacks, data mining, String Subsequence Kernel, Support Vector Machine, Supervised Learning
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Computer Science & Mathematics
Publisher: Elsevier
Date Deposited: 22 Feb 2018 10:37
Last Modified: 04 Sep 2021 10:44
URI: https://researchonline.ljmu.ac.uk/id/eprint/8112
View Item View Item