Tripathi, S (2025) E2RL : A Framework for Entity Extraction, Resolution and Linking. Doctoral thesis, Liverpool John Moores University.
Preview |
Text
2025shashiphd.pdf - Published Version Available under License Creative Commons Attribution Non-commercial. Download (1MB) | Preview |
Abstract
Extracting information from large volume of text and inferring or storing insights is nowadays has become a much-needed process. Entity Extraction, Resolution and linking using knowledge graphs is a critical process that facilitates the accurate identification and association of entities across diverse datasets, thereby enhancing information retrieval, data integration, and intelligent decision-making systems. This doctoral research introduces the Entity Extraction, Resolution, and Linking (E2RL) Framework, a comprehensive solution designed to address the multifaceted challenges of information extraction. The E2RL Framework comprises three interconnected submodules: Entity Extraction, Resolution, and Linking. In the Entity Extraction submodule, innovative approaches such as TransCRF(Rai, Tripathi and Narang, 2022a) and SimNER (Tripathi and Rai, 2018a)are employed to identify entities within textual data with high precision and contextual understanding. The Entity Resolution submodule leverages both Tree-based and Probabilistic Approaches to reconcile and associate entities across different data sources, ensuring consistency and accuracy in entity representation. For establishing meaningful connections between identified entities, the Linking submodule integrates advanced graph linking algorithms, including FriendREC (Tripathi et al., 2019) and LinkVec (S P Tripathi, Yadav and Rai, 2022) and other methods, which enhance the relational structure of knowledge graphs. The architecture of E2RL adopts the Model-View-ViewModel (MVVM) pattern and a layered structure(Tripathi and Narang, 2016a), promoting scalability, maintainability, and flexibility for future enhancements. This modular design facilitates seamless information retrieval, efficient data indexing for semantic searching, and the extraction of pivotal information, thereby supporting robust entity linking processes. The research methodology encompasses an extensive literature review to identify gaps and advancements in existing Entity Extraction, Resolution, and Linking methodologies. A conceptual framework is developed to define the core components and their interrelationships, followed by the design and implementation of tailored algorithms for each submodule. Rigorous theoretical underpinnings and validation procedures ensure the robustness and validity of the proposed framework. Comprehensive evaluations of the E2RL Framework demonstrate its versatility and effectiveness in tackling complex information extraction challenges across various domains. The framework's adaptability makes it a valuable tool for researchers and practitioners, capable of integrating contextual embeddings, adapting to dynamic data changes, and scaling efficiently to incorporate external knowledge bases. Additionally, the E2RL Framework is optimized for real-time entity extraction, resolution, and linking, ensuring timely and accurate results for applications requiring instant data processing. In conclusion, the E2RL Framework offers a robust, scalable, and adaptable solution for enhancing Entity Extraction, Resolution and linking. By integrating advanced pre-processing techniques, sophisticated optimization algorithms, and innovative link prediction methodologies, E2RL effectively addresses critical challenges such as entity ambiguity, scalability, and computational efficiency. This research not only contributes valuable methodologies to the academic discourse but also provides a practical framework for real world applications, paving the way for more intelligent and efficient information systems.
Item Type: | Thesis (Doctoral) |
---|---|
Uncontrolled Keywords: | Named Entity Recognition; Entity Extraction; Entity Resolution; Entity Linking; Generative AI; Large Language Models |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Computer Science and Mathematics |
Date of acceptance: | 20 June 2025 |
Date of first compliant Open Access: | 31 July 2025 |
Date Deposited: | 31 Jul 2025 15:27 |
Last Modified: | 31 Jul 2025 15:27 |
DOI or ID number: | 10.24377/LJMU.t.00026819 |
Supervisors: | Zhou, B, Atherton, P, Sheng, Y and Khan, W |
URI: | https://researchonline.ljmu.ac.uk/id/eprint/26819 |
![]() |
View Item |