Fergus, P, Chalmers, C, Matthews, N, Nixon, S, Burger, A, Hartley, O, Sutherland, C, Lambin, X, Longmore, S and Wich, S (2024) Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data. Sensors, 24 (24). pp. 1-31. ISSN 1424-8220
|
Text
Towards Context Rich Automated Biodiversity Assessments.pdf - Published Version Available under License Creative Commons Attribution. Download (19MB) | Preview |
Abstract
Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision–language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps. We introduce a two-stage system: YOLOv10-X to localise and classify species (mammals and birds) within images and a Phi-3.5-vision-instruct model to read YOLOv10-X bounding box labels to identify species, overcoming its limitation with hard-to-classify objects in images. Additionally, Phi-3.5 detects broader variables, such as vegetation type and time of day, providing rich ecological and environmental context to YOLO’s species detection output. When combined, this output is processed by the model’s natural language system to answer complex queries, and retrieval-augmented generation (RAG) is employed to enrich responses with external information, like species weight and IUCN status (information that cannot be obtained through direct visual analysis). Combined, this information is used to automatically generate structured reports, providing biodiversity stakeholders with deeper insights into, for example, species abundance, distribution, animal behaviour, and habitat selection. Our approach delivers contextually rich narratives that aid in wildlife management decisions. By providing contextually rich insights, our approach not only reduces manual effort but also supports timely decision making in conservation, potentially shifting efforts from reactive to proactive.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | wildlife conservation; deep learning; object detection; large language models; vision transformers; biodiversity monitoring; biodiversity monitoring; deep learning; large language models; object detection; vision transformers; wildlife conservation; Biodiversity; Animals; Birds; Image Processing, Computer-Assisted; Conservation of Natural Resources; Deep Learning; Artificial Intelligence; Mammals; Ecosystem; Machine Learning and Artificial Intelligence; Networking and Information Technology R&D (NITRD); 13 Climate Action; 15 Life on Land; 0301 Analytical Chemistry; 0502 Environmental Science and Management; 0602 Ecology; 0805 Distributed Computing; 0906 Electrical and Electronic Engineering; Analytical Chemistry |
Subjects: | G Geography. Anthropology. Recreation > GE Environmental Sciences Q Science > QA Mathematics > QA75 Electronic computers. Computer science T Technology > T Technology (General) |
Divisions: | Astrophysics Research Institute Biological and Environmental Sciences (from Sep 19) Computer Science and Mathematics |
Publisher: | MDPI |
SWORD Depositor: | A Symplectic |
Date Deposited: | 15 Jan 2025 16:19 |
Last Modified: | 15 Jan 2025 16:30 |
DOI or ID number: | 10.3390/s24248122 |
URI: | https://researchonline.ljmu.ac.uk/id/eprint/25284 |
View Item |