Deep Reinforcement Learning for Battery Energy Storage Optimization and Residential Decarbonization in Grid-Deficient Environments: An Iraqi Case Study

Mohammed, A orcid iconORCID: 0009-0003-8426-6964, Abdullah, BM orcid iconORCID: 0000-0002-1281-148X, Shubbar, A orcid iconORCID: 0000-0001-5609-1165, Zhang, Q orcid iconORCID: 0000-0002-0651-469X, Aldhaibani, O orcid iconORCID: 0000-0003-0235-2862, Cullen, J orcid iconORCID: 0000-0003-0401-4613 and Salih, A (2026) Deep Reinforcement Learning for Battery Energy Storage Optimization and Residential Decarbonization in Grid-Deficient Environments: An Iraqi Case Study. Energies, 19 (5).

[thumbnail of energies-19-01233-v2.pdf]
Preview
Text
energies-19-01233-v2.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

In grid-deficient environments, residential energy systems face severe carbon emission penalties due to mandatory reliance on diesel standby generators during supply interruptions. In Iraq, summer peak loads routinely exceed grid capacity, triggering prolonged generator operation and dramatically increasing household carbon footprints. This study presents a deep Q-network (DQN) reinforcement learning framework for intelligent battery energy storage system (BESS) scheduling, targeting carbon emissions reduction through strategic peak shaving. The DQN agent learns optimal battery dispatch strategies by internalizing diurnal patterns in load and solar generation through temporal state features, enabling anticipatory control without requiring explicit external forecasting models. The system is trained on one-year operational data from a representative Iraqi residential installation and evaluated over the critical summer period (122 days, 35.5% grid unavailability). The results demonstrate a 54.8% CO2 reduction (306.5 kg versus 677.4 kg baseline), a 25.5% reduction in generator runtime, and a 23.7% reduction in operating costs for the studied configuration. The learned policy approaches 89.6% of perfect-foresight MILP performance while executing 35,000 times faster. A reward function sensitivity analysis across five weighting schemes confirms that the 20:1 carbon-to-cost priority ratio optimally balances environmental and economic objectives. Ablation studies quantify the mechanism contributions: anticipatory pre-charging accounts for 58% of the total improvement, discharge optimization for 44%, and real-time PV coordination for 22%. These findings establish DQN-based BESS optimization as a practically deployable decarbonization approach for residential systems in grid-constrained developing regions.

Item Type: Article
Uncontrolled Keywords: 02 Physical Sciences; 09 Engineering; 33 Built environment and design; 40 Engineering; 51 Physical sciences
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Divisions: Civil Engineering and Built Environment
Computer Science and Mathematics
Engineering
Publisher: MDPI
Date of acceptance: 25 February 2026
Date of first compliant Open Access: 2 March 2026
Date Deposited: 02 Mar 2026 14:55
Last Modified: 02 Mar 2026 14:55
DOI or ID number: 10.3390/en19051233
URI: https://researchonline.ljmu.ac.uk/id/eprint/28170
View Item View Item