Abstract
Device‐to‐device (D2D) communication is a promising technology in fifth‐generation (5G) wireless networks, offering enhanced system capacity, spectrum performance, and energy efficiency. However, D2D links can introduce interference with cellular links, posing challenges in spectrum allocation and network quality assurance. This paper presents a novel approach using multiagent reinforcement learning with a proximal policy optimization algorithm to address the resource allocation problem in D2D networks. The proposed algorithm aims to optimize overall throughput and maximize the signal‐to‐interference noise ratio (SINR) while ensuring low computational complexity. The study introduces the following two key techniques: staggered training and decentralized execution. Staggered training improves agent performance and minimizes computational complexity by training agents one at a time in a sequential manner. This allows agents to learn from each other’s mistakes and avoid local minima. Decentralized execution enhances scalability and system robustness by enabling agents to learn and act independently without relying on communication with other agents. In the event of agent failure, the remaining agents can continue operating. The findings of this work demonstrate a significant improvement in energy efficiency (EE) and an enhancement in the quality of service (QoS) of the network. Overall, the algorithm proves to be a promising solution for resource allocation in multiagent D2D networks, offering notable improvements in EE and QoS while maintaining scalability for large networks.