RSA is one of the well-known cryptography method used in asymmetric cryptosystems. But, RSA challenges on architecture, performance, power and resource consumption still can be improved. In this research, we propose a low-latency and resource-efficient scalable RSA cryptoprocessor architecture to deal with power and resource consumption issues. It is obtained using two approaches. First, optimization of Radix-4 Montgomery multiplication that yields the reduction on resource utilization and latency. Second, designing a scalable architecture based on the optimized Radix-4 Montgomery multiplication. The proposed design is verified in FPGA through simulation and image encryption application. Synthesis results show that the proposed design achieves an optimal design in respect of low-latency, resource-efficient and scalability. It only requires 227k cycles latency and consumes 13k logic gate utilization for 512-bit RSA.