Affiliation:
1. College of Electronic Engineering, National University of Defense Technology, Heifei 230037, China
2. Anhui Shenwu Information Technology Co., Ltd., Heifei 241000, China
Abstract
Modern web applications offer various APIs for data interaction. However, as the number of these APIs increases, so does the potential for security threats. Essentially, more APIs in an application can lead to more detectable vulnerabilities. Thus, it is crucial to identify APIs as comprehensively as possible in web applications. However, this task faces challenges due to the increasing complexity of web development techniques and the abundance of similar web pages. In this paper, we propose APIMiner, a framework for identifying APIs in web applications by dynamically traversing web pages based on web page state similarity analysis. APIMiner first builds a web page model based on the HTML elements of the current web page. APIMiner then uses this model to represent the state of the page. Then, APIMiner evaluates each element’s similarity in the page model and determines the page state similarity based on these similarity values. From the different states of the page, APIMiner extracts the data interaction APIs on the page. We conduct extensive experiments to evaluate APIMiner’s effectiveness. In the similarity analysis, our method surpasses state-of-the-art methods like NDD and mNDD in accurately distinguishing similar pages. We compare APIMiner with state-of-the-art tools (e.g., Enemy of the State, Crawlergo, and Wapiti3) for API identification. APIMiner excels in the number of identified APIs (average 1136) and code coverage (average 28,470). Relative to these tools, on average, APIMiner identifies 7.96 times more APIs and increases code coverage by 142.72%.
Funder
The National Key R&D Program of China