Converting unstructured data, i.e. data coded in a format which is not structured in a predefined way, such as PDF, into structured data, i.e. clearly defined types of data organised in a structure, has several advantages. One of the most positive effects of this conversion is that data becomes easier to search, both for humans and for algorithms. Even if there are many tools which have this objective, through a systematic review of the existing literature it is possible to understand whether there is a software whose features allow it to have better performances than the others in order to carry out a specific task in this context. This protocol shows the methodology followed in order to make a systematic review of the literature regarding the software dedicated to the extraction and manipulation of references from papers in PDF file format. Thus, the objective of this research, which is reflected on the flow of the literature review methodology, is to retrieve the most suitable software for the specified purpose, i.e.retrieving and manipulating citations from PDF files.