Affiliation:
1. Harvard Medical School
2. Boston Children's Hospital
Abstract
Abstract
Author affiliations are essential in bibliometric studies, requiring relevant information extraction from free-text affiliations. Precisely determining an author's location from their affiliation is crucial for understanding research networks, collaborations, and geographic distribution. Existing geoparsing tools using regular expressions have limitations due to unstructured and ambiguous affiliations, resulting in erroneous location identification, especially for unconventional variations or misspellings. Moreover, their inefficient handling of big datasets hampers large-scale bibliometric studies. Though machine learning-based geoparsers exist, they depend on explicit location information, creating challenges when detailed geographic data is absent. To address these issues, we developed and evaluated a natural language processing model to predict the city, state, and country from an author's free-text affiliation. Our model automates location inference, overcoming drawbacks of existing methods. Trained and tested with MapAffil, a publicly available geoparsed dataset of PubMed affiliations up to 2018, our model accurately retrieves high-resolution locations, even without explicit mentions of a city, state, or even country. Leveraging NLP techniques and the LinearSVC algorithm, our machine learning model achieves superior accuracy based on several validation datasets. This research demonstrates a practical application of text classification for inferring specific geographical locations from free-text affiliations, benefiting researchers and institutions in analyzing research output distribution.
Publisher
Research Square Platform LLC
Reference7 articles.
1. The bibliometric analysis of scholarly production: How great is the impact;Ellegaard O;Scientometrics,2015
2. U.S. National Library of Medicine. MEDLINE®/pubmed® XML Element Descriptions and their Attributes https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html#medlinecitation (2018).
3. A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide;Torvik VI;Dlib Mag,2015
4. Tuomela M. S., Fegley B. D., Torvik V.I. Introducing the Author-ity Exporter, and a case study of geo-temporal movement of authors. In: METRICS Workshop ASIST Annual Meeting, http://hdl.handle.net/2142/91612 (2016).
5. Rajaraman, A., Ullman, J.D. Mining of Massive Datasets. 1–17, 10.1017/CBO9781139058452.002 (2011).