What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data-Reference-Cited by-同舟云学术

What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data

Published:2020-07 Issue:3 Volume:6 Page:205630512094070
ISSN:2056-3051
Container-title:Social Media + Society
language:en
Short-container-title:Social Media + Society

Author:

Mancosu Moreno¹^ORCID,Vegetti Federico¹²

Affiliation:

1. University of Turin, Italy

2. University of Milan, Italy

Abstract

In reaction to the Cambridge Analytica scandal, Facebook has restricted the access to its Application Programming Interface (API). This new policy has damaged the possibility for independent researchers to study relevant topics in political and social behavior. Yet, much of the public information that the researchers may be interested in is still available on Facebook, and can be still systematically collected through web scraping techniques. The goal of this article is twofold. First, we discuss some ethical and legal issues that researchers should consider as they plan their collection and possible publication of Facebook data. In particular, we discuss what kind of information can be ethically gathered about the users (public information), how published data should look like to comply with privacy regulations (like the GDPR), and what consequences violating Facebook’s terms of service may entail for the researcher. Second, we present a scraping routine for public Facebook posts, and discuss some technical adjustments that can be performed for the data to be ethically and legally acceptable. The code employs screen scraping to collect the list of reactions to a Facebook public post, and performs a one-way cryptographic hash function on the users’ identifiers to pseudonymize their personal information, while still keeping them traceable within the data. This article contributes to the debate around freedom of internet research and the ethical concerns that might arise by scraping data from the social web.

Publisher

SAGE Publications

Subject

Computer Science Applications,Communication,Cultural Studies

Link

http://journals.sagepub.com/doi/pdf/10.1177/2056305120940703

Reference28 articles.

1. Pseudonymization of patient identifiers for translational research

2. Facebook polls as proto-democratic instruments in the Egyptian revolution: The ‘We Are All Khaled Said’ Facebook page

3. Put in the spotlight or largely ignored? Emphasis on the Spitzenkandidaten by political parties in their online campaigns for European elections

4. After the ‘APIcalypse’: social media platforms and their fight against critical scholarly research

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. People incorrectly correcting other people: The pragmatics of (re-)corrections and their negotiation in a Facebook group;Discourse, Context & Media;2024-10

2. A comparative study on public interest considerations in data scraping dispute;International Journal of Law in Context;2024-09-06

3. “Don’t research us”—How Mastodon instance rules connect to research ethics;Publizistik;2024-08

4. The role of online crisis actors in teachers’ work and lives;Critical Studies in Education;2024-07-29

5. Enhancing Social Media Data Collection for Digital Forensic Investigations: A Web Parser Approach;2024 International Conference on Computer, Information and Telecommunication Systems (CITS);2024-07-17