Digital trace data collection through data donation
Author(s): Laura Boeschoten, Niek de Schipper, Adrienne Mendrik, Emiel van der Veen, Bella Struminskaya, Heleen Janssen, Theo Araujo
Friday 16 | 11:20-11:40
Room: TP54
Session: Sociology and Computational Social Science
In our everyday lives, we leave more and more traces behind on digital platforms: for example, by liking a post on Instagram or sending a message via WhatsApp. The promise of computational social science is that researchers can utilize these digital traces to study human behavior and social interaction at an unprecedented level of detail. However, while the amount of digital traces increases, most are closed off in proprietary archives of commercial corporations, with only a subset being available to a small set of researchers, or through increasingly restricted APIs. An alternative approach to gain access to digital traces is enabled thanks to the GDPR’s right to data access and data portability. Thanks to this legislation, all data processing entities are required to provide citizens a digital copy of their personal data upon request in, where that is appropriate, electronic form. We refer to these pieces of personal data as Data Download Packages (DDPs). This legislation allows researchers to invite participants to share their DDPs. A major challenge is, however, that DDPs potentially contain very sensitive data. Conversely, often not all data is needed to answer the specific research question.
To tackle these challenges, Boeschoten et al. (2022) developed an alternative workflow: First, the participant requests their personal DDP at the platform of interest. Second, they download it onto their own personal device. Third, by means of local processing, only the features of interest to the researcher are extracted from that DDP. Fourth, the participant inspects the extracted features after which they can choose what they want to donate (or decline to donate). Only after selecting the data for donation and clicking the button donate, the donated data is sent to a storage location and can be accessed by the researcher.
Port (Boeschoten et al., 2023) is an open-source software tool that allows for researchers to fully configure their own data donation study design using this workflow. Port is a generic tool that allows researchers to decide which digital platform to investigate, which digital traces to collect, how to present the digital traces to the participant, and what to communicate to the participant throughout this process. Furthermore, Port can be used to create custom study flows for participants and can be integrated with other resources for data collection such as Qualtrics. At last, Port is available open-source, and we facilitate the use of Port either on the Dutch national research infrastructure SURF, or through software-as-a-service solutions through Eyra.co. To summarize, these functionalities make Port a generic and useful tool for any researcher interested in collecting digital traces for research purposes.