BACKGROUND
There is a large amount of valuable and rich qualitative data from interviews or conversations about living with Type 1 Diabetes which could be used for qualitative analysis. However, especially if this data was not collected to answer one specific research question, data saturation and representativeness need to be assessed. Social media data from the Diabetes online community, for example from Twitter, can easily be collected to create a parallel corpus that can be compared with the conversations. This social media data covers a large number of participants and localities and can thus be used to situate the conversational recordings in question within a larger context. The present study puts forward one way in which such a comparison can be implemented, and discusses the findings.
OBJECTIVE
The objective of this study is to show how a collection of Tweets from the English-speaking online Diabetes Community can be used to situate a smaller set of interviews and conversational recordings about living with Type 1 Diabetes in the broader discourse.
METHODS
Two sets of data were collected, one from Twitter using hashtags common in the Diabetes Online Community, the other consists of 17 hours of audio-recorded face-to-face conversations and interviews with people living with Type 1 Diabetes in Scotland. Both corpora contain about 200.000 words. They were analyzed in R using common metrics of word frequency and distinctiveness. The most frequent words were hand-coded for broader topics using a bottom-up data driven approach to coding.
RESULTS
The conversations largely mirror the global diabetes online community’s discourse. The small differences are accounted for by the nature of the medium or the geographical context of the conversations. Both sources of data corroborate findings from previous work on the experience of people living with Type 1 Diabetes in terms of key topics and concerns.
CONCLUSIONS
This strategy of comparing small conversational corpora to potentially very large online corpora is presented as a methodology for making non-purpose-built corpora accessible for different types of analysis, situating purpose-built corpora within a wider context, and developing new research questions based on such a textual analysis.
CLINICALTRIAL
No trial registration was needed. Data collection was approved by the Linguistics and English Language Ethics Committee at the University of Edinburgh and an ethics approval waiver was obtained from the Scottish National Health Service. Participant data were anonymized using pseudonyms selected by the participants.