HCIL-2018-01

Mauriello, M., Buntain, C., McNally, B., Bagalkotkar, S., Kushnir, S., Froehlich, J.
HCIL-2018-01
Recent qualitative studies have begun using large amounts of Online Social Network (OSN) data to study how users interact with technologies. However, current approaches to dataset generation are manual, time-consuming, and can be difficult to reproduce. To address these issues, we introduce SMIDGen: a hybrid manual + computational approach for enhancing the replicability and scalability of data collection from OSNs to support qualitative research. We demonstrate how the SMIDGen approach integrates information retrieval (IR) and machine learning (ML) with human expertise through a case study focused on the collection of YouTube videos. Our findings show how SMIDGen surfaces data that manual searches might otherwise miss, increases the overall proportion of relevant data collected, and is robust against IR/ML algorithm selection.
Return to Main TRs Page