HCIL-2014-17

Monroe, M.
June 2014
Ph.D Dissertation from the Department of Computer Science
HCIL-2014-17

In our burgeoning world of pervasive sensors and affordable data storage, records of timestamped events are being produced across nearly every domain of personal and professional computing. This temporal event data is a fundamental component of electronic health records, process logs, sports analytics, and more. Across all domains, however, are two overarching needs: (1) to understand population-level trends and patterns, and (2) to identify important subsets of individual records.

Visual analytics tools are billed as the solution to both of these problems. A huge volume of work has demonstrated the ability of these tools to facilitate user- guided data exploration and hypothesis generation across a wide range of data types. What is typically ignored however, is the process that takes place between the data collection and this exploration stage, a process frequently referred to as data wrangling. For many data types, wrangling consists mostly of restructuring spreadsheet columns and renaming fields. For temporal event data though, this wrangling process can extend much further|to the data itself|where event patterns must be transformed to better re ect either the real world events that generated them or the perspective of a given study. Without this step, population-level trends can be obscured beyond the point of recognition, and important subsets of records are impossible to discern.

Temporal event data wrangling, however, is deceivingly difficult and error prone even for expert users. Standard, command-based query languages are poorly suited for specifying even the simplest event patterns and, in systems that are not precisely designed for handling temporal constructs, these queries are executed using a series of slow and inefficient self-join operations. Attempts at more accessible query languages frequently omit critical features such as events that occur over a period of time (intervals) or the absence of an event. Perhaps most importantly is that query alone is not enough to get users through a typical temporal event data wrangling process. Event patterns not only need to be found, but also transformed and re-represented. Temporal event wrangling is just as much about revisal as it is about retrieval, and given the ubiquity of this data type, an effective solution on this front has the potential to hugely impact the way that we utilize this data to inform future decisions. An improved query and wrangling process would not only benefit database professionals, but also dramatically increase the range of users who can access this type of data, particularly domain expert medical researchers.

This dissertation demonstrates the ability of the EventFlow visualization tool to extend beyond the typical bounds of data exploration, and serve as a critical aid for both temporal event query and data transformation. I begin by establishing a better understanding of why these two processes are innately error prone, and introduce a simple set of powerful yet usable mechanisms that can help reduce an initial portion of these errors. I then show that by coupling these mechanisms with interactive visualizations, users are able to both identify remaining errors and leverage those errors to construct more accurate queries and transformations. The direct contributions of this dissertation are (1) a graphic-based query capabilities over points, intervals, and absences, (2) an integer programming strategy for processing temporal queries, (3) a Find & Replace system for transforming event sequences, and (4) eight case studies that demonstrate the utility and validity of these approaches. However, this work is designed more broadly to open new avenues of research in how visualization and visual analytics tools can be leveraged for tasks beyond data exploration.

Return to Main TRs Page