This week’s HCIL Brown Bag Lunch (BBL) will feature Sigfried Gold from University of Maryland, College Park. His talk is titled, Exploratory visualization tools for health records research, and an exciting detour into infrastructural support for health records research at UMD.


Time: Thursday, October 5th from 12:30PM – 1:30PM
Place: HCIL (2105 Hornbake, South Wing)

Please make sure to bring your own lunch this week (it will be a more conventional BBL).


Important medical research is increasingly based on analysis of data collected during provision of routine care. Compared to clinical trials data, this “secondary use” data is not susceptible to randomized, prospective study protocols; it suffers from poor quality and extreme “missingness” for observational or retrospective methods; strict privacy and human subjects regulations limit its availability; processing it for analysis is complicated by the diversity of its sources, formats, and the plethora of language and coding systems in which it is recorded; and analyzing it generally requires advanced clinical training and methods for grappling with its extreme multi-variateness, sparsity, and unknown systemic biases. Despite these formidable challenges, this data is orders of magnitude cheaper and more prolific than clinical trial data. Researchers and analysts within medical provider institutions can have access to data for millions of patients essentially for free; while medical products companies, regulators, and payer institutions can affordably purchase data for hundreds of millions of patients. Further, although analysts’ uses cases are diverse and their methods (e.g., advanced statistics or machine learning) often opaque as well as immature; they share many basic questions and tasks: they almost universally need to characterize their populations on various demographic and clinical dimensions; they generally need to choose study and comparator cohorts; they need to group patients by disease and treatment parameters; they need to evaluate the significance of untold co-morbidities and confounders; they need to explore and discover temporal patterns obscured by the volume and variability of the data.

The advent of common data models and open-source software is just beginning to drastically streamline research workflows with this data. For analysts with access to data in OHDSI ( format, for instance, many months of the standard observational study workflow can be skipped entirely. OHDSI’s web-based cohort construction tools and it’s open and growing R methods library allow researchers not only to define and execute their studies in hours or days rather than months, these researchers can now instantly and precisely share their code and aggregate results in a research network to be immediately replicated on dozens of other databases containing records for hundreds of millions of patients.

What this means for my research is: 1) my visualization tools can be built to a single data model and can be tested with a wide variety of use cases and without requiring my subject matter expert collaborators to perform data collection and transformation just to work with me; and 2) my tools can be built with immediate integration into platforms they are already using, so, for instance, they can take advantage of these experimental visualization tools as they design their study and set parameters; they can feed those parameters into their statistical or machine learning algorithms; and they can then (continuing in the same platform) use these visualization tools to explore and evaluate results.

What it also means for my research, for better or worse, is that my model for developing and evaluating visualization software and working with users and collaborators is very different from what HCI researchers are used to, and, since no one at UMD (as far as I know) is using OHDSI or anything like it, I have been spending more time explaining and evangelizing for my preferred research platform than for my research itself.

At the Brown Bag I will talk about both; but depending on audience interest (some of our visualization researchers will be off at IEEE VIS this week), I may end up focusing more on the infrastructural issues.


With 29 years of experience in developing data management and analysis software on Unix/Linux and web platforms, Gold specializes in designing and implementing innovative, browser-based information visual analytics tools to facilitate the exploration and understanding of complex, multivariate or temporal data. He has experience in a wide array of industries (cyber security, securities trading, law, public sector administration, fundraising), but particular expertise in medical informatics and the secondary use of clinical and claims data for pharmacoepidemiology and patient safety research. He works with medical data using a common data model and open-source software as a collaborator in the OHDSI community.