Analysis of Annotated Data Models for Improving Data Quality

Hannes Ulrich, Ann-Kristin Kock-Schoppenhauer, Björn Andersen, Josef Ingenerf

Abstract

The public Medical Data Models (MDM) portal with more than 9.000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community. It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e. g. the semi-interactive curation of core data records in a special domain. Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data. Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item. This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable. This raises the question how to assure that semantically similar datasets are also processed and classified similarly. In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described. The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items. The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.

OriginalspracheEnglisch
ZeitschriftStudies in Health Technology and Informatics
Jahrgang243
Seiten (von - bis)190-194
Seitenumfang5
ISSN0926-9630
DOIs
PublikationsstatusVeröffentlicht - 2017

Zitieren