Analysis of Annotated Data Models for Improving Data Quality

Hannes Ulrich, Ann-Kristin Kock-Schoppenhauer, Björn Andersen, Josef Ingenerf


The public Medical Data Models (MDM) portal with more than 9.000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community. It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e. g. the semi-interactive curation of core data records in a special domain. Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data. Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item. This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable. This raises the question how to assure that semantically similar datasets are also processed and classified similarly. In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described. The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items. The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.

Original languageEnglish
JournalStudies in Health Technology and Informatics
Pages (from-to)190-194
Number of pages5
Publication statusPublished - 2017


Dive into the research topics of 'Analysis of Annotated Data Models for Improving Data Quality'. Together they form a unique fingerprint.

Cite this