The automatic detection and accurate localization of landmarks is a crucial task in medical imaging. It is necessary for tasks like diagnosis, surgical planning, and post-operative assessment. A common approach to localize multiple landmarks is to combine multiple independent localizers for individual landmarks with a spatial regularizer, e.g., a conditional random field (CRF). Its configuration, e.g., the CRF topology and potential functions, often has to be manually specified w.r.t. the application. In this paper, we present a general framework to automatically learn the optimal configuration of a CRF for localizing multiple landmarks. Furthermore, we introduce a novel “missing” label for each landmark (node in the CRF). The key idea is to define a pool of potentials and optimize their CRF weights and the potential values for missing landmarks in a learning framework. Potentials with a low weight are removed, thus optimizing the graph topology. This allows to easily transfer our framework to new applications, and to integrate different localizers. Further advantages of our algorithm are its low test runtime, low amount of training data, and interpretability. We illustrate its feasibility in a detailed evaluation on three medical datasets featuring high degrees of pathologies and outliers.