Variations in the shape and appearance of anatomical structures in medical images are often relevant radiological signs of disease. Automatic tools can help automate parts of this manual process. A cloud-based evaluation framework is presented in this paper including results of benchmarking current state-of-the-art medical imaging algorithms for anatomical structure segmentation and landmark detection: the VISCERAL Anatomy benchmarks. The algorithms are implemented in virtual machines in the cloud where participants can only access the training data and can be run privately by the benchmark administrators to objectively compare their performance in an unseen common test set. Overall, 120 computed tomography and magnetic resonance patient volumes were manually annotated to create a standard Gold Corpus containing a total of 1295 structures and 1760 landmarks. Ten participants contributed with automatic algorithms for the organ segmentation task, and three for the landmark localization task. Different algorithms obtained the best scores in the four available imaging modalities and for subsets of anatomical structures. The annotation framework, resulting data set, evaluation setup, results and performance analysis from the three VISCERAL Anatomy benchmarks are presented in this article. Both the VISCERAL data set and Silver Corpus generated with the fusion of the participant algorithms on a larger set of non-manually-annotated medical images are available to the research community.