This work proposes a general multi-layer framework for image labeling, which targets the challenging problem of classifying the occluded parts of the 3D scene depicted in a 2D image. Our framework is based on the mixed graphical models, which explicitly encode causal relationship between the visible and occluded regions. Unlike other image labeling techniques where a single label is determined for each pixel, layered model assigns multiple labels to pixels. We propose a novel “Multi-Layer-CRF” framework that allows for the integration of sophisticated occlusion potentials into the model and enables the automatic inference of the layer decomposition. We use a special message-passing algorithm to perform maximum a posterior inference on mixed graphs and demonstrate the ability to infer the correct labels of occluded regions in both the aerial near-vertical dataset and urban street-view dataset. It is shown to increase the classification accuracy in occluded areas significantly.