Deep networks have set the state-of-the-art in most image analysis tasks by replacing handcrafted features with learned convolution filters within end-to-end trainable architectures. Still, the specifications of a convolutional network are subject to much manual design – the shape and size of the receptive field for convolutional operations is a very sensitive part that has to be tuned for different image analysis applications. 3D fully-convolutional multi-scale architectures with skip-connection that excel at semantic segmentation and landmark localisation have huge memory requirements and rely on large annotated datasets - an important limitation for wider adaptation in medical image analysis. We propose a novel and effective method based on trainable 3D convolution kernels that learns both filter coefficients and spatial filter offsets in a continuous space based on the principle of differentiable image interpolation first introduced for spatial transformer network. A deep network that incorporates this one binary extremely large and inflecting sparse kernel (OBELISK) filter requires fewer trainable parameters and less memory while achieving high quality results compared to fully-convolutional U-Net architectures on two challenging 3D CT multi-organ segmentation tasks. Extensive validation experiments indicate that the performance of sparse deformable convolutions is due to their ability to capture large spatial context with few expressive filter parameters and that network depth is not always necessary to learn complex shape and appearance features. A combination with conventional CNNs further improves the delineation of small organs with large shape variations and the fast inference time using flexible image sampling may offer new potential use cases for deep networks in computer-assisted, image-guided interventions.