This paper presents a generic approach to highly efficient image registration in two and three dimensions. Both monomodal and multimodal registration problems are considered. We focus on the important class of affine-linear transformations in a derivative-based optimization framework. Our main contribution is an explicit formulation of the objective function gradient and Hessian approximation that allows for very efficient, parallel derivative calculation with virtually no memory requirements. The flexible parallelism of our concept allows for direct implementation on various hardware platforms. Derivative calculations are fully matrix free and operate directly on the input data, thereby reducing the auxiliary space requirements from O(n) to O(1). The proposed approach is implemented on multicore CPU and GPU. Our GPU code outperforms a conventional matrix-based CPU implementation by more than two orders of magnitude, thus enabling usage in real-time scenarios. The computational properties of our approach are extensively evaluated, thereby demonstrating the performance gain for a variety of real-life medical applications.