Highly Efficient Image Registration for Embedded Systems Using a Distributed Multicore DSP Architecture

Roelof Berg, Lars König, Jan Rühaak, Ralph Lausen, Bernd Fischer


We present a complete approach to highly efficient image registration for embedded systems, covering all steps from theory to practice. An optimization-based image registration algorithm using a least-squares data term is implemented on an embedded distributed multicore digital signal processor (DSP) architecture. All relevant parts are optimized, ranging from mathematics, algorithmics, and data transfer to hardware architecture and electronic components. The optimization for the rigid alignment of two-dimensional images is performed in a multilevel Gauss–Newton minimization framework. We propose a reformulation of the necessary derivative computations, which eliminates all sparse matrix operations and allows for parallel, memory-efficient computation. The pixelwise parallellism forms an ideal starting point for our implementation on a multicore, multichip DSP architecture. The reduction of data transfer to the particular DSP chips is key for an efficient calculation. By determining worst cases for the subimages needed on each DSP, we can substantially reduce data transfer and memory requirements. This is accompanied by a sophisticated padding mechanism that eliminates pipeline hazards and speeds up the generation of the multilevel pyramid. Finally, we present a reference hardware architecture consisting of four TI C6678 DSPs with eight cores each. We show that it is possible to register high-resolution images within milliseconds on an embedded device. In our example, we register two images with 4096 × 4096 pixels within 93 ms, while off-loading the CPU by a factor of 20 and requiring 3.12 times less electrical energy.
Original languageEnglish
JournalJournal of Real-Time Image Processing
Issue number2
Pages (from-to)341–361
Number of pages21
Publication statusPublished - 02.11.2014


Dive into the research topics of 'Highly Efficient Image Registration for Embedded Systems Using a Distributed Multicore DSP Architecture'. Together they form a unique fingerprint.

Cite this