In this work we present an optimized version of the Adaptive Radix Tree (ART) index structure for GPUs. We analyze an existing GPU implementation of ART (GRT), identify bottlenecks and present an optimized data structure and layout to improve the lookup and update performance. We show that our implementation outperforms the existing approach by a factor up to 2 times for lookups and up to 10 times for updates using the same GPU. We also show that the sequential memory layout presented here is beneficial for lookup-intensive workloads on the CPU, outperforming the ART by up to 10 times. We analyze the impact of the memory architecture of the GPU, where it becomes visible that traditional GDDR6(X) is beneficial for the index lookups due to the faster clock rates compared to High Bandwidth Memory (HBM).
Original languageEnglish
Publication statusPublished - 2021

Research Areas and Centers

  • Centers: Center for Artificial Intelligence Luebeck (ZKIL)
  • Research Area: Intelligent Systems

DFG Research Classification Scheme

  • 409-04 Operating, Communication, Database and Distributed Systems
  • 409-08 Massively Parallel and Data-Intensive Systems


Dive into the research topics of 'CuART - a CUDA-based, scalable Radix-Tree lookup and update engine.'. Together they form a unique fingerprint.

Cite this