PatTrieSort - External String Sorting based on Patricia Tries

Sven Groppe, Dennis Heinrich, Stefan Werner, Christopher Blochwitz, Thilo Pionteck

Abstract

External merge sort belongs to the most efficient and widely used algorithms to sort big data: As much data as fits inside is sorted in main memory and afterwards swapped to external storage as so called initial run. After sorting all the data in this way block-wise, the initial runs are merged in a merging phase in order to retrieve the final sorted run containing the completely sorted original data. Patricia tries are one of the most space-efficient ways to store strings especially those with common prefixes. Hence, we propose to use patricia tries for initial run generation in an external merge sort variant, such that initial runs can become large compared to traditional external merge sort using the same main memory size. Furthermore, we store the initial runs as patricia tries instead of lists of sorted strings. As we will show in this paper, patricia tries can be efficiently merged having a superior performance in comparison to merging runs of sorted strings. We complete our discussion with a complexity analysis as well as a comprehensive performance evaluation, where our new approach outperforms traditional external merge sort by a factor of 4 for sorting over 4 billion strings of real world data.
OriginalspracheEnglisch
ZeitschriftOpen Journal of Databases (OJDB)
Jahrgang2
Ausgabenummer1
Seiten (von - bis)36-50
Seitenumfang15
ISSN2199-3459
PublikationsstatusVeröffentlicht - 01.10.2015

Strategische Forschungsbereiche und Zentren

  • Querschnittsbereich: Intelligente Systeme
  • Zentren: Zentrum für Künstliche Intelligenz Lübeck (ZKIL)

DFG-Fachsystematik

  • 409-06 Informationssysteme, Prozess- und Wissensmanagement
  • 409-04 Betriebs-, Kommunikations-, Datenbank- und verteilte Systeme

Fingerprint

Untersuchen Sie die Forschungsthemen von „PatTrieSort - External String Sorting based on Patricia Tries“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren