Parallel Execution Approaches on Data and Index Structures in the Context of Semantic Web Database Management Systems

Dennis Heinrich

Abstract

Information is crucial in the development of mankind. Since the dawn of time, humanity has preserved knowledge in different media, like stone tablets, papyrus scrolls or books. Each media has different advantages and disadvantages, like stone tablets do not burn, papyrus scrolls are light-weight and books combine multiple pages in an ergonomic form for the human hand. If these different types of media are stored in a collection, they are sorted and cataloged in a way that a reader can find the needed information faster and easier. These choices of how and where we store information persists if a computer is used to store data. The choice of the underlying data structure and the used index structure both directly impact how efficient and effective the data can be stored and retrieved. This efficiency and effectiveness becomes more important with advanced concepts of structuring data, like the Semantic Web that stores sentence-like statements, consisting of subject, predicate and object, in the billions. For a long period of time, the advancements in the frequency of processing units could counter the growth of the amount of data that is processed. At some point the consumed power for greater processing speed became unbearable and the distribution of tasks into different units of a multi-core processor was the new direction the development was heading. Specialized hardware with parallel execution paths have become the new assistants and/or competitors to the parallel approaches of processors. Be it either the reprogramming of Graphics Processing Unit for general tasks than graphic computation or the use of Field-Programmable Gate Arrays (FPGAs) for a user-specific circuit, the chosen index and data structures have to be adapted to the new parallel execution principle. This leads to a new examination of already existing data and index structures which are advanced and combined under the aspect of parallel execution.

In this work, the index and data structures of a Semantic Web Database Management System are extended and restructured for a highly parallel execution. The new sorting approach PatTrieSort uses patricia tries to not only sort but also compress the input strings at the same time allowing an efficient usage of the main memory. Multiple tries, so called initial runs, can be merged later in the process resulting in one final patricia trie. This new sorting approach can be integrated into the dictionary and six collation orders generation of the chosen Semantic Web Database LUPOSDATE. In the multiple steps of this generation process, each opportunity is used to execute tasks in parallel. At the last part of this work, the B +-tree index structures of the collation orders are extended for parallel searches on an FPGA. This leads to a hybrid system consisting of a CPU-based host system and an FPGA-based accelerator card. Each part of this work is carefully evaluated for its benefits and limitations in context of parallel execution and acceleration of the Semantic Web system.
OriginalspracheEnglisch
QualifikationDoctorate
Gradverleihende Hochschule
Betreuer/-in / Berater/-in
  • Groppe, Sven, Betreuer*in
  • Ingenerf, Josef, Betreuer*in
Datum der Vergabe24.09.2018
PublikationsstatusVeröffentlicht - 19.09.2018

Fingerprint

Untersuchen Sie die Forschungsthemen von „Parallel Execution Approaches on Data and Index Structures in the Context of Semantic Web Database Management Systems“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren