Abstract
Nowadays it is a fact that more and more (unstructured) data is generated, stored and analyzed in several areas of science and industry – motivated by political and security reasons (surveillance, intelligence agencies), economical (advertisement, social media) or medical matters. Besides a flood of machine-generated data due to technological advances, e.g., in ubiquitous internet-of-thing products and increasing accuracy of sensors, a large amount of data is produced by humans in various forms. In order to enable machines to automatically analyze (possibly not well or completely defined) data the idea of the Semantic Web was created. Besides suitable data structures, optimized hardware is necessary to store and process this vast amount of data. Whereas persistently storing these massive data is hassle-free, the processing and analysis within a reasonable time frame becomes more and more difficult. In order to cope with these problems, in the last decades intensive work was done to optimize database software and data structures. Furthermore, technological advances enabled shrinking feature size to increase clock frequency and thus the overall performance. However, nowadays these approaches are reaching their limits (power wall) and in the last years the trend evolved to multi/manycore systems in order to increase performance. Additionally, these systems are not assembled with homogeneous cores, but rather are composed by heterogeneous and specialized cores which compute a specific task efficiently. The main issue of such systems is that these specialized cores can not be used in applications showing a huge variety in processing. Widely available Field Programmable Gate Arrays (FPGAs) with the capability of (partial) runtime reconfiguration are able to close the gap between the flexibility of general-purpose CPUs and the performance of specialized hardware accelerators.
In this work, a fully integrated hardware-accelerated query engine for large-scale datasets in the context of Semantic Web databases is presented. The proposed architecture allows the user to retrieve specific information from Semantic Web datasets by writing a query which is automatically adapted and executed on a runtime reconfigurable FPGA. As queries are typically unknown at design time a static approach is not feasible and not flexible to cover a wide range of queries at system runtime. Our proposed approach dynamically generates an optimized hardware accelerator in terms of an FPGA configuration for each individual query and transparently retrieves the query result to be displayed to the user. The benefits and limitations are evaluated on large-scale synthetic datasets with up to 260 million records as well as the widely known Billion Triples Challenge with over one billion records.
In this work, a fully integrated hardware-accelerated query engine for large-scale datasets in the context of Semantic Web databases is presented. The proposed architecture allows the user to retrieve specific information from Semantic Web datasets by writing a query which is automatically adapted and executed on a runtime reconfigurable FPGA. As queries are typically unknown at design time a static approach is not feasible and not flexible to cover a wide range of queries at system runtime. Our proposed approach dynamically generates an optimized hardware accelerator in terms of an FPGA configuration for each individual query and transparently retrieves the query result to be displayed to the user. The benefits and limitations are evaluated on large-scale synthetic datasets with up to 260 million records as well as the widely known Billion Triples Challenge with over one billion records.
Original language | English |
---|---|
Qualification | Doctorate / Phd |
Awarding Institution | |
Supervisors/Advisors |
|
Award date | 06.02.2017 |
Place of Publication | Lübeck |
Publication status | Published - 01.10.2016 |