Abstract
: In this paper we present and discuss our efforts to accelerate a sample application by using the Streaming SIMD Extensions (SSE) to the x64 instruction set. Several approaches to their integration into the source code are tested and evaluated against each other. They are assembler intrinsics, the initial source code combined with different compiler flags, and enhanced code for better SSE inference. Their performances are compared to benchmarks from two hybrid computing systems, which use a Field Programmable Gate Array (FPGA) and a Graphics Processing Unit (GPU), respectively. As the interfaces to manipulated/accelerated code sections are the same in all cases, comparability always is maintained.
Original language | English |
---|---|
Pages | 131-140 |
Number of pages | 10 |
Publication status | Published - 2011 |
Event | 24th PARS - Workshop on Parallel Systems and Algorithms - Rüschlikon, Switzerland Duration: 26.05.2011 → 27.05.2011 |
Conference
Conference | 24th PARS - Workshop on Parallel Systems and Algorithms |
---|---|
Country/Territory | Switzerland |
City | Rüschlikon |
Period | 26.05.11 → 27.05.11 |