PUBLICATIONS

Towards Efficient Query Processing over Heterogeneous RDF Interfaces

Paper in DeSemWeb2018. Download
Authors: Gabriela Montoya, Christian Aebeloe and Katja Hose.
Since the proposal of RDF as a standard for representing statements about entities, diverse interfaces to publish and strategies to query RDF data have been proposed. Although some recent proposals are aware of the advantages and disadvantages of state-of-the-art approaches, no work has yet tried to integrate them into a hybrid system that exploits their, in many cases, complementary strengths to process queries more efficiently than each of these approaches could do individually. In this paper, we present an approach that exploits the diverse characteristics of queryable RDF interfaces to efficiently process SPARQL queries; we present a brief study of the characteristics of some of the most popular RDF interfaces (brTPF and SPARQL endpoints); a method to estimate the impact of using a particular interface on query evaluation, and a method to use multiple interfaces to efficiently process a query. Our experiments using a well-known benchmark dataset and a large number of queries, with answer sizes varying from 1 up to 1 million, show execution time gains of up to three orders of magnitude and data transfer reduction of up to four orders of magnitude.

EXPERIMENTS

Experiments were performed using a setup of up to 32 clients performing around 200 queries concurrently. We tested using just a brTPF client, SPARQL endpoint and hybridSE. The used queries can be downloaded in the Downloads section. In total we had the following parameters, which were changed during experiments: We present example results in this section for each experiment. Full results can be downloaded in the Downloads section, and during this section, in form of PDF-files containing plots for the results. In some plots, to ease the readability, we have divided the values on the Y-axis with 10x where x is an appropriate power.

hybridSE, 4 clients, threshold = 50

The following example shows the throughput of all three approaches (#queries per minutes). The full results for 4c_false_50 can be downloaded here.

hybridSE, 4 clients, threshold = 50, reverse proxy

The following example shows the average execution time of all three approaches (ms). The full results for 4c_false_rp_50 can be downloaded here.

hybridSE, 4 clients, count bytes, threshold = 50

The following example shows the number of transferred bytes between the client and server for all three approaches. The full results for 4c_true_50 can be downloaded here.

hybridSE, 16 clients, threshold = 50

The following example shows the average number of calls to the TPF server for brTPF and hybridSE. The full results for 16c_false_50 can be downloaded here.

hybridSE, 16 clients, threshold = 50, reverse proxy

The following example shows the total number of timeouts for all three approaches. The full results for 16c_false_rp_50 can be downloaded here.

hybridSE, 16 clients, count bytes, threshold = 50

The following example shows the number of triples transferred from the TPF server for brTPF and hybridSE. The full results for 16c_true_50 can be downloaded here.

hybridSE, 32 clients, threshold = 50

The following example shows the throughput of all three approaches (#queries per minutes). The full results for 32c_false_50 can be downloaded here.

Threshold, 4 clients

The following example shows the average number of calls to the endpoint for all three thresholds. The full results for threshold_4c can be downloaded here.

Threshold, 4 clients, reverse proxy

The following example shows the throughput of all three thresholds (#queries per minutes). The full results for threshold_4c_rp can be downloaded here.

Threshold, 16 clients

The following example shows the average number of triples transferred between the TPF server and client for all three thresholds. The full results for threshold_16c can be downloaded here.

Threshold, 16 clients, reverse proxy

The following example shows the average server time of all three thresholds (ms). The full results for threshold_16c_rp can be downloaded here.

Threshold, 32 clients

The following example shows the average execution time for all three thresholds (ms). The full results for threshold_32c can be downloaded here.

DOWNLOADS