PUBLICATIONS
Towards Efficient Query Processing over Heterogeneous RDF Interfaces
Since the proposal of RDF as a standard for representing statements about entities, diverse interfaces to publish and strategies to query RDF data have been proposed. Although some recent proposals are aware of the advantages and disadvantages of state-of-the-art approaches, no work has yet tried to integrate them into a hybrid system that exploits their, in many cases, complementary strengths to process queries more efficiently than each of these approaches could do individually. In this paper, we present an approach that exploits the diverse characteristics of queryable RDF interfaces to efficiently process SPARQL queries; we present a brief study of the characteristics of some of the most popular RDF interfaces (brTPF and SPARQL endpoints); a method to estimate the impact of using a particular interface on query evaluation, and a method to use multiple interfaces to efficiently process a query. Our experiments using a well-known benchmark dataset and a large number of queries, with answer sizes varying from 1 up to 1 million, show execution time gains of up to three orders of magnitude and data transfer reduction of up to four orders of magnitude.EXPERIMENTS
Experiments were performed using a setup of up to 32 clients performing around 200 queries concurrently. We tested using just a brTPF client, SPARQL endpoint and hybridSE. The used queries can be downloaded in the Downloads section. In total we had the following parameters, which were changed during experiments:hybridSE, 4 clients, threshold = 50
The following example shows the throughput of all three approaches (#queries per minutes). The full results for4c_false_50
can be downloaded here.hybridSE, 4 clients, threshold = 50, reverse proxy
The following example shows the average execution time of all three approaches (ms). The full results for4c_false_rp_50
can be downloaded here.hybridSE, 4 clients, count bytes, threshold = 50
The following example shows the number of transferred bytes between the client and server for all three approaches. The full results for4c_true_50
can be downloaded here.hybridSE, 16 clients, threshold = 50
The following example shows the average number of calls to the TPF server for brTPF and hybridSE. The full results for16c_false_50
can be downloaded here.hybridSE, 16 clients, threshold = 50, reverse proxy
The following example shows the total number of timeouts for all three approaches. The full results for16c_false_rp_50
can be downloaded here.hybridSE, 16 clients, count bytes, threshold = 50
The following example shows the number of triples transferred from the TPF server for brTPF and hybridSE. The full results for16c_true_50
can be downloaded here.hybridSE, 32 clients, threshold = 50
The following example shows the throughput of all three approaches (#queries per minutes). The full results for32c_false_50
can be downloaded here.Threshold, 4 clients
The following example shows the average number of calls to the endpoint for all three thresholds. The full results forthreshold_4c
can be downloaded here.Threshold, 4 clients, reverse proxy
The following example shows the throughput of all three thresholds (#queries per minutes). The full results forthreshold_4c_rp
can be downloaded here.Threshold, 16 clients
The following example shows the average number of triples transferred between the TPF server and client for all three thresholds. The full results forthreshold_16c
can be downloaded here.Threshold, 16 clients, reverse proxy
The following example shows the average server time of all three thresholds (ms). The full results forthreshold_16c_rp
can be downloaded here.Threshold, 32 clients
The following example shows the average execution time for all three thresholds (ms). The full results forthreshold_32c
can be downloaded here.DOWNLOADS
HybridTPFEngine.zip
file or view sources on GitHub.queries
for the experimental setup.