SPARQL Helper
SPARQL Helper, Centralized SPARQL query execution with automatic fallback.
This module is a SPARQL client that handles: - Automatic GET -> POST fallback for endpoints that require POST - Exponential backoff retry logic for transient failures - Support for SELECT (JSON) and CONSTRUCT (Turtle/N3) queries - HTML error detection in responses - Consistent logging across all SPARQL operations - Support for pagination (limit and offset usage)
- Usage:
from rdfsolve.sparql_helper import SparqlHelper
# Create a helper for an endpoint helper = SparqlHelper(”https://sparql.example.org/”)
# Execute SELECT query (returns dict) results = helper.select(“SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10”)
# Execute CONSTRUCT query (returns bytes/string) turtle_data = helper.construct(“CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }”)
# Execute ASK query (returns bool) exists = helper.ask(“ASK { ?s a <http://example.org/Class> }”)
- class QueryRecord(query: str, query_type: ~typing.Literal['SELECT', 'CONSTRUCT', 'ASK'], endpoint_url: str, timestamp: str = <factory>, description: str = '', keywords: list[str] = <factory>, success: bool = True)[source]
Bases:
objectRecord of a SPARQL query execution.
- exception EndpointError[source]
Bases:
SparqlHelperErrorRaised when the endpoint returns an error.
- exception EndpointTimeoutError[source]
Bases:
EndpointErrorRaised when the endpoint times out (read / connect).
- exception EndpointUnhealthyError[source]
Bases:
EndpointErrorRaised when the endpoint returns a 200/400 with a non-SPARQL body.
Typical examples: database in recovery mode, backend proxy errors, maintenance pages returned as
text/plainortext/html.
- exception PaginationTruncatedError(msg: str, offset: int = 0)[source]
Bases:
EndpointTimeoutErrorRaised by select_chunked when pagination is abandoned mid-stream.
This means some rows were already yielded before the error, so the caller received a partial result set. The
offsetattribute records where pagination stopped.Initialize a pagination truncation error.
- Parameters:
msg – Error message.
offset – Offset at which pagination stopped.
- exception QueryError[source]
Bases:
SparqlHelperErrorRaised when the query itself is invalid.
- class MimeTypes[source]
Bases:
objectStandard MIME types for SPARQL protocol.
- JSON = 'application/sparql-results+json'
- XML = 'application/sparql-results+xml'
- TURTLE = 'text/turtle'
- N3 = 'text/n3'
- NTRIPLES = 'application/n-triples'
- RDFXML = 'application/rdf+xml'
- JSONLD = 'application/ld+json'
- SELECT_ACCEPT = 'application/sparql-results+json, application/sparql-results+xml;q=0.9'
- CONSTRUCT_ACCEPT = 'text/turtle, text/n3;q=0.9, application/n-triples;q=0.8, application/rdf+xml;q=0.7'
- class SparqlHelper(endpoint_url: str, *, use_post: bool = False, max_retries: int = 10, initial_backoff: float = 1.0, max_backoff: float = 30.0, timeout: float = 10000.0)[source]
Bases:
objectCentralized SPARQL query executor with automatic fallback and retry logic.
This class provides: - Automatic GET/POST method fallback when endpoints return HTML/500 errors - Configurable retry with exponential backoff for transient failures - Consistent error handling and logging - Support for SELECT, CONSTRUCT, and ASK queries
Uses standard requests library.
- endpoint_url
The SPARQL endpoint URL
- use_post
If True, always use POST method (skip GET attempt)
- max_retries
Maximum number of retry attempts
- initial_backoff
Initial backoff delay in seconds
- max_backoff
Maximum backoff delay in seconds
- timeout
Request timeout in seconds
Example
>>> helper = SparqlHelper("https://sparql.swisslipids.org/") >>> results = helper.select("SELECT ?g { GRAPH ?g { ?s ?p ?o } }") >>> for binding in results["results"]["bindings"]: ... print(binding["g"]["value"])
Initialize the SPARQL helper.
- Parameters:
endpoint_url – SPARQL endpoint URL
use_post – Always use POST (default: False, tries GET first)
max_retries – Maximum retry attempts for transient failures
initial_backoff – Initial delay between retries (seconds)
max_backoff – Maximum delay between retries (seconds)
timeout – Request timeout in seconds (default: 60)
- POST_RETRY_PATTERNS = ('html', '500', 'internal', 'error', 'method not allowed')
- HTML_MARKERS = ('<!DOCTYPE', '<html', '<HTML', '<!doctype')
- RETRY_STATUS_CODES = (500, 502, 503, 504, 429)
- COST_LIMIT_PATTERNS: ClassVar[tuple[str, ...]] = ('estimated execution time', 'exceeds the limit', 'query timed out', 'timeout expired', 'execution time limit', 'statement timeout', 'cost limit exceeded')
- classmethod get_collected_queries() list[QueryRecord][source]
Get all collected queries.
- classmethod export_queries_as_ttl(output_file: str | None = None, base_uri: str = 'https://example.org/sparql-queries/', dataset_name: str = 'dataset') str[source]
Export collected queries as TTL using SHACL SPARQL representation.
- Parameters:
output_file – Optional file path to write TTL
base_uri – Base URI for query IRIs
dataset_name – Name of the dataset for namespacing
- Returns:
TTL string with all collected queries
- select(query: str, purpose: str = '') dict[str, Any][source]
Execute a SELECT query and return JSON results.
- Parameters:
query – SPARQL SELECT query string.
purpose – Caller context for logs, e.g.
"mining/typed-object".
- Returns:
Dictionary with SPARQL JSON results format containing
"head"and"results"keys.- Raises:
EndpointError – If the endpoint returns an error after all retries.
QueryError – If the query is malformed.
- construct(query: str) str[source]
Execute a CONSTRUCT query and return Turtle RDF data.
- Parameters:
query – SPARQL CONSTRUCT query string
- Returns:
Turtle-formatted RDF string
- Raises:
EndpointError – If the endpoint returns an error after all retries
QueryError – If the query is malformed
- construct_graph(query: str) Graph[source]
Execute a CONSTRUCT query and return an RDFLib Graph.
The CONSTRUCT method internally uses _execute which handles GET->POST fallback automatically when HTML is detected in the response string.
- Parameters:
query – SPARQL CONSTRUCT query string
- Returns:
RDFLib Graph containing the constructed triples
- Raises:
EndpointError – If the endpoint returns an error after all retries
QueryError – If the query is malformed
- ask(query: str) bool[source]
Execute an ASK query and return boolean result.
- Parameters:
query – SPARQL ASK query string
- Returns:
True if the pattern exists, False otherwise
- Raises:
EndpointError – If the endpoint returns an error after all retries
QueryError – If the query is malformed
- find_classes_for_uri_pattern(uri_prefix: str) list[str][source]
Find all
rdf:typeclasses whose instances match uri_prefix.Tries an IRI-range filter first (index-friendly on most engines):
SELECT DISTINCT ?c WHERE { ?s a ?c . FILTER( ?s >= <uri_prefix> && ?s < <uri_prefix_next> ) }The upper-bound
uri_prefix_nextis derived by incrementing the last character of uri_prefix by one code-point (e.g."https://bioregistry.io/faldo/"->"https://bioregistry.io/faldo0"becauseord('/') + 1 == ord('0')).If the incremented character would be illegal inside a SPARQL
<…>IRI literal (e.g.=->>, which closes the IRI), falls back to the saferSTRSTARTSfilter:SELECT DISTINCT ?c WHERE { ?s a ?c . FILTER(STRSTARTS(STR(?s), "uri_prefix")) }- Parameters:
uri_prefix – URI prefix string, e.g.
"https://identifiers.org/ensembl/".- Returns:
Deduplicated list of class URIs (may be empty).
- get_bindings(query: str, purpose: str = '') list[dict[str, str]][source]
Execute SELECT query and return simplified bindings list.
Convenience method that extracts just the variable values.
- Parameters:
query – SPARQL SELECT query string
purpose – Optional tag for log identification
- Returns:
List of dicts mapping variable names to their values
Example
>>> bindings = helper.get_bindings("SELECT ?s ?p { ?s ?p ?o }") >>> for row in bindings: ... print(row["s"], row["p"])
- select_chunked(query_template: str, chunk_size: int = 100, max_total_results: int | None = None, delay_between_chunks: float = 0.5, purpose: str = '') Any[source]
Execute a SELECT query in chunks using OFFSET/LIMIT pagination.
Uses adaptive pagination: when the endpoint times out, the chunk (LIMIT) is reduced by ~15 % and the same offset is retried after a cooldown pause. The chunk size will never shrink below 60 % of the original value (i.e. a maximum cumulative reduction of ~40 %). Up to 3 consecutive shrinks are attempted per offset before giving up on that page.
After a successful fetch with a reduced chunk size, the smaller size is kept for subsequent pages (the endpoint is consistently slow).
- Parameters:
query_template – SPARQL query with
{offset}and{limit}placeholders.chunk_size – Initial number of results per chunk.
max_total_results – Cap on total results (
None= all).delay_between_chunks – Polite pause between pages (seconds).
purpose – Caller context for log messages.
- Yields:
List of bindings (dicts) from each chunk.
- static prepare_paginated_query(base_query: str) str[source]
Prepare a SPARQL query for use with select_chunked by escaping braces.
SPARQL queries contain curly braces {} which conflict with Python’s str.format() used for pagination placeholders. This method: 1. Escapes all existing braces ({{ and }}) 2. Appends OFFSET {offset} and LIMIT {limit} placeholders
- Parameters:
base_query – SPARQL query WITHOUT OFFSET/LIMIT clauses. Should be a complete query ready to execute.
- Returns:
Query template safe for use with str.format(offset=N, limit=M)
Example
>>> query = "SELECT ?s WHERE { ?s a ?class }" >>> template = SparqlHelper.prepare_paginated_query(query) >>> # template is now safe for: template.format(offset=0, limit=100) >>> for bindings in helper.select_chunked(template): ... process(bindings)
- static escape_sparql_for_format(query: str) str[source]
Escape SPARQL braces so the query can be used with str.format().
This is useful when you need to add your own placeholders to a query that contains SPARQL curly braces.
- Parameters:
query – SPARQL query with literal curly braces
- Returns:
Query with braces doubled for .format() compatibility
Example
>>> q = "SELECT ?s WHERE { ?s a <{class_uri}> }" # Won't work! >>> # Instead: >>> q = SparqlHelper.escape_sparql_for_format( ... "SELECT ?s WHERE { ?s a <CLASS_PLACEHOLDER> }" ... ) >>> q = q.replace("CLASS_PLACEHOLDER", "{class_uri}")
- sparql_select(endpoint_url: str, query: str, use_post: bool = False, purpose: str = '') dict[str, Any][source]
Execute a one-off SELECT query.
Convenience function when you don’t need to reuse the helper.
- Parameters:
endpoint_url – SPARQL endpoint URL
query – SPARQL SELECT query
use_post – Force POST method
purpose – Optional tag for log identification
- Returns:
SPARQL JSON results
- sparql_construct(endpoint_url: str, query: str, use_post: bool = False) Graph[source]
Execute a one-off CONSTRUCT query.
Convenience function when you don’t need to reuse the helper.
- Parameters:
endpoint_url – SPARQL endpoint URL
query – SPARQL CONSTRUCT query
use_post – Force POST method
- Returns:
RDFLib Graph with constructed triples