SPARQL Helper

SPARQL Helper, Centralized SPARQL query execution with automatic fallback.

This module is a SPARQL client that handles: - Automatic GET -> POST fallback for endpoints that require POST - Exponential backoff retry logic for transient failures - Support for SELECT (JSON) and CONSTRUCT (Turtle/N3) queries - HTML error detection in responses - Consistent logging across all SPARQL operations - Support for pagination (limit and offset usage)

Usage:

from rdfsolve.sparql_helper import SparqlHelper

# Create a helper for an endpoint helper = SparqlHelper(”https://sparql.example.org/”)

# Execute SELECT query (returns dict) results = helper.select(“SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10”)

# Execute CONSTRUCT query (returns bytes/string) turtle_data = helper.construct(“CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }”)

# Execute ASK query (returns bool) exists = helper.ask(“ASK { ?s a <http://example.org/Class> }”)

class QueryRecord(query: str, query_type: ~typing.Literal['SELECT', 'CONSTRUCT', 'ASK'], endpoint_url: str, timestamp: str = <factory>, description: str = '', keywords: list[str] = <factory>, success: bool = True)[source]

Bases: object

Record of a SPARQL query execution.

query: str = <dataclasses._MISSING_TYPE object>

query_type: Literal['SELECT', 'CONSTRUCT', 'ASK'] = <dataclasses._MISSING_TYPE object>

endpoint_url: str = <dataclasses._MISSING_TYPE object>

timestamp: str = <dataclasses._MISSING_TYPE object>

description: str = ''

keywords: list[str] = <dataclasses._MISSING_TYPE object>

success: bool = True

query_id() → str[source]: Generate a unique ID for this query based on content hash.

exception SparqlHelperError[source]

Bases: Exception

Base exception for SPARQL helper errors.

exception EndpointError[source]

Bases: SparqlHelperError

Raised when the endpoint returns an error.

exception EndpointTimeoutError[source]

Bases: EndpointError

Raised when the endpoint times out (read / connect).

exception EndpointUnhealthyError[source]

Bases: EndpointError

Raised when the endpoint returns a 200/400 with a non-SPARQL body.

Typical examples: database in recovery mode, backend proxy errors, maintenance pages returned as text/plain or text/html.

exception PaginationTruncatedError(msg: str, offset: int = 0)[source]

Bases: EndpointTimeoutError

Raised by select_chunked when pagination is abandoned mid-stream.

This means some rows were already yielded before the error, so the caller received a partial result set. The offset attribute records where pagination stopped.

Initialize a pagination truncation error.

Parameters:

msg – Error message.
offset – Offset at which pagination stopped.

exception QueryError[source]

Bases: SparqlHelperError

Raised when the query itself is invalid.

class MimeTypes[source]

Bases: object

Standard MIME types for SPARQL protocol.

JSON = 'application/sparql-results+json'

XML = 'application/sparql-results+xml'

TURTLE = 'text/turtle'

N3 = 'text/n3'

NTRIPLES = 'application/n-triples'

RDFXML = 'application/rdf+xml'

JSONLD = 'application/ld+json'

SELECT_ACCEPT = 'application/sparql-results+json, application/sparql-results+xml;q=0.9'

CONSTRUCT_ACCEPT = 'text/turtle, text/n3;q=0.9, application/n-triples;q=0.8, application/rdf+xml;q=0.7'

class SparqlHelper(endpoint_url: str, *, use_post: bool = False, max_retries: int = 10, initial_backoff: float = 1.0, max_backoff: float = 30.0, timeout: float = 10000.0)[source]

Bases: object

Centralized SPARQL query executor with automatic fallback and retry logic.

This class provides: - Automatic GET/POST method fallback when endpoints return HTML/500 errors - Configurable retry with exponential backoff for transient failures - Consistent error handling and logging - Support for SELECT, CONSTRUCT, and ASK queries

Uses standard requests library.

endpoint_url: The SPARQL endpoint URL

use_post: If True, always use POST method (skip GET attempt)

max_retries: Maximum number of retry attempts

initial_backoff: Initial backoff delay in seconds

max_backoff: Maximum backoff delay in seconds

timeout: Request timeout in seconds

Example

>>> helper = SparqlHelper("https://sparql.swisslipids.org/")
>>> results = helper.select("SELECT ?g { GRAPH ?g { ?s ?p ?o } }")
>>> for binding in results["results"]["bindings"]:
...     print(binding["g"]["value"])

Initialize the SPARQL helper.

Parameters:

endpoint_url – SPARQL endpoint URL
use_post – Always use POST (default: False, tries GET first)
max_retries – Maximum retry attempts for transient failures
initial_backoff – Initial delay between retries (seconds)
max_backoff – Maximum delay between retries (seconds)
timeout – Request timeout in seconds (default: 60)

POST_RETRY_PATTERNS = ('html', '500', 'internal', 'error', 'method not allowed')

HTML_MARKERS = ('<!DOCTYPE', '<html', '<HTML', '<!doctype')

RETRY_STATUS_CODES = (500, 502, 503, 504, 429)

COST_LIMIT_PATTERNS: ClassVar[tuple[str, ...]] = ('estimated execution time', 'exceeds the limit', 'query timed out', 'timeout expired', 'execution time limit', 'statement timeout', 'cost limit exceeded')

classmethod enable_query_collection() → None[source]: Enable collection of all executed queries.

classmethod disable_query_collection() → None[source]: Disable query collection.

classmethod get_collected_queries() → list[QueryRecord][source]: Get all collected queries.

classmethod clear_collected_queries() → None[source]: Clear all collected queries.

classmethod export_queries_as_ttl(output_file: str | None = None, base_uri: str = 'https://example.org/sparql-queries/', dataset_name: str = 'dataset') → str[source]

Export collected queries as TTL using SHACL SPARQL representation.

Parameters:

output_file – Optional file path to write TTL
base_uri – Base URI for query IRIs
dataset_name – Name of the dataset for namespacing

Returns:

TTL string with all collected queries

select(query: str, purpose: str = '') → dict[str, Any][source]

Execute a SELECT query and return JSON results.

Parameters:

query – SPARQL SELECT query string.
purpose – Caller context for logs, e.g. "mining/typed-object".

Returns:

Dictionary with SPARQL JSON results format containing "head" and "results" keys.

Raises:

EndpointError – If the endpoint returns an error after all retries.
QueryError – If the query is malformed.

construct(query: str) → str[source]

Execute a CONSTRUCT query and return Turtle RDF data.

Parameters:

query – SPARQL CONSTRUCT query string

Returns:

Turtle-formatted RDF string

Raises:

EndpointError – If the endpoint returns an error after all retries
QueryError – If the query is malformed

construct_graph(query: str) → Graph[source]

Execute a CONSTRUCT query and return an RDFLib Graph.

The CONSTRUCT method internally uses _execute which handles GET->POST fallback automatically when HTML is detected in the response string.

Parameters:

query – SPARQL CONSTRUCT query string

Returns:

RDFLib Graph containing the constructed triples

Raises:

EndpointError – If the endpoint returns an error after all retries
QueryError – If the query is malformed

ask(query: str) → bool[source]

Execute an ASK query and return boolean result.

Parameters:

query – SPARQL ASK query string

Returns:

True if the pattern exists, False otherwise

Raises:

EndpointError – If the endpoint returns an error after all retries
QueryError – If the query is malformed

find_classes_for_uri_pattern(uri_prefix: str) → list[str][source]

Find all rdf:type classes whose instances match uri_prefix.

Tries an IRI-range filter first (index-friendly on most engines):

SELECT DISTINCT ?c
WHERE {
  ?s a ?c .
  FILTER(
    ?s >= <uri_prefix> &&
    ?s <  <uri_prefix_next>
  )
}

The upper-bound uri_prefix_next is derived by incrementing the last character of uri_prefix by one code-point (e.g. "https://bioregistry.io/faldo/" -> "https://bioregistry.io/faldo0" because ord('/') + 1 == ord('0')).

If the incremented character would be illegal inside a SPARQL <…> IRI literal (e.g. = -> >, which closes the IRI), falls back to the safer STRSTARTS filter:

SELECT DISTINCT ?c
WHERE {
  ?s a ?c .
  FILTER(STRSTARTS(STR(?s), "uri_prefix"))
}

Parameters:: uri_prefix – URI prefix string, e.g. "https://identifiers.org/ensembl/".
Returns:: Deduplicated list of class URIs (may be empty).

get_bindings(query: str, purpose: str = '') → list[dict[str, str]][source]

Execute SELECT query and return simplified bindings list.

Convenience method that extracts just the variable values.

Parameters:

query – SPARQL SELECT query string
purpose – Optional tag for log identification

Returns:

List of dicts mapping variable names to their values

Example

>>> bindings = helper.get_bindings("SELECT ?s ?p { ?s ?p ?o }")
>>> for row in bindings:
...     print(row["s"], row["p"])

select_chunked(query_template: str, chunk_size: int = 100, max_total_results: int | None = None, delay_between_chunks: float = 0.5, purpose: str = '') → Any[source]

Execute a SELECT query in chunks using OFFSET/LIMIT pagination.

Uses adaptive pagination: when the endpoint times out, the chunk (LIMIT) is reduced by ~15 % and the same offset is retried after a cooldown pause. The chunk size will never shrink below 60 % of the original value (i.e. a maximum cumulative reduction of ~40 %). Up to 3 consecutive shrinks are attempted per offset before giving up on that page.

After a successful fetch with a reduced chunk size, the smaller size is kept for subsequent pages (the endpoint is consistently slow).

Parameters:

query_template – SPARQL query with {offset} and {limit} placeholders.
chunk_size – Initial number of results per chunk.
max_total_results – Cap on total results (None = all).
delay_between_chunks – Polite pause between pages (seconds).
purpose – Caller context for log messages.

Yields:

List of bindings (dicts) from each chunk.

static prepare_paginated_query(base_query: str) → str[source]

Prepare a SPARQL query for use with select_chunked by escaping braces.

SPARQL queries contain curly braces {} which conflict with Python’s str.format() used for pagination placeholders. This method: 1. Escapes all existing braces ({{ and }}) 2. Appends OFFSET {offset} and LIMIT {limit} placeholders

Parameters:: base_query – SPARQL query WITHOUT OFFSET/LIMIT clauses. Should be a complete query ready to execute.
Returns:: Query template safe for use with str.format(offset=N, limit=M)

Example

>>> query = "SELECT ?s WHERE { ?s a ?class }"
>>> template = SparqlHelper.prepare_paginated_query(query)
>>> # template is now safe for: template.format(offset=0, limit=100)
>>> for bindings in helper.select_chunked(template):
...     process(bindings)

static escape_sparql_for_format(query: str) → str[source]

Escape SPARQL braces so the query can be used with str.format().

This is useful when you need to add your own placeholders to a query that contains SPARQL curly braces.

Parameters:: query – SPARQL query with literal curly braces
Returns:: Query with braces doubled for .format() compatibility

Example

>>> q = "SELECT ?s WHERE { ?s a <{class_uri}> }"  # Won't work!
>>> # Instead:
>>> q = SparqlHelper.escape_sparql_for_format(
...     "SELECT ?s WHERE { ?s a <CLASS_PLACEHOLDER> }"
... )
>>> q = q.replace("CLASS_PLACEHOLDER", "{class_uri}")

close() → None[source]: Close the underlying requests session.

sparql_select(endpoint_url: str, query: str, use_post: bool = False, purpose: str = '') → dict[str, Any][source]

Execute a one-off SELECT query.

Convenience function when you don’t need to reuse the helper.

Parameters:

endpoint_url – SPARQL endpoint URL
query – SPARQL SELECT query
use_post – Force POST method
purpose – Optional tag for log identification

Returns:

SPARQL JSON results

sparql_construct(endpoint_url: str, query: str, use_post: bool = False) → Graph[source]

Execute a one-off CONSTRUCT query.

Convenience function when you don’t need to reuse the helper.

Parameters:

endpoint_url – SPARQL endpoint URL
query – SPARQL CONSTRUCT query
use_post – Force POST method

Returns:

RDFLib Graph with constructed triples