Sources
Load data-source definitions from data/sources.yaml.
The canonical source registry is a YAML file containing a flat list of mappings, one per SPARQL data source. Each mapping carries:
name - unique human-readable identifier.
endpoint - SPARQL endpoint URL.
graph_uris - named graphs to query.
use_graph - whether to wrap queries in a
GRAPHclause.two_phase - use two-phase mining (default
True).Optional tuning knobs: chunk_size, class_batch_size, class_chunk_size, timeout, delay, counts, unsafe_paging.
Legacy CSV files (data/sources.csv) and JSON-LD files are still
accepted: the reader auto-detects the format by extension.
Typical usage:
from rdfsolve.sources import load_sources
for src in load_sources("data/sources.yaml"):
print(src["name"], src["endpoint"])
- load_sources(path: str | Path | None = None) list[SourceEntry][source]
Load data-source definitions from a YAML, JSON-LD, or CSV file.
- Parameters:
path – Path to the sources file. When
Nonethe defaultdata/sources.yaml(or.jsonld/.csvfallback) is used.- Returns:
One dict per data source, keys normalised to snake_case. Sources without an
endpointare included (callers may skip them).- Return type:
- load_sources_dataframe(path: str | Path | None = None) DataFrame[source]
Load sources and return a
DataFrame.The DataFrame has columns compatible with
probe_resource():dataset_name,endpoint_url,graph_uri,use_graph,void_iri.- Parameters:
path – Path to the sources file.
None= auto-detect default.