API
Main RDFSolve functionalities for VoID extraction and conversion.
- compose_query_from_paths(paths: list[dict[str, Any]], prefixes: dict[str, str] | None = None, include_types: bool = False, include_labels: bool = True, limit: int = 100, value_bindings: dict[str, list[str]] | None = None) dict[str, Any][source]
Generate a SPARQL query from diagram paths.
This is a pure-Python function- no Flask required. It delegates to
rdfsolve.compose.compose_query_from_paths().- Parameters:
paths – List of path dicts, each with an
edgeslist. Each edge hassource,target,predicate, andis_forward.prefixes – Namespace prefix map (e.g.
{"wp": "http://..."}).include_types – Add
rdf:typeassertions.include_labels – Add
OPTIONAL rdfs:labelclauses.limit – LIMIT for the generated query.
value_bindings – VALUES clause bindings
{var: [uri, ...]}.
- Returns:
Dict with
query(SPARQL string),variable_map(var -> schema URI), andjsonld(SPARQLExecutable JSON-LD).
Example:
>>> from rdfsolve.api import compose_query_from_paths >>> result = compose_query_from_paths( ... paths=[{"edges": [{ ... "source": "http://ex.org/Gene", ... "target": "http://ex.org/Protein", ... "predicate": "http://ex.org/encodes", ... "is_forward": True, ... }]}], ... prefixes={"ex": "http://ex.org/"}, ... ) >>> print(result["query"]) PREFIX ex: <http://ex.org/> ...
- execute_sparql(query: str, endpoint: str, method: str = 'GET', timeout: int = 30, variable_map: dict[str, str] | None = None) dict[str, Any][source]
Execute a SPARQL query against a remote endpoint.
This is a pure-Python function- no Flask required. It delegates to
rdfsolve.query.execute_sparql()which uses the robustSparqlHelperunder the hood.- Parameters:
query – Full SPARQL query string.
endpoint – SPARQL endpoint URL.
method – HTTP method (
"GET"or"POST").timeout – Timeout in seconds.
variable_map – Optional mapping of SPARQL ?variable -> schema URI.
- Returns:
Dict with keys
query,endpoint,variables,rows,variable_map,row_count,duration_ms, and optionallyerror.
Example:
>>> from rdfsolve.api import execute_sparql >>> result = execute_sparql( ... query="SELECT ?s WHERE { ?s a ?o } LIMIT 5", ... endpoint="https://sparql.wikipathways.org/sparql/", ... ) >>> result["row_count"] 5
- graph_to_jsonld(graph: Graph, graph_uris: str | list[str] | None = None, filter_void_admin_nodes: bool = True, endpoint_url: str | None = None, dataset_name: str | None = None) dict[str, Any][source]
Convert a VoID graph to JSON-LD format.
- Parameters:
graph – RDFLib Graph with VoID data
graph_uris – Graph URIs to filter extraction
filter_void_admin_nodes – Remove VoID and administrative nodes
endpoint_url – SPARQL endpoint URL for the @about section
dataset_name – Dataset name for the @about section
- Returns:
JSON-LD with @context, @graph, and @about
- graph_to_linkml(graph: Graph, graph_uris: str | list[str] | None = None, filter_void_nodes: bool = True, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) str[source]
Convert a VoID graph to LinkML YAML schema.
- Parameters:
graph – RDFLib Graph with VoID data
graph_uris – Graph URIs to filter extraction
filter_void_nodes – Remove VoID-specific nodes
schema_name – Name for the schema
schema_description – Description for the schema
schema_base_uri – Base URI for the schema
- Returns:
LinkML YAML schema string
- graph_to_schema(void_graph: Graph, graph_uris: str | list[str] | None = None, filter_void_admin_nodes: bool = True) DataFrame[source]
Convert VoID graph to schema DataFrame.
- Parameters:
void_graph – RDFLib graph with VoID data
graph_uris – Graph URIs to extract
filter_void_admin_nodes – Filter VoID or administrative nodes
- Returns:
DataFrame with schema patterns (subject/property/object URIs)
- graph_to_shacl(graph: Graph, graph_uris: str | list[str] | None = None, filter_void_nodes: bool = True, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None, closed: bool = True, suffix: str | None = None, include_annotations: bool = False) str[source]
Convert a VoID graph to SHACL shapes.
Generates SHACL (Shapes Constraint Language) shapes from a VoID graph. SHACL shapes define constraints on RDF data and can be used for validation.
- Parameters:
graph – RDFLib Graph with VoID data
graph_uris – Graph URIs to filter extraction
filter_void_nodes – Remove VoID-specific nodes
schema_name – Name for the schema
schema_description – Description for the schema
schema_base_uri – Base URI for the schema
closed – Generate closed shapes (only allow defined properties)
suffix – Optional suffix for shape names (e.g., “Shape”)
include_annotations – Include class/slot annotations in shapes
- Returns:
SHACL shapes as Turtle/RDF string
Example
>>> from rdflib import Graph >>> from rdfsolve.api import graph_to_shacl >>> void_graph = Graph() >>> void_graph.parse("dataset_void.ttl", format="turtle") >>> shacl_ttl = graph_to_shacl(void_graph, schema_name="my_dataset")
- import_semra_source(source: str, keep_prefixes: list[str] | None = None, output_dir: str = 'docker/mappings/semra') dict[str, Any][source]
Import mappings from a SeMRA source and write one JSON-LD per prefix.
Delegates to
rdfsolve.semra_converter.import_source().- Parameters:
source – SeMRA source key (e.g.
"biomappings").keep_prefixes – Optional prefix filter.
output_dir – Directory for output files.
- Returns:
Summary dict
{"succeeded", "failed", "skipped"}.
- import_sssom_source(entry: dict[str, Any], output_dir: str = 'docker/mappings/sssom') dict[str, Any][source]
Download and convert one SSSOM source entry to JSON-LD files.
Thin wrapper around
rdfsolve.sssom_importer.import_sssom_source().For each
.sssom.tsvfile found inside the archive atentry["url"], one JSON-LD file is written to output_dir:{source_name}__{sssom_file_stem}.jsonld
- Parameters:
entry – Dict with at least
"name"and"url"keys, as found indata/sssom_sources.yaml.output_dir – Directory to write output JSON-LD files.
- Returns:
Summary dict with keys
"succeeded","failed","skipped".
- infer_mappings(input_paths: list[str], output_path: str, *, inversion: bool = True, transitivity: bool = True, generalisation: bool = False, chain_cutoff: int = 3, dataset_name: str | None = None) dict[str, Any][source]
Run the SeMRA inference pipeline over mapping JSON-LD files.
Thin wrapper around
rdfsolve.inference.infer_mappings(). See that function for full documentation.- Parameters:
input_paths – Paths to input mapping JSON-LD files.
output_path – Path to write the inferenced mapping JSON-LD.
inversion – Apply symmetric inversion.
transitivity – Apply transitive chain inference.
generalisation – Apply generalisation.
chain_cutoff – Max chain length for transitivity.
dataset_name – Override for
@about.dataset_name.
- Returns:
Summary dict with
"input_edges","output_edges","inference_types","output_path".
- load_mapping_jsonld(path: str) dict[str, Any][source]
Load a mapping JSON-LD file from disk.
- Parameters:
path – Path to a
.jsonldfile.- Returns:
Parsed JSON dict.
- load_parser_from_file(void_file_path: str, graph_uris: str | list[str] | None = None, exclude_graphs: bool = True) VoidParser[source]
Load a VoID file and return a parser for schema extraction.
- Parameters:
void_file_path – Path to VoID Turtle file
graph_uris – Graph URIs to filter queries
exclude_graphs – Exclude system graphs
- Returns:
VoidParser instance
- load_parser_from_graph(graph: Graph, graph_uris: str | list[str] | None = None, exclude_graphs: bool = True) VoidParser[source]
Load a VoID graph and return a parser for schema extraction.
- Parameters:
graph – RDFLib Graph with VoID data
graph_uris – Graph URIs to filter queries
exclude_graphs – Exclude system graphs
- Returns:
VoidParser instance
- load_parser_from_jsonld(jsonld_path: str, graph_uris: str | list[str] | None = None, exclude_graphs: bool = True) VoidParser[source]
Load a mined-schema JSON-LD file and return a VoidParser.
Reads the JSON-LD produced by
rdfsolve mine, reconstructs aMinedSchemaviaMinedSchema.from_jsonld(), converts it to an in-memory VoID RDF graph, and wraps it in aVoidParserready for export to CSV / LinkML / SHACL / RDF-config.- Parameters:
jsonld_path – Path to a
*_schema.jsonldfile produced byrdfsolve mine.graph_uris – Graph URIs to filter (passed through to VoidParser).
exclude_graphs – Exclude system graphs.
- Returns:
VoidParser instance backed by the converted VoID graph.
- mine_all_sources(sources_csv: str | None = None, *, sources: str | None = None, output_dir: str = '.', fmt: str = 'all', chunk_size: int = 10000, class_chunk_size: int | None = None, class_batch_size: int = 15, delay: float = 0.5, timeout: float = 120.0, counts: bool = True, reports: bool = True, filter_service_namespaces: bool = True, untyped_as_classes: bool = False, authors: list[dict[str, str]] | None = None, on_progress: Callable[[str, int, int, str | None], None] | None = None) dict[str, Any][source]
Mine schemas for all sources in a JSON-LD or CSV file.
Reads a sources file (JSON-LD preferred, CSV still accepted) and runs
mine_schema()for each entry whose endpoint is non-empty. Results are written to output_dir as{name}_schema.jsonldand / or{name}_void.ttl.Per-source overrides (
chunk_size,class_batch_size,timeout, etc.) in the JSON-LD file take precedence over the function-level defaults.- Parameters:
sources_csv – Deprecated - use sources instead. Path to a CSV file with data sources. Kept for backwards compatibility; ignored when sources is given.
sources – Path to the sources file (JSON-LD or CSV). When
None, the defaultdata/sources.jsonld(or.csvfallback) is used.output_dir – Directory where outputs are written.
fmt – Export format -
"jsonld","void", or"all".chunk_size – Pagination page size for SPARQL queries.
class_chunk_size – Page size for Phase-1 class discovery in two-phase mode.
None= no pagination. Ignored for rows that are not two-phase.class_batch_size – Number of classes per VALUES query in Phase-2 of two-phase mining (default 15).
delay – Delay between paginated pages (seconds).
timeout – HTTP timeout per request (seconds).
counts – Whether to fetch triple-count queries.
reports – Write per-source analytics JSON reports.
filter_service_namespaces – Strip service/system namespace patterns from each mined schema (default
True).untyped_as_classes – Treat untyped URI objects as
owl:Classreferences instead of the genericrdfs:Resourcesentinel (defaultFalse).on_progress – Optional callback invoked after each source is processed. Signature:
(dataset_name, index, total, status_or_error). status_or_error isNoneon success, or an error message string.
- Returns:
Summary dict with keys
"succeeded","failed", and"skipped"mapping to lists of dataset names.
- mine_schema(endpoint_url: str, graph_uris: str | list[str] | None = None, dataset_name: str | None = None, chunk_size: int = 10000, class_chunk_size: int | None = None, class_batch_size: int = 15, delay: float = 0.5, timeout: float = 120.0, counts: bool = True, two_phase: bool = True, report_path: str | None = None, filter_service_namespaces: bool = True, authors: list[dict[str, str]] | None = None) dict[str, Any][source]
Mine RDF schema from a SPARQL endpoint using SELECT queries.
This is a simpler, faster alternative to generate_void_from_endpoint that avoids heavy CONSTRUCT queries. Returns a MinedSchema which can export to JSON-LD or be converted to a VoID graph.
- Parameters:
endpoint_url – SPARQL endpoint URL
graph_uris – Graph URI(s) to restrict queries
dataset_name – Human-readable dataset name
chunk_size – Pagination page size
class_chunk_size – Page size for Phase-1 class discovery (
None= single query, no pagination)class_batch_size – Number of classes to group into one VALUES query in Phase-2 (default 15)
delay – Delay between pages (seconds)
timeout – HTTP timeout per request
counts – Whether to fetch triple counts
two_phase – Use two-phase mining (default
True). PassFalsefor the legacy single-pass strategy.report_path – If given, write analytics JSON to this path
filter_service_namespaces – Strip service/system namespace patterns from the result (default
True)
- Returns:
JSON-LD dict with @context, @graph, and @about
- probe_instance_mapping(prefix: str, sources_csv: str | None = None, *, sources: str | None = None, predicate: str = 'http://www.w3.org/2004/02/skos/core#narrowMatch', dataset_names: list[str] | None = None, timeout: float = 60.0) dict[str, Any][source]
Probe SPARQL endpoints for a bioregistry resource and return JSON-LD.
For every dataset in sources (or the subset in dataset_names), queries the endpoint for RDF classes whose instances match the resource’s known URI prefixes. Generates pairwise
skos:narrowMatchedges (or predicate override) between classes across different datasets and returns the result as a JSON-LD mapping document.The returned dict has the same structure as a mined schema JSON-LD (
@context+@graph+@about) and can be saved directly todocker/schemas/for auto-import on Flask startup.- Parameters:
prefix – Bioregistry prefix, e.g.
"ensembl".sources_csv – Deprecated - use sources instead.
sources – Path to the sources file (JSON-LD or CSV). When
None, auto-detects the default file.predicate – Mapping predicate URI. Defaults to
skos:narrowMatch.dataset_names – Restrict probing to these dataset names.
timeout – SPARQL request timeout in seconds.
- Returns:
JSON-LD
dictwith@context,@graph,@about.- Raises:
ValueError – If prefix is unknown to bioregistry.
- resolve_iris(iris: list[str], endpoints: list[dict[str, Any]], timeout: int = 15) dict[str, Any][source]
Resolve IRIs against SPARQL endpoints to discover their rdf:type.
This is a pure-Python function- no Flask required. It delegates to
rdfsolve.iri.resolve_iris().- Parameters:
iris – List of IRI strings to resolve.
endpoints – List of endpoint dicts, each with keys
name,endpoint, and optionallygraph.timeout – Per-endpoint timeout in seconds.
- Returns:
Dict with keys
resolved,not_found,errors.
Example:
>>> from rdfsolve.api import resolve_iris >>> result = resolve_iris( ... iris=["http://identifiers.org/ncbigene/1234"], ... endpoints=[{ ... "name": "wikipathways", ... "endpoint": "https://sparql.wikipathways.org/sparql/", ... }], ... ) >>> result["resolved"] {...}
- seed_inferenced_mappings(input_dir: str = 'docker/mappings', output_dir: str = 'docker/mappings/inferenced', output_name: str = 'inferenced_mappings', inversion: bool = True, transitivity: bool = True, generalisation: bool = False, chain_cutoff: int = 3) dict[str, Any][source]
Infer over all mappings in input_dir and write to output_dir.
Thin wrapper around
rdfsolve.inference.seed_inferenced_mappings().- Parameters:
input_dir – Directory containing mapping subdirs.
output_dir – Directory for output.
output_name – Stem for the output file.
inversion – Apply inversion inference.
transitivity – Apply transitivity inference.
generalisation – Apply generalisation.
chain_cutoff – Max chain length.
- Returns:
Summary dict from
infer_mappings().
- seed_instance_mappings(prefixes: list[str], sources_csv: str | None = None, *, sources: str | None = None, output_dir: str = 'docker/mappings/instance_matching', predicate: str = 'http://www.w3.org/2004/02/skos/core#narrowMatch', dataset_names: list[str] | None = None, timeout: float = 60.0, skip_existing: bool = False) dict[str, Any][source]
Probe multiple bioregistry resources and write mapping JSON-LD files.
Iterates over prefixes, runs
probe_instance_mapping()for each, and writes the result to{output_dir}/{prefix}_instance_mapping.jsonld.When a file already exists on disk the new probe results are merged into it rather than overwriting it:
New
@graphnodes (source classes not yet in the file) are appended.For existing source nodes, new predicate->target entries are added; duplicates are silently skipped.
uri_formats_queriedin@aboutis unioned.pattern_countandgenerated_atare refreshed.
The default behaviour (
skip_existing=False) is to always probe and merge. Passskip_existing=Trueonly when you explicitly want to skip prefixes whose output file already exists without re-probing.- Parameters:
prefixes – List of bioregistry prefixes to process.
sources_csv – Deprecated - use sources instead.
sources – Path to the sources file (JSON-LD or CSV). When
None, auto-detects the default file.output_dir – Directory where JSON-LD files are written (created if absent).
predicate – Mapping predicate URI.
dataset_names – Restrict probing to these dataset names.
timeout – SPARQL request timeout per request.
skip_existing – If
True, skip prefixes whose output file already exists without re-probing. Defaults toFalse(always probe and merge).
- Returns:
{"succeeded": [...], "failed": [...]}.- Return type:
Summary dict
- seed_semra_mappings(sources: list[str], keep_prefixes: list[str] | None = None, output_dir: str = 'docker/mappings/semra') dict[str, Any][source]
Seed semra mapping files for multiple sources.
Calls
import_semra_source()for each entry in sources and aggregates the results.- Parameters:
sources – List of SeMRA source keys (e.g.
["biomappings", "gilda"]).keep_prefixes – Optional shared prefix filter applied to all sources.
output_dir – Directory for output files.
- Returns:
Aggregated summary with keys
"succeeded","failed","skipped".
- seed_sssom_mappings(sssom_sources_yaml: str = 'data/sssom_sources.yaml', output_dir: str = 'docker/mappings/sssom', names: list[str] | None = None) dict[str, Any][source]
Seed SSSOM mapping files for all (or selected) sources.
Thin wrapper around
rdfsolve.sssom_importer.seed_sssom_mappings().Reads sssom_sources_yaml, optionally filters to names, and calls
import_sssom_source()for each entry.- Parameters:
sssom_sources_yaml – Path to the SSSOM sources YAML file (default:
data/sssom_sources.yaml).output_dir – Directory for output JSON-LD files (default:
docker/mappings/sssom).names – Optional list of source names to restrict processing; if
None(default), all entries are processed.
- Returns:
Aggregated summary with keys
"succeeded","failed","skipped".
- to_jsonld_from_file(void_file_path: str, filter_void_admin_nodes: bool = True, endpoint_url: str | None = None, dataset_name: str | None = None, graph_uris: str | list[str] | None = None) dict[str, Any][source]
Convert a VoID file to JSON-LD format.
- Parameters:
void_file_path – Path to VoID file
filter_void_admin_nodes – Remove VoID and administrative nodes
endpoint_url – SPARQL endpoint URL for the @about section
dataset_name – Dataset name for the @about section
graph_uris – Graph URIs for the @about section
- Returns:
JSON-LD with @context, @graph, and @about
- to_linkml_from_file(void_file_path: str, filter_void_nodes: bool = True, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) str[source]
Convert a VoID file to LinkML YAML schema.
- Parameters:
void_file_path – Path to VoID file
filter_void_nodes – Remove VoID-specific nodes
schema_name – Name for the schema
schema_description – Description for the schema
schema_base_uri – Base URI for the schema
- Returns:
LinkML YAML schema string
- to_rdfconfig_from_file(void_file_path: str, filter_void_nodes: bool = True, endpoint_url: str | None = None, endpoint_name: str | None = None, graph_uri: str | None = None) dict[str, str][source]
Convert a VoID file to RDF-config YAML files.
RDF-config is a schema standard that describes RDF data models using YAML configuration files. This function generates three files: - model.yml: Class and property structure - prefix.yml: Namespace prefix definitions - endpoint.yml: SPARQL endpoint configuration
Note: The rdf-config tool requires these files to be named exactly model.yml, prefix.yml, and endpoint.yml, and placed in a directory named {dataset}_config. The CLI automatically creates this structure.
- Parameters:
void_file_path – Path to VoID file
filter_void_nodes – Remove VoID-specific nodes
endpoint_url – SPARQL endpoint URL (optional)
endpoint_name – Name for endpoint (default: “endpoint”)
graph_uri – Named graph URI (optional)
- Returns:
Dictionary with ‘model’, ‘prefix’, ‘endpoint’ keys containing YAML strings
Example
>>> from rdfsolve.api import to_rdfconfig_from_file >>> rdfconfig = to_rdfconfig_from_file( ... "dataset_void.ttl", ... endpoint_url="https://example.org/sparql", ... graph_uri="http://example.org/graph", ... ) >>> # Save files >>> with open("model.yml", "w") as f: ... f.write(rdfconfig["model"]) >>> with open("prefix.yml", "w") as f: ... f.write(rdfconfig["prefix"]) >>> with open("endpoint.yml", "w") as f: ... f.write(rdfconfig["endpoint"])
- to_shacl_from_file(void_file_path: str, filter_void_nodes: bool = True, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None, closed: bool = True, suffix: str | None = None, include_annotations: bool = False) str[source]
Convert a VoID file to SHACL shapes.
Generates SHACL (Shapes Constraint Language) shapes from a VoID description file. SHACL shapes define constraints on RDF data and can be used for validation.
- Parameters:
void_file_path – Path to VoID file
filter_void_nodes – Remove VoID-specific nodes
schema_name – Name for the schema
schema_description – Description for the schema
schema_base_uri – Base URI for the schema
closed – Generate closed shapes (only allow defined properties)
suffix – Optional suffix for shape names (e.g., “Shape”)
include_annotations – Include class/slot annotations in shapes
- Returns:
SHACL shapes as Turtle/RDF string
Example
>>> from rdfsolve.api import to_shacl_from_file >>> shacl_ttl = to_shacl_from_file( ... "dataset_void.ttl", schema_name="my_dataset", closed=True ... ) >>> with open("schema.shacl.ttl", "w") as f: ... f.write(shacl_ttl)
- to_void_from_file(jsonld_path: str) Graph[source]
Convert a mined-schema JSON-LD file to a VoID RDF graph.
Reads the JSON-LD, reconstructs a
MinedSchema, and returns the equivalent VoID graph (rdflibGraph).- Parameters:
jsonld_path – Path to a
*_schema.jsonldfile.- Returns:
rdflib
Graphcontaining the VoID description.