Schema Models
Schema models - public API re-exports.
Core
Core schema models: SchemaPattern, AboutMetadata, MinedSchema.
These are the primary data structures for mined RDF schemas.
- class SchemaPattern(*, subject_class: str, property_uri: str, object_class: str, count: Annotated[int | None, Ge(ge=0)] = None, datatype: str | None = None, subject_label: str | None = None, property_label: str | None = None, object_label: str | None = None)[source]
Bases:
BaseModelA single schema pattern: subject_class -> property -> object.
Captures three kinds of relationships:
typed-object:
?s a ?sc . ?s ?p ?o . ?o a ?ocliteral:
?s a ?sc . ?s ?p ?o . FILTER(isLiteral(?o))untyped-uri:
?s a ?sc . ?s ?p ?o . FILTER(isURI(?o))
This model is shared contract between SchemaMiner (direct SPARQL) and VoidParser (VoID-based extraction).
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AboutMetadata(*, generated_by: str, generated_at: str, endpoint: str | None = None, dataset_name: str | None = None, graph_uris: list[str] | None = None, pattern_count: Annotated[int, Ge(ge=0)] = 0, strategy: str = 'unknown', rdfsolve_version: str | None = None, qlever_version: dict[str, str] | None = None, started_at: str | None = None, finished_at: str | None = None, total_duration_s: Annotated[float | None, Ge(ge=0)] = None, authors: list[dict[str, str]] | None = None, schema_uri: str | None = None, void_uri: str | None = None, report_uri: str | None = None, linkml_uri: str | None = None, **extra_data: Any)[source]
Bases:
BaseModelProvenance metadata attached to every schema export.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- static build(endpoint: str | None = None, dataset_name: str | None = None, graph_uris: list[str] | None = None, pattern_count: int = 0, strategy: str = 'unknown', started_at: str | None = None, finished_at: str | None = None, total_duration_s: float | None = None, authors: list[dict[str, str]] | None = None, qlever_version: dict[str, str] | None = None) AboutMetadata[source]
Create metadata with auto-populated version + timestamp.
- class MinedSchema(*, patterns: list[SchemaPattern] = <factory>, about: AboutMetadata)[source]
Bases:
BaseModelComplete mined schema: patterns + provenance.
Primary export format is JSON-LD. Can also be converted to a VoID RDF graph for downstream conversion to LinkML / SHACL / RDF-config via VoidParser.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- patterns: list[SchemaPattern]
- about: AboutMetadata
- filter_service_namespaces(extra_prefixes: list[str] | None = None) MinedSchema[source]
Return a copy without service/system patterns.
A pattern is removed when any of its
subject_class,property_uri, orobject_classstarts with a prefix listed inSERVICE_NAMESPACE_PREFIXES(or extra_prefixes).
- classmethod from_dict(raw: dict[str, Any]) MinedSchema[source]
Reconstruct from a JSON-LD dict (e.g. returned by
to_jsonld()).Inverse of
to_jsonld(). Expands CURIEs using the dict’s own@contextblock.
- classmethod from_jsonld(path: str | Path) MinedSchema[source]
Reconstruct from a
*_schema.jsonldfile.Convenience wrapper around
from_dict()that reads and parses the file first.
- to_networkx() Any[source]
Export as a typed-object
nx.MultiDiGraph.Nodes are class URIs. Each typed-object pattern becomes a directed edge. Literal/Resource sentinels are excluded.
- to_jsonld() dict[str, Any][source]
Export schema as JSON-LD with @context, @graph, @about.
The @graph groups triples by subject class. Labels are exported in a top-level
_labelsmap keyed by CURIE.
- to_void_graph() Any[source]
Build an rdflib VoID Graph from the mined patterns.
Allows feeding the result into VoidParser for downstream conversion to LinkML, SHACL, RDF-config, etc.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
LinkML Converter
LinkML schema generation from JSON-LD.
Converts a rdfsolve JSON-LD schema dict (@context + @graph)
into a LinkML SchemaDefinition or its YAML serialisation.
- make_valid_linkml_name(uri_or_curie: str) str[source]
Convert a URI or CURIE to a valid LinkML identifier.
LinkML identifiers must start with a letter and contain only letters, digits, and underscores.
Examples:
"aopo:KeyEvent" -> "aopo_KeyEvent" "edam.data1025" -> "edam_data_1025" "http://example.org/Cls" -> prefix_Cls (via bioregistry)
- to_linkml(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) SchemaDefinition[source]
Generate a LinkML
SchemaDefinitionfrom a JSON-LD dict.- Parameters:
jsonld – JSON-LD document with
@context,@graph, and optionally_labels.schema_name – Name for the schema (also used as default prefix).
schema_description – Human-readable description.
schema_base_uri – Base URI; defaults to
https://w3id.org/{schema_name}/.
- Return type:
SchemaDefinition
SHACL Converter
SHACL shape generation from JSON-LD.
Converts a rdfsolve JSON-LD schema dict to SHACL Turtle via the LinkML -> ShaclGenerator pipeline.
- to_shacl(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None, closed: bool = True, suffix: str | None = None, include_annotations: bool = False) str[source]
Generate SHACL shapes (Turtle) from a JSON-LD schema dict.
- Parameters:
jsonld – JSON-LD document (
@context,@graph, …).schema_name – Name for the underlying LinkML schema.
schema_description – Human-readable description.
schema_base_uri – Base URI for the schema.
closed – If True, produce closed SHACL shapes (
sh:closed true).suffix – Suffix appended to every shape name (e.g.
"Shape"→PersonShape).include_annotations – If True, carry annotations through to shapes.
- Returns:
SHACL shapes serialised as Turtle.
- Return type:
RDF-Config Converter
RDF-config YAML generation from JSON-LD.
Converts a rdfsolve JSON-LD schema dict to the three YAML files
expected by the rdf-config
tool: model.yaml, prefix.yaml, endpoint.yaml.
- to_rdfconfig(jsonld: dict[str, Any], *, endpoint_url: str | None = None, endpoint_name: str | None = None, graph_uri: str | None = None) dict[str, str][source]
Generate RDF-config YAML files from a JSON-LD schema dict.
- Parameters:
jsonld – JSON-LD document (
@context,@graph, …).endpoint_url – SPARQL endpoint URL for
endpoint.yaml.endpoint_name – Label for the endpoint (defaults to
"endpoint").graph_uri – Optional named-graph URI for
endpoint.yaml.
- Returns:
Keys
model,prefix,endpoint-> YAML strings.- Return type:
Report
Mining analytics report models.
- class QueryStats(*, sent: Annotated[int, Ge(ge=0)] = 0, failed: Annotated[int, Ge(ge=0)] = 0, total_time_s: Annotated[float, Ge(ge=0)] = 0.0)[source]
Bases:
BaseModelCumulative statistics for one query category.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class OneShotQueryResult(*, query_type: str, success: bool, duration_s: Annotated[float | None, Ge(ge=0)] = None, row_count: Annotated[int | None, Ge(ge=0)] = None, error: str | None = None)[source]
Bases:
BaseModelOutcome of a single unbounded SELECT against a SPARQL endpoint.
Used to record the raw performance of an unguarded one-shot query so it can be compared against the fallback-chain result.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class PhaseReport(*, name: str, started_at: str | None = None, finished_at: str | None = None, duration_s: Annotated[float | None, Ge(ge=0)] = None, items_discovered: Annotated[int, Ge(ge=0)] = 0, error: str | None = None)[source]
Bases:
BaseModelTiming and outcome for one mining phase.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class MiningReport(*, dataset_name: str | None = None, endpoint_url: str, graph_uris: list[str] | None = None, strategy: str = 'unknown', rdfsolve_version: str, python_version: str, qlever_version: dict[str, str] | None=None, started_at: str, finished_at: str | None = None, total_duration_s: Annotated[float | None, ~annotated_types.Ge(ge=0)] = None, query_stats: dict[str, ~rdfsolve.schema_models.report.QueryStats]=<factory>, total_queries_sent: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, total_queries_failed: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, phases: list[PhaseReport] = <factory>, abort_reason: str | None = None, pattern_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, class_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, property_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, unique_uris_labelled: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, config: dict[str, ~typing.Any]=<factory>, machine: dict[str, ~typing.Any] | None=None, benchmark: dict[str, ~typing.Any] | None=None, one_shot_results: list[OneShotQueryResult] | None = None, authors: list[dict[str, str]] | None=None, dataset_metadata: dict[str, ~typing.Any] | None=None, report_uri: str | None = None, **extra_data: Any)[source]
Bases:
BaseModelAnalytical metadata collected during a mining run.
Designed to be written to disk incrementally (after each phase completes) so that partial data is preserved even if mining crashes midway.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- query_stats: dict[str, QueryStats]
- phases: list[PhaseReport]
- one_shot_results: list[OneShotQueryResult] | None
- model_config = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].