Schema Models

Schema models - public API re-exports.

Core

Core schema models: SchemaPattern, AboutMetadata, MinedSchema.

These are the primary data structures for mined RDF schemas.

class SchemaPattern(*, subject_class: str, property_uri: str, object_class: str, count: Annotated[int | None, Ge(ge=0)] = None, datatype: str | None = None, subject_label: str | None = None, property_label: str | None = None, object_label: str | None = None)[source]

Bases: BaseModel

A single schema pattern: subject_class -> property -> object.

Captures three kinds of relationships:

  • typed-object: ?s a ?sc . ?s ?p ?o . ?o a ?oc

  • literal: ?s a ?sc . ?s ?p ?o . FILTER(isLiteral(?o))

  • untyped-uri: ?s a ?sc . ?s ?p ?o . FILTER(isURI(?o))

This model is shared contract between SchemaMiner (direct SPARQL) and VoidParser (VoID-based extraction).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

subject_class: str
property_uri: str
object_class: str
count: int | None
datatype: str | None
subject_label: str | None
property_label: str | None
object_label: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AboutMetadata(*, generated_by: str, generated_at: str, endpoint: str | None = None, dataset_name: str | None = None, graph_uris: list[str] | None = None, pattern_count: Annotated[int, Ge(ge=0)] = 0, strategy: str = 'unknown', rdfsolve_version: str | None = None, qlever_version: dict[str, str] | None = None, started_at: str | None = None, finished_at: str | None = None, total_duration_s: Annotated[float | None, Ge(ge=0)] = None, authors: list[dict[str, str]] | None = None, schema_uri: str | None = None, void_uri: str | None = None, report_uri: str | None = None, linkml_uri: str | None = None, **extra_data: Any)[source]

Bases: BaseModel

Provenance metadata attached to every schema export.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

generated_by: str
generated_at: str
endpoint: str | None
dataset_name: str | None
graph_uris: list[str] | None
pattern_count: int
strategy: str
rdfsolve_version: str | None
qlever_version: dict[str, str] | None
started_at: str | None
finished_at: str | None
total_duration_s: float | None
authors: list[dict[str, str]] | None
schema_uri: str | None
void_uri: str | None
report_uri: str | None
linkml_uri: str | None
model_config = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

static build(endpoint: str | None = None, dataset_name: str | None = None, graph_uris: list[str] | None = None, pattern_count: int = 0, strategy: str = 'unknown', started_at: str | None = None, finished_at: str | None = None, total_duration_s: float | None = None, authors: list[dict[str, str]] | None = None, qlever_version: dict[str, str] | None = None) AboutMetadata[source]

Create metadata with auto-populated version + timestamp.

class MinedSchema(*, patterns: list[SchemaPattern] = <factory>, about: AboutMetadata)[source]

Bases: BaseModel

Complete mined schema: patterns + provenance.

Primary export format is JSON-LD. Can also be converted to a VoID RDF graph for downstream conversion to LinkML / SHACL / RDF-config via VoidParser.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

patterns: list[SchemaPattern]
about: AboutMetadata
filter_service_namespaces(extra_prefixes: list[str] | None = None) MinedSchema[source]

Return a copy without service/system patterns.

A pattern is removed when any of its subject_class, property_uri, or object_class starts with a prefix listed in SERVICE_NAMESPACE_PREFIXES (or extra_prefixes).

get_classes() list[str][source]

Return sorted unique subject/object class URIs.

get_properties() list[str][source]

Return sorted unique property URIs.

classmethod from_dict(raw: dict[str, Any]) MinedSchema[source]

Reconstruct from a JSON-LD dict (e.g. returned by to_jsonld()).

Inverse of to_jsonld(). Expands CURIEs using the dict’s own @context block.

classmethod from_jsonld(path: str | Path) MinedSchema[source]

Reconstruct from a *_schema.jsonld file.

Convenience wrapper around from_dict() that reads and parses the file first.

to_networkx() Any[source]

Export as a typed-object nx.MultiDiGraph.

Nodes are class URIs. Each typed-object pattern becomes a directed edge. Literal/Resource sentinels are excluded.

to_jsonld() dict[str, Any][source]

Export schema as JSON-LD with @context, @graph, @about.

The @graph groups triples by subject class. Labels are exported in a top-level _labels map keyed by CURIE.

to_void_graph() Any[source]

Build an rdflib VoID Graph from the mined patterns.

Allows feeding the result into VoidParser for downstream conversion to LinkML, SHACL, RDF-config, etc.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

LinkML Converter

LinkML schema generation from JSON-LD.

Converts a rdfsolve JSON-LD schema dict (@context + @graph) into a LinkML SchemaDefinition or its YAML serialisation.

make_valid_linkml_name(uri_or_curie: str) str[source]

Convert a URI or CURIE to a valid LinkML identifier.

LinkML identifiers must start with a letter and contain only letters, digits, and underscores.

Examples:

"aopo:KeyEvent"           -> "aopo_KeyEvent"
"edam.data1025"           -> "edam_data_1025"
"http://example.org/Cls"  -> prefix_Cls  (via bioregistry)
to_linkml(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) SchemaDefinition[source]

Generate a LinkML SchemaDefinition from a JSON-LD dict.

Parameters:
  • jsonld – JSON-LD document with @context, @graph, and optionally _labels.

  • schema_name – Name for the schema (also used as default prefix).

  • schema_description – Human-readable description.

  • schema_base_uri – Base URI; defaults to https://w3id.org/{schema_name}/.

Return type:

SchemaDefinition

to_linkml_yaml(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) str[source]

Return the LinkML schema as a YAML string.

Parameters are the same as to_linkml().

SHACL Converter

SHACL shape generation from JSON-LD.

Converts a rdfsolve JSON-LD schema dict to SHACL Turtle via the LinkML -> ShaclGenerator pipeline.

to_shacl(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None, closed: bool = True, suffix: str | None = None, include_annotations: bool = False) str[source]

Generate SHACL shapes (Turtle) from a JSON-LD schema dict.

Parameters:
  • jsonld – JSON-LD document (@context, @graph, …).

  • schema_name – Name for the underlying LinkML schema.

  • schema_description – Human-readable description.

  • schema_base_uri – Base URI for the schema.

  • closed – If True, produce closed SHACL shapes (sh:closed true).

  • suffix – Suffix appended to every shape name (e.g. "Shape"PersonShape).

  • include_annotations – If True, carry annotations through to shapes.

Returns:

SHACL shapes serialised as Turtle.

Return type:

str

RDF-Config Converter

RDF-config YAML generation from JSON-LD.

Converts a rdfsolve JSON-LD schema dict to the three YAML files expected by the rdf-config tool: model.yaml, prefix.yaml, endpoint.yaml.

to_rdfconfig(jsonld: dict[str, Any], *, endpoint_url: str | None = None, endpoint_name: str | None = None, graph_uri: str | None = None) dict[str, str][source]

Generate RDF-config YAML files from a JSON-LD schema dict.

Parameters:
  • jsonld – JSON-LD document (@context, @graph, …).

  • endpoint_url – SPARQL endpoint URL for endpoint.yaml.

  • endpoint_name – Label for the endpoint (defaults to "endpoint").

  • graph_uri – Optional named-graph URI for endpoint.yaml.

Returns:

Keys model, prefix, endpoint -> YAML strings.

Return type:

dict

Report

Mining analytics report models.

class QueryStats(*, sent: Annotated[int, Ge(ge=0)] = 0, failed: Annotated[int, Ge(ge=0)] = 0, total_time_s: Annotated[float, Ge(ge=0)] = 0.0)[source]

Bases: BaseModel

Cumulative statistics for one query category.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

sent: int
failed: int
total_time_s: float
model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class OneShotQueryResult(*, query_type: str, success: bool, duration_s: Annotated[float | None, Ge(ge=0)] = None, row_count: Annotated[int | None, Ge(ge=0)] = None, error: str | None = None)[source]

Bases: BaseModel

Outcome of a single unbounded SELECT against a SPARQL endpoint.

Used to record the raw performance of an unguarded one-shot query so it can be compared against the fallback-chain result.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

query_type: str
success: bool
duration_s: float | None
row_count: int | None
error: str | None
model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class PhaseReport(*, name: str, started_at: str | None = None, finished_at: str | None = None, duration_s: Annotated[float | None, Ge(ge=0)] = None, items_discovered: Annotated[int, Ge(ge=0)] = 0, error: str | None = None)[source]

Bases: BaseModel

Timing and outcome for one mining phase.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

name: str
started_at: str | None
finished_at: str | None
duration_s: float | None
items_discovered: int
error: str | None
model_config = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class MiningReport(*, dataset_name: str | None = None, endpoint_url: str, graph_uris: list[str] | None = None, strategy: str = 'unknown', rdfsolve_version: str, python_version: str, qlever_version: dict[str, str] | None=None, started_at: str, finished_at: str | None = None, total_duration_s: Annotated[float | None, ~annotated_types.Ge(ge=0)] = None, query_stats: dict[str, ~rdfsolve.schema_models.report.QueryStats]=<factory>, total_queries_sent: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, total_queries_failed: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, phases: list[PhaseReport] = <factory>, abort_reason: str | None = None, pattern_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, class_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, property_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, unique_uris_labelled: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, config: dict[str, ~typing.Any]=<factory>, machine: dict[str, ~typing.Any] | None=None, benchmark: dict[str, ~typing.Any] | None=None, one_shot_results: list[OneShotQueryResult] | None = None, authors: list[dict[str, str]] | None=None, dataset_metadata: dict[str, ~typing.Any] | None=None, report_uri: str | None = None, **extra_data: Any)[source]

Bases: BaseModel

Analytical metadata collected during a mining run.

Designed to be written to disk incrementally (after each phase completes) so that partial data is preserved even if mining crashes midway.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

dataset_name: str | None
endpoint_url: str
graph_uris: list[str] | None
strategy: str
rdfsolve_version: str
python_version: str
qlever_version: dict[str, str] | None
started_at: str
finished_at: str | None
total_duration_s: float | None
query_stats: dict[str, QueryStats]
total_queries_sent: int
total_queries_failed: int
phases: list[PhaseReport]
abort_reason: str | None
pattern_count: int
class_count: int
property_count: int
unique_uris_labelled: int
config: dict[str, Any]
machine: dict[str, Any] | None
benchmark: dict[str, Any] | None
one_shot_results: list[OneShotQueryResult] | None
authors: list[dict[str, str]] | None
dataset_metadata: dict[str, Any] | None
report_uri: str | None
model_config = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].