Schema Models

Schema models - public API re-exports.

Core

Core schema models: SchemaPattern, AboutMetadata, MinedSchema.

These are the primary data structures for mined RDF schemas.

class SchemaPattern(*, subject_class: str, property_uri: str, object_class: str, count: Annotated[int | None, Ge(ge=0)] = None, datatype: str | None = None, subject_label: str | None = None, property_label: str | None = None, object_label: str | None = None)[source]

Bases: BaseModel

A single schema pattern: subject_class -> property -> object.

Captures three kinds of relationships:

typed-object: ?s a ?sc . ?s ?p ?o . ?o a ?oc
literal: ?s a ?sc . ?s ?p ?o . FILTER(isLiteral(?o))
untyped-uri: ?s a ?sc . ?s ?p ?o . FILTER(isURI(?o))

This model is shared contract between SchemaMiner (direct SPARQL) and VoidParser (VoID-based extraction).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

subject_class: str

property_uri: str

object_class: str

count: int | None

datatype: str | None

subject_label: str | None

property_label: str | None

object_label: str | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AboutMetadata(*, generated_by: str, generated_at: str, endpoint: str | None = None, dataset_name: str | None = None, graph_uris: list[str] | None = None, pattern_count: Annotated[int, Ge(ge=0)] = 0, strategy: str = 'unknown', rdfsolve_version: str | None = None, qlever_version: dict[str, str] | None = None, started_at: str | None = None, finished_at: str | None = None, total_duration_s: Annotated[float | None, Ge(ge=0)] = None, authors: list[dict[str, str]] | None = None, schema_uri: str | None = None, void_uri: str | None = None, report_uri: str | None = None, linkml_uri: str | None = None, **extra_data: Any)[source]

Bases: BaseModel

Provenance metadata attached to every schema export.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

generated_by: str

generated_at: str

endpoint: str | None

dataset_name: str | None

graph_uris: list[str] | None

pattern_count: int

strategy: str

rdfsolve_version: str | None

qlever_version: dict[str, str] | None

started_at: str | None

finished_at: str | None

total_duration_s: float | None

authors: list[dict[str, str]] | None

schema_uri: str | None

void_uri: str | None

report_uri: str | None

linkml_uri: str | None

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

static build(endpoint: str | None = None, dataset_name: str | None = None, graph_uris: list[str] | None = None, pattern_count: int = 0, strategy: str = 'unknown', started_at: str | None = None, finished_at: str | None = None, total_duration_s: float | None = None, authors: list[dict[str, str]] | None = None, qlever_version: dict[str, str] | None = None) → AboutMetadata[source]: Create metadata with auto-populated version + timestamp.

class MinedSchema(*, patterns: list[SchemaPattern] = <factory>, about: AboutMetadata)[source]

Bases: BaseModel

Complete mined schema: patterns + provenance.

Primary export format is JSON-LD. Can also be converted to a VoID RDF graph for downstream conversion to LinkML / SHACL / RDF-config via VoidParser.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

patterns: list[SchemaPattern]

about: AboutMetadata

filter_service_namespaces(extra_prefixes: list[str] | None = None) → MinedSchema[source]

Return a copy without service/system patterns.

A pattern is removed when any of its subject_class, property_uri, or object_class starts with a prefix listed in SERVICE_NAMESPACE_PREFIXES (or extra_prefixes).

get_classes() → list[str][source]: Return sorted unique subject/object class URIs.

get_properties() → list[str][source]: Return sorted unique property URIs.

classmethod from_dict(raw: dict[str, Any]) → MinedSchema[source]

Reconstruct from a JSON-LD dict (e.g. returned by to_jsonld()).

Inverse of to_jsonld(). Expands CURIEs using the dict’s own @context block.

classmethod from_jsonld(path: str | Path) → MinedSchema[source]

Reconstruct from a *_schema.jsonld file.

Convenience wrapper around from_dict() that reads and parses the file first.

to_networkx() → Any[source]

Export as a typed-object nx.MultiDiGraph.

Nodes are class URIs. Each typed-object pattern becomes a directed edge. Literal/Resource sentinels are excluded.

to_jsonld() → dict[str, Any][source]

Export schema as JSON-LD with @context, @graph, @about.

The @graph groups triples by subject class. Labels are exported in a top-level _labels map keyed by CURIE.

to_void_graph() → Any[source]

Build an rdflib VoID Graph from the mined patterns.

Allows feeding the result into VoidParser for downstream conversion to LinkML, SHACL, RDF-config, etc.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

LinkML Converter

LinkML schema generation from JSON-LD.

Converts a rdfsolve JSON-LD schema dict (@context + @graph) into a LinkML SchemaDefinition or its YAML serialisation.

make_valid_linkml_name(uri_or_curie: str) → str[source]

Convert a URI or CURIE to a valid LinkML identifier.

LinkML identifiers must start with a letter and contain only letters, digits, and underscores.

Examples:

"aopo:KeyEvent"           -> "aopo_KeyEvent"
"edam.data1025"           -> "edam_data_1025"
"http://example.org/Cls"  -> prefix_Cls  (via bioregistry)

to_linkml(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) → SchemaDefinition[source]

Generate a LinkML SchemaDefinition from a JSON-LD dict.

Parameters:

jsonld – JSON-LD document with @context, @graph, and optionally _labels.
schema_name – Name for the schema (also used as default prefix).
schema_description – Human-readable description.
schema_base_uri – Base URI; defaults to https://w3id.org/{schema_name}/.

Return type:

SchemaDefinition

to_linkml_yaml(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None) → str[source]

Return the LinkML schema as a YAML string.

Parameters are the same as to_linkml().

SHACL Converter

SHACL shape generation from JSON-LD.

Converts a rdfsolve JSON-LD schema dict to SHACL Turtle via the LinkML -> ShaclGenerator pipeline.

to_shacl(jsonld: dict[str, Any], *, schema_name: str | None = None, schema_description: str | None = None, schema_base_uri: str | None = None, closed: bool = True, suffix: str | None = None, include_annotations: bool = False) → str[source]

Generate SHACL shapes (Turtle) from a JSON-LD schema dict.

Parameters:

jsonld – JSON-LD document (@context, @graph, …).
schema_name – Name for the underlying LinkML schema.
schema_description – Human-readable description.
schema_base_uri – Base URI for the schema.
closed – If True, produce closed SHACL shapes (sh:closed true).
suffix – Suffix appended to every shape name (e.g. "Shape" → PersonShape).
include_annotations – If True, carry annotations through to shapes.

Returns:

SHACL shapes serialised as Turtle.

Return type:

str

RDF-Config Converter

RDF-config YAML generation from JSON-LD.

Converts a rdfsolve JSON-LD schema dict to the three YAML files expected by the rdf-config tool: model.yaml, prefix.yaml, endpoint.yaml.

to_rdfconfig(jsonld: dict[str, Any], *, endpoint_url: str | None = None, endpoint_name: str | None = None, graph_uri: str | None = None) → dict[str, str][source]

Generate RDF-config YAML files from a JSON-LD schema dict.

Parameters:

jsonld – JSON-LD document (@context, @graph, …).
endpoint_url – SPARQL endpoint URL for endpoint.yaml.
endpoint_name – Label for the endpoint (defaults to "endpoint").
graph_uri – Optional named-graph URI for endpoint.yaml.

Returns:

Keys model, prefix, endpoint -> YAML strings.

Return type:

dict

Report

Mining analytics report models.

class QueryStats(*, sent: Annotated[int, Ge(ge=0)] = 0, failed: Annotated[int, Ge(ge=0)] = 0, total_time_s: Annotated[float, Ge(ge=0)] = 0.0)[source]

Bases: BaseModel

Cumulative statistics for one query category.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

sent: int

failed: int

total_time_s: float

model_config = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class OneShotQueryResult(*, query_type: str, success: bool, duration_s: Annotated[float | None, Ge(ge=0)] = None, row_count: Annotated[int | None, Ge(ge=0)] = None, error: str | None = None)[source]

Bases: BaseModel

Outcome of a single unbounded SELECT against a SPARQL endpoint.

Used to record the raw performance of an unguarded one-shot query so it can be compared against the fallback-chain result.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

query_type: str

success: bool

duration_s: float | None

row_count: int | None

error: str | None

model_config = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class PhaseReport(*, name: str, started_at: str | None = None, finished_at: str | None = None, duration_s: Annotated[float | None, Ge(ge=0)] = None, items_discovered: Annotated[int, Ge(ge=0)] = 0, error: str | None = None)[source]

Bases: BaseModel

Timing and outcome for one mining phase.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

name: str

started_at: str | None

finished_at: str | None

duration_s: float | None

items_discovered: int

error: str | None

model_config = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class MiningReport(*, dataset_name: str | None = None, endpoint_url: str, graph_uris: list[str] | None = None, strategy: str = 'unknown', rdfsolve_version: str, python_version: str, qlever_version: dict[str, str] | None=None, started_at: str, finished_at: str | None = None, total_duration_s: Annotated[float | None, ~annotated_types.Ge(ge=0)] = None, query_stats: dict[str, ~rdfsolve.schema_models.report.QueryStats]=<factory>, total_queries_sent: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, total_queries_failed: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, phases: list[PhaseReport] = <factory>, abort_reason: str | None = None, pattern_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, class_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, property_count: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, unique_uris_labelled: Annotated[int, ~annotated_types.Ge(ge=0)] = 0, config: dict[str, ~typing.Any]=<factory>, machine: dict[str, ~typing.Any] | None=None, benchmark: dict[str, ~typing.Any] | None=None, one_shot_results: list[OneShotQueryResult] | None = None, authors: list[dict[str, str]] | None=None, dataset_metadata: dict[str, ~typing.Any] | None=None, report_uri: str | None = None, **extra_data: Any)[source]

Bases: BaseModel

Analytical metadata collected during a mining run.

Designed to be written to disk incrementally (after each phase completes) so that partial data is preserved even if mining crashes midway.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

dataset_name: str | None

endpoint_url: str

graph_uris: list[str] | None

strategy: str

rdfsolve_version: str

python_version: str

qlever_version: dict[str, str] | None

started_at: str

finished_at: str | None

total_duration_s: float | None

query_stats: dict[str, QueryStats]

total_queries_sent: int

total_queries_failed: int

phases: list[PhaseReport]

abort_reason: str | None

pattern_count: int

class_count: int

property_count: int

unique_uris_labelled: int

config: dict[str, Any]

machine: dict[str, Any] | None

benchmark: dict[str, Any] | None

one_shot_results: list[OneShotQueryResult] | None

authors: list[dict[str, str]] | None

dataset_metadata: dict[str, Any] | None

report_uri: str | None

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].