Mapping Models

Mapping models - public API re-exports.

Core

Core mapping models: MappingEdge, InstanceMatchResult, Mapping.

Base class and helpers for all mapping types.

class MappingEdge(*, source_class: str, target_class: str, predicate: str = 'http://www.w3.org/2004/02/skos/core#narrowMatch', source_dataset: str, target_dataset: str, source_endpoint: str | None = None, target_endpoint: str | None = None, confidence: Annotated[float | None, Ge(ge=0), Le(le=1)] = None)[source]

Bases: BaseModel

A single mapping edge between two classes.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

source_class: str
target_class: str
predicate: str
source_dataset: str
target_dataset: str
source_endpoint: str | None
target_endpoint: str | None
confidence: float | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class InstanceMatchResult(*, dataset_name: str, endpoint_url: str, uri_format: str, matched_class: str | None = None)[source]

Bases: BaseModel

Raw result of probing one URI format against one endpoint.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

dataset_name: str
endpoint_url: str
uri_format: str
matched_class: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class Mapping(*, edges: list[MappingEdge] = <factory>, about: AboutMetadata, mapping_type: str = 'unknown')[source]

Bases: BaseModel

Container for a set of mapping edges with provenance.

Base class for all mapping types.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

edges: list[MappingEdge]
about: AboutMetadata
mapping_type: str
classmethod from_jsonld(path: str | Path) Mapping[source]

Reconstruct from a mapping JSON-LD file.

Inverse of to_jsonld(). Expands CURIEs using the file’s own @context block.

to_networkx() Any[source]

Export the mapping as an nx.MultiDiGraph.

classmethod dataset_graph(paths: Iterable[str | Path], class_to_datasets: dict[str, set[str]], *, base_graph: Any | None = None, strategies: Collection[str] | None = None) Any[source]

Stream mapping files into a weighted dataset-pair graph.

For every mapping edge whose both endpoint classes appear in class_to_datasets, increment the weight of the (dataset_a, dataset_b) pair in the output graph.

to_jsonld() dict[str, Any][source]

Export as JSON-LD with @context, @graph, @about.

Edges are grouped by source_class.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Instance Mapping

InstanceMapping - instance-based matching.

class InstanceMapping(*, edges: list[MappingEdge] = <factory>, about: AboutMetadata, mapping_type: str = 'instance_matcher', resource_prefix: str, uri_formats: list[str] = <factory>, match_results: list[InstanceMatchResult] = <factory>)[source]

Bases: Mapping

Mapping generated by instance-based matching.

Probes SPARQL endpoints for instances matching bioregistry URI patterns to discover which classes across different datasets represent the same kind of entity.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

mapping_type: str
resource_prefix: str
uri_formats: list[str]
match_results: list[InstanceMatchResult]
to_jsonld() dict[str, Any][source]

Extend base JSON-LD with instance-matcher provenance.

classmethod from_bioregistry_resource(prefix: str, datasources: Any, predicate: str = 'http://www.w3.org/2004/02/skos/core#narrowMatch', dataset_names: list[str] | None = None, timeout: float = 60.0) InstanceMapping[source]

Probe all endpoints for a bioregistry resource.

Parameters:
  • prefix – Bioregistry prefix (e.g. "ensembl").

  • datasources – DataFrame with columns [dataset_name, endpoint_url].

  • predicate – Mapping predicate URI.

  • dataset_names – Optional subset of datasets to query.

  • timeout – SPARQL request timeout in seconds.

Returns:

InstanceMapping ready for export.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

merge_instance_jsonld(existing: dict[str, Any], new: dict[str, Any]) dict[str, Any][source]

Merge new instance-mapping JSON-LD into existing.

Merges:

  • @context - union of all prefix->namespace entries.

  • @graph - nodes keyed by @id; predicate targets are merged (duplicates skipped).

  • @about - uri_formats_queried is unioned; pattern_count recomputed; generated_at refreshed.

Returns:

The mutated existing dict (also returned for convenience).

SSSOM Mapping

SsomMapping - SSSOM-derived mappings.

class SsomMapping(*, edges: list[MappingEdge] = <factory>, about: AboutMetadata, mapping_type: str = 'sssom_import', source_name: str, sssom_file: str, mapping_set_id: str | None = None, mapping_set_title: str | None = None, license: str | None = None, curie_map: dict[str, str]=<factory>)[source]

Bases: Mapping

Mapping imported from an SSSOM source.

Each instance corresponds to one .sssom.tsv file extracted from an SSSOM bundle (e.g. the EBI OLS SSSOM archive).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

mapping_type: str
source_name: str
sssom_file: str
mapping_set_id: str | None
mapping_set_title: str | None
license: str | None
curie_map: dict[str, str]
to_jsonld() dict[str, Any][source]

Extend base JSON-LD with SSSOM provenance.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

SemraMapping - SeMRA-derived mappings.

class SemraMapping(*, edges: list[MappingEdge] = <factory>, about: AboutMetadata, mapping_type: str = 'semra_import', source_name: str, source_prefix: str | None = None, evidence_chain: list[dict[str, ~typing.Any]]=<factory>)[source]

Bases: Mapping

Mapping imported from a SeMRA external source.

Carries the semra source key (e.g. "biomappings") and, for per-prefix sources, the bioregistry prefix.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

mapping_type: str
source_name: str
source_prefix: str | None
evidence_chain: list[dict[str, Any]]
to_jsonld() dict[str, Any][source]

Extend base JSON-LD with SeMRA provenance.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Inference Mapping

InferencedMapping - inference pipeline mappings.

class InferencedMapping(*, edges: list[MappingEdge] = <factory>, about: AboutMetadata, mapping_type: str = 'inferenced', inference_types: list[str] = <factory>, source_mapping_files: list[str] = <factory>, evidence_chain: list[dict[str, ~typing.Any]]=<factory>, stats: dict[str, ~typing.Any]=<factory>)[source]

Bases: Mapping

Mapping produced by the rdfsolve/SeMRA inference pipeline.

Carries the set of inference types applied, source mapping files, evidence chain, and optional aggregate stats.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

mapping_type: str
inference_types: list[str]
source_mapping_files: list[str]
evidence_chain: list[dict[str, Any]]
stats: dict[str, Any]
to_jsonld() dict[str, Any][source]

Extend base JSON-LD with inference provenance.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].