Inference

SeMRA-powered inference pipeline for rdfsolve mappings.

Takes one or more mapping JSON-LD files, converts their edges to semra.Mapping objects, applies the requested inference operations (inversion, transitivity/chain, generalisation), deduplicates via semra.api.assemble_evidences, and writes the result as an InferencedMapping JSON-LD file.

Main entry-point

infer_mappings() - full pipeline. seed_inferenced_mappings() - convenience wrapper for CLI/scripts.

infer_mappings(input_paths: list[str], output_path: str, *, inversion: bool = True, transitivity: bool = True, generalisation: bool = False, chain_cutoff: int = 3, dataset_name: str | None = None) dict[str, Any][source]

Run the inference pipeline over a set of mapping JSON-LD files.

Loads all mapping edges from input_paths, converts them to semra Mappings, applies the chosen inference operations, deduplicates via semra.api.assemble_evidences, converts back to rdfsolve edges, and writes an InferencedMapping JSON-LD to output_path.

Parameters:
  • input_paths – Paths to input mapping JSON-LD files.

  • output_path – Path to write the inferenced mapping JSON-LD.

  • inversion – Apply symmetric inversion of every mapping.

  • transitivity – Apply transitive chain inference.

  • generalisation – Apply generalisation (broader/narrower).

  • chain_cutoff – Max chain length for transitivity inference.

  • dataset_name – Override for the @about.dataset_name field.

Returns:

Summary dict with keys "input_edges", "output_edges", "inference_types", "output_path".

seed_inferenced_mappings(input_dir: str = 'docker/mappings', output_dir: str = 'docker/mappings/inferenced', output_name: str = 'inferenced_mappings', inversion: bool = True, transitivity: bool = True, generalisation: bool = False, chain_cutoff: int = 3) dict[str, Any][source]

Infer over all mappings in input_dir and write to output_dir.

Collects all *.jsonld files under input_dir (instance_matching/, semra/, and sssom/ subdirs), runs infer_mappings(), and writes {output_dir}/{output_name}.jsonld.

This is the convenience entry-point for the CLI and seed scripts.

Parameters:
  • input_dir – Directory that contains mapping subdirs.

  • output_dir – Directory to write inferenced output.

  • output_name – Stem for the output file (without .jsonld).

  • inversion – Apply inversion inference.

  • transitivity – Apply transitivity inference.

  • generalisation – Apply generalisation inference.

  • chain_cutoff – Max chain length for transitivity.

Returns:

Summary from infer_mappings().