Command Line Interface

rdfsolve automatically installs the command rdfsolve. See rdfsolve --help for usage details.

rdfsolve

RDFSolve - RDF Schema Extraction, Export and Analysis Toolkit.

Pipeline commands (schema mining):

rdfsolve pipeline mine         Mine schemas from remote endpoints
rdfsolve pipeline local-mine   Mine from a local QLever endpoint
rdfsolve pipeline qleverfile   Generate Qleverfiles for QLever

Analysis commands:

rdfsolve export    Interconvert schemas between JSON-LD, LinkML, SHACL, CSV, ...

Mapping commands:

rdfsolve instance-match   Cross-dataset class matching
rdfsolve semra            Import external SeMRA mappings
rdfsolve inference        Derive new mappings from existing ones

Usage

rdfsolve [OPTIONS] COMMAND [ARGS]...

Options

--version: Show the version and exit.

-v, --verbose: Enable verbose logging

export

Convert RDF schemas between formats.

All subcommands accept a VoID Turtle (.ttl) or rdfsolve JSON-LD (.jsonld) file as INPUT and produce the requested output format. The input format is auto-detected from the file extension.

Supported conversions (any-to-any via the internal model):

VoID (.ttl)      <->  JSON-LD (.jsonld)
JSON-LD (.jsonld) ->  CSV, LinkML, SHACL, RDF-config, VoID

Subcommands:

csv        Export schema patterns as a CSV table
jsonld     Export schema as JSON-LD
void       Export schema as VoID Turtle
linkml     Export schema as LinkML YAML
shacl      Export schema as SHACL shapes (Turtle)
rdfconfig  Export schema as RDF-config YAML files

Examples:

rdfsolve export csv       dataset_schema.jsonld
rdfsolve export linkml    dataset_void.ttl -o ./out
rdfsolve export shacl     dataset_schema.jsonld --closed
rdfsolve export rdfconfig dataset_void.ttl --endpoint-url http://...
rdfsolve export void      dataset_schema.jsonld

Usage

rdfsolve export [OPTIONS] COMMAND [ARGS]...

csv

Export schema patterns as a CSV table.

Example:

rdfsolve export csv dataset_schema.jsonld -o ./exports

Usage

rdfsolve export csv [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>: Output directory.

Arguments

INPUT: Required argument

jsonld

Export schema as JSON-LD.

Useful for converting a VoID Turtle file into the rdfsolve JSON-LD format. If the input is already JSON-LD, it is re-serialised (which can be used to refresh @about metadata).

Example:

rdfsolve export jsonld dataset_void.ttl -o ./exports

Usage

rdfsolve export jsonld [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>: Output directory.

--endpoint-url <endpoint_url>: SPARQL endpoint URL for @about.

--graph-uri <graph_uri>: Named graph URI for @about.

Arguments

INPUT: Required argument

linkml

Export schema as LinkML YAML.

Generates a LinkML schema definition for data modelling, validation, and code generation.

Example:

rdfsolve export linkml dataset_schema.jsonld --schema-name myds

Usage

rdfsolve export linkml [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>: Output directory.

--schema-name <schema_name>: Schema name (default: from filename).

--schema-description <schema_description>: Schema description.

--schema-uri <schema_uri>: Base URI for the schema (e.g. http://example.org/schemas/myschema).

Arguments

INPUT: Required argument

rdfconfig

Export schema as RDF-config YAML files.

RDF-config is a schema standard for describing RDF data models. Produces a {dataset}_config/ directory with model.yaml, prefix.yaml, and optionally endpoint.yaml.

Example:

rdfsolve export rdfconfig dataset_void.ttl \
    --endpoint-url https://example.org/sparql

Usage

rdfsolve export rdfconfig [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>: Output directory.

--endpoint-url <endpoint_url>: SPARQL endpoint URL (generates endpoint.yaml when provided).

--endpoint-name <endpoint_name>: Endpoint name (default: ‘endpoint’).

--graph-uri <graph_uri>: Named graph URI.

Arguments

INPUT: Required argument

shacl

Export schema as SHACL shapes (Turtle).

SHACL shapes validate RDF data against the extracted schema. Use –closed (default) for strict validation or –open for flexible.

Examples:

rdfsolve export shacl dataset_schema.jsonld --open
rdfsolve export shacl dataset_schema.jsonld --suffix Shape

Usage

rdfsolve export shacl [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>: Output directory.

--schema-name <schema_name>: Schema name (default: from filename).

--schema-description <schema_description>: Schema description.

--schema-uri <schema_uri>: Base URI for the schema.

--closed, --open: Generate closed shapes (default) or open shapes.

--suffix <suffix>: Suffix for shape names (e.g. ‘Shape’ -> PersonShape).

Arguments

INPUT: Required argument

void

Export schema as VoID Turtle.

Converts a JSON-LD schema back to VoID RDF (Turtle). Also works with VoID input (round-trip).

Example:

rdfsolve export void dataset_schema.jsonld -o ./exports

Usage

rdfsolve export void [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>: Output directory.

Arguments

INPUT: Required argument

inference

Derive new mappings from existing ones.

Uses SeMRA inference operations (inversion, transitivity, generalisation) to expand a set of mapping JSON-LD files.

Typical workflow:

rdfsolve inference run --input file1.jsonld file2.jsonld \
    --output docker/mappings/inferenced/inferred.jsonld
rdfsolve inference seed

Usage

rdfsolve inference [OPTIONS] COMMAND [ARGS]...

run

Infer new mappings from the given input files.

Usage

rdfsolve inference run [OPTIONS]

Options

-i, --input <input_paths>: Required Input mapping JSON-LD file (repeatable).

-o, --output <output_path>: Required Output JSON-LD file path.

--no-inversion: Disable inversion inference.

--no-transitivity: Disable transitivity (chain) inference.

--generalisation: Enable generalisation inference (off by default).

--chain-cutoff <chain_cutoff>

Maximum chain length for transitivity.

Default:: 3

--name <dataset_name>: Override @about.dataset_name in the output.

seed

Infer over all mappings found under INPUT_DIR.

Usage

rdfsolve inference seed [OPTIONS]

Options

--input-dir <input_dir>

Directory containing instance_matching/ and semra/ subdirs.

Default:: 'docker/mappings'

--output-dir <output_dir>

Directory for the inferenced output.

Default:: 'docker/mappings/inferenced'

--name <output_name>

Output file stem (without .jsonld).

Default:: 'inferenced_mappings'

--no-inversion: Disable inversion inference.

--no-transitivity: Disable transitivity inference.

--generalisation: Enable generalisation inference.

--chain-cutoff <chain_cutoff>

Maximum chain length for transitivity.

Default:: 3

instance-match

Instance-based matching: discover cross-dataset class links.

Probes SPARQL endpoints for classes whose instances match bioregistry URI patterns and writes skos:narrowMatch mapping JSON-LD files.

Typical workflow:

rdfsolve instance-match probe --prefix ensembl -o ensembl_mapping.jsonld
rdfsolve instance-match seed  --prefixes ensembl uniprot chebi

Usage

rdfsolve instance-match [OPTIONS] COMMAND [ARGS]...

probe

Probe endpoints for a single bioregistry resource.

Queries every endpoint in SOURCES for RDF classes whose instances match the URI patterns registered in bioregistry for PREFIX and emits a JSON-LD mapping document.

Usage

rdfsolve instance-match probe [OPTIONS]

Options

-p, --prefix <prefix>: Required Bioregistry prefix to probe (e.g. ‘ensembl’).

--sources <sources>: Path to sources file (JSON-LD or CSV). Default: auto-detect data/sources.jsonld.

--predicate <predicate>

Mapping predicate URI.

Default:: 'http://www.w3.org/2004/02/skos/core#narrowMatch'

-d, --dataset <datasets>: Restrict to this dataset name (repeatable).

--timeout <timeout>

SPARQL request timeout in seconds.

Default:: 60.0

-o, --output <output>: Write JSON-LD to this file (default: stdout).

seed

Seed mapping files for multiple bioregistry resources.

Writes {PREFIX}_instance_mapping.jsonld to OUTPUT_DIR for each supplied PREFIX. Existing files are skipped unless –no-skip-existing is passed.

Usage

rdfsolve instance-match seed [OPTIONS]

Options

-p, --prefixes <prefix_list>: Required Bioregistry prefix (repeatable).

--sources <sources>: Path to sources file (JSON-LD or CSV). Default: auto-detect data/sources.jsonld.

--output-dir <output_dir>

Directory to write JSON-LD mapping files.

Default:: 'docker/mappings/instance_matching'

--predicate <predicate>

Mapping predicate URI.

Default:: 'http://www.w3.org/2004/02/skos/core#narrowMatch'

-d, --dataset <datasets>: Restrict to this dataset name (repeatable).

--timeout <timeout>

SPARQL request timeout in seconds.

Default:: 60.0

--no-skip-existing: Re-probe even if the output file already exists.

pipeline

Schema-mining pipeline: mine, local-mine, qleverfile.

These commands replace the old rdfsolve discover, mine, and mine-all top-level commands. Each route can target remote SPARQL endpoints or a local QLever instance.

Quick-start examples:

# Mine schemas from all remote endpoints
rdfsolve pipeline mine

# Generate Qleverfiles then mine locally
rdfsolve pipeline qleverfile --data-dir /data/rdf
rdfsolve pipeline local-mine --name drugbank \
    --endpoint http://localhost:7026

All pipeline commands accept --sources, --output-dir, --filter, --timeout, and --benchmark. Use rdfsolve pipeline <cmd> --help for full details.

Usage

rdfsolve pipeline [OPTIONS] COMMAND [ARGS]...

local-mine

Mine schemas from a local QLever endpoint.

Use after downloading data and running qlever index && qlever start for a dataset. Connects to the local endpoint and runs the full mining pipeline.

Example:

rdfsolve pipeline local-mine \
    --endpoint http://localhost:7026 \
    --name drugbank --discover-first --benchmark

Usage

rdfsolve pipeline local-mine [OPTIONS]

Options

--benchmark: Collect per-run benchmarks (timing, memory, CPU).

--filter <name_filter>: Regex to select sources by name.

--timeout <timeout>: HTTP timeout per SPARQL request (seconds).

--format <fmt>

Export format.

Options:: jsonld | void | all

--output-dir <output_dir>: Output directory for schemas/reports.

--sources <sources>: Path to sources YAML/JSON-LD/CSV.

--one-shot: Mine using a single unbounded SELECT per pattern type (no LIMIT/OFFSET, no fallback chain). Recommended for local QLever endpoints. Records per-query timing and row count in the report for comparison with the fallback-chain run.

--author <NAME|ORCID>: Credit an author in provenance metadata. Format: ‘Full Name|0000-0000-0000-0000’. ORCID is optional. Repeat for multiple authors.

--untyped-as-classes: Treat untyped URI objects as owl:Class references instead of rdfs:Resource.

--no-counts: Skip triple-count queries (faster).

--class-batch-size <class_batch_size>: Classes per VALUES query in two-phase mining.

--class-chunk-size <class_chunk_size>: Page size for Phase-1 class discovery (two-phase mode only). Default: no pagination.

--chunk-size <chunk_size>: SPARQL pagination page size.

--endpoint <endpoint>: Local QLever SPARQL endpoint URL.

--name <name>: Dataset name (required for single-dataset mode).

--discover-first: Run VoID discovery before mining.

--void-uri-base <void_uri_base>: Base URI for generated VoID partition IRIs (default: sources.yaml value or built-in template).

--test: Process only the 3 smallest downloadable sources.

mine

Mine schemas from remote SPARQL endpoints.

Standard mining workflow: iterate endpoints, extract schema patterns, write JSON-LD schemas, VoID turtle, and analytics reports. Reports are always written.

Examples:

rdfsolve pipeline mine
rdfsolve pipeline mine --filter "drugbank"
rdfsolve pipeline mine --benchmark

Usage

rdfsolve pipeline mine [OPTIONS]

Options

--benchmark: Collect per-run benchmarks (timing, memory, CPU).

--filter <name_filter>: Regex to select sources by name.

--timeout <timeout>: HTTP timeout per SPARQL request (seconds).

--format <fmt>

Export format.

Options:: jsonld | void | all

--output-dir <output_dir>: Output directory for schemas/reports.

--sources <sources>: Path to sources YAML/JSON-LD/CSV.

--one-shot: Mine using a single unbounded SELECT per pattern type (no LIMIT/OFFSET, no fallback chain). Recommended for local QLever endpoints. Records per-query timing and row count in the report for comparison with the fallback-chain run.

--author <NAME|ORCID>: Credit an author in provenance metadata. Format: ‘Full Name|0000-0000-0000-0000’. ORCID is optional. Repeat for multiple authors.

--untyped-as-classes: Treat untyped URI objects as owl:Class references instead of rdfs:Resource.

--no-counts: Skip triple-count queries (faster).

--class-batch-size <class_batch_size>: Classes per VALUES query in two-phase mining.

--class-chunk-size <class_chunk_size>: Page size for Phase-1 class discovery (two-phase mode only). Default: no pagination.

--chunk-size <chunk_size>: SPARQL pagination page size.

qleverfile

Generate Qleverfiles for local QLever mining.

Creates a Qleverfile for each source that has download URLs in the sources registry. Each Qleverfile includes a GET_DATA_CMD that downloads and preprocesses the data.

Examples:

rdfsolve pipeline qleverfile --data-dir /data/rdf
rdfsolve pipeline qleverfile --data-dir /data/rdf --test

Usage

rdfsolve pipeline qleverfile [OPTIONS]

Options

--benchmark: Collect per-run benchmarks (timing, memory, CPU).

--filter <name_filter>: Regex to select sources by name.

--timeout <timeout>: HTTP timeout per SPARQL request (seconds).

--format <fmt>

Export format.

Options:: jsonld | void | all

--output-dir <output_dir>: Output directory for schemas/reports.

--sources <sources>: Path to sources YAML/JSON-LD/CSV.

--data-dir <data_dir>: Required Root directory where RDF dumps live (required).

--base-port <base_port>: First port number for allocation.

--test: Generate only for 3 smallest downloadable sources.

--runtime <runtime>

QLever runtime.

Options:: docker | native

semra

SeMRA integration: import external mappings from semra sources.

Downloads mappings from community sources (biomappings, Gilda, etc.) and writes one JSON-LD file per (source, bioregistry-prefix) pair.

Typical workflow:

rdfsolve semra import --source biomappings
rdfsolve semra seed --sources biomappings gilda

Usage

rdfsolve semra [OPTIONS] COMMAND [ARGS]...

import

Import mappings from a single SeMRA source.

Writes {source}_{prefix}.jsonld for each unique subject prefix found in the downloaded mappings.

Usage

rdfsolve semra import [OPTIONS]

Options

-s, --source <source>: Required SeMRA source key (e.g. ‘biomappings’, ‘gilda’).

-p, --prefix <prefixes>: Keep only these bioregistry prefixes (repeatable). Default: keep all.

--output-dir <output_dir>

Directory to write JSON-LD files.

Default:: 'docker/mappings/semra'

seed

Seed mapping files from multiple SeMRA sources.

Usage

rdfsolve semra seed [OPTIONS]

Options

-s, --sources <source_list>: Required SeMRA source key (repeatable).

-p, --prefix <prefixes>: Keep only these bioregistry prefixes (repeatable).

--output-dir <output_dir>

Directory to write JSON-LD files.

Default:: 'docker/mappings/semra'