Command Line Interface

rdfsolve automatically installs the command rdfsolve. See rdfsolve --help for usage details.

rdfsolve

RDFSolve - RDF Schema Extraction, Export and Analysis Toolkit.

Pipeline commands (schema mining):

rdfsolve pipeline mine         Mine schemas from remote endpoints
rdfsolve pipeline local-mine   Mine from a local QLever endpoint
rdfsolve pipeline qleverfile   Generate Qleverfiles for QLever

Analysis commands:

rdfsolve export    Interconvert schemas between JSON-LD, LinkML, SHACL, CSV, ...

Mapping commands:

rdfsolve instance-match   Cross-dataset class matching
rdfsolve semra            Import external SeMRA mappings
rdfsolve inference        Derive new mappings from existing ones

Usage

rdfsolve [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

-v, --verbose

Enable verbose logging

export

Convert RDF schemas between formats.

All subcommands accept a VoID Turtle (.ttl) or rdfsolve JSON-LD (.jsonld) file as INPUT and produce the requested output format. The input format is auto-detected from the file extension.

Supported conversions (any-to-any via the internal model):

VoID (.ttl)      <->  JSON-LD (.jsonld)
JSON-LD (.jsonld) ->  CSV, LinkML, SHACL, RDF-config, VoID

Subcommands:

csv        Export schema patterns as a CSV table
jsonld     Export schema as JSON-LD
void       Export schema as VoID Turtle
linkml     Export schema as LinkML YAML
shacl      Export schema as SHACL shapes (Turtle)
rdfconfig  Export schema as RDF-config YAML files

Examples:

rdfsolve export csv       dataset_schema.jsonld
rdfsolve export linkml    dataset_void.ttl -o ./out
rdfsolve export shacl     dataset_schema.jsonld --closed
rdfsolve export rdfconfig dataset_void.ttl --endpoint-url http://...
rdfsolve export void      dataset_schema.jsonld

Usage

rdfsolve export [OPTIONS] COMMAND [ARGS]...

csv

Export schema patterns as a CSV table.

Example:

rdfsolve export csv dataset_schema.jsonld -o ./exports

Usage

rdfsolve export csv [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>

Output directory.

Arguments

INPUT

Required argument

jsonld

Export schema as JSON-LD.

Useful for converting a VoID Turtle file into the rdfsolve JSON-LD format. If the input is already JSON-LD, it is re-serialised (which can be used to refresh @about metadata).

Example:

rdfsolve export jsonld dataset_void.ttl -o ./exports

Usage

rdfsolve export jsonld [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>

Output directory.

--endpoint-url <endpoint_url>

SPARQL endpoint URL for @about.

--graph-uri <graph_uri>

Named graph URI for @about.

Arguments

INPUT

Required argument

linkml

Export schema as LinkML YAML.

Generates a LinkML schema definition for data modelling, validation, and code generation.

Example:

rdfsolve export linkml dataset_schema.jsonld --schema-name myds

Usage

rdfsolve export linkml [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>

Output directory.

--schema-name <schema_name>

Schema name (default: from filename).

--schema-description <schema_description>

Schema description.

--schema-uri <schema_uri>

Base URI for the schema (e.g. http://example.org/schemas/myschema).

Arguments

INPUT

Required argument

rdfconfig

Export schema as RDF-config YAML files.

RDF-config is a schema standard for describing RDF data models. Produces a {dataset}_config/ directory with model.yaml, prefix.yaml, and optionally endpoint.yaml.

Example:

rdfsolve export rdfconfig dataset_void.ttl \
    --endpoint-url https://example.org/sparql

Usage

rdfsolve export rdfconfig [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>

Output directory.

--endpoint-url <endpoint_url>

SPARQL endpoint URL (generates endpoint.yaml when provided).

--endpoint-name <endpoint_name>

Endpoint name (default: ‘endpoint’).

--graph-uri <graph_uri>

Named graph URI.

Arguments

INPUT

Required argument

shacl

Export schema as SHACL shapes (Turtle).

SHACL shapes validate RDF data against the extracted schema. Use –closed (default) for strict validation or –open for flexible.

Examples:

rdfsolve export shacl dataset_schema.jsonld --open
rdfsolve export shacl dataset_schema.jsonld --suffix Shape

Usage

rdfsolve export shacl [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>

Output directory.

--schema-name <schema_name>

Schema name (default: from filename).

--schema-description <schema_description>

Schema description.

--schema-uri <schema_uri>

Base URI for the schema.

--closed, --open

Generate closed shapes (default) or open shapes.

--suffix <suffix>

Suffix for shape names (e.g. ‘Shape’ -> PersonShape).

Arguments

INPUT

Required argument

void

Export schema as VoID Turtle.

Converts a JSON-LD schema back to VoID RDF (Turtle). Also works with VoID input (round-trip).

Example:

rdfsolve export void dataset_schema.jsonld -o ./exports

Usage

rdfsolve export void [OPTIONS] INPUT

Options

-o, --output-dir <output_dir>

Output directory.

Arguments

INPUT

Required argument

inference

Derive new mappings from existing ones.

Uses SeMRA inference operations (inversion, transitivity, generalisation) to expand a set of mapping JSON-LD files.

Typical workflow:

rdfsolve inference run --input file1.jsonld file2.jsonld \
    --output docker/mappings/inferenced/inferred.jsonld
rdfsolve inference seed

Usage

rdfsolve inference [OPTIONS] COMMAND [ARGS]...

run

Infer new mappings from the given input files.

Usage

rdfsolve inference run [OPTIONS]

Options

-i, --input <input_paths>

Required Input mapping JSON-LD file (repeatable).

-o, --output <output_path>

Required Output JSON-LD file path.

--no-inversion

Disable inversion inference.

--no-transitivity

Disable transitivity (chain) inference.

--generalisation

Enable generalisation inference (off by default).

--chain-cutoff <chain_cutoff>

Maximum chain length for transitivity.

Default:

3

--name <dataset_name>

Override @about.dataset_name in the output.

seed

Infer over all mappings found under INPUT_DIR.

Usage

rdfsolve inference seed [OPTIONS]

Options

--input-dir <input_dir>

Directory containing instance_matching/ and semra/ subdirs.

Default:

'docker/mappings'

--output-dir <output_dir>

Directory for the inferenced output.

Default:

'docker/mappings/inferenced'

--name <output_name>

Output file stem (without .jsonld).

Default:

'inferenced_mappings'

--no-inversion

Disable inversion inference.

--no-transitivity

Disable transitivity inference.

--generalisation

Enable generalisation inference.

--chain-cutoff <chain_cutoff>

Maximum chain length for transitivity.

Default:

3

instance-match

Instance-based matching: discover cross-dataset class links.

Probes SPARQL endpoints for classes whose instances match bioregistry URI patterns and writes skos:narrowMatch mapping JSON-LD files.

Typical workflow:

rdfsolve instance-match probe --prefix ensembl -o ensembl_mapping.jsonld
rdfsolve instance-match seed  --prefixes ensembl uniprot chebi

Usage

rdfsolve instance-match [OPTIONS] COMMAND [ARGS]...

probe

Probe endpoints for a single bioregistry resource.

Queries every endpoint in SOURCES for RDF classes whose instances match the URI patterns registered in bioregistry for PREFIX and emits a JSON-LD mapping document.

Usage

rdfsolve instance-match probe [OPTIONS]

Options

-p, --prefix <prefix>

Required Bioregistry prefix to probe (e.g. ‘ensembl’).

--sources <sources>

Path to sources file (JSON-LD or CSV). Default: auto-detect data/sources.jsonld.

--predicate <predicate>

Mapping predicate URI.

Default:

'http://www.w3.org/2004/02/skos/core#narrowMatch'

-d, --dataset <datasets>

Restrict to this dataset name (repeatable).

--timeout <timeout>

SPARQL request timeout in seconds.

Default:

60.0

-o, --output <output>

Write JSON-LD to this file (default: stdout).

seed

Seed mapping files for multiple bioregistry resources.

Writes {PREFIX}_instance_mapping.jsonld to OUTPUT_DIR for each supplied PREFIX. Existing files are skipped unless –no-skip-existing is passed.

Usage

rdfsolve instance-match seed [OPTIONS]

Options

-p, --prefixes <prefix_list>

Required Bioregistry prefix (repeatable).

--sources <sources>

Path to sources file (JSON-LD or CSV). Default: auto-detect data/sources.jsonld.

--output-dir <output_dir>

Directory to write JSON-LD mapping files.

Default:

'docker/mappings/instance_matching'

--predicate <predicate>

Mapping predicate URI.

Default:

'http://www.w3.org/2004/02/skos/core#narrowMatch'

-d, --dataset <datasets>

Restrict to this dataset name (repeatable).

--timeout <timeout>

SPARQL request timeout in seconds.

Default:

60.0

--no-skip-existing

Re-probe even if the output file already exists.

pipeline

Schema-mining pipeline: mine, local-mine, qleverfile.

These commands replace the old rdfsolve discover, mine, and mine-all top-level commands. Each route can target remote SPARQL endpoints or a local QLever instance.

Quick-start examples:

# Mine schemas from all remote endpoints
rdfsolve pipeline mine

# Generate Qleverfiles then mine locally
rdfsolve pipeline qleverfile --data-dir /data/rdf
rdfsolve pipeline local-mine --name drugbank \
    --endpoint http://localhost:7026

All pipeline commands accept --sources, --output-dir, --filter, --timeout, and --benchmark. Use rdfsolve pipeline <cmd> --help for full details.

Usage

rdfsolve pipeline [OPTIONS] COMMAND [ARGS]...

local-mine

Mine schemas from a local QLever endpoint.

Use after downloading data and running qlever index && qlever start for a dataset. Connects to the local endpoint and runs the full mining pipeline.

Example:

rdfsolve pipeline local-mine \
    --endpoint http://localhost:7026 \
    --name drugbank --discover-first --benchmark

Usage

rdfsolve pipeline local-mine [OPTIONS]

Options

--benchmark

Collect per-run benchmarks (timing, memory, CPU).

--filter <name_filter>

Regex to select sources by name.

--timeout <timeout>

HTTP timeout per SPARQL request (seconds).

--format <fmt>

Export format.

Options:

jsonld | void | all

--output-dir <output_dir>

Output directory for schemas/reports.

--sources <sources>

Path to sources YAML/JSON-LD/CSV.

--one-shot

Mine using a single unbounded SELECT per pattern type (no LIMIT/OFFSET, no fallback chain). Recommended for local QLever endpoints. Records per-query timing and row count in the report for comparison with the fallback-chain run.

--author <NAME|ORCID>

Credit an author in provenance metadata. Format: ‘Full Name|0000-0000-0000-0000’. ORCID is optional. Repeat for multiple authors.

--untyped-as-classes

Treat untyped URI objects as owl:Class references instead of rdfs:Resource.

--no-counts

Skip triple-count queries (faster).

--class-batch-size <class_batch_size>

Classes per VALUES query in two-phase mining.

--class-chunk-size <class_chunk_size>

Page size for Phase-1 class discovery (two-phase mode only). Default: no pagination.

--chunk-size <chunk_size>

SPARQL pagination page size.

--endpoint <endpoint>

Local QLever SPARQL endpoint URL.

--name <name>

Dataset name (required for single-dataset mode).

--discover-first

Run VoID discovery before mining.

--void-uri-base <void_uri_base>

Base URI for generated VoID partition IRIs (default: sources.yaml value or built-in template).

--test

Process only the 3 smallest downloadable sources.

mine

Mine schemas from remote SPARQL endpoints.

Standard mining workflow: iterate endpoints, extract schema patterns, write JSON-LD schemas, VoID turtle, and analytics reports. Reports are always written.

Examples:

rdfsolve pipeline mine
rdfsolve pipeline mine --filter "drugbank"
rdfsolve pipeline mine --benchmark

Usage

rdfsolve pipeline mine [OPTIONS]

Options

--benchmark

Collect per-run benchmarks (timing, memory, CPU).

--filter <name_filter>

Regex to select sources by name.

--timeout <timeout>

HTTP timeout per SPARQL request (seconds).

--format <fmt>

Export format.

Options:

jsonld | void | all

--output-dir <output_dir>

Output directory for schemas/reports.

--sources <sources>

Path to sources YAML/JSON-LD/CSV.

--one-shot

Mine using a single unbounded SELECT per pattern type (no LIMIT/OFFSET, no fallback chain). Recommended for local QLever endpoints. Records per-query timing and row count in the report for comparison with the fallback-chain run.

--author <NAME|ORCID>

Credit an author in provenance metadata. Format: ‘Full Name|0000-0000-0000-0000’. ORCID is optional. Repeat for multiple authors.

--untyped-as-classes

Treat untyped URI objects as owl:Class references instead of rdfs:Resource.

--no-counts

Skip triple-count queries (faster).

--class-batch-size <class_batch_size>

Classes per VALUES query in two-phase mining.

--class-chunk-size <class_chunk_size>

Page size for Phase-1 class discovery (two-phase mode only). Default: no pagination.

--chunk-size <chunk_size>

SPARQL pagination page size.

qleverfile

Generate Qleverfiles for local QLever mining.

Creates a Qleverfile for each source that has download URLs in the sources registry. Each Qleverfile includes a GET_DATA_CMD that downloads and preprocesses the data.

Examples:

rdfsolve pipeline qleverfile --data-dir /data/rdf
rdfsolve pipeline qleverfile --data-dir /data/rdf --test

Usage

rdfsolve pipeline qleverfile [OPTIONS]

Options

--benchmark

Collect per-run benchmarks (timing, memory, CPU).

--filter <name_filter>

Regex to select sources by name.

--timeout <timeout>

HTTP timeout per SPARQL request (seconds).

--format <fmt>

Export format.

Options:

jsonld | void | all

--output-dir <output_dir>

Output directory for schemas/reports.

--sources <sources>

Path to sources YAML/JSON-LD/CSV.

--data-dir <data_dir>

Required Root directory where RDF dumps live (required).

--base-port <base_port>

First port number for allocation.

--test

Generate only for 3 smallest downloadable sources.

--runtime <runtime>

QLever runtime.

Options:

docker | native

semra

SeMRA integration: import external mappings from semra sources.

Downloads mappings from community sources (biomappings, Gilda, etc.) and writes one JSON-LD file per (source, bioregistry-prefix) pair.

Typical workflow:

rdfsolve semra import --source biomappings
rdfsolve semra seed --sources biomappings gilda

Usage

rdfsolve semra [OPTIONS] COMMAND [ARGS]...

import

Import mappings from a single SeMRA source.

Writes {source}_{prefix}.jsonld for each unique subject prefix found in the downloaded mappings.

Usage

rdfsolve semra import [OPTIONS]

Options

-s, --source <source>

Required SeMRA source key (e.g. ‘biomappings’, ‘gilda’).

-p, --prefix <prefixes>

Keep only these bioregistry prefixes (repeatable). Default: keep all.

--output-dir <output_dir>

Directory to write JSON-LD files.

Default:

'docker/mappings/semra'

seed

Seed mapping files from multiple SeMRA sources.

Usage

rdfsolve semra seed [OPTIONS]

Options

-s, --sources <source_list>

Required SeMRA source key (repeatable).

-p, --prefix <prefixes>

Keep only these bioregistry prefixes (repeatable).

--output-dir <output_dir>

Directory to write JSON-LD files.

Default:

'docker/mappings/semra'