Command Line Interface
rdfsolve automatically installs the command rdfsolve. See
rdfsolve --help for usage details.
rdfsolve
RDFSolve - RDF Schema Extraction, Export and Analysis Toolkit.
Pipeline commands (schema mining):
rdfsolve pipeline mine Mine schemas from remote endpoints
rdfsolve pipeline local-mine Mine from a local QLever endpoint
rdfsolve pipeline qleverfile Generate Qleverfiles for QLever
Analysis commands:
rdfsolve export Interconvert schemas between JSON-LD, LinkML, SHACL, CSV, ...
Mapping commands:
rdfsolve instance-match Cross-dataset class matching
rdfsolve semra Import external SeMRA mappings
rdfsolve inference Derive new mappings from existing ones
Usage
rdfsolve [OPTIONS] COMMAND [ARGS]...
Options
- --version
Show the version and exit.
- -v, --verbose
Enable verbose logging
export
Convert RDF schemas between formats.
All subcommands accept a VoID Turtle (.ttl) or rdfsolve JSON-LD (.jsonld) file as INPUT and produce the requested output format. The input format is auto-detected from the file extension.
Supported conversions (any-to-any via the internal model):
VoID (.ttl) <-> JSON-LD (.jsonld)
JSON-LD (.jsonld) -> CSV, LinkML, SHACL, RDF-config, VoID
Subcommands:
csv Export schema patterns as a CSV table
jsonld Export schema as JSON-LD
void Export schema as VoID Turtle
linkml Export schema as LinkML YAML
shacl Export schema as SHACL shapes (Turtle)
rdfconfig Export schema as RDF-config YAML files
Examples:
rdfsolve export csv dataset_schema.jsonld
rdfsolve export linkml dataset_void.ttl -o ./out
rdfsolve export shacl dataset_schema.jsonld --closed
rdfsolve export rdfconfig dataset_void.ttl --endpoint-url http://...
rdfsolve export void dataset_schema.jsonld
Usage
rdfsolve export [OPTIONS] COMMAND [ARGS]...
csv
Export schema patterns as a CSV table.
Example:
rdfsolve export csv dataset_schema.jsonld -o ./exports
Usage
rdfsolve export csv [OPTIONS] INPUT
Options
- -o, --output-dir <output_dir>
Output directory.
Arguments
- INPUT
Required argument
jsonld
Export schema as JSON-LD.
Useful for converting a VoID Turtle file into the rdfsolve JSON-LD format. If the input is already JSON-LD, it is re-serialised (which can be used to refresh @about metadata).
Example:
rdfsolve export jsonld dataset_void.ttl -o ./exports
Usage
rdfsolve export jsonld [OPTIONS] INPUT
Options
- -o, --output-dir <output_dir>
Output directory.
- --endpoint-url <endpoint_url>
SPARQL endpoint URL for @about.
- --graph-uri <graph_uri>
Named graph URI for @about.
Arguments
- INPUT
Required argument
linkml
Export schema as LinkML YAML.
Generates a LinkML schema definition for data modelling, validation, and code generation.
Example:
rdfsolve export linkml dataset_schema.jsonld --schema-name myds
Usage
rdfsolve export linkml [OPTIONS] INPUT
Options
- -o, --output-dir <output_dir>
Output directory.
- --schema-name <schema_name>
Schema name (default: from filename).
- --schema-description <schema_description>
Schema description.
- --schema-uri <schema_uri>
Base URI for the schema (e.g. http://example.org/schemas/myschema).
Arguments
- INPUT
Required argument
rdfconfig
Export schema as RDF-config YAML files.
RDF-config is a schema standard for describing RDF data models. Produces a {dataset}_config/ directory with model.yaml, prefix.yaml, and optionally endpoint.yaml.
Example:
rdfsolve export rdfconfig dataset_void.ttl \
--endpoint-url https://example.org/sparql
Usage
rdfsolve export rdfconfig [OPTIONS] INPUT
Options
- -o, --output-dir <output_dir>
Output directory.
- --endpoint-url <endpoint_url>
SPARQL endpoint URL (generates endpoint.yaml when provided).
- --endpoint-name <endpoint_name>
Endpoint name (default: ‘endpoint’).
- --graph-uri <graph_uri>
Named graph URI.
Arguments
- INPUT
Required argument
shacl
Export schema as SHACL shapes (Turtle).
SHACL shapes validate RDF data against the extracted schema. Use –closed (default) for strict validation or –open for flexible.
Examples:
rdfsolve export shacl dataset_schema.jsonld --open
rdfsolve export shacl dataset_schema.jsonld --suffix Shape
Usage
rdfsolve export shacl [OPTIONS] INPUT
Options
- -o, --output-dir <output_dir>
Output directory.
- --schema-name <schema_name>
Schema name (default: from filename).
- --schema-description <schema_description>
Schema description.
- --schema-uri <schema_uri>
Base URI for the schema.
- --closed, --open
Generate closed shapes (default) or open shapes.
- --suffix <suffix>
Suffix for shape names (e.g. ‘Shape’ -> PersonShape).
Arguments
- INPUT
Required argument
void
Export schema as VoID Turtle.
Converts a JSON-LD schema back to VoID RDF (Turtle). Also works with VoID input (round-trip).
Example:
rdfsolve export void dataset_schema.jsonld -o ./exports
Usage
rdfsolve export void [OPTIONS] INPUT
Options
- -o, --output-dir <output_dir>
Output directory.
Arguments
- INPUT
Required argument
inference
Derive new mappings from existing ones.
Uses SeMRA inference operations (inversion, transitivity, generalisation) to expand a set of mapping JSON-LD files.
Typical workflow:
rdfsolve inference run --input file1.jsonld file2.jsonld \
--output docker/mappings/inferenced/inferred.jsonld
rdfsolve inference seed
Usage
rdfsolve inference [OPTIONS] COMMAND [ARGS]...
run
Infer new mappings from the given input files.
Usage
rdfsolve inference run [OPTIONS]
Options
- -i, --input <input_paths>
Required Input mapping JSON-LD file (repeatable).
- -o, --output <output_path>
Required Output JSON-LD file path.
- --no-inversion
Disable inversion inference.
- --no-transitivity
Disable transitivity (chain) inference.
- --generalisation
Enable generalisation inference (off by default).
- --chain-cutoff <chain_cutoff>
Maximum chain length for transitivity.
- Default:
3
- --name <dataset_name>
Override @about.dataset_name in the output.
seed
Infer over all mappings found under INPUT_DIR.
Usage
rdfsolve inference seed [OPTIONS]
Options
- --input-dir <input_dir>
Directory containing instance_matching/ and semra/ subdirs.
- Default:
'docker/mappings'
- --output-dir <output_dir>
Directory for the inferenced output.
- Default:
'docker/mappings/inferenced'
- --name <output_name>
Output file stem (without .jsonld).
- Default:
'inferenced_mappings'
- --no-inversion
Disable inversion inference.
- --no-transitivity
Disable transitivity inference.
- --generalisation
Enable generalisation inference.
- --chain-cutoff <chain_cutoff>
Maximum chain length for transitivity.
- Default:
3
instance-match
Instance-based matching: discover cross-dataset class links.
Probes SPARQL endpoints for classes whose instances match bioregistry URI patterns and writes skos:narrowMatch mapping JSON-LD files.
Typical workflow:
rdfsolve instance-match probe --prefix ensembl -o ensembl_mapping.jsonld
rdfsolve instance-match seed --prefixes ensembl uniprot chebi
Usage
rdfsolve instance-match [OPTIONS] COMMAND [ARGS]...
probe
Probe endpoints for a single bioregistry resource.
Queries every endpoint in SOURCES for RDF classes whose instances match the URI patterns registered in bioregistry for PREFIX and emits a JSON-LD mapping document.
Usage
rdfsolve instance-match probe [OPTIONS]
Options
- -p, --prefix <prefix>
Required Bioregistry prefix to probe (e.g. ‘ensembl’).
- --sources <sources>
Path to sources file (JSON-LD or CSV). Default: auto-detect data/sources.jsonld.
- --predicate <predicate>
Mapping predicate URI.
- Default:
'http://www.w3.org/2004/02/skos/core#narrowMatch'
- -d, --dataset <datasets>
Restrict to this dataset name (repeatable).
- --timeout <timeout>
SPARQL request timeout in seconds.
- Default:
60.0
- -o, --output <output>
Write JSON-LD to this file (default: stdout).
seed
Seed mapping files for multiple bioregistry resources.
Writes {PREFIX}_instance_mapping.jsonld to OUTPUT_DIR for each supplied PREFIX. Existing files are skipped unless –no-skip-existing is passed.
Usage
rdfsolve instance-match seed [OPTIONS]
Options
- -p, --prefixes <prefix_list>
Required Bioregistry prefix (repeatable).
- --sources <sources>
Path to sources file (JSON-LD or CSV). Default: auto-detect data/sources.jsonld.
- --output-dir <output_dir>
Directory to write JSON-LD mapping files.
- Default:
'docker/mappings/instance_matching'
- --predicate <predicate>
Mapping predicate URI.
- Default:
'http://www.w3.org/2004/02/skos/core#narrowMatch'
- -d, --dataset <datasets>
Restrict to this dataset name (repeatable).
- --timeout <timeout>
SPARQL request timeout in seconds.
- Default:
60.0
- --no-skip-existing
Re-probe even if the output file already exists.
pipeline
Schema-mining pipeline: mine, local-mine, qleverfile.
These commands replace the old rdfsolve discover, mine, and
mine-all top-level commands. Each route can target remote SPARQL
endpoints or a local QLever instance.
Quick-start examples:
# Mine schemas from all remote endpoints
rdfsolve pipeline mine
# Generate Qleverfiles then mine locally
rdfsolve pipeline qleverfile --data-dir /data/rdf
rdfsolve pipeline local-mine --name drugbank \
--endpoint http://localhost:7026
All pipeline commands accept --sources, --output-dir,
--filter, --timeout, and --benchmark.
Use rdfsolve pipeline <cmd> --help for full details.
Usage
rdfsolve pipeline [OPTIONS] COMMAND [ARGS]...
local-mine
Mine schemas from a local QLever endpoint.
Use after downloading data and running qlever index && qlever
start for a dataset. Connects to the local endpoint and runs
the full mining pipeline.
Example:
rdfsolve pipeline local-mine \
--endpoint http://localhost:7026 \
--name drugbank --discover-first --benchmark
Usage
rdfsolve pipeline local-mine [OPTIONS]
Options
- --benchmark
Collect per-run benchmarks (timing, memory, CPU).
- --filter <name_filter>
Regex to select sources by name.
- --timeout <timeout>
HTTP timeout per SPARQL request (seconds).
- --format <fmt>
Export format.
- Options:
jsonld | void | all
- --output-dir <output_dir>
Output directory for schemas/reports.
- --sources <sources>
Path to sources YAML/JSON-LD/CSV.
- --one-shot
Mine using a single unbounded SELECT per pattern type (no LIMIT/OFFSET, no fallback chain). Recommended for local QLever endpoints. Records per-query timing and row count in the report for comparison with the fallback-chain run.
- --author <NAME|ORCID>
Credit an author in provenance metadata. Format: ‘Full Name|0000-0000-0000-0000’. ORCID is optional. Repeat for multiple authors.
- --untyped-as-classes
Treat untyped URI objects as owl:Class references instead of rdfs:Resource.
- --no-counts
Skip triple-count queries (faster).
- --class-batch-size <class_batch_size>
Classes per VALUES query in two-phase mining.
- --class-chunk-size <class_chunk_size>
Page size for Phase-1 class discovery (two-phase mode only). Default: no pagination.
- --chunk-size <chunk_size>
SPARQL pagination page size.
- --endpoint <endpoint>
Local QLever SPARQL endpoint URL.
- --name <name>
Dataset name (required for single-dataset mode).
- --discover-first
Run VoID discovery before mining.
- --void-uri-base <void_uri_base>
Base URI for generated VoID partition IRIs (default: sources.yaml value or built-in template).
- --test
Process only the 3 smallest downloadable sources.
mine
Mine schemas from remote SPARQL endpoints.
Standard mining workflow: iterate endpoints, extract schema patterns, write JSON-LD schemas, VoID turtle, and analytics reports. Reports are always written.
Examples:
rdfsolve pipeline mine
rdfsolve pipeline mine --filter "drugbank"
rdfsolve pipeline mine --benchmark
Usage
rdfsolve pipeline mine [OPTIONS]
Options
- --benchmark
Collect per-run benchmarks (timing, memory, CPU).
- --filter <name_filter>
Regex to select sources by name.
- --timeout <timeout>
HTTP timeout per SPARQL request (seconds).
- --format <fmt>
Export format.
- Options:
jsonld | void | all
- --output-dir <output_dir>
Output directory for schemas/reports.
- --sources <sources>
Path to sources YAML/JSON-LD/CSV.
- --one-shot
Mine using a single unbounded SELECT per pattern type (no LIMIT/OFFSET, no fallback chain). Recommended for local QLever endpoints. Records per-query timing and row count in the report for comparison with the fallback-chain run.
- --author <NAME|ORCID>
Credit an author in provenance metadata. Format: ‘Full Name|0000-0000-0000-0000’. ORCID is optional. Repeat for multiple authors.
- --untyped-as-classes
Treat untyped URI objects as owl:Class references instead of rdfs:Resource.
- --no-counts
Skip triple-count queries (faster).
- --class-batch-size <class_batch_size>
Classes per VALUES query in two-phase mining.
- --class-chunk-size <class_chunk_size>
Page size for Phase-1 class discovery (two-phase mode only). Default: no pagination.
- --chunk-size <chunk_size>
SPARQL pagination page size.
qleverfile
Generate Qleverfiles for local QLever mining.
Creates a Qleverfile for each source that has download URLs in the sources registry. Each Qleverfile includes a GET_DATA_CMD that downloads and preprocesses the data.
Examples:
rdfsolve pipeline qleverfile --data-dir /data/rdf
rdfsolve pipeline qleverfile --data-dir /data/rdf --test
Usage
rdfsolve pipeline qleverfile [OPTIONS]
Options
- --benchmark
Collect per-run benchmarks (timing, memory, CPU).
- --filter <name_filter>
Regex to select sources by name.
- --timeout <timeout>
HTTP timeout per SPARQL request (seconds).
- --format <fmt>
Export format.
- Options:
jsonld | void | all
- --output-dir <output_dir>
Output directory for schemas/reports.
- --sources <sources>
Path to sources YAML/JSON-LD/CSV.
- --data-dir <data_dir>
Required Root directory where RDF dumps live (required).
- --base-port <base_port>
First port number for allocation.
- --test
Generate only for 3 smallest downloadable sources.
- --runtime <runtime>
QLever runtime.
- Options:
docker | native
semra
SeMRA integration: import external mappings from semra sources.
Downloads mappings from community sources (biomappings, Gilda, etc.) and writes one JSON-LD file per (source, bioregistry-prefix) pair.
Typical workflow:
rdfsolve semra import --source biomappings
rdfsolve semra seed --sources biomappings gilda
Usage
rdfsolve semra [OPTIONS] COMMAND [ARGS]...
import
Import mappings from a single SeMRA source.
Writes {source}_{prefix}.jsonld for each unique subject prefix found in the downloaded mappings.
Usage
rdfsolve semra import [OPTIONS]
Options
- -s, --source <source>
Required SeMRA source key (e.g. ‘biomappings’, ‘gilda’).
- -p, --prefix <prefixes>
Keep only these bioregistry prefixes (repeatable). Default: keep all.
- --output-dir <output_dir>
Directory to write JSON-LD files.
- Default:
'docker/mappings/semra'
seed
Seed mapping files from multiple SeMRA sources.
Usage
rdfsolve semra seed [OPTIONS]
Options
- -s, --sources <source_list>
Required SeMRA source key (repeatable).
- -p, --prefix <prefixes>
Keep only these bioregistry prefixes (repeatable).
- --output-dir <output_dir>
Directory to write JSON-LD files.
- Default:
'docker/mappings/semra'