Ontology Extraction#
Ontology extraction analyzes existing RDFS/OWL ontologies and extracts structured metadata that PyGraft-gen uses to generate synthetic Knowledge Graphs.
On this page:
- Overview - Understanding the extraction process
- Design Philosophy - Core principles guiding extraction
- What Gets Extracted - The three metadata files
- Technical Details - Implementation and supported formats
Overview#
Extraction converts an ontology file into three JSON metadata files:
flowchart LR
O[Ontology File]
subgraph E[Extraction Pipeline]
N[Namespace<br/>Extraction]
C[Class<br/>Extraction]
R[Relation<br/>Extraction]
end
NI[namespaces_info.json]
CI[class_info.json]
RI[relation_info.json]
O --> N
O --> C
O --> R
N --> NI
C --> CI
R --> RI
style O fill:#f8c9c9,stroke:#c55,stroke-width:2px
style N fill:#f8c9c9,stroke:#c55,stroke-width:2px
style C fill:#f8c9c9,stroke:#c55,stroke-width:2px
style R fill:#f8c9c9,stroke:#c55,stroke-width:2px
style NI fill:#eee,stroke:#666,stroke-width:2px
style CI fill:#eee,stroke:#666,stroke-width:2px
style RI fill:#eee,stroke:#666,stroke-width:2px
These files capture the ontology structure in a format the KG generator can use. Extraction is deterministic, read-only, and operates purely on explicit axioms without reasoning.
Extraction Scope
Extraction captures only OWL constructs that PyGraft-gen can enforce during generation.
See What's Supported
Design Philosophy#
Extraction is built on three core principles that ensure predictable, debuggable results:
Explicit-First, No Inference
The extractor answers one question: "What does this ontology explicitly declare?"
It operates purely on axioms present in the ontology file. No OWL reasoning, RDFS entailment, or semantic expansion is performed.
What this means:
- Declared relationships like
foaf:Person rdfs:subClassOf foaf:Agentare captured - Implied relationships are only captured if explicitly stated
- Transitive relationships (like
rdfs:subClassOf*) are computed via SPARQL property paths over explicit axioms, not semantic inference - External ontologies (FOAF, Dublin Core, etc.) are not loaded or traversed
Read-Only and Schema-Focused
Extraction is read-only. The original ontology is never modified.
The focus is purely structural: class hierarchies, property characteristics, and explicit constraints that can be enforced during generation.
Deterministic and Reproducible
Running extraction on the same ontology always produces identical output. All derived structures are computed using fixed SPARQL queries that execute in the same order every time.
This matters for debugging extraction issues, reproducing generation runs, and understanding exactly what the generator sees.
What Gets Extracted#
Extraction produces three JSON files, each capturing a different aspect of your ontology's structure.
Namespaces → namespaces_info.json
Prefix-to-namespace mappings and ontology metadata. All IRIs are normalized to prefix:LocalName format.
Requirements: Your ontology must include an owl:Ontology declaration. (VANN annotations are recommended but optional)
Learn more → Reference - Namespaces Info
Classes → class_info.json
Class hierarchy and constraints including named classes, direct and transitive rdfs:subClassOf relationships, owl:disjointWith declarations, hierarchy layers, and statistics.
What counts as a class? Any named IRI appearing as an rdf:type target, rdfs:subClassOf participant, or property domain/range. Blank nodes are filtered out.
Learn more → Reference - Class Info
Relations → relation_info.json
Object property characteristics and constraints including OWL characteristics (symmetric, transitive, functional, reflexive, irreflexive, asymmetric, inverse functional), domain/range constraints, inverse relationships, subproperty hierarchies, and disjointness.
Note: Datatype properties are excluded.
Learn more → Reference - Relation Info
Technical Details#
Implementation & Supported Formats
SPARQL-based extraction:
Extraction uses SPARQL queries executed through Python's rdflib library. Queries are organized by domain and stored as .rq files.
No reasoning:
Extraction captures only explicit assertions. Transitive closures are computed via SPARQL property paths over explicit axioms, not OWL reasoning.
Supported formats:
N-Triples and other RDF serializations are not supported.
What's Next#
- KG Generation - How instances are created from extracted metadata
- Consistency Checking - Validating generated KGs
Reference: