Skip to content

Quickstart#

Generate your first Knowledge Graph in under 5 minutes.

Before You Start

Make sure you've installed PyGraft-gen

Choose Your Approach#

Ontology Extraction + KG Generation Full Synthetic Generation


Ontology Extraction + KG Generation#

You have an existing ontology and want to create synthetic instance data that follows its schema.

1. Extract the ontology structure

pygraft extract path/to/your-ontology.ttl
This creates pygraft.config.json in your current working directory with extracted metadata.

2. Configure KG generation

Open pygraft.config.json and edit the kg section:

{
  "kg": {
    "num_entities": 3000,               // Number of entity instances to generate
    "num_triples": 30000,               // Target number of triples

    "enable_fast_generation": false,    // Speed optimization: creates smaller base KG then replicates it

    "relation_usage_uniformity": 0.9,   // Controls distribution of relations across triples
    "prop_untyped_entities": 0.0,       // Proportion of entities that remain untyped

    "avg_specific_class_depth": 2.0,    // Average depth of the most specific class assigned to each entity

    "multityping": false,               // Whether entities may receive multiple most-specific types
    "avg_types_per_entity": 1.0,        // Average number of most-specific classes per typed entity

    "check_kg_consistency": true        // Run post-generation HermiT reasoning to verify consistency
  }
}

3. Generate the Knowledge Graph

pygraft kg pygraft.config.json

4. Verify the output

You should see:

output_pygraft/your-ontology-name/
├── schema.ttl              # Copy of your ontology
├── kg.ttl                  # Generated Knowledge Graph ✨
├── class_info.json         # Extracted class metadata
├── relation_info.json      # Extracted relation metadata
├── namespaces_info.json    # Namespace mappings
└── kg_info.json            # Generation statistics

5. Debug inconsistencies (if needed)

If HermiT reports your KG as inconsistent, use the explain command to identify problematic axioms:

pygraft explain output_pygraft/your-ontology-name/kg.ttl

This runs the Pellet reasoner to provide detailed explanations of which specific triples or constraints are causing logical contradictions.

Learn More

See Consistency Checking.

You're Done!

Your Knowledge Graph is in kg.ttl. Open it to see the generated triples, or use kg_info.json to review statistics.

Full Synthetic#

Generate both the ontology schema and instance data from statistical parameters.

1. Create a configuration template

pygraft init json
This creates pygraft.config.json in your current working directory.

2. Configure generation parameters

Open pygraft.config.json and edit the essential parameters:

{
  "general": {
    "project_name": "auto",           // Auto-generate timestamped folder name
    "rdf_format": "ttl"                // Output format (ttl, rdf, or nt)
  },

  "schema": {
    "classes": {
      "num_classes": 50,               // Number of classes in ontology
      "max_hierarchy_depth": 4,        // How deep the class tree goes
      ...
    },
    "relations": {
      "num_relations": 50,             // Number of object properties
      ...
    }
  },

  "kg": {
    "num_entities": 3000,              // Number of entity instances
    "num_triples": 30000,              // Target number of triples
    "check_kg_consistency": true       // Run HermiT reasoning to verify consistency
  }
}

Learn More

See Configuration Reference for all available parameters.

3. Generate schema and KG

pygraft build pygraft.config.json
pygraft schema pygraft.config.json
pygraft kg pygraft.config.json

4. Verify the output

You should see:

output_pygraft/2026-01-16_14-30-22/
├── schema.ttl              # Generated ontology ✨
├── kg.ttl                  # Generated Knowledge Graph ✨
├── class_info.json         # Class hierarchy metadata
├── relation_info.json      # Relation constraints
└── kg_info.json            # Generation statistics

You're Done!

Both your synthetic schema and KG are ready. Review schema.ttl to see the ontology structure and kg.ttl for the instance data.


What's Supported#

Current Focus: Object Properties

PyGraft-gen currently focuses on object properties (entity-to-entity relations). We're working to complete full object property support before moving to datatype properties. Future versions will add:

  • Blank-node class expressionsowl:Restriction, owl:unionOf, owl:intersectionOf, owl:complementOf, etc.
  • Value restrictionsowl:someValuesFrom, owl:allValuesFrom, owl:hasValue, etc.
  • Compound domain/range – Complex class expressions in property constraints
  • Higher-level disjointnessowl:AllDisjointClasses, owl:disjointUnionOf, etc.

Once object properties are complete, we'll add datatype properties (literal-valued attributes like strings, integers, dates, etc.).

These additions require defining how they should be modeled from the extracted ontology, enforced during generation, and integrated with existing constraints. These are design questions we're actively exploring.

FAQ#

Consistency checking takes forever?

Consistency checking uses the HermiT reasoner. Runtime depends on:

  • Schema complexity (constraints, disjointness, property characteristics)
  • KG size (entities and triples)

Typical runtime:

  • Small KGs (10K entities): Seconds to minutes
  • Medium KGs (100K entities): Minutes to tens of minutes
  • Large KGs (1M+ entities): Hours or may not complete

For large KGs:

  1. Test on small KG (1K-100K entities) with checking enabled first
  2. Once validated, disable for production: "check_kg_consistency": false

Learn more about consistency checking


What's Next?#

Based on your goal:

Technical Reference: