Skip to content

PyGraft-gen#

Generate synthetic RDFS/OWL ontologies and RDF Knowledge Graphs at scale.

PyGraft-gen creates synthetic Knowledge Graphs with realistic structure and constraint-aware generation, making it ideal for testing AI systems, benchmarking graph algorithms, and advancing research in scenarios where real data is sensitive or unavailable.

Get Started View on GitHub


Installation#

# From PyPI (recommended)
pip install pygraft-gen

# From GitHub (latest from main branch)
pip install git+https://github.com/Orange-OpenSource/pygraft-gen.git
# From PyPI (recommended)
uv add pygraft-gen

# From GitHub (latest from main branch)
uv pip install git+https://github.com/Orange-OpenSource/pygraft-gen.git
# From PyPI (recommended)
poetry add pygraft-gen

# From GitHub (latest from main branch)
poetry add git+https://github.com/Orange-OpenSource/pygraft-gen.git

Requirements: Python 3.10+, Java (optional)

Learn more

See Installation for detailed setup instructions and Java configuration.


  • New to Knowledge Graphs?


    Learn Ontologies, RDF, OWL, and the Semantic Web standards that power PyGraft-gen.

    Start with the basics →

  • Two Flexible Workflows


    Generate from scratch using statistical parameters or extract structure from real ontologies to create synthetic instances.

    See both workflows →

  • Constraint-Aware Generation


    Enforces OWL constraints during generation and validates results with HermiT and Pellet reasoners.

    Learn about constraints →

  • Production-Scale Performance


    Built to handle millions of entities and tens of millions of triples with optimized generation architecture and fast sampling mode.

    Explore generation details →

  • Stochastic by Design


    Generates diverse, randomized graphs by default. Optionally set a random seed for reproducible results in testing and research.

    Configure generation →


Research & Current Focus#

🏆 Built on Award-Winning Research

PyGraft-gen is built on PyGraft, which received the Best Resource Paper Award at ESWC 2024.

Read the paper →

Current Focus: Object Properties

PyGraft-gen currently focuses on object properties (entity-to-entity relations). We're working to complete full object property support before moving to datatype properties. Future versions will add:

  • Blank-node class expressionsowl:Restriction, owl:unionOf, owl:intersectionOf, owl:complementOf, etc.
  • Value restrictionsowl:someValuesFrom, owl:allValuesFrom, owl:hasValue, etc.
  • Compound domain/range – Complex class expressions in property constraints
  • Higher-level disjointnessowl:AllDisjointClasses, owl:disjointUnionOf, etc.

Once object properties are complete, we'll add datatype properties (literal-valued attributes like strings, integers, dates, etc.).

These additions require defining how they should be modeled from the extracted ontology, enforced during generation, and integrated with existing constraints. These are design questions we're actively exploring.

Community & Support#