PyGraft-gen#
Generate synthetic RDFS/OWL ontologies and RDF Knowledge Graphs at scale.
PyGraft-gen creates synthetic Knowledge Graphs with realistic structure and constraint-aware generation, making it ideal for testing AI systems, benchmarking graph algorithms, and advancing research in scenarios where real data is sensitive or unavailable.
Installation#
# From PyPI (recommended)
pip install pygraft-gen
# From GitHub (latest from main branch)
pip install git+https://github.com/Orange-OpenSource/pygraft-gen.git
# From PyPI (recommended)
uv add pygraft-gen
# From GitHub (latest from main branch)
uv pip install git+https://github.com/Orange-OpenSource/pygraft-gen.git
# From PyPI (recommended)
poetry add pygraft-gen
# From GitHub (latest from main branch)
poetry add git+https://github.com/Orange-OpenSource/pygraft-gen.git
Requirements: Python 3.10+, Java (optional)
Learn more
See Installation for detailed setup instructions and Java configuration.
-
New to Knowledge Graphs?
Learn Ontologies, RDF, OWL, and the Semantic Web standards that power PyGraft-gen.
-
Two Flexible Workflows
Generate from scratch using statistical parameters or extract structure from real ontologies to create synthetic instances.
-
Constraint-Aware Generation
Enforces OWL constraints during generation and validates results with HermiT and Pellet reasoners.
-
Production-Scale Performance
Built to handle millions of entities and tens of millions of triples with optimized generation architecture and fast sampling mode.
-
Stochastic by Design
Generates diverse, randomized graphs by default. Optionally set a random seed for reproducible results in testing and research.
Research & Current Focus#
Built on Award-Winning Research
PyGraft-gen is built on PyGraft, which received the Best Resource Paper Award at ESWC 2024.
Current Focus: Object Properties
PyGraft-gen currently focuses on object properties (entity-to-entity relations). We're working to complete full object property support before moving to datatype properties. Future versions will add:
- Blank-node class expressions –
owl:Restriction,owl:unionOf,owl:intersectionOf,owl:complementOf, etc. - Value restrictions –
owl:someValuesFrom,owl:allValuesFrom,owl:hasValue, etc. - Compound domain/range – Complex class expressions in property constraints
- Higher-level disjointness –
owl:AllDisjointClasses,owl:disjointUnionOf, etc.
Once object properties are complete, we'll add datatype properties (literal-valued attributes like strings, integers, dates, etc.).
These additions require defining how they should be modeled from the extracted ontology, enforced during generation, and integrated with existing constraints. These are design questions we're actively exploring.
Community & Support#
- Discussions — Questions, ideas, and support
- Report Issues — Found a bug?
- Publications — Read the research
- Contributing — Join development