Configuration File#
This page documents the structure of pygraft.config.{json/yml}, the main configuration file for PyGraft-gen.
This file controls all aspects of schema and Knowledge Graph generation, including class hierarchies, relation characteristics, instance populations, and output formats.
On this page:
- Structure - The three main configuration sections
- Formats - JSON vs YAML
- Configuration for Extracted Ontologies - Extraction workflow specifics
- FAQ - Common questions and edge cases
Configuration Scope
general: Used by all commandsschema(containsclassesandrelations): Only used during schema generationkg: Only used during KG generation
Critical: If you run pygraft kg, the schema section is completely ignored
Structure#
The configuration file is organized into three main sections, each controlling different aspects of generation.
general#
Project-wide settings that apply to all generation tasks.
| Parameter | Description | Allowed / Typical Values |
|---|---|---|
project_name |
Output folder name. Use "auto" for automatic timestamped folders (schema) or to reuse existing schema folders (KG). |
Any string or "auto" |
rdf_format |
RDF serialization format for schema and KG output. | "xml", "ttl", "nt" |
rng_seed |
Random seed for reproducibility. When null, generation is stochastic; when an integer, all outputs become deterministic. |
null (default) or any integer |
schema.classes#
Controls synthetic class hierarchy generation.
| Parameter | Description | Allowed / Typical Values |
|---|---|---|
num_classes |
Total number of classes to generate. | Positive integer |
max_hierarchy_depth |
Maximum depth of the class hierarchy under owl:Thing. |
>= 1 |
avg_class_depth |
Target average depth for classes in the hierarchy. | > 0 and < max_hierarchy_depth |
avg_children_per_parent |
Average number of direct subclasses per parent (controls branching/tree shape). | > 0.0 |
avg_disjointness |
Target proportion of class pairs marked as disjoint. Higher values create more logically separated class clusters. | 0.0-1.0 |
schema.relations#
Controls object property generation and OWL/RDFS characteristics.
| Parameter | Description | Allowed / Typical Values |
|---|---|---|
num_relations |
Number of object properties to generate. | Positive integer |
relation_specificity |
Target average depth of domain/range class assignments. Higher values encourage more specific class constraints. | 0.0-max_hierarchy_depth |
prop_profiled_relations |
Proportion of relations that receive rdfs:domain and/or rdfs:range constraints. |
0.0-1.0 |
profile_side |
Whether profiled relations have both domain and range or at least one. | "both", "partial" |
prop_symmetric_relations |
Proportion marked as owl:SymmetricProperty. |
0.0-1.0 |
prop_inverse_relations |
Proportion that participate in owl:inverseOf pairs. |
0.0-1.0 |
prop_transitive_relations |
Proportion declared as owl:TransitiveProperty. |
0.0-1.0 |
prop_asymmetric_relations |
Proportion declared as owl:AsymmetricProperty. |
0.0-1.0 |
prop_reflexive_relations |
Proportion declared as owl:ReflexiveProperty. Never receive domain/range constraints. |
0.0-1.0 |
prop_irreflexive_relations |
Proportion declared as owl:IrreflexiveProperty. |
0.0-1.0 |
prop_functional_relations |
Proportion declared as owl:FunctionalProperty (each subject has at most one object). |
0.0-1.0 |
prop_inverse_functional_relations |
Proportion declared as owl:InverseFunctionalProperty (each object has at most one subject). |
0.0-1.0 |
prop_subproperties |
Proportion assigned as subproperties in rdfs:subPropertyOf hierarchies. |
0.0-1.0 |
kg#
Controls Knowledge Graph instance generation.
| Parameter | Description | Allowed / Typical Values |
|---|---|---|
num_entities |
Number of entity instances to generate. | > 0 |
num_triples |
Target number of triples before inference-driven expansion. | > 0 |
enable_fast_generation |
Speed optimization: creates smaller base KG then replicates it. Trades diversity for faster generation on large graphs. | true, false |
relation_usage_uniformity |
Controls distribution of relations across triples. Higher values produce more balanced usage. | 0.0-1.0 |
prop_untyped_entities |
Proportion of entities that remain untyped (no rdf:type assertion). |
0.0-1.0 |
avg_specific_class_depth |
Average depth of the most specific class assigned to each typed entity. | > 0.0 and <= max_hierarchy_depth |
multityping |
Whether entities may receive multiple most-specific types. | true, false |
avg_types_per_entity |
Average number of most-specific classes per typed entity. Must be 1.0 when multityping=false; >= 1.0 otherwise. |
>= 1.0 |
check_kg_consistency |
Whether to run post-generation HermiT reasoning to verify schema+KG consistency. | true, false |
Formats#
PyGraft-gen supports both JSON and YAML configuration formats.
pygraft init --format json # Creates pygraft.config.json
pygraft init --format yaml # Creates pygraft.config.yml
Both formats are functionally equivalent - use whichever you prefer.
Example Configuration#
{
"general": {
"project_name": "auto",
"rdf_format": "ttl",
"rng_seed": null
},
"schema": {
"classes": {
"num_classes": 50,
"max_hierarchy_depth": 4,
"avg_class_depth": 2.5,
"avg_children_per_parent": 2.0,
"avg_disjointness": 0.3
},
"relations": {
"num_relations": 50,
"relation_specificity": 2.5,
"prop_profiled_relations": 0.9,
"profile_side": "both",
"prop_symmetric_relations": 0.3,
"prop_inverse_relations": 0.3,
"prop_transitive_relations": 0.1,
"prop_asymmetric_relations": 0.0,
"prop_reflexive_relations": 0.3,
"prop_irreflexive_relations": 0.0,
"prop_functional_relations": 0.0,
"prop_inverse_functional_relations": 0.0,
"prop_subproperties": 0.3
}
},
"kg": {
"num_entities": 3000,
"num_triples": 30000,
"enable_fast_generation": true,
"relation_usage_uniformity": 0.9,
"prop_untyped_entities": 0.0,
"avg_specific_class_depth": 2.0,
"multityping": false,
"avg_types_per_entity": 1.0,
"check_kg_consistency": true
}
}
Configuration for Extracted Ontologies#
When using the ontology extraction workflow (pygraft extract), the configuration file is partially auto-generated.
What gets auto-generated:
- Run extraction:
pygraft extract ontology.ttl - PyGraft creates or updates a config file with:
general.project_name: Set to extraction output folder namegeneral.rdf_format: Matches ontology formatschemasection: Auto-filled with extraction statistics (informational only)kgsection: Left untouched if file exists, otherwise populated with default template values
What you configure:
Edit the generated config to set KG generation parameters in the kg section, then run:
pygraft kg pygraft.config.json
Schema Section is Read-Only
After extraction, the schema section shows what was found but does not control generation. Only the kg section matters.
Learn More
See Ontology Extraction for details on this workflow.
FAQ#
How does project_name work differently across commands?
The project_name parameter behaves differently depending on the command:
Schema generation (pygraft schema or pygraft build):
"auto": Creates timestamped folder (e.g.,2025-12-05_13-22-44)- Custom name: Creates/reuses named folder (normalized and slugified)
KG generation (pygraft kg):
"auto": Reuses most recent synthetic (timestamped) schema folder- Custom name: Reuses existing folder (required for extracted ontologies like "noria", "foaf")
What are prop_* parameters?
All prop_* parameters are proportions between 0.0 and 1.0, controlling what percentage of relations receive specific characteristics.
How does rng_seed affect generation?
The rng_seed parameter affects all random decisions throughout generation. Set it to an integer for fully reproducible results, or leave it as null for stochastic generation.
What happens when multityping=false?
When multityping=false, avg_types_per_entity must be exactly 1.0 (automatically enforced if not specified).
Why don't reflexive relations get domain/range constraints?
Reflexive relations never receive domain/range constraints because they must apply to all entities in their domain by definition.