Skip to content

Configuration File#

This page documents the structure of pygraft.config.{json/yml}, the main configuration file for PyGraft-gen.

This file controls all aspects of schema and Knowledge Graph generation, including class hierarchies, relation characteristics, instance populations, and output formats.

On this page:

Configuration Scope

  • general: Used by all commands
  • schema (contains classes and relations): Only used during schema generation
  • kg: Only used during KG generation

Critical: If you run pygraft kg, the schema section is completely ignored


Structure#

The configuration file is organized into three main sections, each controlling different aspects of generation.

general#

Project-wide settings that apply to all generation tasks.

Parameter Description Allowed / Typical Values
project_name Output folder name. Use "auto" for automatic timestamped folders (schema) or to reuse existing schema folders (KG). Any string or "auto"
rdf_format RDF serialization format for schema and KG output. "xml", "ttl", "nt"
rng_seed Random seed for reproducibility. When null, generation is stochastic; when an integer, all outputs become deterministic. null (default) or any integer

schema.classes#

Controls synthetic class hierarchy generation.

Parameter Description Allowed / Typical Values
num_classes Total number of classes to generate. Positive integer
max_hierarchy_depth Maximum depth of the class hierarchy under owl:Thing. >= 1
avg_class_depth Target average depth for classes in the hierarchy. > 0 and < max_hierarchy_depth
avg_children_per_parent Average number of direct subclasses per parent (controls branching/tree shape). > 0.0
avg_disjointness Target proportion of class pairs marked as disjoint. Higher values create more logically separated class clusters. 0.0-1.0

schema.relations#

Controls object property generation and OWL/RDFS characteristics.

Parameter Description Allowed / Typical Values
num_relations Number of object properties to generate. Positive integer
relation_specificity Target average depth of domain/range class assignments. Higher values encourage more specific class constraints. 0.0-max_hierarchy_depth
prop_profiled_relations Proportion of relations that receive rdfs:domain and/or rdfs:range constraints. 0.0-1.0
profile_side Whether profiled relations have both domain and range or at least one. "both", "partial"
prop_symmetric_relations Proportion marked as owl:SymmetricProperty. 0.0-1.0
prop_inverse_relations Proportion that participate in owl:inverseOf pairs. 0.0-1.0
prop_transitive_relations Proportion declared as owl:TransitiveProperty. 0.0-1.0
prop_asymmetric_relations Proportion declared as owl:AsymmetricProperty. 0.0-1.0
prop_reflexive_relations Proportion declared as owl:ReflexiveProperty. Never receive domain/range constraints. 0.0-1.0
prop_irreflexive_relations Proportion declared as owl:IrreflexiveProperty. 0.0-1.0
prop_functional_relations Proportion declared as owl:FunctionalProperty (each subject has at most one object). 0.0-1.0
prop_inverse_functional_relations Proportion declared as owl:InverseFunctionalProperty (each object has at most one subject). 0.0-1.0
prop_subproperties Proportion assigned as subproperties in rdfs:subPropertyOf hierarchies. 0.0-1.0

kg#

Controls Knowledge Graph instance generation.

Parameter Description Allowed / Typical Values
num_entities Number of entity instances to generate. > 0
num_triples Target number of triples before inference-driven expansion. > 0
enable_fast_generation Speed optimization: creates smaller base KG then replicates it. Trades diversity for faster generation on large graphs. true, false
relation_usage_uniformity Controls distribution of relations across triples. Higher values produce more balanced usage. 0.0-1.0
prop_untyped_entities Proportion of entities that remain untyped (no rdf:type assertion). 0.0-1.0
avg_specific_class_depth Average depth of the most specific class assigned to each typed entity. > 0.0 and <= max_hierarchy_depth
multityping Whether entities may receive multiple most-specific types. true, false
avg_types_per_entity Average number of most-specific classes per typed entity. Must be 1.0 when multityping=false; >= 1.0 otherwise. >= 1.0
check_kg_consistency Whether to run post-generation HermiT reasoning to verify schema+KG consistency. true, false

Formats#

PyGraft-gen supports both JSON and YAML configuration formats.

pygraft init --format json  # Creates pygraft.config.json
pygraft init --format yaml  # Creates pygraft.config.yml

Both formats are functionally equivalent - use whichever you prefer.

Example Configuration#

{
  "general": {
    "project_name": "auto",
    "rdf_format": "ttl",
    "rng_seed": null
  },

  "schema": {
    "classes": {
      "num_classes": 50,
      "max_hierarchy_depth": 4,
      "avg_class_depth": 2.5,
      "avg_children_per_parent": 2.0,
      "avg_disjointness": 0.3
    },

    "relations": {
      "num_relations": 50,
      "relation_specificity": 2.5,
      "prop_profiled_relations": 0.9,
      "profile_side": "both",

      "prop_symmetric_relations": 0.3,
      "prop_inverse_relations": 0.3,
      "prop_transitive_relations": 0.1,
      "prop_asymmetric_relations": 0.0,
      "prop_reflexive_relations": 0.3,
      "prop_irreflexive_relations": 0.0,
      "prop_functional_relations": 0.0,
      "prop_inverse_functional_relations": 0.0,
      "prop_subproperties": 0.3
    }
  },

  "kg": {
    "num_entities": 3000,
    "num_triples": 30000,

    "enable_fast_generation": true,

    "relation_usage_uniformity": 0.9,
    "prop_untyped_entities": 0.0,

    "avg_specific_class_depth": 2.0,

    "multityping": false,
    "avg_types_per_entity": 1.0,

    "check_kg_consistency": true
  }
}

Configuration for Extracted Ontologies#

When using the ontology extraction workflow (pygraft extract), the configuration file is partially auto-generated.

What gets auto-generated:

  1. Run extraction: pygraft extract ontology.ttl
  2. PyGraft creates or updates a config file with:
  3. general.project_name: Set to extraction output folder name
  4. general.rdf_format: Matches ontology format
  5. schema section: Auto-filled with extraction statistics (informational only)
  6. kg section: Left untouched if file exists, otherwise populated with default template values

What you configure:

Edit the generated config to set KG generation parameters in the kg section, then run:

pygraft kg pygraft.config.json

Schema Section is Read-Only

After extraction, the schema section shows what was found but does not control generation. Only the kg section matters.

Learn More

See Ontology Extraction for details on this workflow.


FAQ#

How does project_name work differently across commands?

The project_name parameter behaves differently depending on the command:

Schema generation (pygraft schema or pygraft build):

  • "auto": Creates timestamped folder (e.g., 2025-12-05_13-22-44)
  • Custom name: Creates/reuses named folder (normalized and slugified)

KG generation (pygraft kg):

  • "auto": Reuses most recent synthetic (timestamped) schema folder
  • Custom name: Reuses existing folder (required for extracted ontologies like "noria", "foaf")
What are prop_* parameters?

All prop_* parameters are proportions between 0.0 and 1.0, controlling what percentage of relations receive specific characteristics.

How does rng_seed affect generation?

The rng_seed parameter affects all random decisions throughout generation. Set it to an integer for fully reproducible results, or leave it as null for stochastic generation.

What happens when multityping=false?

When multityping=false, avg_types_per_entity must be exactly 1.0 (automatically enforced if not specified).

Why don't reflexive relations get domain/range constraints?

Reflexive relations never receive domain/range constraints because they must apply to all entities in their domain by definition.