Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project tries to adhere to Semantic Versioning.
[Unreleased]#
Changes committed but not yet released will appear here.
Roadmap
Planned features are tracked in Contributing, not here. Unreleased is for completed work awaiting release.
v0.0.11 (2026-01-22)#
Documentation Overhaul & CLI Restructuring
Migrated documentation from Sphinx/Furo to MkDocs Material, restructured CLI into modular architecture, and introduced standalone consistency explanation
Breaking Changes
- Configuration files must now nest
classesandrelationsunder a newschemakey - Removed
explain_inconsistencyparameter fromgenerate_kg() - Removed
--explainflag fromkgandbuildcommands - Removed
reasoner()orchestrator frompygraft.utils.reasoning
Added#
- Standalone explain command: New
pygraft explainCLI command andexplain_kg()API for analyzing KG inconsistencies with flexible reasoner selection (--reasoner hermit|pellet|both) - Separated reasoner functions: Split
reasoner()intoreasoner_hermit()andreasoner_pellet()for clear separation of concerns
Changed#
- Migrated documentation to MkDocs: Replaced Sphinx/Furo with MkDocs Material theme, rewriting all documentation from scratch
- Restructured CLI into modular subpackage: One file per command (
init,schema,kg,build,extract,explain) with shared utilities informatting.py,validators.py, andextract_helper.py - Nested schema configuration: Grouped
classesandrelationsunder newschemasection to better reflect logical separation from KG generation parameters - Modern CLI type hints: Migrated all CLI parameters to
Annotatedstyle with improved docstrings
Fixed#
- YAML config support in KG generation: Replaced direct
json.load()withload_config()to properly handle both JSON and YAML configuration files - Auto project_name for KG mode: Now only selects synthetic schemas (timestamped folders) when using
auto, preventing accidental KG generation against extracted ontologies - Config cleanup on extraction:
pygraft extractnow produces canonical config structure, removing extra keys and restoring missing defaults while preserving KG section values
v0.0.10 (2026-01-08)#
KG Generation Optimization
Major performance overhaul enabling practical large-scale generation of Knowledge Graphs with millions of entities and tens of millions of triples
Changed#
- Restructured KG generator into specialized modules with clear responsibilities: types, structures, config, schema loading, entity creation, batch generation, and output serialization
- Switched to integer-based internal processing: All operations use lightweight integer IDs instead of strings, dramatically reducing memory overhead. String identifiers only used during final RDF serialization
- Moved from post-generation cleanup to inline validation: Triples validated as generated using pre-computed constraint data, eliminating expensive multi-pass graph scans
- Implemented batch sampling with vectorized operations: Generate and validate thousands of triples per iteration using NumPy arrays with pre-computed constraint caches
- Introduced two-phase constraint filtering: Fast vectorized phase eliminates invalid triples, deep semantic phase only processes remaining candidates
- Simplified state management: Generation state organized as structured attributes rather than nested function parameters
Fixed#
- Eliminated symmetric relation bottleneck: Replaced expensive data structure rebuilds with incremental constant-time tracking
- Resolved generation stalls and infinite loops: Added stall detection (drops unproductive relations), timeout protection, and adaptive oversampling for constrained properties
- Improved memory efficiency: Compact indexed structures, sparse arrays for entity pools, explicit cleanup after serialization, single-pass domain/range computation
- Optimized functional property validation: Constant-time set lookups replace full triple scans
Added#
- Intelligent generation heuristics: Weighted relation sampling, entity freshness bias, and fast generation mode (seed + replication for very large targets)
- Comprehensive constraint caching: All schema constraints analyzed once at startup for instant lookup during generation
- Structured data containers: Clear separation of schema metadata, constraint caches, entity state, and generation progress
Performance Impact#
| Aspect | Before | After | Improvement |
|---|---|---|---|
| Memory usage | Heavy string duplication | Integer IDs | ~60% reduction |
| Domain/range lookup | Per-sample recompute | Pre-cached | ~1000x faster |
| Functional checks | Scan all triples | Set lookup | ~10000x faster |
| Validation | Multiple post-gen passes | Inline | 5x fewer scans |
| Reliability | Could hang/stall | Robust termination | No infinite loops |
| Scale | Often impractical | Reliable | Million+ entities now |
Actual speedup varies by ontology complexity. Highly constrained schemas (many functional properties, extensive disjointness) see different gains than simpler ontologies, but all scenarios are substantially faster and complete reliably.
v0.0.9 (2026-01-08)#
Performance Restored
This release fixes the critical performance regression introduced in v0.0.8
Ontology Extraction (Performance Fix)
Fixed single-source-of-truth refactor that caused 20-30x performance regression in ontology extraction
Changed#
- Introduced
relations_seed.rqas canonical source of truth for object property universe - Implemented
@RELATIONS_SEEDmarker injection mechanism across all relation SPARQL queries - Removed working-graph materialization approach that caused slowdown
Fixed#
- Restored extraction performance from ~10 minutes back to ~20-40 seconds
- Ensured all relation extractors (patterns, inverses, hierarchy, disjointness, domain/range) use consistent property universe
- Completed implementation with missing
relations.pymodule updates for marker injection
Technical context: The earlier refactor attempted to centralize object properties via materialized working graph (membership triples + graph copy), which proved impractical. The new approach uses query-time injection for true single-source semantics without performance penalty.
v0.0.8 (2025-12-15)#
Critical Performance Issue - Do Not Use
This release contains a severe performance regression (20-30x slowdown) in ontology extraction that makes it impractical for use. Please use v0.0.9 or later instead. This entry is preserved for historical reference only.
Ontology Extraction
Introduced full ontology extraction pipeline, enabling PyGraft-gen to generate KGs from real-world ontologies
Breaking Changes#
- Removed
create_json_template()andcreate_yaml_template()in favor of unifiedcreate_config()API - Configuration template filenames now fixed as
pygraft_config.jsonorpygraft_config.yml(no longer customizable)
Added#
- Complete ontology extraction pipeline with dedicated modules:
namespaces.pyfor prefix and base IRI extractionclasses.pyfor class hierarchy extraction (generatesclass_info.json)relations.pyfor property extraction (generatesrelation_info.json)extraction.pyas the main pipeline entry pointqueries.pyfor centralized SPARQL query loading
- SPARQL query resources for extraction (class disjointness, hierarchy, relation patterns, inverses, subproperties, domain/range, property disjointness)
Changed#
- Updated CLI
initcommand to use newcreate_config()API - class_info.json:
direct_class2superclassesnow uses list structure for OWL multi-inheritance support - relation_info.json:
rel2superrelnow uses list structure for multi-parent property hierarchies
Fixed#
- Corrected KG serialization where CURIEs (e.g.,
bot:Site) produced broken IRIs; now properly expands identifiers vianamespaces_info.json - Fixed near-zero triple generation for ontologies with conjunctive domain/range constraints; now correctly samples entities satisfying all required classes
- Eliminated excessive rejection sampling by detecting and disabling relations with empty candidate pools after entity typing
- Corrected inverse domain/range disjointness filtering to work with list-based constraint structure
- Fixed inference oversampling to support multi-parent subproperty hierarchies
v0.0.7 (2025-12-09)#
CLI Modernization
Migrated from argparse to Typer for improved ergonomics and maintainability
Added#
- Typer-based CLI with structured subcommands:
help,init,schema,kg,build - User-facing output messages independent of logging levels (logging now optional via
-l/--log-level)
Changed#
- Template creation functions (
create_json_template,create_yaml_template) now return file paths for easier API composition reasoner()returns explicit boolean consistency flag instead of exception-based control flowgenerate_schema()returns(schema_path, is_consistent)tuple for direct access to both outputs- Replaced text2art banner with clean Rich-based rule in CLI header
- Improved
-l/--log-levelflag with clearer help text
Removed#
- Legacy argparse CLI implementation
v0.0.6 (2025-12-08)#
Subgraph matching patterns and tools
v0.0.5 (2025-12-08)#
Legacy Code Modernization
Major architectural refactor improving maintainability, reproducibility, and standards compliance
Added#
- Unified RNG strategy across all generators, enabling deterministic reproduction when seeded while keeping default runs stochastic
- Type system via
types.pyproviding centralized TypedDict definitions for all configuration files and JSON artifacts - CLI enhancements:
-V/--versionflag and--log-leveloption for controlling output verbosity - Comprehensive configuration validation pipeline with structural checks, strict type validation, and semantic constraints
- Centralized builder functions for
class_info,relation_info, andkg_infoJSON outputs
Changed#
- Migrated from flat layout to modern
src/directory structure with organized packages (generators/,utils/,resources/) - Renamed
template.{json/yml}topygraft_config.{json/yml}for clearer purpose - Reorganized configuration format into explicit
general,classes,relations, andkgsections - Refactored core generators with explicit configuration dataclasses, improved invariants, and well-defined entry points
- Improved CLI implementation with clearer help text and more robust validation
- Separated schema/KG generation from HermiT reasoning; KG files now contain only instance triples
- Standardized output handling with
pygraft_output/as default root and optional custom paths - Enhanced logging with consistent INFO milestones, DEBUG internals, and absolute paths
- Improved internal naming across constraint validation and triple generation helpers
Fixed#
- Corrected inverse range-disjointness filtering that previously applied head validation but removed triples based on tail
- Fixed phantom-layer sampling in class assignment that could sample beyond actual hierarchy depth
- Replaced order-dependent inverse mapping with canonical symmetric reconstruction
- Restored and corrected oversampling logic for inference-based triple augmentation
- Fixed functional and inverse-functional constraint checks with proper tuple indexing
- Unified domain/range disjointness validation to consistently use transitive superclass expansion
- Ensured HermiT reasoning works across all RDF formats via automatic RDF/XML conversion
Removed#
- Split
utils.pyinto focused modules:reasoning.py,cli.py,templates.py,paths.py,config.py - Removed redundant
generate()API; combined workflow now handled explicitly via CLI
v0.0.4 (2025-11-27)#
PEP 621 Migration & Tooling Update
Added#
- Modern development tooling stack:
- Ruff for linting and formatting
- Pyright/Basedpyright for static type checking
- Codespell for spell-checking
- Project-wide configuration via EditorConfig and
.python-version
- Initial
CHANGELOG.mdfollowing Keep a Changelog format - Updated
CONTRIBUTING.mdwith clearer development workflow
Changed#
- Migrated to PEP 621 build system using Hatchling
- Switched to dynamic versioning via git tags using
hatch-vcs(e.g.,v0.0.4→0.0.4, dev installs show0.0.5.dev0+...) - Renamed
pygraft/main.pytopygraft/cli.pyand updated console entrypoint accordingly - Raised minimum Python version to 3.10 (Python 3.8 reached EOL, 3.9 approaching EOL)
Removed#
- Legacy
setup.pyandsetup.cfgbuild configuration files
v0.0.3 (2023-09-08)#
Derived from PyGraft v0.0.3 PyPI release
Added#
- Public PyPI release as
pygraft==0.0.3 - Core generation pipeline with three execution modes: schema-only, KG-only, or combined schema + KG
- Extended RDFS and OWL construct support for standards-compliant modeling with fine-grained control
- Consistency checking of generated schemas and KGs via HermiT DL reasoner
- YAML-based configuration with
create_yaml_template()function to generate template config files - High-level Python API with
generate_schema(),generate_kg(), andgenerate()functions (exposed via__all__) - CLI support for running generation pipeline from command line
- Sphinx-based documentation with Read the Docs integration covering installation, parameters, and quickstart
Changed#
- Improved README and documentation with better feature descriptions and usage examples
v0.0.2 (2023-09-07)#
Derived from PyGraft v0.0.2 PyPI release
Fixed#
- Packaging metadata and README formatting issues from v0.0.1
v0.0.1 (2023-09-07)#
Derived from PyGraft v0.0.1 PyPI release
Added#
- Initial PyPI release of PyGraft
- Configurable schema and KG generator with schema-only, KG-only, and combined pipeline modes with consistency checking
- Initial documentation and README with project goals and basic usage