Skip to content

Contributing to PyGraft-gen#

We would love for you to contribute to PyGraft-gen and help make it even better than it is today! Please contribute to this repository if any of the following is true:

  • You have expertise in Knowledge Graphs, synthetic datasets, or graph generation,
  • You have expertise in stochastic generation, rule-based generation, or subgraph matching techniques,
  • You want to challenge AI pipelines with domain-related Knowledge Graphs and domain-specific graph patterns.

For Developers

See the Development Guide for setup instructions, code standards, and tooling.

How to Contribute#

We welcome community contributions, whether documentation, refactoring, tests, or new features.

You can contribute to PyGraft-gen in the following ways:

Desirable Features#

Want to contribute but not sure where to start? Here are features we're working toward, organized by priority.

Current Focus

After ontology extraction (v0.0.8) and KG optimization (v0.0.10), we're prioritizing infrastructure (testing, CI/CD, code quality) before expanding object property support.

High Priority#

  • Support for any input ontology (inherited from PyGraft, v0.0.8) - Generate KGs from real-world ontologies, not just PyGraft-generated schemas
  • Large-scale KG generation (v0.0.10) - Optimized KG generator architecture enabling millions of entities and tens of millions of triples
  • Unit test suite - Comprehensive tests for core generation modules (schema, entities, triples)
  • Pre-commit hooks - Automated code quality checks
  • CI/CD pipeline - GitHub Actions for testing and deployment
  • Docstring standardization - Clean, consistent Google-style docstrings

Medium Priority#

  • Conflict resolution (inherited from PyGraft) - Fix conflicts between rdfs:subPropertyOf, owl:FunctionalProperty, and owl:InverseFunctionalProperty
  • Inconsistency explanations (inherited from PyGraft) - Parse HermiT/Pellet output to identify and remove problematic triples without regenerating
  • Blank-node class expressions - Support owl:Restriction, owl:unionOf, owl:intersectionOf, owl:complementOf
  • Value restrictions - Support owl:someValuesFrom, owl:allValuesFrom, owl:hasValue
  • Compound domain/range - Complex class expressions in property constraints
  • Higher-level disjointness - Support owl:AllDisjointClasses, owl:disjointUnionOf

Low Priority#

  • JSON Schema validation - Validate user configurations against formal schema

Not Currently Prioritized

Datatype properties (literal-valued attributes like strings, integers, dates) will be addressed after object property support is complete.

Interested in tackling one of these? Start a Discussion to discuss your approach!


Communication#

GitHub is our primary communication platform. Use Discussions for questions, ideas, and general support. Reserve Issues for bug reports and feature requests, and Pull Requests for code contributions.

You may also contact the maintainers by email for more specific purposes and questions.

We value respectful and constructive communication. Keep discussions focused on practical problems and solutions. All interactions follow GitHub's Code of Conduct.