Fundamentals#
This guide introduces the core concepts behind PyGraft-gen: ontologies, Knowledge Graphs, and why you'd want to generate synthetic versions of them.
On this page, you will find:
- What is a Knowledge Graph?
- Industry Impact
- What is an Ontology?
- RDF, OWL & Semantic Web Standards
- What Does "Synthetic" Mean?
- How PyGraft-gen Fits In
- Ready to Generate?
What is a Knowledge Graph?#
A Knowledge Graph stores information as a network of connected entities and their relationships. Each piece of information is represented as a triple: (subject, predicate, object)
The triple is the atomic unit of Knowledge Graphs – it's the smallest, indivisible piece of structured information. You can't break it down further while preserving meaning.
Example:
(Neo, knows, Morpheus)
(Neo, escapes, Matrix)
(Matrix, createdBy, Machines)
This says: "Neo knows Morpheus. Neo escapes the Matrix. The Matrix was created by Machines."
Knowledge graphs represent facts as interconnected data, making it easy to traverse relationships and discover patterns.
Industry Impact#
Beyond the technical definition, Knowledge Graphs are powering critical systems across industries. They were recognized as a top enabler technology by Gartner's Emerging Tech Impact Radar (2024), driving adoption across:
Search — Powering Google's search results (2012)
Powering Google's search results – grew from 570 million entities in 2012 to 500 billion facts on 5 billion entities by 2020, answering roughly one-third of Google's 100 billion monthly searches
Automotive — Renault uses KGs to validate car configurations (2012)
Renault uses KGs to validate car configurations – encoding constraints between features to automatically filter $10^{20}$ valid configurations from $10^{25}$ possible combinations (only 1 in 100,000 random combinations is valid)
Manufacturing — Volvo Cars developed the Insight Lab (2019)
Volvo Cars developed the Insight Lab, an integrated graph service using Neo4j to manage increasingly complex vehicle configurations, customizations, and dependencies between features and functions – transforming complex manufacturing data into actionable insights for cross-team collaboration
Healthcare — Enabling drug discovery and precision medicine (2020)
Enabling drug discovery and precision medicine through biomedical KGs – integrating genomic, pharmaceutical, and clinical data to identify drug-target interactions and predict disease treatments
Telecom — Nokia leverages KGs for network automation (2023)
Nokia leverages KGs for network automation – identifying $9 billion in potential energy savings and preventing 280,000 acres of deforestation across mobile network infrastructures
Consumer Electronics — Samsung acquired Oxford Semantic Technologies (2024)
Samsung has acquired Oxford Semantic Technologies (following collaboration since 2018) – integrating personal Knowledge Graphs with on-device AI to provide hyper-personalized experiences across mobile devices, TVs, and home appliances
Source Attribution
Industry examples adapted from course materials provided by Inria Academy, authored by Fabien Gandon (Université Côte d'Azur, Inria, CNRS, I3S).
What is an Ontology?#
To build Knowledge Graphs at scale, you need structure. That's where ontologies come in – they're the schemas that define what can exist in your graph and how things can relate to each other.
An ontology specifies:
- Classes – Types of entities (
Person, Company, Location) - Properties – Relationships between entities (
worksFor, hasRole, locatedIn) - Constraints – Rules that must be followed (domain, range, disjointness)
Example ontology:
# Classes
:Person a owl:Class .
:Company a owl:Class .
:Location a owl:Class .
# Properties
:worksFor a owl:ObjectProperty ;
rdfs:domain :Person ;
rdfs:range :Company .
:locatedIn a owl:ObjectProperty ;
rdfs:domain :Company ;
rdfs:range :Location .
:hasRole a owl:ObjectProperty ;
rdfs:domain :Person .
This ontology says:
- People work for Companies (via
worksFor) - Companies are located in Locations (via
locatedIn) - People have roles (via
hasRole)
Domain/Range constraints:
worksForcan only have a Person as its subject (domain)worksForcan only have a Company as its object (range)
Schema vs Data
Ontology = Schema (structure and rules)
Knowledge Graph = Data (actual instances following those rules)
RDF, OWL & Semantic Web Standards#
Knowledge graphs and ontologies aren't just informal concepts – they're built on rigorous semantic web standards developed by the World Wide Web Consortium (W3C):
RDF (Resource Description Framework)
The foundation. Everything is expressed as triples: (subject, predicate, object)
Adds basic vocabulary for defining classes and properties:
rdfs:subClassOf– Class hierarchiesrdfs:domain– What can be a subjectrdfs:range– What can be an object
Adds rich constraints and logical rules:
owl:disjointWith– Things that can't overlapowl:FunctionalProperty– Properties with one value maxowl:SymmetricProperty– Bidirectional relationshipsowl:TransitiveProperty– Inherited relationships
A human-friendly syntax for writing RDF/OWL. All examples in this documentation use Turtle syntax.
Other formats exist (RDF/XML, N-Triples), but Turtle is the most readable.
What Does "Synthetic" Mean?#
Now that you understand Knowledge Graphs and ontologies, let's talk about why you'd want to generate fake versions of them.
Synthetic data is artificially generated data that mimics real data's structure and characteristics without containing actual sensitive information.
:JohnDoe a :Patient ;
:hasCondition :Diabetes ;
:treatedBy :DrSmith .
:DrSmith a :Doctor ;
:worksAt :CityHospital .
:E1 a :Patient ;
:hasCondition :Diabetes ;
:treatedBy :E2 .
:E2 a :Doctor ;
:worksAt :E3 .
The synthetic version follows the same ontology (Patient, Doctor, hasCondition, treatedBy, worksAt) but uses generated identifiers (E1, E2, E3) instead of real people and places. The structure, relationships, and constraints are preserved, but the actual data is fake.
How PyGraft-gen Fits In#
PyGraft-gen brings all these concepts together: it generates synthetic Knowledge Graphs that follow ontology constraints using W3C semantic web standards.
You can either:
- Start from an existing ontology – Extract its structure and generate synthetic instance data that respects all constraints
- Generate everything from scratch – Create both the ontology schema and instance data using statistical parameters
Both approaches produce valid RDF/OWL outputs with constraint-aware generation, ensuring every triple is logically consistent.
Ready to Generate?#
You now understand the fundamentals of Knowledge Graphs, ontologies, and synthetic data generation.
Next steps:
- Install PyGraft-gen – Set up in minutes
- Quickstart – Generate your first KG
- Core Concepts – Learn how the generation algorithms work