PhD Thesis | François Amat

Relational databases form the backbone of modern information systems, yet significant business logic remains hidden within the data itself—implicit regularities that schemas fail to capture. This thesis introduces a deterministic framework to mine these patterns.

We present MATILDA and MAHILDA, two systems capable of discovering expressive Tuple-Generating Dependencies and Horn Rules directly from relational tables. Prioritizing Explainable AI, we generate verifiable, rule-shaped artifacts that support auditing, data quality, and logical imputation without the opacity of black-box models.

MATILDA: Expressive TGD Mining

Chapter 4

A scalable algorithm to discover first-order dependencies with multi-atom heads and existential witnesses, capturing complex implications across multiple tables.

∀x,y: MARRIAGE(x,y) ⇒ ∃l,s: RESIDENCE(x,l,s) ∧ RESIDENCE(y,l,s)

Handles multi-table joins naturally
Discovers new existential variables (e.g., shared location)
Uses Constraint Graphs for efficient pruning
Auditable closed-world support & confidence

MAHILDA: Horn Rules with Disjoint Semantics

Chapter 5

A specialized system for mining recursive Horn rules. It introduces disjoint semantics to strictly prevent vacuous self-loops, serving as a rigorous baseline for ILP.

LINEAGE(p, c1) ∧ MARRIAGE(p, s) ⇒ LINEAGE(p, c2) (Where c1 ≠ c2)

Disjoint semantics prevents vacuous self-loops
Optimized for recursive patterns (e.g. siblings)
High-performance pruning strategies

Defense Committee

Fabian SuchanekDirecteur de thèseTélécom Paris

Pierre-Henri ParisCo-directeur de thèseUniversité Paris-Saclay

Fatiha SaisRapporteurUniversité Paris-Saclay

Pierre SenellartRapporteurENS Paris

Bernd AmannExaminateurSorbonne Université

Nathalie PernelleExaminatriceUniversité Sorbonne Paris Nord

MATILDA: Expressive TGD Mining

MAHILDA: Horn Rules with Disjoint Semantics

Defense Committee

MATILDA Mining Pipeline

MAHILDA Mining Pipeline