Mining Patterns on Tabular Data

PhD Dissertation by François Amat • Dec 2025
Institut Polytechnique de Paris | Télécom Paris
↓ Download PDF (1.2MB)

Relational databases form the backbone of modern information systems, yet significant business logic remains hidden within the data itself—implicit regularities that schemas fail to capture. This thesis introduces a deterministic framework to mine these patterns.

We present MATILDA and MAHILDA, two systems capable of discovering expressive Tuple-Generating Dependencies and Horn Rules directly from relational tables. Prioritizing Explainable AI, we generate verifiable, rule-shaped artifacts that support auditing, data quality, and logical imputation without the opacity of black-box models.

MATILDA: Expressive TGD Mining

Chapter 4

A scalable algorithm to discover first-order dependencies with multi-atom heads and existential witnesses, capturing complex implications across multiple tables.

Body Head 1 Head 2
MARRIAGE(x,y) (Elvis, Priscilla) RESIDENCE(x) Memphis, TN RESIDENCE(y) Memphis, TN
∀x,y: MARRIAGE(x,y) ⇒ ∃l,s: RESIDENCE(x,l,s) ∧ RESIDENCE(y,l,s)

MAHILDA: Horn Rules with Disjoint Semantics

Chapter 5

A specialized system for mining recursive Horn rules. It introduces disjoint semantics to strictly prevent vacuous self-loops, serving as a rigorous baseline for ILP.

Table A Table A' Disjoint Check
Parent: Lisa Riley ≠ Ben Riley = Riley
LINEAGE(p, c1) ∧ MARRIAGE(p, s) ⇒ LINEAGE(p, c2) (Where c1 ≠ c2)

Defense Committee

Fabian SuchanekDirecteur de thèseTélécom Paris
Pierre-Henri ParisCo-directeur de thèseUniversité Paris-Saclay
Fatiha SaisRapporteurUniversité Paris-Saclay
Pierre SenellartRapporteurENS Paris
Bernd AmannExaminateurSorbonne Université
Nathalie PernelleExaminatriceUniversité Sorbonne Paris Nord