Luca Engel

AI and Knowledge Graphs for Invoice Verification

Sep 7, 2025

AI and Knowledge Graphs for Invoice VerificationCover image generated with OpenAI's GPT-5 model.

TL;DR

At a glance

Problem

Swiss healthcare invoice verification is a complex and multi-faceted task. Auditors face three recurring challenges:

  1. Incomplete or placeholder tariff codes — invoices often include code “999” as a placeholder, requiring manual lookup of the correct tariff.
  2. Complex treatment limitation rules — the official Tariff Information System (TIS) encodes these rules in natural language, which must be consistently interpreted and applied.
  3. Justifying rejections — when invoices are non-compliant, insurers must provide transparent and reproducible rejection reasons, which today often depend on past experience and handwritten notes.

Together, these steps make invoice verification slow, costly, and inconsistent. The core challenge is: can AI systems reduce manual workload while ensuring that decisions remain accurate, explainable, and ethically sound across all three use cases?

Solution overview

I designed three pipelines:

  1. Tariff Prediction — models suggest plausible missing tariff codes (= overarching groupings of medical treatments).
  2. Text2Cypher Translation — automatically converts treatment limitation rules into Cypher queries for execution on a Neo4j graph.
  3. Invoice Verification (GraphRAG) — combines historical rejection reasons and Text2Cypher outputs to deliver explainable rejection decisions.

Ethical analysis ensured privacy, fairness, transparency, and sustainability considerations were addressed.

Architecture

The system is built around a central knowledge graph storing invoices, treatments, tariffs, and rejection reasons. Three pipelines operate on top of this graph:

Data

Method

1. Tariff Prediction

2. Text2Cypher Translation

3. Automatic Invoice Verification

Experiments & Results

Benchmarks

TaskMetricScore (%)
Tariff PredictionMacro-F183.0
Text2CypherExecution Acc.91.0
Rejection VerificationAccuracy (conf.)96.5

Evaluation protocol. 80/10/10 train/val/test split for tariff prediction, held-out custom dataset for rule translation, and prototype testing on real rejection scenarios.

Error analysis


Impact

What I learned

Future Work