Record Linkage | Minimalist Innovation LLC

An infographic showing a secure data flow from three sources: 'Hospital: Disease' (blue), 'Pharmacy: Medication' (green), and 'Lab: Test Results' (purple). The streams of binary data and points, linked to 'Name and DOB' with padlocks, converge into a central 'Honest Broker' server rack. A unified data stream then flows from the server to a 'Unified Analysis' digital dashboard displaying charts and graphs, viewed by a glowing human silhouette on the right.

Privacy-Preserving Record Linkage: Cryptography, Unicode, and Matching In the Dark

Privacy-Preserving Record Linkage (PPRL) lets two organizations determine which records refer to the same person — without either party seeing the other's data. Healthcare networks, judicial agencies, and government registries link records using cryptographic Bloom filters and HMAC-keyed hashing. But if strings are not Unicode-normalized before encoding, the hash diverges and matches fail silently. This post shows the full pipeline: q-grams, Bloom filters, CLKs, and normaliza

Gandhinath Swaminathan

Mar 249 min read

Illustration comparing a neat prototype entity resolution model with a complex, messy production data graph.

Benchmarking & Datasets for Entity Resolution

A practical guide to benchmarking entity resolution (ER) systems. It covers commonly used public datasets, explains which evaluation metrics are informative in ER (and why accuracy can mislead), and outlines how to design a domain-specific test set so results are meaningful for production decisions.

Gandhinath Swaminathan

Jan 268 min read

Feature illustration of a Sony PS‑LX350H turntable with SPLADE token weights on the left and a token‑to‑token attention graph on the right, showing sparse retrieval turning into an entity-resolution decision.

From Inverted Index to Attention Graph: Turning SPLADE Tokens Into ER Decisions

False entity merges don’t just dirty data. They distort inventory, pricing, and forecasts, then every model and report built on top. Learned sparse retrieval improves recall, but it can still treat records like unordered tokens. This post adds token-to-token attention as a structural check so near-duplicates pass and lookalikes fail, with a trail you can audit.

Gandhinath Swaminathan

Jan 213 min read

Privacy-Preserving Record Linkage: Cryptography, Unicode, and Matching In the Dark

Benchmarking & Datasets for Entity Resolution

From Inverted Index to Attention Graph: Turning SPLADE Tokens Into ER Decisions

Domain Modeling for Agentic AI: Customer 360 as a Semantic Problem

Sustainable Entity Resolution: Profiling and Energy Measurement