top of page


Benchmarking & Datasets for Entity Resolution
A practical guide to benchmarking entity resolution (ER) systems. It covers commonly used public datasets, explains which evaluation metrics are informative in ER (and why accuracy can mislead), and outlines how to design a domain-specific test set so results are meaningful for production decisions.

Gandhinath Swaminathan
Jan 268 min read


From Inverted Index to Attention Graph: Turning SPLADE Tokens Into ER Decisions
False entity merges don’t just dirty data. They distort inventory, pricing, and forecasts, then every model and report built on top. Learned sparse retrieval improves recall, but it can still treat records like unordered tokens. This post adds token-to-token attention as a structural check so near-duplicates pass and lookalikes fail, with a trail you can audit.

Gandhinath Swaminathan
Jan 213 min read
bottom of page