Sustainable Entity Resolution: Profiling and Energy Measurement

Gandhinath Swaminathan
Mar 27
8 min read

This Post At a Glance

The thread: 
Part 1 Blog post illustrates where matching breaks at the byte layer. 
Part 2 Blog post explains how ICU4X enforces those rules with zero-copy internationalization. 
Part 3 Blog post shows building a privacy-preserving linkage pipeline through Bloom filters and keyed-HMAC. 
Part 4, this post, serves as the concluding part (part 4) of Unicode, I18N, and Entity Resolution Series, focusing on closing the loop on sustainability.

The argument: Rare earth minerals go into GPUs. GPUs consume power to serve large language model inference. LLM inference for entity resolution wastes those minerals. A deterministic NFKC normalizer, a few Bitwise operations, and a Jaccard comparison over a 1,000-bit vector can handle every fuzzy match that a 70-billion-parameter model attempts. The energy difference exceeds 1,400 times.

The tools: Profiling tools reveal where code burns cycles. Energy measurement tools quantify the joules consumed. Both disciplines together define what sustainable entity resolution looks like in production.

The plea: Habitat destruction takes many forms—from palm oil plantations displacing Bornean orangutans to rare earth mines being carved out for our GPUs. Scarce resources demand conscious stewardship. While charities like Borneo Orangutan Survival and many others in similar capacity do the heavy lifting to undo the damage on the ground, we have a duty at our keyboards. Before we default to generating solutions with massive models, we must weigh the cost of our convenience. Let this post serve as a quiet reminder to internalize the ecological toll of every line of code we ship.

Bar chart comparing energy consumption: deterministic PPRL pipeline at 11.11 Joules for 10,000 records versus a single Language Model query at 1.55 Joules versus Neural network pairwise matching at 15,480 Joules for the same workload. The generative approach uses about 1,400 times more energy. — ***Energy per workload: the deterministic PPRL pipeline uses far less energy than LLM-based matching.***

Why Generative AI Should Not Be The Default Solution for Entity Resolution

The temptation to reach for a large language model makes sense on the surface. LLMs handle noisy text and language variation without explicit rules, and the true cost stays hidden behind a simple API call. But when we pull back the curtain on energy consumption, the reality is stark.

Research comparing general-purpose AI to task-specific models reveals massive inefficiencies for equal tasks:

The Baseline Penalty: General-purpose AI consumes 20 to 30 times more energy than task-specific models.
The Per-Query Cost: A single, short GPT-4o query burns through 0.43 watt-hours of energy.
The Scale Multiplier: Models with 70 billion or more parameters consume 100 times more energy per token than smaller, focused models.

The situation for code quality reinforces this waste. When researchers measured the energy consumption of LLM-generated code against human-written canonical solutions, the human solutions consistently won out. On average, human-written code was:

1.17x more efficient than DeepSeek-v3.
1.21x more efficient than GPT-4o.
2x more efficient than Gemini 1.5 Pro.

When looking at specific algorithmic problem categories, the disparity becomes catastrophic. GPT-4o generated solutions that consumed up to 46 times more energy than the canonical solution, while LLaMA-3.3-70B generated code that consumed up to 149 times more energy.

The code an LLM writes to solve an entity resolution problem might itself run with deep inefficiency.

The Hardware Toll: Rare Earth Minerals and The Geopolitics of Compute

The discourse on software efficiency fails to stay separate from hardware manufacturing. High-performance data center GPUs rely on rare earth elements to achieve their magnetic, conductive, and luminescent properties. Every time an unoptimized script invokes a neural network for a string comparison, it utilizes hardware born from intensive mining operations.

The extraction and refining of these elements leads to deforestation, soil erosion, and water contamination.

According to the USGS 2026 Mineral Commodity Summaries, the geopolitical centralization of these resources has created a critical dependency:

In 2025 alone, U.S. imports of rare-earth compounds and metals increased by 169%.
The U.S. remains 100 percent reliant on imports for gallium and natural graphite—both essential for modern semiconductor and battery life cycles.

Software efficiency acts as the primary determinant of hardware longevity. Heavy workloads induce thermal throttling, and repeated thermal cycling degrades semiconductor pathways.

Empirical studies document CPU slowdowns of 12 to 40 percent from software updates that introduce unoptimized code paths or heavy security mitigations that block efficient execution.

As software becomes bloated—a phenomenon termed algorithmic obsolescence—older yet physically sound hardware gets rendered artificially obsolete. Writing performant code directly delays hardware refresh cycles, mitigates the need for new mineral extraction, and stems the accumulation of toxic waste.

The E-Waste Reality: According to the Global E-waste Monitor, worldwide e-waste generation reached a record 62.1 million metric tons in 2022 and is on track to hit 82 million tons by 2030. Under a business-as-usual scenario, this figure will reach 111 million tons by 2050. Most critically, only 1% of the global demand for rare earth elements is currently met through recycling.

The Inference Energy Dominance

A dangerous misconception holds that model training forms the primary environmental burden. Training represents a discrete, one-time event; inference—the ongoing querying of the model in production—is the true engine of waste.

Current data shows that inference accounts for more than 90 percent of total LLM power consumption over the model lifecycle. With services answering billions of queries daily, the cumulative energy consumption of inference scales far beyond the initial training cost.

To visualize the impact of a single "efficient" model:

The Electricity Footprint: A single short GPT-4o query consumes 0.43 watt-hours. Scaled to 700 million daily queries, this creates an annual electricity demand comparable to 35,000 U.S. homes.
The Water Cost: The freshwater evaporation required to cool the servers for those same queries matches the annual drinking needs of 1.2 million people.

By 2028, over 80 percent of data center accelerators will be dedicated exclusively to inference. These metrics highlight why engineers must view computational cycles on GPUs not as "free" API calls, but as finite, high-cost resources.

The Profiling Toolchain

Knowing that deterministic code performs better proves insufficient. The code must actually run fast and lean. Profiling quantifies where cycles go. The discipline connects to sustainability because clock cycles equal joules. A loop that runs 10 times longer than it needs to burns 10 times more energy.

Criterion.rs - statistical benchmarking for Rust

Criterion.rs provides statistics-driven microbenchmarking. It collects measurements over many iterations and applies statistical analysis to distinguish real performance changes from noise. The library produces HTML reports with plots for visualizing distributions.

For the PPRL pipeline we build in the last blog post, Criterion benchmarks belong at the function level. The following results come from actual runs against the codebase:

Horizontal bar chart showing per-operation benchmark times from Criterion.rs. CLK encoding at 70.72 microseconds dominates. Normalization, tokenization, and similarity each measure under 2 microseconds. — ***Encoding 100 records from the 100,000-record HIPAA dataset generated using GECO dataset generator holds steady at 7.25 ms. The linear scaling confirms no memory leaks or degraded throughput.***

Memory profiling with DHAT

The codebase implements dhat for heap allocation analysis. The memory profiler instruments the global allocator and reports:

Every allocation: Its total size and frequency.
The call site: Exactly which line of code triggered the heap usage.
The "Total Bytes" vs. "Max Bytes": Helping us distinguish between temporary spikes and long-term leaks.

Measuring energy consumption

Profiling reveals where time goes. Energy measurement reveals where power goes. The two correlate but differ: a loop that stalls on cache misses burns power without making progress.

To bridge this gap, we use Intel's Running Average Power Limit (RAPL) interface. RAPL provides hardware energy counters for:

The CPU package and cores
The integrated GPU
DRAM (Memory)

On Linux, we expose these power limits through the perf command. Because power counters update approximately every millisecond and overflow in roughly 60 seconds, your sampling frequency should exceed once per minute for long-running production jobs.

The PPRL codebase energy measurement

To move beyond estimates, the codebase integrates energy measurement directly into the benchmark harness. Our methodology adapts to the environment:

On macOS: It reads CPU power directly via SMC sensors.
On Linux: It falls back to time-based estimation to feed the Software Carbon Intensity calculation especially when RAPL counters are unavailable.

Code example in Rust to calculate energy.

These measurements feed directly into the Software Carbon Intensity (SCI) calculation, and the results are definitive:

Metric	Value
Average power	15.0 W (TDP estimate)
Total energy	11.11 J
Duration	740.6 ms
SCI	0.000147 gCO₂e per 1,000 records

The Software Carbon Intensity for the deterministic PPRL pipeline measures at 0.000147 gCO₂e per 1,000 records matched. For comparison, a single LLM query at 0.43 watt-hours contributes 0.204 gCO₂e. When we scale this to a standard entity resolution task of matching 10,000 records pairwise, the disparity is impossible to ignore: The LLM produces 1,400 times more carbon than the deterministic pipeline.

Pipeline Hot Path Analysis

Chart showing CPU time distribution. CLK encoding with HMAC-SHA256 consumes 92.9 percent. Preprocessing, tokenization, similarity, and blocking share the remaining 7.1 percent.

The CLK encoding step consumes 93 percent of per-record CPU time. This concentration in HMAC-SHA256 looks correct. The cryptographic hash provides the security guarantee that prevents frequency attacks. Consequently, optimization should focus on tuning the number of hash functions (k) per attribute or using hardware-accelerated SHA extensions where available.

The minimal slices for normalization (1.7%) and tokenization (2.2%) confirm that ICU4X zero-copy processing adds negligible load. The preprocessing pipeline from the series blog posts, part 1 and part 2, NFKC followed by case fold and diacritic strip, runs in 1.31 µs per record.

Scaling

The `PPRL` pipeline scales linearly (Note: this naive implementation did not have any lock or I/O constraints for demonstration purposes). Encoding 100,000 HIPAA records takes about 7.25 seconds.

The Ecological Imperative

Software often feels weightless, but its footprint is physically carved into the landscapes where our hardware is born. The 50% decline of Bornean orangutans over the last sixty years is a sentinel metric—a proxy for the cascading loss of flora and fauna across entire ecosystems. It is a lagging indicator of the industrial expansion required to sustain our silicon supply chain.

When we waste GPU cycles on an LLM query that a hash function solves better, we trigger a direct, destructive pipeline:

Inefficient Code → Hardware Obsolescence → Increased Mining & E-waste → Ecological Erasure.

This is a resource conflict. Our inefficient AI workloads compete for the finite energy budget required by critical infrastructure—healthcare, finance, and emergency services. In this context, architectural optimization isn't just a performance win; it is resource conservation. A saved joule is a mineral left in the ground, a device kept out of a landfill, and a habitat left intact.

Let’s write every line of code as if paying for it in joules. Because we are.

References and credits

Series on Unicode, I18N, and Entity Resolution

Minimalist Innovation. The hidden complexity of text: A close look at Unicode normalization for entity resolution https://www.minimalistinnovation.com/post/unicode-normalization-entity-resolution
Minimalist Innovation. The infrastructure behind global text: I18N, ICU, and why Rust does it differently.https://www.minimalistinnovation.com/post/global-text-infrastructure-i18n-icu-rust
Minimalist Innovation. Privacy-Preserving Record Linkage: Cryptography, Unicode, and Matching In the Dark. https://www.minimalistinnovation.com/post/pprl-bloom-filters-entity-resolution
Minimalist Innovation. Sustainable Entity Resolution: Profiling and Energy Measurement (this post)

Energy and sustainability research

Proof News. General Purpose AI Uses 20 to 30 Times More Energy than Task-Specific AI. https://www.proofnews.org/general-purpose-ai-uses-20-to-30-times-more-energy-than-task-specific-ai/
John Snow Labs. Tokens per Joule: How to Quantify and Reduce the Energy Footprint of Clinical LLM Inference. https://www.johnsnowlabs.com/tokens-per-joule-how-to-quantify-and-reduce-the-energy-footprint-of-clinical-llm-inference/
Profiling Energy Use in Large Language Models Inference. arXiv. https://arxiv.org/html/2407.16893v2
Evaluating the Energy-Efficiency of the Code Generated by LLMs. arXiv. https://arxiv.org/pdf/2505.20324.pdf
State of the Apes: Extractive Industries and Ape Conservation (Cambridge University Press / Arcus Foundation) .

Software carbon intensity

Green Software Foundation. SCI Specification Achieves ISO Standard Status. https://greensoftware.foundation/articles/sci-specification-achieves-iso-standard-status/
Green Software Foundation. SCI Specification. https://github.com/Green-Software-Foundation/sci
Enhancing the Software Carbon Intensity Specification. https://www.greensort.org/GSF_SCI.html

Profiling and measurement tools

Criterion.rs: Statistics-driven benchmarking for Rust. https://github.com/bheisler/criterion.rs
Divan: Fast and Simple Benchmarking for Rust. https://nikolaivazquez.com/blog/divan/
Profiling Rust programs the easy way. https://www.ntietz.com/blog/profiling-rust-programs-the-easy-way/
How to Profile Rust Applications with perf, flamegraph, and samply. https://oneuptime.com/blog/post/2026-01-07-rust-profiling-perf-flamegraph/view
Running Average Power Limit (RAPL). https://projectexigence.eu/green-ict-digest/running-average-power-limit-rapl/
Reading RAPL energy measurements from Linux. https://web.eece.maine.edu/~vweaver/projects/rapl/

Energy tracking frameworks

CodeCarbon. https://pypi.org/project/codecarbon/
Green Metrics Tool. https://www.green-coding.io/products/green-metrics-tool/
Green Metrics Tool: Measuring for fun and profit. arXiv. https://arxiv.org/html/2506.23967v1
Automated SCI Measurements with the Green Metrics Tool. https://greensoftware.foundation/articles/use-case-automated-sci-measurements-with-green-metrics-tool/

SIMD and optimization

Ultra-Fast Bloom Filters using SIMD techniques. https://wany16.github.io/files/SIMD-IWQoS.pdf
Bloom Overtakes Cuckoo at High Throughput. VLDB. https://www.vldb.org/pvldb/vol12/p502-lang.pdf

Entity resolution

Christen, P., Ranbaduge, T., and Schnell, R., 2020. Linking Sensitive Data: Methods and Techniques for Practical Privacy-Preserving Information Sharing. Springer.
Christen, P., and Pudjijono, A., 2009. GeCo: An online personal data generator and corruptor. https://dmm.anu.edu.au/geco/
Blocking and Filtering Techniques for Entity Resolution: A Survey. https://helios2.mi.parisdescartes.fr/~themisp/publications/csur20-blockingfiltering.pdf
Practical Guide to Entity Resolution. Towards Data Science. https://towardsdatascience.com/practical-guide-to-entity-resolution-part-4-299ac89b9415/