AI-driven Student Identity Resolution Case Study For US University

Case Study Details:

Industry: Higher Education

Region: US

Technology: Open AI · Azure Data Factory · Azure Logic Apps · Cosmos DB

The Challenge

For the university’s information systems division, maintaining a single, accurate, institution-wide student identity had become increasingly difficult. Over 1.6 million student records were spread across multiple academic, enrollment, and administrative platforms. As integrations expanded, so did duplicate student profiles, often caused by inconsistent data entry, format variations, or incomplete records.

Traditional deduplication relied heavily on deterministic rules—exact match on name, email, phone—an approach that struggled with real‑world data variability and failed to detect multi-record duplicate clusters.

This let to:

Fragmented student identity data
Administrative reporting inconsistencies
Increased manual effort for data cleanup
Operational inefficiencies across student services

The growing volume of records made manual review unsustainable, and no incremental improvement would solve it. The deduplication methodology itself needed modernization.

Evoke’s Approach

Evoke Technologies developed and deployed a hybrid AI-assisted Entity Resolution System that combines SQL-based candidate selection with LLM-driven similarity analysis to automatically detect duplicate student records at scale.

SQL-Based Candidate Generation
Configurable SQL pipelines generate potential duplicate clusters using flexible attribute matching rules, enabling preliminary grouping before deeper analysis.
LLM-Driven Similarity Analysis
Large Language Models evaluate names, emails, phone numbers, and addresses to identify semantic similarity beyond exact matches, resolving ambiguity caused by data variations or partial inputs.
Hybrid Scoring Framework
Deterministic match signals and LLM similarity outputs feed into a unified scoring engine, producing confidence scores to rank potential duplicate records.
Scalable Azure Pipeline
An Azure-based architecture—ADF, Logic Apps, Azure Functions, Cosmos DB—enables incremental, automated, and scalable processing of millions of student records.
Human-in-the-Loop Validation
High-confidence matches are auto-classified, while borderline clusters are routed to administrative reviewers for precise and controlled merging.

The Outcomes

Metric	Before	After
Duplicate detection accuracy	Limited by exact-match rules	Significantly improved through LLM semantic analysis
Undetected duplicate records	30,000+	All identified and flagged for cleanup
Administrative effort	High manual review dependency	Reduced through automated scoring & clustering
Scalability	Batch processes with bottlenecks	Fully scalable Azure-based pipeline

The university gained a repeatable, scalable, AI-powered entity resolution framework that continuously improves data quality, enhances operational efficiency, and supports long-term institutional data governance.

Strategic Value Delivered

Beyond operational gains, the engagement created a sustainable and intelligent data-quality foundation, ensuring:

A unified, accurate student identity across systems
Reduction of downstream administrative errors
Scalable support for future system integrations
Lower long-term cost and effort for data quality management

Services

Industries

Partners

insights

Careers

About us

The Smart Deduper: AI‑Driven Student Identity Resolution at Scale

Case Study Details:

The Challenge

Evoke’s Approach

The Outcomes

Strategic Value Delivered

Read the Full Case Study

Get in Touch

The Smart Deduper: AI‑Driven Student Identity Resolution at Scale

Case Study Details:

The Challenge

Evoke’s Approach

The Outcomes

Strategic Value Delivered

Read the Full Case Study