Anirudh Chauhan

MY TOP 3 WORK

Winner — National Hackathon 2025

Ministry of Education, Government of India | IIT Gandhinagar

The Problem I solved

In India, household consumption data is primarily collected through the Household Consumption Expenditure Survey (HCES), which is conducted roughly once every five years. While this survey is extremely valuable, the large gaps between data collection cycles make it difficult for policymakers to understand current consumption patterns.

For a country with over one billion people, conducting large-scale expenditure surveys frequently (quarterly or annually) is not practically feasible due to cost, logistics, and time constraints. However, the government needs up-to-date insights on Monthly Per Capita Expenditure (MPCE) to design welfare schemes, adjust economic policies, and release quarterly or annual budgets.

The challenge was to build a reliable, scalable predictive system that could estimate MPCE across households and regions in India without relying on frequent nationwide surveys, while still capturing the socioeconomic diversity of the country.

How did I solve it?

I framed MPCE estimation as a hierarchical prediction problem rather than a single-step regression task, to better reflect the economic stratification present in Indian households.

The solution was designed as a two-stage modeling pipeline: In the first stage, households are classified into economic segments such as lower rural, upper rural, lower urban, and upper urban. This classification uses household characteristics and socio-economic indicators without directly relying on MPCE values, allowing the model to learn broad consumption patterns. In the second stage, segment-specific regression models are trained to predict MPCE. Instead of using a single global model, each segment has its own specialized model, enabling the system to capture distinct expenditure behaviors across socioeconomic groups.

I experimented with multiple modeling approaches, including clustering-based methods, XGBoost, and ensemble stacking. While baseline models achieved moderate results, the hierarchical approach significantly improved performance by aligning machine learning with real-world economic structure.

What did I use?

Machine Learning Models: XGBoost, Random Forest, LightGBM
Architecture: Two-stage hierarchical modeling (classification + regression)
Ensemble Techniques: Model stacking and segment-specific regression
Feature Engineering: Household characteristics, Asset ownership indicators, Digital engagement and socio-economic metrics
Evaluation Metrics: R² Score, Mean Absolute Percentage Error (MAPE)
Programming Language: Python

Challenges I faced

A major challenge was modeling the heterogeneity of Indian households using limited and infrequently collected survey data. Early attempts using a single global regression model failed to generalize across regions and income groups.

Feature engineering was another critical challenge. Household expenditure depends on several indirect and proxy variables, requiring careful selection and iterative refinement to ensure robustness across rural and urban settings.

Balancing predictive performance with interpretability was especially important since the solution was intended for policy-level decision-making. The final hierarchical design enabled segment-wise feature importance analysis, improving transparency and trust in the model.

To know more about this work or discuss the modeling approach, DM me here →

Automating Customer Query Identification & Resolution for millions of Customer

Policybazaar (India's largest online insurance company)

The Problem I solved

At Policybazaar, India's largest online insurance aggregator, millions of customer interactions take place daily across multiple channels such as phone calls, WhatsApp chats, live website chats, and web application interactions. These conversations often contain multiple customer concerns related to claims, policy renewals, cancellations, health checkups, and other services.

Previously, identifying and routing these customer queries relied heavily on manual effort by service and sales agents. Agents either went through call transcripts or manually wrote summaries to capture customer concerns. This process was time-consuming, error-prone, and not scalable at Policybazaar's operational scale. Important issues were sometimes missed, and a significant amount of agent time was spent on documentation rather than resolution.

The organization needed a fully automated and scalable system that could understand noisy, multi-channel customer conversations, identify customer queries accurately, and route them to the appropriate departments for faster resolution.

How did I solve it?

I designed and deployed an end-to-end machine learning pipeline to automate customer query identification and resolution. The system consumes raw, noisy conversational data from Kafka streams. Since the data originated from multiple communication channels, extensive preprocessing was required. This included handling embedded bot-generated JSON menus, inconsistent formats, and fragmented conversational text. I implemented custom logic to convert these structures into clean and intuitive natural-language input suitable for downstream models.

To generate actionable insights from conversations, I built a concern-centric summarization system. I fine-tuned a LLaMA 1B model using LoRA to produce concise summaries that focused strictly on customer issues rather than generic conversation flow. Multiple hyperparameters, including rank, alpha, dropout, learning rate, batch size, and number of epochs, were tuned to achieve optimal performance while keeping the model resource-efficient.

A key real-world challenge was handling follow-up conversations for the same customer. To address this, I introduced a case_summary field that maintained the latest consolidated summary for each lead. When a follow-up interaction occurred, the existing summary and the new summary were combined and passed through another trained model to generate an updated case summary, ensuring that the system always reflected the customer's most recent concerns. From these summaries, customer queries were extracted and passed to a BERT-based multi-class classification model. This model categorized each query into one of 15 predefined departments, such as claims, policy renewal, or cancellation. The predictions were stored in MongoDB and automatically routed to the relevant departments for resolution.

What did I use?

Models: BERT (multi-class classification), LLaMA 1B (LoRA fine-tuning)
Frameworks: PyTorch, HuggingFace Transformers
Streaming & Infrastructure: Kafka, AWS S3
Databases: MongoDB
Programming Language: Python

Challenges I faced

Building this system involved several technical and practical challenges. Cleaning and normalizing noisy, multi-channel conversational data required extensive custom preprocessing logic. Handling embedded structured data such as bot-generated JSON menus within free-form text was particularly complex.

Model experimentation also presented challenges. Initial attempts with Phi 3.5 Mini were unsuccessful due to dependency issues and suboptimal performance. Larger LLaMA models produced better results but required significantly more GPU memory, leading to the need for a more efficient solution. Achieving high accuracy in multi-class classification was difficult due to the large number of categories. I iteratively scaled the problem from 5 to 10 and finally 15 classes, experimenting with custom BERT fine-tuned on internal company data, averaging CLS tokens from the last 4 transformer layers, progressive reduction of dense layer sizes (512 → 256 → 128 → 64) to avoid information loss.

Additionally, dataset creation was highly labor-intensive. Thousands of query samples were manually labeled, and hundreds of call summaries were reviewed to ensure correctness and consistency. Despite these challenges, the system was successfully taken from initial data preparation to full production deployment, achieving over 94% classification accuracy and significantly improving operational efficiency.

To know more about this work, DM me here →

Building the World's First Reading & Speaking Proficiency Analysis Tool

The Problem I solved

If you search online for tools related to reading proficiency analysis or speaking proficiency assessment, you will mostly find research papers, academic blogs, or theoretical frameworks — but not any practical, accessible tools that tell individuals what exactly they need to improve and how.

At my university, the communication department works closely with students to improve presentation, interview, and public speaking skills. One of their most effective methods involved students reading aloud from a text while focusing on pitch, expression, phrasing, and fluency. Research-backed observations suggested that students who read well aloud also tend to speak better in public settings.

While the method worked, it did not scale. Students often knew they were not good at reading or speaking but had no clarity on what was wrong or how to fix it. This required personal mentorship for each student, which became impractical as the university grew. The challenge was clear: How do you automatically assess reading and speaking quality, provide actionable feedback, and do it in a way that scales beyond one-on-one human evaluation?

How did I solve it?

I approached this problem by first defining what "good reading" actually means in measurable terms. Instead of treating reading as a binary correct/incorrect task, I designed a four-dimensional proficiency scale, rating each aspect from 1 to 5: Expression and Volume, Phrasing and Intonation, Smoothness, and Pace.

We were given the LibriSpeech ASR corpus, an unlabelled dataset consisting of read audiobooks. Since the dataset had no proficiency labels, the first major step was manual dataset creation. We painstakingly labeled audio samples based on the defined scale, which required hours of listening and human judgment.

Once the dataset was ready, I extracted detailed prosodic and spectral features, including MFCCs, pitch statistics, jitter, shimmer, voice breaks, and temporal features. Each audio sample was converted into a high-dimensional representation capturing both speech quality and delivery patterns. After reviewing existing literature, I chose Support Vector Machines (SVMs) as the core modeling approach. SVMs performed consistently well in prior research and were particularly effective for high-dimensional audio features. The model was trained to predict proficiency scores across the four defined dimensions.

The end result was not just a model, but the foundation of a tool that provides clear, actionable feedback. Instead of vague scores, the system tells users where they need improvement — whether it's pace, expression, or smoothness — and does so quickly enough to be useful right before presentations or interviews.

What did I use?

Dataset: LibriSpeech ASR Corpus + custom-collected audio samples
Machine Learning Model: Support Vector Machines (SVM)
Feature Extraction: MFCCs (primary contributors), Pitch-based prosodic features, Temporal and fluency-related metrics
Audio Processing Libraries: librosa, PRAAT, parselmouth
Evaluation Metrics: Accuracy, F1-score, confusion matrices
Programming Language: Python

Challenges I faced

The biggest challenge was the absence of labeled data. The LibriSpeech dataset contained only audio and transcripts, requiring us to design a labeling framework from scratch and manually annotate hundreds of samples. Ensuring consistency and reliability in human labeling was both time-consuming and mentally exhausting.

Feature selection was another major challenge. Audio data produces an overwhelming number of features, many of which contribute little to predictive performance. Identifying that MFCCs were the most significant contributors required extensive experimentation and correlation analysis.

There was also limited existing research focused specifically on reading proficiency assessment using audio, especially in a way that translates into real-world tools. This meant much of the work involved bridging the gap between academic research and practical usability.

Despite these challenges, the project resulted in a scalable system that can assess reading and speaking proficiency objectively — something that previously required human experts. The tool is now positioned to be deployed as a web application, offering quick, no-nonsense feedback to help users improve their communication skills efficiently.

To know more about this work or try the tool once it launches, DM me here →

EXPERIENCE

MY TOP 3 WORK

Winner — National Hackathon 2025

Automating Customer Query Identification & Resolution for millions of Customer

Building the World's First Reading & Speaking Proficiency Analysis Tool

EDUCATION

TOP ACHIEVEMENTS

CONTACT