Projects

2022

Denoised smoothing as an adversarial defense for ASR [Project Presentation]

Experimented with Denoising Diffusion Probabilistic models (DDPMs) to denoise smoothed audio input in order to recover the original audio, to achieve better-quality ASR transcriptions while providing robustness to adversarial attacks

Evaluated effectiveness of DDPMs as defense against adversarial attacks on ASR systems
Achieved 40% improvement in WER and better robustness compared to sequential randomised smoothing on LibriSpeech dataset

A novel dataset that empowers embodied goal oriented conversational agents to develop context-awareness of physically-situated dialogue and the ability to adapt to language variation [In progress]

Developed flask based web-app to crowdsource audio annotations for ALFRED benchmark on Amazon Mechanical Turk to create a first of a kind dataset to enable currently “deaf” embodied agents to learn to navigate by hearing and seeing
Awarded Best Solution Capstone project and is currently enabling the collection of 25K+ audio annotations

Face Classification & Verification using Convolutional Neural Networks

CNN-based architectures for Face Classification & Verification

Implemented and trained ResNet-34 and ConvNeXt-T models for face classification on subset of VGGFace2 dataset using image augmentation techniques and Stochastic Depth, achieving an accuracy of 94%
Finetuned ConvNeXt-T model with center loss for face verification and achieved accuracy of 66%

Iterative Back-Translation-Style Data Augmentation for Low Resource ASR and TTS [Project Poster]

Novel iterative back-translation style approach to augment data for Automatic Speech Recognition (ASR) and Text to Speech (TTS) for Malayalam, a truly low resource language

Adapted back-translation style data augmentation technique to speech processing by leveraging ASR and TTS outputs to improve each other’s performance iteratively
Output of ASR used to create a pseudo-parallel corpus as input to fine-tune TTS and vice versa
Implemented a conformer-based ASR model with linear fusion of HuBERT and spectrum-based features. TTS model was a combination of Glow-TTS and Hifi-GAN
Achieved up to 6.91% and 10.87% reduction in Word Error Rate (WER) and Character Error Rate (CER) respectively for ASR and a 2.91% improvement in Mel-cepstral distortion (MCD) for TTS

Multilingual Speech Recognition [Code]

Automatic Speech Recognition for Malayalam on espnet

Contributed to the espnet open source toolkit by implementing Malayalam ASR with only around 6 hours of parallel speech-text data
Achieved WER and CER of 39.2 and 10.4 respectively for a conformer base model
Improved the WER and CER of the base model by 9.2% and 13.4% respectively by implementing a learnable linear fusion of spectrum based and HuBERT self-supervised learning features
Successfully merged ASR recipe into the espnet open-source toolkit

Multilingual Translation

NMT for low resource languages Azerbaijani and Belarus, to and from English

Improved baseline bilingual and multilingual models for low resource NMT with fairseq as the MT framework on top of PyTorch
Implemented multiple methods such as data augmentation via back translation and cross-lingual transfer learning to improve multilingual transfer
Achieved a 3 point increase in BLEU score
Studied the effect of vocabulary size, and tokenization algorithms on the performance of NMT

Multilingual Sequence labeling

Bi-LSTM based POS Tagging with mBERT and Conditional Random Fields

Enhanced the performance of a baseline bi-LSTM model written in PyTorch for the task of POS tagging by utilizing pre-trained multilingual BERT embeddings
Gained 20.6% and 3% in accuracy for Tamil and English respectively
Performed extensive analysis to understand variation in performance across language families, typology and hyper-parameter

Power plant Machine Learning Pipeline

End-to-end pipeline performing ETL, EDA and ML model tuning and evaluation to accurately predict power output given a set of environmental readings

Performed ETL using PySpark and PySparkSQL
Built ML pipeline for Linear Regression and Random Forest learners using Spark ML pipeline API
Model tuning and evaluation using CrossValidator and ParamGridBuilder APIs resulted in the best model improving base model RMSE from 4.56 to 3.39 with coefficient of determination of 0.96

COVID-19 Data Dashboard [Code]

Streamlit web-app powered by Altair and Python

An interactive web application that analyzes how different states in the US approached the COVID-19 pandemic
Through visualization techniques it was seen that as soon as the ICU bed utilization crosses the ~75% barrier, the number of deaths see a sharp incline
The dashboard empowers the viewer with insights and answers to questions related to the impact of COVID-19 on existing medical infrastructure and whether a strict government policy response entails lower morbidity

2021

Twitter Analytics Web Service

Fully managed high-performance multi-tier web service with Amazon EKS and RDS. Performed ETL using Spark to reduce 1TB Twitter data to 60GB

Worked in a team of 3 to build a Vertx-based web application that recommends similar Twitter users
Designed an efficient and fault-tolerant web tier consisting of 3 microservices using Amazon EKS with managed node groups to handle high loads (~tens of thousands of RPS) under a constrained budget
Performed ETL on a large Twitter data set (~1 TB) using Apache Spark on the Azure Databricks platform and deployed storage tier on an AWS RDS MySQL instance
Automated service deployment using eksctl, Terraform and helm charts
Ranked 5th in terms of performance/cost ratio in a live test spanning ~3 hrs

Machine Learning on the Cloud

Trained and deployed XGBoost on Google AI Platform

Trained and deployed a machine learning model (XGBoost) on the Google AI Platform to predict cab fares in NYC and performed hyperparameter tuning using HyperTune to improve accuracy of model
Processed ride requests in the form of audio and images leveraging a pipeline of cloud ML APIs such as Cloud Text-to-Speech, Cloud Speech-to-Text, Cloud NLP, Directions and AutoML Vision offered by GCP
Deployed an end-to-end solution on Google App Engine to predict cab fare by combining input pipeline and trained model

Question Answering [Project Video]

Syntactic rule-based QA system

Collaborated with a team of 3 to build a rule-based Question Answering system for Wikipedia articles
Developed a hybrid answer generation pipeline consisting of question type identification, top candidate sentences extraction and syntactic rule-based answer formation using dependency parsing and POS tagging
Performed question to declarative sentence conversion, coreference resolution, sentence vector similarity, named-entity recognition and lexical analysis to enhance fluency and conciseness of generated answers

Question Answering System on SQuAD

QA using Machine Learning algorithms

Developed NLP processing pipeline to train and evaluate multiple machine learning models using NLTK library for cleaning and feature extraction of 100000 questions and context paragraphs
Fine-tuned pre-trained BERT model using PyTorch and deployed final model to public endpoint through Microsoft Azure Machine Learning Studio

Rating Prediction for Amazon’s Product Reviews

Logistic regression to predict product ratings

Built a multi-class logistic regression model to predict product ratings from 100,000 reviews
Data underwent cleaning, exploratory data analysis, feature construction using TfIdf Vectorizer, oversampling to deal with class imbalance
Final model achieved an accuracy of 71% and deployed to public endpoint on Microsoft Azure

2020

Neural Machine Translation (NMT) from English to Hindi

Seq2Seq model for NMT

Employed supervised __Encoder-Decoder__ architecture facilitated by an enhanced version of Bahdanau’s attention mechanism, Word2Vec and Vecmap
Final model attained a BLEU score of 35

Rideshare - A Cloud Based Application

Uber-like cab search application

Developed backend for a cloud-based car-pooling application with REST APIs and MySQL database
Implemented load balancing on containerized application deployed on an Amazon AWS EC2 instance

Sports analytics with Hadoop

PageRank on Hadoop

Performed analysis on an Indian Premier League dataset using MapReduce
Devised algorithm for ranking players to identify most prolific batsman at each venue based on impact using PageRank, Spark and Streaming Spark libraries