Experimented with Denoising Diffusion Probabilistic models (DDPMs) to denoise smoothed audio input in order to recover the original audio, to achieve better-quality ASR transcriptions while providing robustness to adversarial attacks
Evaluated effectiveness of DDPMs as defense against adversarial attacks on ASR systems
Achieved 40% improvement in WER and better robustness compared to sequential randomised smoothing on LibriSpeech dataset
A novel dataset that empowers embodied goal oriented conversational agents to develop context-awareness of physically-situated dialogue and the ability to adapt to language variation [In progress]
Developed flask based web-app to crowdsource audio annotations for ALFRED benchmark on Amazon Mechanical Turk to create a first of a kind dataset to enable currently “deaf” embodied agents to learn to navigate by hearing and seeing
Awarded Best Solution Capstone project and is currently enabling the collection of 25K+ audio annotations
Face Classification & Verification using Convolutional Neural Networks
CNN-based architectures for Face Classification & Verification
Implemented and trained ResNet-34 and ConvNeXt-T models for face classification on subset of VGGFace2 dataset using image augmentation techniques and Stochastic Depth, achieving an accuracy of 94%
Finetuned ConvNeXt-T model with center loss for face verification and achieved accuracy of 66%
Iterative Back-Translation-Style Data Augmentation for Low Resource ASR and TTS [Project Poster]
Novel iterative back-translation style approach to augment data for Automatic Speech Recognition (ASR) and Text to Speech (TTS)
for Malayalam, a truly low resource language
Adapted back-translation style data augmentation technique to speech processing by leveraging ASR and TTS outputs to improve each other’s performance iteratively
Output of ASR used to create a pseudo-parallel corpus as input to fine-tune TTS and vice versa
Implemented a conformer-based ASR model with linear fusion of HuBERT and spectrum-based features. TTS model was a combination of Glow-TTS and Hifi-GAN
Achieved up to 6.91% and 10.87% reduction in Word Error Rate (WER) and Character Error Rate (CER) respectively for ASR and a 2.91% improvement in Mel-cepstral distortion (MCD) for TTS
Automatic Speech Recognition for Malayalam on espnet
Contributed to the espnet open source toolkit by implementing Malayalam ASR with only around 6 hours of parallel speech-text data
Achieved WER and CER of 39.2 and 10.4 respectively for a conformer base model
Improved the WER and CER of the base model by 9.2% and 13.4% respectively by implementing a learnable linear fusion of spectrum based and HuBERT self-supervised learning features
Successfully merged ASR recipe into the espnet open-source toolkit
Multilingual Translation
NMT for low resource languages Azerbaijani and Belarus, to and from English
Improved baseline bilingual and multilingual models for low resource NMT with fairseq as the MT framework on top of PyTorch
Implemented multiple methods such as data augmentation via back translation and cross-lingual transfer learning to improve multilingual transfer
Achieved a 3 point increase in BLEU score
Studied the effect of vocabulary size, and tokenization algorithms on the performance of NMT
Multilingual Sequence labeling
Bi-LSTM based POS Tagging with mBERT and Conditional Random Fields
Enhanced the performance of a baseline bi-LSTM model written in PyTorch for the task of POS tagging by utilizing pre-trained multilingual BERT embeddings
Gained 20.6% and 3% in accuracy for Tamil and English respectively
Performed extensive analysis to understand variation in performance across language families, typology and hyper-parameter
Power plant Machine Learning Pipeline
End-to-end pipeline performing ETL, EDA and ML model tuning and evaluation to accurately predict power
output given a set of environmental readings
Performed ETL using PySpark and PySparkSQL
Built ML pipeline for Linear Regression and Random Forest learners using Spark ML pipeline API
Model tuning and evaluation using CrossValidator and ParamGridBuilder APIs resulted in the best model improving base model RMSE from
4.56 to 3.39 with coefficient of determination of 0.96
An interactive web application that analyzes how different states in the US approached the COVID-19 pandemic
Through visualization techniques it was seen that as soon as the ICU bed utilization crosses the ~75% barrier, the number of deaths see a sharp incline
The dashboard empowers the viewer with insights and answers to questions related to the impact of COVID-19 on existing medical infrastructure and whether a strict government policy response entails lower morbidity
2021
Twitter Analytics Web Service
Fully managed high-performance multi-tier web service with Amazon EKS and RDS.
Performed ETL using Spark to reduce 1TB Twitter data to 60GB
Worked in a team of 3 to build a Vertx-based web application that recommends similar Twitter users
Designed an efficient and fault-tolerant web tier consisting of 3 microservices using Amazon EKS with managed node groups to handle high loads (~tens of thousands of RPS) under a constrained budget
Performed ETL on a large Twitter data set (~1 TB) using Apache Spark on the Azure Databricks platform and deployed storage tier on an AWS RDS MySQL instance
Automated service deployment using eksctl, Terraform and helm charts
Ranked 5th in terms of performance/cost ratio in a live test spanning ~3 hrs
Machine Learning on the Cloud
Trained and deployed XGBoost on Google AI Platform
Trained and deployed a machine learning model (XGBoost) on the Google AI Platform to predict cab fares in NYC and performed hyperparameter tuning using HyperTune to improve accuracy of model
Processed ride requests in the form of audio and images leveraging a pipeline of cloud ML APIs such as Cloud Text-to-Speech, Cloud Speech-to-Text, Cloud NLP, Directions and AutoML Vision offered by GCP
Deployed an end-to-end solution on Google App Engine to predict cab fare by combining input pipeline and trained model
Collaborated with a team of 3 to build a rule-based Question Answering system for Wikipedia articles
Developed a hybrid answer generation pipeline consisting of question type identification, top candidate sentences extraction and syntactic rule-based answer formation using dependency parsing and POS tagging
Performed question to declarative sentence conversion, coreference resolution, sentence vector similarity, named-entity recognition and lexical analysis to enhance fluency and conciseness of generated answers
Question Answering System on SQuAD
QA using Machine Learning algorithms
Developed NLP processing pipeline to train and evaluate multiple machine learning models using NLTK library for cleaning and feature extraction of 100000 questions and context paragraphs
Fine-tuned pre-trained BERT model using PyTorch and deployed final model to public endpoint through Microsoft Azure Machine Learning Studio
Rating Prediction for Amazon’s Product Reviews
Logistic regression to predict product ratings
Built a multi-class logistic regression model to predict product ratings from 100,000 reviews
Data underwent cleaning, exploratory data analysis, feature construction using TfIdf Vectorizer, oversampling to deal with class imbalance
Final model achieved an accuracy of 71% and deployed to public endpoint on Microsoft Azure
2020
Neural Machine Translation (NMT) from English to Hindi
Seq2Seq model for NMT
Employed supervised __Encoder-Decoder__ architecture facilitated by an enhanced version of Bahdanau’s attention mechanism, Word2Vec and Vecmap
Final model attained a BLEU score of 35
Rideshare - A Cloud Based Application
Uber-like cab search application
Developed backend for a cloud-based car-pooling application with REST APIs and MySQL database
Implemented load balancing on containerized application deployed on an Amazon AWS EC2 instance
Sports analytics with Hadoop
PageRank on Hadoop
Performed analysis on an Indian Premier League dataset using MapReduce
Devised algorithm for ranking players to identify most prolific batsman at each venue based on impact using
PageRank, Spark and Streaming Spark libraries