2022

Denoised smoothing as an adversarial defense for ASR [Project Presentation]

Experimented with Denoising Diffusion Probabilistic models (DDPMs) to denoise smoothed audio input in order to recover the original audio, to achieve better-quality ASR transcriptions while providing robustness to adversarial attacks
  • Evaluated effectiveness of DDPMs as defense against adversarial attacks on ASR systems
  • Achieved 40% improvement in WER and better robustness compared to sequential randomised smoothing on LibriSpeech dataset

ALFRED-Speech: An Embodied Vision-Audio-Navigation Task [Project Presentation]

A novel dataset that empowers embodied goal oriented conversational agents to develop context-awareness of physically-situated dialogue and the ability to adapt to language variation [In progress]
  • Developed flask based web-app to crowdsource audio annotations for ALFRED benchmark on Amazon Mechanical Turk to create a first of a kind dataset to enable currently “deaf” embodied agents to learn to navigate by hearing and seeing
  • Awarded Best Solution Capstone project and is currently enabling the collection of 25K+ audio annotations

Face Classification & Verification using Convolutional Neural Networks

CNN-based architectures for Face Classification & Verification
  • Implemented and trained ResNet-34 and ConvNeXt-T models for face classification on subset of VGGFace2 dataset using image augmentation techniques and Stochastic Depth, achieving an accuracy of 94%
  • Finetuned ConvNeXt-T model with center loss for face verification and achieved accuracy of 66%

Iterative Back-Translation-Style Data Augmentation for Low Resource ASR and TTS [Project Poster]

Novel iterative back-translation style approach to augment data for Automatic Speech Recognition (ASR) and Text to Speech (TTS) for Malayalam, a truly low resource language
  • Adapted back-translation style data augmentation technique to speech processing by leveraging ASR and TTS outputs to improve each other’s performance iteratively
  • Output of ASR used to create a pseudo-parallel corpus as input to fine-tune TTS and vice versa
  • Implemented a conformer-based ASR model with linear fusion of HuBERT and spectrum-based features. TTS model was a combination of Glow-TTS and Hifi-GAN
  • Achieved up to 6.91% and 10.87% reduction in Word Error Rate (WER) and Character Error Rate (CER) respectively for ASR and a 2.91% improvement in Mel-cepstral distortion (MCD) for TTS

Multilingual Speech Recognition [Code]

Automatic Speech Recognition for Malayalam on espnet
  • Contributed to the espnet open source toolkit by implementing Malayalam ASR with only around 6 hours of parallel speech-text data
  • Achieved WER and CER of 39.2 and 10.4 respectively for a conformer base model
  • Improved the WER and CER of the base model by 9.2% and 13.4% respectively by implementing a learnable linear fusion of spectrum based and HuBERT self-supervised learning features
  • Successfully merged ASR recipe into the espnet open-source toolkit

Multilingual Translation

NMT for low resource languages Azerbaijani and Belarus, to and from English
  • Improved baseline bilingual and multilingual models for low resource NMT with fairseq as the MT framework on top of PyTorch
  • Implemented multiple methods such as data augmentation via back translation and cross-lingual transfer learning to improve multilingual transfer
  • Achieved a 3 point increase in BLEU score
  • Studied the effect of vocabulary size, and tokenization algorithms on the performance of NMT

Multilingual Sequence labeling

Bi-LSTM based POS Tagging with mBERT and Conditional Random Fields
  • Enhanced the performance of a baseline bi-LSTM model written in PyTorch for the task of POS tagging by utilizing pre-trained multilingual BERT embeddings
  • Gained 20.6% and 3% in accuracy for Tamil and English respectively
  • Performed extensive analysis to understand variation in performance across language families, typology and hyper-parameter

Power plant Machine Learning Pipeline

End-to-end pipeline performing ETL, EDA and ML model tuning and evaluation to accurately predict power output given a set of environmental readings
  • Performed ETL using PySpark and PySparkSQL
  • Built ML pipeline for Linear Regression and Random Forest learners using Spark ML pipeline API
  • Model tuning and evaluation using CrossValidator and ParamGridBuilder APIs resulted in the best model improving base model RMSE from 4.56 to 3.39 with coefficient of determination of 0.96

COVID-19 Data Dashboard [Code]

Streamlit web-app powered by Altair and Python
  • An interactive web application that analyzes how different states in the US approached the COVID-19 pandemic
  • Through visualization techniques it was seen that as soon as the ICU bed utilization crosses the ~75% barrier, the number of deaths see a sharp incline
  • The dashboard empowers the viewer with insights and answers to questions related to the impact of COVID-19 on existing medical infrastructure and whether a strict government policy response entails lower morbidity

2021

Twitter Analytics Web Service

Fully managed high-performance multi-tier web service with Amazon EKS and RDS. Performed ETL using Spark to reduce 1TB Twitter data to 60GB
  • Worked in a team of 3 to build a Vertx-based web application that recommends similar Twitter users
  • Designed an efficient and fault-tolerant web tier consisting of 3 microservices using Amazon EKS with managed node groups to handle high loads (~tens of thousands of RPS) under a constrained budget
  • Performed ETL on a large Twitter data set (~1 TB) using Apache Spark on the Azure Databricks platform and deployed storage tier on an AWS RDS MySQL instance
  • Automated service deployment using eksctl, Terraform and helm charts
  • Ranked 5th in terms of performance/cost ratio in a live test spanning ~3 hrs

Machine Learning on the Cloud

Trained and deployed XGBoost on Google AI Platform
  • Trained and deployed a machine learning model (XGBoost) on the Google AI Platform to predict cab fares in NYC and performed hyperparameter tuning using HyperTune to improve accuracy of model
  • Processed ride requests in the form of audio and images leveraging a pipeline of cloud ML APIs such as Cloud Text-to-Speech, Cloud Speech-to-Text, Cloud NLP, Directions and AutoML Vision offered by GCP
  • Deployed an end-to-end solution on Google App Engine to predict cab fare by combining input pipeline and trained model

Question Answering [Project Video]

Syntactic rule-based QA system
  • Collaborated with a team of 3 to build a rule-based Question Answering system for Wikipedia articles
  • Developed a hybrid answer generation pipeline consisting of question type identification, top candidate sentences extraction and syntactic rule-based answer formation using dependency parsing and POS tagging
  • Performed question to declarative sentence conversion, coreference resolution, sentence vector similarity, named-entity recognition and lexical analysis to enhance fluency and conciseness of generated answers

Question Answering System on SQuAD

QA using Machine Learning algorithms
  • Developed NLP processing pipeline to train and evaluate multiple machine learning models using NLTK library for cleaning and feature extraction of 100000 questions and context paragraphs
  • Fine-tuned pre-trained BERT model using PyTorch and deployed final model to public endpoint through Microsoft Azure Machine Learning Studio

Rating Prediction for Amazon’s Product Reviews

Logistic regression to predict product ratings
  • Built a multi-class logistic regression model to predict product ratings from 100,000 reviews
  • Data underwent cleaning, exploratory data analysis, feature construction using TfIdf Vectorizer, oversampling to deal with class imbalance
  • Final model achieved an accuracy of 71% and deployed to public endpoint on Microsoft Azure

2020

Neural Machine Translation (NMT) from English to Hindi

Seq2Seq model for NMT
  • Employed supervised __Encoder-Decoder__ architecture facilitated by an enhanced version of Bahdanau’s attention mechanism, Word2Vec and Vecmap
  • Final model attained a BLEU score of 35

Rideshare - A Cloud Based Application

Uber-like cab search application
  • Developed backend for a cloud-based car-pooling application with REST APIs and MySQL database
  • Implemented load balancing on containerized application deployed on an Amazon AWS EC2 instance

Sports analytics with Hadoop

PageRank on Hadoop
  • Performed analysis on an Indian Premier League dataset using MapReduce
  • Devised algorithm for ranking players to identify most prolific batsman at each venue based on impact using PageRank, Spark and Streaming Spark libraries