Github Blog

Projects

dénommé : Multilingual Name Detection using spaCy v3

Person Name Detection using spaCy V3 which focuses on detecting multilingual person names. Trained using XLM-Roberta using custom data for English, Romanised-Hindi Names, Romanised-Arabic Names and Arabic Names.

GitHub release (latest by date) PyPI PyPI - License

denomme : Multilingual Name Detector · spaCy Universe
Multilingual Name Detection

Hinglish Twitter Sentiment Detection | SemEval2020

HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Sentiment analysis for code-mixed social media text continues to be anunder-explored area. This work adds two common approaches: fine-tuning largetransformer models and sample efficient methods like ULMFiT. Prior workdemonstrates the efficacy of classical ML methods for polarity detection.Fine-t…

This work adds two common approaches: fine-tuning large transformer models and sample efficient methods like ULMFiT. Prior work demonstrates the efficacy of classical ML methods for polarity detection. Fine-tuned general-purpose language representation models, such as those of the BERT family are benchmarked along with classical machine learning and ensemble methods. We show that NB-SVM beats RoBERTa by 6.2% (relative) F1. The best performing model is a majority-vote ensemble which achieves an F1 of 0.707.

Work/Internships

Verloop.io  

Machine Learning Engineer  | July 2020 - Current

Verloop.io is a conversational AI startup based out of Bangalore, India.

  • Working on the homegrown intent recognition service using sentence-transformer to improve the performance and accuracy. Current F1 out-performs the previous by 1.4x
  • Worked on Person-Name Detection system and upgrading it to spaCy v3
  • Upgraded Legacy service in Django with integrating the tests with CI/CD solving for tech-debt and improving the development efficiency
  • Prototyped GPT3 for data generation, data annotation, intent recognition and bench-marked it against the existing services. Using the data-annotation pipeline to fastrack the client-specific annotation process.
Machine Learning Intern | May 2019 - August 2019
  • Created a Person-Name extractor customised for multilingual conversations. TweakedFlair(Facebook’s Natural Language Processing library)to work on chatbot specific use cases in English, Spanish and French
  • Evaluated performance of various language models like ULMFiT, VAMPIRE for low-resource language contexts
  • Deployed the developed multilingual name-extractor to production; Final model achieves 47% improvement in F1 compared to previously deployed FastText mode

Zero Lab

Research Intern | May 2018 - July 2018
  • Prototyped a machine learning powered market place for streamlining transactions between buyers and sellers that intelligently estimates negotiated prices by observing behaviour in the first few manual negotiations.

Education

Year Institute
B.Tech Electronics and Telecommunication 2016-2020 MKSSS's Cummins College of Engineering

Talks

Creating {insert artist name} Lyrics Generator

Extracting Names from multi-lingual conversation at PyData Bangalore