Hi, I'm Shreayan. I was born and brought up in Mumbai. I am 27 years old. I am a fun-loving person who likes to travel, eat delicious food, meet new people, and listen to all types of music. I have graduated with a Masters in Computer Science from Johns Hopkins University. I love coding and my areas of interest include Machine Learning, NLP, and Data Science. I started coding in 2012 when my class teacher introduced me to it. My love for coding has only improved over the years. Call me a fanatic, but I'm crazy for all things related to Space and Astronomy. I love researching random stuff in Astronomy and Space, which often gives me existential crisis. My hobbies include playing video games, chess, football and cooking. In my free time, I also play various songs on the keyboard. Do check out my Instagram if you want to check out how I play the keyboard.
Shreayan Chaudhary
+1 (551) 339-7724
shreayan98c@gmail.com
linkedin.com/in/shreayan98c/
Minneapolis, Minnesota
My goal is to pursue a career in Machine Learning and venture into a technological field where my ongoing learning skills can be utilized to organize, manage, analyze massive amounts of structured and unstructured data to create various algorithms using machine learning to meet specific business needs and goals, thus contributing to the growth of the organization.
A Machine Learning Engineer with 3 years of experience in tech, working on end to end ML projects, from data ingestion to deploying ML models in production. As a Computer Science Masters student at Johns Hopkins University specializing in Machine Learning, I am highly skilled in multimodal ML - NLP and Computer Vision. I have two publication at established international conferences as well as one patent. I specialize in distributed multi-agent architectures, RAG pipelines for enterprise analytics, and vision-NLP integrations.
Machine Learning Engineer II • June 2023 - Present
• Architected Seagate Context Protocol (SCP) to unify structured + unstructured data (databases, text, video, sheets, PDFs), enabling RAG-based Q&A and Text2SQL agents that reduced reliance on Tableau (~85% cost reduction)
• Reduced development overhead by 60% by designing distributed multi-agent systems for Seagate Copilot Hub, scaling to 40k+ users using A2A and LangGraph
• Decreased LLM inference costs by 67% by fine-tuning Llama 3 with PEFT/LoRA for Text2SQL agents
• Developed a vision-NLP agent to extract structured data from PDFs and images, automating ingestion workflows with ~96% effort reduction via SME-reviewed validation loop
• Technologies Used: Pytorch, JAX, PEFT, LoRA, vLLM, Streamlit, Azure, LLaMA, Falcon, Agent2Agent, Autogen, Docker, Kubernetes
ML Research Assistant • April 2023 - July 2024
• Engineered a token-efficient LLM protocol for real-time voice control of surgical robots, boosting command precision and reducing operation time by ~20%
• Fine-tuned Petals and Bloomz LLMs across distributed nodes using Ray, and accelerated inference with vLLM to support low-latency robotic task execution
Machine Learning Engineer • January 2021 - July 2022
• Boosted large-scale document processing accuracy and speed by ~85% through a hybrid OCR pipeline combining rule-based methods and deep learning models
• Built reusable TensorFlow/PyTorch APIs for a no-code ML platform adopted by 40+ enterprise clients
• Architected a 1.5+ GB/day scalable pipeline for ETL, feature engineering, and ML model ingestion
• Technologies Used: Pytorch, Tensorflow, NLTK, Spacy, CoreNLP, Gensim, GCP, OpenCV, Tesseract, Google Vision, Fuzzy ML Algorithms
Machine Learning Research Intern • December 2019 - January 2021
• Cut manual translation effort by 80% for the Indian Navy via a finetuned Bi-LSTM Russian-English translator
• Boosted complaint resolution by 94% across 3.7M+ records by creating a BERT-based feedback summarizer for NGOs
• Led 7 members of 3 teams (ML, Dev, Mgmt) to develop a Sanskrit OCR + post-editing using CNN, Tesseract, GoogleOCR to digitize ancient Sanskrit manuscripts
• Technologies Used: Bi-LSTM Transformers, CNN, RNN, Tesseract, BERT, OpenCV, Huggingface, Keras, Tensorflow, NLTK, SpaCy, Django, GCP, AWS
Intern Data Scientist • June 2019 - July 2019
• Segmented potential defaulters by analyzing bank loan data using SVMs, reducing manual review time by ~70%
• Technologies Used: Flask, NumPy, Scikit-Learn, AWS EC2, PostgreSQL, SVM, Neural Nets
Intern Software Engineer • June 2018 - July 2018
• Engineered a JSP and Servlet based scalable web app for A/B testing with emails, SMS¬ifications to 1.2M+ users, thereby assisting the company’s customer relations team to help and effectively communicate with the customers
• Technologies Used: JSP, Servlets, jQuery, Datatables.js, HTML, CSS: MaterializeCSS
Intern Web Developer • January 2018 - June 2018
• Built analytics dashboard for web.wisopt.com used by SRM University serving 600+ profs and 15k+ students for all official communications
• Technologies Used: jQuery, Vue.js, HTML, CSS: Bootstrap
Social Media Head • June 2017 - June 2020
• Organized national level technical events such as Hackathons & Coding Competitions to promote various applications of coding and collaboration among CS undergrad students.
• Handled the Facebook, Twitter and Instagram accounts with over 6000 followers.
• Increased the user engagement by 30% across all the social handles - Facebook, Instagram and Twitter. (4.6k to 6.1k).
MS in Computer Science • 2022 - 2024
• CGPA: 3.97/4
• Research Assistant at ARCADE (Advanced Robotics and Computationally AugmenteD Environments) Lab advised by Prof. Unberath - Spring '23, Fall '23, Spring '24
• Research Assistant at CLSP (Centre for Language and Speech Processing) Lab advised by Prof. Yarowsky - Fall '22
• Teaching Assistant for the graduate level course Information Retrieval by Prof. Yarowsky - Spring '24
• Teaching Assistant for the graduate level course Software Engg. by Prof. Darvish - Spring '23
• Teaching Assistant for the graduate level course Databases by Prof. Yarowsky - Fall '22
B.Tech in Software Engineering • 2016 - 2020
• Grade: 89%
• First Class with Distinction: Ranked in the top 10% of the department
XIIth grade in CBSE Board • 2014 - 2016
• Obtained a Silver medal in International Informatics Olympiad.
Xth grade in ICSE Board • 2004 - 2014
• Certificate of excellence in Computer Science and Mathematics.
I started my coding journey in 2012, when my class teacher introduced me to it. My love for coding has only improved over the years. I started coding with Java in 2012. After that, I learned the basics of C and C++. Then I moved on to HTML, CSS and JavaScript to create fluid, mobile responsive and lightweight web applications. When I was introduced to Python by a friend, I immediately fell in love with this language, and it has been my favourite language since. I have dived deep into deep learning (pun intended) and started learning machine learning (also, pun intended) in Python. My area of research lies in Recommender Systems, OCR and NLP.
Shreayan Chaudhary on ResearchGate
Chaudhary, Shreayan Killeen, Benjamin; Osgood, Greg; Unberath, Mathias (2024). Take a Shot! Natural Language Control of Robotic X-ray Systems for Image-guided Surgery, International Conference on Information Processing in Computer-Assisted Interventions
This paper proposes a natural language based communication protocol to control C-Arm robotic devices using voice commands.
Chaudhary, Shreayan; Anupama, C. (2020). Ensemble Recommendation System using a hybrid decision level fusion of Popularity Model and Collaborative Filtering, International Conference for Artificial Intelligence and Evolutionary Computations in Engineering Systems, pp.551-559 DOI: 10.1007/978-981-15-0199-9_47.
This paper proposes a hybrid recommendation system algorithm using Content based and Collaborative Filtering to improve the performance metrics and address the cold-start problem to overcome the drawbacks of both the algorithms.
Chaudhary, Shreayan; Ferni, U. (2020). Recommendation System for Establishing New Businesses using Geospatial Clustering for Multiple Reference Points, National Conference on Artificial Intelligence and Intelligent Information Processing; Patented under SRM University.
Created custom clustering algorithms and a recommender system to find the optimal location in any given city or place to help set up a business for entrepreneurs, thus saving time, money, and risk.
Engineered prototype glasses with camera and mic, and fine-tuned OpenFlamingo on real-time video to answer users’ audio-based questions (Whisper for ASR) conditioned on the image (CLIP ViT for images and LLaMA for text)
Data Pipeline: Video Stream from phone camera -> Capture Frames -> Record User Question -> Transcribe using Whisper -> Model Inference conditioned on image frames -> Return Answer
Technologies: DroidCam, OpenCV, OpenAI Whisper, OpenFlamingo (LLaMA 7B + CLIP ViT/L-14) Vision-Language Model.
Created an end to end project to help entrepreneurs and businessmen find the optimal location to set up their business. Created custom clustering algorithms using geospatial data analysis and a recommender system to find the optimal location in any given city or place to help set up a business for entrepreneurs, thus saving time, money, and risk.
Data Pipeline: Data Integration -> Data Inspection -> Data Visualizations -> Data Modeling -> Model Evaluation -> Django Web Application -> Cluster Analysis -> RESTful API creation -> Deployment on Heroku
Technologies: ML, Custom Clustering Algorithm, Folium, Heroku, Django.
Data Integrated from: Census of India 2011, Geopy Nominatim, Foursquare API and Google Places API.
Venter is a response categorization and document summarization tool. Goal is to create a platform to analyse, process and summarize large corpora of user feedbacks, inputs and responses autonomously.
Clients: Civis, ICMyC (I Change My City), SpeakUp.
Technologies: ML, Deep Learning, GCP, NLP, Django.
Created an end to end project to help students find the prospective grad schools for themselves. Predicting the chance of a student being selected for a grad program abroad and finding the most suitable universities for him/her.
Data Pipeline: Data Collection -> Data Inspection -> Data Visualizations -> Data Modeling -> Model Evaluation -> Flask Web Application -> RESTful API creation -> Deployment on Heroku
Technologies: ML, Linear Regression, XGBoost, Heroku, Flask, AWS.
Created an automated machine learning model that will predict the propensity of a loan defaulter whether he will return the loan (s)he has taken from the bank or not and find out the most probable form of repayment. Also automated it to run at 6pm daily and deployed it on an AWS EC2 server for production.
Creating a smart headband which is connected to the headphones to guide the visually impaired user via audio instructions about the objects and people in his/her path. This project was awarded a special mention at Conception, an IoT project expo.
Created a recommender system by applying collaborative filtering and simple popularity method algorithms on a movie dataset consisting of 27000 movies and 20 million ratings given by 138000 users.
Dataset: MovieLens 20 Million Dataset
Created a Smart Vehicle to Vehicle Technology using IoT to decrease commute times as well as enable multi-vehicle control by a single driver using GPS and Ultrasonic sensors.
Technologies Used: Keras, Arduino, NodeMCU, Bluetooth, Flask
Languages Used: Python, HTML, JavaScript, CSS, C
Creating a local library application using django to manage books systematically. This project was created in Django and was deployed on Heroku server. Project on GitHub
Used the techniques of SMOTE (Synthetic Minority Oversampling TEchnique), under-sampling and over-sampling to classify the type of business in multiple categories in an unbalanced dataset and achieved top 10 percentile F-1 score across India.
A news blog using django that fetches the latest news via news blog api. This project was created in Django and was deployed on Heroku server. Project on GitHub
Created a deep neural network using keras to identify the type of cloth or the footwear from the given images.
Using the OpenCV library to detect the user's face from the frames of the video captured through the videocam.
A web application which can send Emails, SMS and Android push notifications to application users to all (or selected few) employees in an organization. It uses HTML, CSS and JS as the frontend for webpage and MSSQL Server and Java as backend which stores the employee details.
Software Requirements: Netbeans IDE 8.2, Java 8 EE, Microsoft SQL Server 2012
Languages Used: HTML, JavaScript, CSS, Java
Frameworks Used: Bootstrap, DataTables, jQuery
To predict the price of a house in Boston using scikit-learn’s boston dataset.
The problem that we solved here is that given a set of features that describe a house in Boston, our machine learning model must
predict the house price.
Languages Used: Python
Dataset: Boston Houses Dataset
Created a model using machine learning algorithms that can predict whether the cancer is benign or malignant.
Dataset: Breast Cancer Wisconsin Dataset
Objective was to understand the factors that influence the attrition and to predict the employees who are going to leave the company in future.
Dataset: IBM Employees Dataset
A very simple to-do list that allows you to keep a track of activities / tasks you need to complete without having to forget about it.
Check it out!