My projects

You can check out my personal projects on my GitHub. My capstone projects are bound by an NDA and the code cannot be viewed.

Paint Defect Prediction

This project, part of the CMSE 495 Data Science Capstone at Michigan State University, focuses on predicting and mitigating paint defects in automotive manufacturing for Ford Motor Company.

By analyzing production data, the project identifies factors contributing to defects such as scratches and uneven coatings. The goal is to optimize the painting process, reduce waste, and improve operational efficiency while enhancing vehicle quality and customer satisfaction....

Machine Learning for Carbon Optimization

Originally developed as part of a CSE capstone at Michigan State University, this project develops a tool for Optimization of Carbon Removal utilizing machine learning.

The website displays interactive heatmaps for three major carbon capture technologies: kelp farms, reforestation, and direct air capture. Built using React, Python machine learning libraries, and the Mapbox JavaScript API, the tool helps users—including government agencies, companies, and investors—make informed decisions by visualizing cost-effectiveness and other metrics across U.S. counties. It aims to support sustainable investments and lower the carbon footprint.

This project, part of the CSE 404: Introduction to Machine Learning course at Michigan State University, leverages machine learning to classify bird calls within two taxonomic families—Sylviidae and Timaliidae. Utilizing recordings from MSU's Avian Vocalizations Center and Cornell's Macaulay Library, the project aims to support conservation through bioacoustics.

A variety of machine learning models, including CNN, RNN, ResNet, ResNet with GRU, and Transformer architectures, were assessed for their performance in accurately distinguishing between the two families. The ResNet+GRU model achieved the highest test accuracy at 62.5% for unseen species within these families. This project illustrates the potential for machine learning to advance conservation efforts and has future applications for real-time species identification, valuable to both birdwatchers and conservationists.

Bird Call Classification
Sentiment Analysis

This project focused on sentiment analysis of Russian troll tweets from the 2016 U.S. Presidential election. Using a dataset of over 200,000 tweets, sentiment analysis techniques were applied to classify tweets as either "Right Troll" (pro-Russia) or "Left Troll" (anti-Russia). A model was developed using logistic regression, achieving an accuracy of around 90%. Additionally, a custom-built classifier was tested, though it was less effective. Future improvements could include expanding the dataset, exploring new machine learning techniques, and refining sentiment scoring to enhance accuracy.

Database with Python

The project focuses on developing a custom SQL interpreter for managing and querying a simple in-memory database. It is designed to parse and execute SQL commands, allowing users to interact with the database using standard SQL syntax. The system includes functionalities for executing various SQL statements, managing database state through JSON files, and performing data manipulation operations such as appending and removing rows. Additionally, it supports querying and filtering data, offering a foundational framework for understanding and implementing SQL operations in a controlled environment.

Web Interface to translate Deep Learned Model for rib fracture detection

Detecting rib fractures in pediatric radiographs is challenging and can lead to missed diagnoses by radiologists. Rib fractures are critical indicators of child abuse, making accurate detection essential. Previous research has employed deep learning models like RetinaNet and YOLOv5 to develop a custom framework for detecting and localizing these fractures in pediatric chest radiographs.

To enhance accessibility, the machine learning model was deployed as a publicly available tool. The deployment involved creating an interactive web application using Flask with a Python backend, integrating scripts to execute the model. The application is hosted on Jetstream, a cloud computing platform, and utilizes the Gunicorn server for efficient HTTP request handling. The website is secured with HTTPS encryption to ensure data protection.

This project aimed to predict protein abundance from RNA data using machine learning. The data includes gene expression (GEX) and protein (ADT) data, both pre-processed and stored in Anndata files. Missing values in GEX data were handled using a KNN imputer, and dimensionality reduction was applied through PCA, retaining 50 components.

A multi-layer perceptron (MLP) regression model with two hidden layers was implemented, achieving a root mean square error (RMSE) of 0.35. This performance surpasses both linear and baseline models, highlighting the model's capability to predict protein abundance accurately based on RNA expression, contributing to advancements in single-cell genomics.

RNA to Protein Prediction