PROJECTS

MoneyTree

March 2025
Python, TensorFlow, Scikit-Learn, Streamlit, Pandas, BeautifulSoup, NumPy, Requests, VADER, Gemini API, YFinance API, Git/Github

Conceptualized and built end-to-end fintech product addressing beginner investor pain points, delivering AI-powered personal finance assistant with personalized investment recommendations and 2-year asset price predictions.
Conducted user research and market analysis to identify user needs (risk tolerance, financial goals, investing knowledge), designing matching algorithm that connects users to top 3 investment opportunities from 1,600+ curated assets
Led product development from ideation to launch, defining product requirements, user experience flows, and technical specifications while collaborating with development team to deliver award-winning solution

Cinephile

May 2025
Python, SQL (MySQL), NoSQL (MongoDB), NLP (Gemini API), TMDb API, Data Wrangling, Prompt Engineering, Schema Design, Git/GitHub

Developed a natural language query engine that translates user requests into SQL and MongoDB queries, enabling structured search across movie and TV datasets.
Engineered relational and NoSQL schemas optimized for media data (e.g., titles, genres, ratings, cast, streaming platforms), supporting efficient retrieval of complex queries.
Collected and wrangled large-scale datasets from the TMDb API, transforming unstructured JSON responses into structured SQL tables and NoSQL documents.
Applied prompt engineering with Gemini LLM to improve accuracy of automatically generated queries, overcoming challenges with joins, aggregations, and nested lookups.
Strengthened expertise in database design, data pipelines, and NLP-driven query automation

Housing Price Predictor

May 2025
Python, Pandas, Scikit-learn, Matplotlib, NumPy, Linear Regression, Data Visualization, Feature Engineering

Built an interactive ML application that filters housing data by user preferences and predicts prices using linear regression with R-squared accuracy reporting.
Implemented data preprocessing pipeline with one-hot encoding for categorical variables and multi-criteria filtering based on square footage, bedrooms, bathrooms, year built, and neighborhood.
Developed predictive model using scikit-learn with train-test split methodology, achieving quantifiable performance metrics through statistical evaluation.
Created comprehensive data visualizations displaying actual vs. predicted housing prices with scatter plots and rolling average trend lines for model performance analysis.
Designed end-to-end data science workflow from user input validation to automated visualization generation for real estate price analytics.

Determinants of Adult Income: A Longitudinal Analysis

December 2024
Python, STATA, Econometrics, Multiple Linear Regression, Longitudinal Data Analysis, Statistical Modeling, Data Visualization, Hypothesis Testing

Analyzed longitudinal data from 8,984 respondents over 24 years using multiple linear regression to identify childhood predictors of adult income, achieving 21.8% adjusted R-squared.
Applied backwards selection methodology to optimize model performance, examining relationships between family factors and adult earnings through systematic variable selection.
Conducted statistical analysis revealing significant income disparities by gender ($9,975 gap) and race ($5,120 gap) with p<0.001 significance levels.
Engineered quadratic features for parental education variables to capture diminishing returns effects and improve model explanatory power.
Translated complex statistical findings into actionable insights, demonstrating ability to communicate data-driven results for policy and business applications.

Exoplanet Candidate Classification

May 2025
Python, Scikit-learn, Pandas, Matplotlib, PCA, Cross-Validation, Classification Models, Feature Engineering, Model Optimization

Designed end-to-end ML workflow including data preprocessing, dimensionality reduction analysis, automated hyperparameter tuning, and model performance visualization for astronomical data classification.
Built multi-algorithm classification system comparing Logistic Regression, KNN, Decision Tree, and SVM models to predict exoplanet candidates using NASA Kepler mission data with orbital and stellar parameters.
Implemented automated model selection pipeline using RandomizedSearchCV and GridSearchCV for hyperparameter optimization, comparing PCA vs. non-PCA feature sets to maximize classification accuracy.
Performed comprehensive model evaluation using confusion matrices, classification reports, and cross-validation techniques to assess precision, recall, and F1-scores across different exoplanet classification categories.
Conducted feature correlation analysis to identify most influential predictors, discovering orbital inclination as the highest correlated variable with exoplanet detection probability.

Rachana Kadikar

About Me
Projects