Data Science and Machine Learning Projects

This portfolio showcases end-to-end machine learning pipelines applied to real-world datasets. Projects cover regression and classification tasks, data cleaning, feature engineering, model tuning, and interpretability.

In the King County housing price project, exploratory analysis and regression modeling were applied to predict property values across Seattle. The Lending Club loan project focused on credit risk assessment using logistic regression, random forests, and boosting algorithms, with emphasis on class imbalance and model explainability.

King County House Prices – Geographic Heatmap

King County House Prices: Regression Modeling

This project uses a dataset of over 21,000 residential sales in King County, WA, to build predictive models for house prices. Techniques applied include exploratory data analysis, multivariate linear regression, and tree-based ensemble models such as Random Forest and XGBoost.

Feature engineering was used to transform spatial coordinates, create interaction terms, and extract insights from dates and categorical fields. The final models were evaluated using RMSE and R² metrics, with SHAP plots used for interpretability.

Training vs Validation Loss – Lending Club Loan Model

Lending Club: Loan Default Classification

This project analyzes peer-to-peer lending data from Lending Club to predict loan default risk. Techniques include logistic regression, random forest, and neural networks, with preprocessing to handle missing values, encode categories, and balance class distributions using SMOTE.

Feature selection, hyperparameter tuning, and model evaluation were conducted using AUC-ROC and log loss metrics. The final neural network model demonstrated stable training performance and interpretability was improved using SHAP value analysis.