Cardiovascular Disease Risk Prediction Modeling Health From Habits

This project is machine learning pipeline that predicts cardiovascular disease risk using data from the CDC BRFSS survey. It tackles class imbalance with SMOTEENN resampling and employs classic feature engineering techniques like one-hot encoding, K-means clustering, and correlation pruning to improve performance and interpretability. Models including Random Forests and Neural Networks are trained with class-weighted loss, hyperparameter tuning, and threshold calibration, boosting the F1 score from a 0.11 baseline to 0.37.

This project is machine learning pipeline that predicts cardiovascular disease risk using data from the CDC BRFSS survey. It tackles class imbalance with SMOTEENN resampling and employs classic feature engineering techniques like one-hot encoding, K-means clustering, and correlation pruning to improve performance and interpretability. Models including Random Forests and Neural Networks are trained with class-weighted loss, hyperparameter tuning, and threshold calibration, boosting the F1 score from a 0.11 baseline to 0.37.