Trees, Ensembles & Real Mental Health Data
daily_survey.csv — 52K daily entriesdemographics.csv — age, sexround1_assessment.csv — Big Five, GAD-7Data prep is done for you! The starter notebook loads, aggregates, and merges the data in cells 1–8. You start modelling from cell 9.
Key features: PANAS negative affect, GAD-7 anxiety, stress coping, sadness, isolation, sleep latency, personality (Big Five)
Week 2: Prompting · Week 4: Debugging · Week 6: Refactoring
Weak
“Clean up my code.”
Strong
“Refactor to: (1) separate loading from modelling, (2) add shape assertions after merges, (3) create a reusable evaluate_model() function, (4) add docstrings. Keep logic identical.”
Refactoring = making code cleaner without changing what it does. Aim for code someone else could read and understand.
starter.ipynb or starter.pyBaseline accuracy: ~54.5%
Good model accuracy: ~80%
Good model AUC: ~0.90
Top predictors: PANAS negative affect, GAD-7, stress coping