PSYC4411

Build a Defensible Classifier

Week 6 Challenge Lab

Trees, Ensembles & Real Mental Health Data

Today's Challenge

  • Build classifiers for elevated depression (PHQ-9 ≥ 5)
    • Baseline → Logistic Regression → Decision Tree → Random Forest
  • Report proper metrics — not just accuracy
    • Precision, recall, F1, AUC, confusion matrix
  • Justify your threshold — why 0.5?
    • Try at least one alternative and compare
  • Present your results (1 slide, ~3 min)
    • Best model + metrics, threshold justification, feature importance, ethical consideration, refactoring win

The Dataset: BC COVID Sleep & Well-Being

  • 3 CSV files to merge:
    • daily_survey.csv — 52K daily entries
    • demographics.csv — age, sex
    • round1_assessment.csv — Big Five, GAD-7
  • After merging: ~836 participants
  • Target: PHQ-9 ≥ 5 (elevated depression)
    • ~54.5% elevated / ~45.5% minimal — nearly balanced
  • 21 features: sleep, mood, personality, demographics, anxiety

Data prep is done for you! The starter notebook loads, aggregates, and merges the data in cells 1–8. You start modelling from cell 9.

Key features: PANAS negative affect, GAD-7 anxiety, stress coping, sadness, isolation, sleep latency, personality (Big Five)

New LLM Skill: Refactoring

Week 2: Prompting · Week 4: Debugging · Week 6: Refactoring

Weak

“Clean up my code.”

Strong

“Refactor to: (1) separate loading from modelling, (2) add shape assertions after merges, (3) create a reusable evaluate_model() function, (4) add docstrings. Keep logic identical.”

Refactoring = making code cleaner without changing what it does. Aim for code someone else could read and understand.

Getting Started

Steps

  1. Open starter.ipynb or starter.py
  2. Run cells 1–8 (data is loaded & merged)
  3. Ask your AI to plan first
  4. Build: Baseline → LogReg → Tree → Forest
  5. Evaluate, analyse, refactor, present

Target Numbers

Baseline accuracy: ~54.5%

Good model accuracy: ~80%

Good model AUC: ~0.90

Top predictors: PANAS negative affect, GAD-7, stress coping