PCA, UMAP & Clustering on Real Mental Health Data
Monday 27 April is a public holiday — complete this lab in your assigned groups in your own time
Data prep is done for you! The starter notebook loads, cleans, and scores the data in cells 1–5. You start the analysis from cell 6.
New this week: VCL fake word filter removes careless responders — participants who claim to know words that don’t exist.
Week 2: Prompting · Week 4: Debugging · Week 6: Refactoring · Week 8: Documentation
Weak
“Explain my code.”
Strong
“I ran PCA on 42 DASS items from 34,500 participants. Write a methods paragraph for a psychology journal (APA style). Include: sample size, measures, preprocessing, number of components, variance explained, clustering details, evaluation metrics. Be specific about software.”
The AI’s draft is a starting point. You MUST verify every number and method name matches your actual analysis. Documentation that doesn’t match is worse than no documentation.
starter.ipynb or starter.pyPCA: 3 clear components, ~55% variance
PC1: ~45% — general distress factor
Best silhouette: k=2 (~0.24)
Stability: very high for k=2 (<1% change)