What's New
New updates and improvements to College Aid Index
← BackProduction Incident - site unavailable for approximately 52 hours.
Fix
Incident Summary
On Wed 11/26/2025 I made a classic mistake by rushing a code change before leaving for Thanksgiving break. It broke the site. Monitoring wasn’t in place, so I only learned from users. I’m sorry for the downtime and frustration.
On Wed 11/26/2025 I made a classic mistake by rushing a code change before leaving for Thanksgiving break. It broke the site. Monitoring wasn’t in place, so I only learned from users. I’m sorry for the downtime and frustration.
Impact
- Site unavailable for roughly two days (11/26–11/28).
- Users you contacted directly couldn’t access reports.
Timeline
- Wed 11/26: Last-minute change deployed; no alerts raised.
- Thu 11/27: Outage ongoing; discovered via user reports.
- Fri 11/28: Returned, found the bad change, deployed fix; site restored.
Root Cause
- Rushed, unreviewed code change introduced a breaking error.
- No monitoring to surface the outage quickly.
Resolution
- Fixed the faulty change and redeployed; site is serving traffic now.
Action Items
- No more last-minute deploys without review or a rollback plan.
- Add basic monitoring/alerts so outages are caught immediately.
- Improve test coverage to catch issues before deployment.
Context
This is an early, part-time project, but availability comes first. I’ll prioritize stability over new features when time is limited.
This is an early, part-time project, but availability comes first. I’ll prioritize stability over new features when time is limited.