
The Tortoise, the Hare, & the Data Scientist:
Predicting Runner Times in the Chicago Marathon
Marathons are notoriously unpredictable. Trust me - I know. I ran the Vancouver Marathon 4 years in a row, 2013-2016, and saw a wild 75-minute swing in finishing times despite being in the same shape all 4 years. Each race seemed a mystery, but could I have predicted these wild fluctuations? And, more to the point, could I have improved my time? This project sough to answer both questions.
Goal: To predict final marathon times as a function of runners’ features (e.g., age group) and how they pace themselves (e.g., does starting slow lead to faster finishes?)
Dates: September 25 - October 6, 2017
Tools: Ridge, lasso regression; Python (pandas, NumPy, selenium, BeautifulSoup, matplotlib, scikit-learn)
#1 Lesson: Pacing counts for a lot – more than you might think. But no pacing strategy will turn me into Usain Bolt. And web scraping is more fun than I ever imagined!