Predicting Runner Times in the Chicago Marathon

The Tortoise, the Hare, & the Data Scientist:
Predicting Runner Times in the Chicago Marathon

Marathons are notoriously unpredictable. Trust me - I know. I ran the Vancouver Marathon 4 years in a row, 2013-2016, and saw a wild 75-minute swing in finishing times despite being in the same shape all 4 years. Each race seemed a mystery, but could I have predicted these wild fluctuations? And, more to the point, could I have improved my time? This project sough to answer both questions.

Goal: To predict final marathon times as a function of runners’ features (e.g., age group) and how they pace themselves (e.g., does starting slow lead to faster finishes?)

Dates: September 25 - October 6, 2017

Tools: Ridge, lasso regression; Python (pandas, NumPy, selenium, BeautifulSoup, matplotlib, scikit-learn)

#1 Lesson: Pacing counts for a lot – more than you might think. But no pacing strategy will turn me into Usain Bolt. And web scraping is more fun than I ever imagined!

Where you can learn more: GitHub or my blog

The Tortoise, the Hare, & the Data Scientist: Predicting Runner Times in the Chicago Marathon

Contact Me

The Tortoise, the Hare, & the Data Scientist:
Predicting Runner Times in the Chicago Marathon