Using Principle Component Analysis to Understand the College Football AP Poll

Every week the Associated Press releases a poll of who they think are the top 25 college football teams in the country.  I wanted to understand how these rankings correlated with the win/loss record and offense/defense team metrics.

Principle Component Analysis (PCA) is a data science algorithm that helps find the metrics that correlate for a given target.  For our case our target is AP votes and our metrics are basic team metrics such as yards per game and points per game.  Using this algorithm we can find which team metrics correlate best with AP votes in the AP Poll.


PCA tries to find co-occurring metrics that account for the biggest amount of variance within the data.  The results of PCA is represented in principal components (PC).  The first two PCs accounted for 70% of all data.  This means they can explain the majority of the data.

The results from PCA shows that top 10 ranked teams separate themselves from the rest of the top 25 by having elite defenses.  They also showed that high ranked teams have explosive offenses. The PCA results confirm what can be visually seen in the table above.  Although it is not surprising the top 10 ranked teams in the country have great stats across the board it is surprising that defensive metrics explain AP ranks better than offensive metrics.

The first component (PC1) says AP votes increase when the defense metrics are low which is an interesting insight.

Relevant Material:

AP Top 25 Week 6 Data Source: ESPN

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s