Every week the Associated Press releases a poll of who they think are the top 25 college football teams in the country. I wanted to understand how these rankings correlated with the win/loss record and offense/defense team metrics.
Principle Component Analysis (PCA) is a data science algorithm that helps find the metrics that correlate for a given target. For our case our target is AP votes and our metrics are basic team metrics such as yards per game and points per game. Using this algorithm we can find which team metrics correlate best with AP votes in the AP Poll.
PCA tries to find co-occurring metrics that account for the biggest amount of variance within the data. The results of PCA is represented in principal components (PC). The first two PCs accounted for 70% of all data. This means they can explain the majority of the data.
The results from PCA shows that top 10 ranked teams separate themselves from the rest of the top 25 by having elite defenses. They also showed that high ranked teams have explosive offenses. The PCA results confirm what can be visually seen in the table above. Although it is not surprising the top 10 ranked teams in the country have great stats across the board it is surprising that defensive metrics explain AP ranks better than offensive metrics.
AP Top 25 Week 6 Data Source: ESPN