College sports conference realignment has been a major part of college sports news recently. The historic Pac-12 conference is likely to disappear after 10 of its 12 teams accepted invites to other conferences. The Big Ten will add some of the Pac-12’s top brands in USC, UCLA, Oregon, and Washington. The Big 12 expanded after losing Texas and Oklahoma to the SEC, adding Cincinnati, BYU, Houston, and UCF from smaller conferences before adding four Pac-12 schools in Arizona, Arizona State, Colorado, and Utah. And finally, the ACC made news by picking up Stanford, California (Berkeley), and SMU, turning the ACC into a nation-wide conference.
Among all the realignment news, many have realized that regionality means less and less nowadays in reference to college conferences. We will be seeing Stanford and UC-Berkeley, two teams on the Pacific coast, playing in the Atlantic Coast Conference! The Big Ten, which previously was centered in the Midwest, now stretches coast to coast as well as there will be matchups such as USC and UCLA playing Rutgers and Maryland.

Personally, I do not like the changes coming to college conferences. I prefer conferences to value regionality over money and brands, similar to many other fans of college sports. Therefore, I decided that I wanted to see what the landscape of college sports would look like if we just created conferences based on regionality, without considering money or TV deals. What would the conferences look like if we wanted truly regional conferences?
Clustering
Teams
The first thing to decide in this exercise was how many teams to use. I did not want to include all college teams, because that is a lot of teams (over 360 teams play D1 college basketball). Therefore, I decided to limit the teams to Power 5 conference teams in addition to a few others. Specifically, I included teams that are currently a basketball member of or will be a basketball member in the future of the ACC, Big 12, Big Ten, Big East, Pac-12, and SEC. Therefore, teams like Cincinnati, BYU, UCF, Houston, and SMU are included. I also included Washington State and Oregon State because they are currently members of the Pac-12, even though their future conference is uncertain. I also just added a few big basketball brands because I was curious. These included Gonzaga, Memphis, and San Diego State (all of whom I believe are deserving power conference teams).
Process
I created conferences based solely on regionality using a K-means clustering algorithm. This algorithm will ensure that each school is in the cluster such that the cluster’s center is closer than any other cluster’s center. The “k” in k-means stands for the number of clusters (or in this case, conferences). I ended up using k = 5, so I clustered the teams into 5 made-up conferences. I just chose this number because it gave results that I liked, but in reality there is no “correct” number of the number of clusters (it just depends on preference).
The k-means clustering algorithm creates clusters by first randomly assigning k teams a their own cluster. The cluster center is therefore just the coordinates of the one team in the cluster. For example, if k = 5, then we randomly choose 5 teams to initialize the 5 cluster centers. Then, we iterate through all the teams and assign each team to the cluster with the closest center. Next, we iterate through each cluster, and update its center by taking the average of each coordinate. Because in this situation I am using latitude-longitude coordinates for location, the cluster center will be the average latitude and average longitude of the teams in the cluster. After that, we iterate through the teams and reassign them to the cluster with the closest center and then update the cluster centers until no teams switch clusters. This delivers the clusters for one single iteration. Because the process relies on randomness for initialization, it is possible to get slightly different results for different random seeds. To ensure we get the best results, we repeat the clustering process several times so that there are many different initializations and we take the best result as the final one. We decide which clustering output is best by seeing which one minimizes the sum of the distances from each team’s location to the their cluster center’s location.
If you are interested, you can see the code that I wrote for this article here: https://github.com/AyushBatra15/Cluster-Conferences
Analysis
Reality
First, before I show the clustering results, I want to display what the future of the college sports conference landscape will look like. Assuming that this round of realignment is over (not a safe assumption, but I can’t see the future), this is what the top conferences will look like in the coming years. Each school is represented as a point on the map, and the color of each point corresponds to their future conference. The “Other” category includes teams that will not be in the ACC, Big 12, Big East, Big Ten, or SEC.

From the map above, it is possible to see some regionality in the conferences, but not a ton. First off, the West coast looks like a mess as the former Pac-12 divides its teams into 3 other conferences (ACC, Big 12, and Big Ten). The Big 12 looks like its pretty concentrated around Texas and the states north and west of Texas, with the notable exceptions of West Virginia, UCF, and Cincinnati (you can’t really see Cincinnati on the map because its directly covered by Xavier since they’re so close together)The Big used to be pretty concentrated around the Great Lakes with a few East Coast teams, but the additions of USC, UCLA, Washington, and Oregon really spread the conference out. The Big East (the only conference in this exercise that didn’t actually change) is pretty spread out across the Northern portion of the United States between Maine and the Great Lakes, but at least all these teams are bound together by the fact that they’re clearly basketball schools (unfortunately, the clustering algorithm does not take this into account as I only used distance, as we will see later). The ACC is similar to the Big Ten in that it used to be pretty regional, only consisting of East Coast schools (or at least schools in the Eastern half of the United States), but their odd expansion spread them all the way out to California. Finally, the SEC seems to be the one of the more regional conferences, with their most northward school being Missouri and most westward being Texas, which is kind of ironic considering they kicked off this whole round of realignment by getting Texas and Oklahoma to join them.
Clustered Conferences
Now, we finally get to the conferences generated by the k-means clustering algorithm. Again, I used 5 clusters, so the teams are divided into 5 different regional made-up conferences. The cluster output just gives us cluster numbers, but to make the results more interpretable, I gave the clusters names: the Great Plains, Midwest, Northeast, Southeast, and West (boring, but accurate).

I don’t know what you think of these theoretical conferences, but they do provide some interesting qualities. Looking at the Northeast, we see many basketball-centric schools in Villanova, Virginia, Duke, UNC, Syracuse, UConn, and Maryland. This would be one of the best college basketball conferences in the nation, although its football would be a bit lacking as its best team would probably be Penn State. The Southeast includes many huge football brands that were in the SEC, like Alabama, Florida, LSU, Auburn, and Georgia, but adds ACC powers Clemson, Florida State, and Miami into the mix of a conference that would be much more football oriented as opposed to basketball oriented. The Midwest conference or cluster includes many current Big Ten teams, along with Kentucky, Louisville, Notre Dame, and a few other additions in a conference that would be strong in both football and basketball. The Great Plains seems to include many current Big 12 teams or schools that have previously been a member of the Big 12, returning to a more regional form. Lastly, the West consists of almost all of the former Pac-12 schools (except for Colorado), but also adds Gonzaga, BYU, and San Diego State for an extra punch.

Just to give an idea of how regional these clusters / conferences are, I’ll show the figure without the school names cluttering it up. You can easily draw boundaries around each cluster that do not intersect. Its almost as if every state has a clear cluster its associated with. East coast states north of South Carolina are clearly associated with the Northeast cluster, the states near the Great Lakes are in the Midwest cluster, the southeast states are in the Southeast (duh), the states near Texas are in the Great Plains, and the west coast states are in the West. It just makes sense.

In order to show just how divided (in a regional sense) the future conferences will be, I broke down each conference (as it will look like after this round of realignment) by how many teams are in each of the 5 clustered regions. It is easy to see that all of the conferences are very spread out. Both the ACC and Big 12 have at least one team from every single region, while the Big Ten has teams from 4 of the 5 regions!

Just how much more regional are the clustered conferences? We can see the differences by measuring the average distance of teams to their cluster centers in each different scenario: the conferences in reality before realignment (labeled “Conference”), the conferences in reality after realignment (labeled “New Conf”), and the not-real clustered conferences (labeled “Cluster”). The average distance of teams to their conference centers is almost double for the conferences after realignment compared to the hypothetical cluster conferences. Even the conferences before realignment were at a respectable level in terms of distance to the conference center, but it shot up so much after realignment.
Note: The visual does not include the distances to cluster center for teams that were in the “Other” category for both the “Conference” bar and the “New Conf” bar.

Concessions
Of course, the clustered conferences are not even close to being perfect. First, they have some funky sizes. For example, the Northeast cluster has 19 schools, which is a really weird number. Most conferences have an even number of teams just to make scheduling and stuff easier. In addition, the clustered conferences result in some lost rivalries. Penn State being apart from Ohio State and Michigan is weird, and they don’t have many natural rivals among the Northeast teams. Another example of this is Missouri in the Midwest cluster, where they don’t really have any historic rivals (in my opinion, Missouri seems more naturally suited for the Great Plains cluster when taking this into account). The same goes with DePaul being in the Midwest (although honestly I don’t even get why DePaul is a power conference team).

Another element that is forgotten by creating conferences solely off locations is school similarity. Certain conferences tend to have schools with similar qualities. For example, the ACC contains many impressive academic institutions like Duke, North Carolina, and Virginia (and Stanford and Berkeley in the near future), and I’m pretty sure they value having schools with strong academics. The Big East is unique because the conference consists of 11 schools that value basketball over football, as most of the Big East schools’ football programs are either in the FCS or don’t exist (I think the only exception is UConn, who has a bad football team). In the clustered conferences, the Big East teams are put into different clusters, so several clusters would have a different number of teams for football and basketball, which would be a bit weird. Conferences like the SEC and Big Ten tend to contain mostly public universities. In fact, only Vanderbilt in the SEC and Northwestern in the Big Ten are private schools in those two conferences, while all the other members are public schools. We can contrast this with the ACC, which has 6 private schools (and 2 more to join in Stanford and SMU). I think an interesting project for the future would be to create conferences based on a clustering algorithm that not only looks at location, but also other qualities like enrollment size, academic ranking, whether a school is private / public, and athletic success. Still, despite all the imperfections with creating conferences by clustering based on location, I think the clustered conferences are far more logical than the current college sports landscape, where regionality matters less and less while money dominates more and more.
Extra Stuff
I just wanted to include a small section to show what some other versions of the clustered conferences may look like. I wrote the code so that the main clustering function had two important arguments: the number of clusters and whether the clusters should have the same number of teams (or as close to having the same number as possible). I added the second argument because (as you may notice below) the clusters may be very unequal as the number of clusters increases. Here are some of the results by changing these parameters:
6 clusters

8 clusters

6 clusters, evenly divided teams
Note: 84 total schools, so 14 schools per cluster

7 clusters, evenly divided teams
Note: 84 total schools, so 12 schools per cluster

