For some NBA players, it takes time to reach full potential. There are unexpected breakout players every season. For example, take Andrew Wiggins. Wiggins never turned into the player that many scouts expected him to be when he was drafted in 2014 by the Minnesota Timberwolves, culminating in a trade to the Golden State Warriors. However, with the change of scenery, Wiggins rose to new heights. He made his first all-star game last season and helped the Warriors win an NBA championship.

Using statistics from previous seasons, though, maybe there is a way to predict these seemingly unexpected breakouts. Almost nobody expected Andrew Wiggins, Dejounte Murray, and Darius Garland to transform into all-star players last season, but they did. By looking into the stats before a breakout season, we can estimate how likely each player is to make their first all-star game.

## Sample

The first step in creating a model to estimate the chance of being a first-time all-star is to gather a sample. In order to get a reasonable sample of players while still keeping only recent years, I used player statistics from 2016-17 to the 2020-21 NBA seasons. I filtered out all players that had made an all-star game before the given season in order to keep only non-all stars. Then, I filtered out all players above 26 years and those with fewer than 600 minutes in a season. I filtered age because NBA players usually reach their peek at about 26-28 years of age. It is extremely rare to see a player make their first all-star appearance after this age range. I used 600 minutes as a requirement to ensure players had enough minutes and avoid making predictions off of low sample sizes. After I gathered the sample, I labeled the data by marking those who made their first all-star game in the following season. For example, the 2020-21 statistics for Andrew Wiggins were marked as breakout because in the following season (2021-22), he made his first all-star game.

Overall, this sample included 784 observations, 28 (3.6%) of which ended up being a first-time all-star in the following season.

## Variable Selection

Once I gathered the sample, the next steps were to explore which variables had the greatest impact on predicting future breakouts. Just thinking about recent first-time all stars, some common qualities come to mind. Often times, these players have a high usage on offense, generating lots of points and assists. Additionally, it is common that first-time all-stars will be high draft picks, usually in their 2nd or 3rd season, such as LaMelo Ball in 2022 and Zion Williamson in 2021. They also are usually able to create their own shot, whether by driving, using pick and rolls, pulling up, or creating from the post.

First, let’s explore the importance of being a high draft pick in predicting future breakout players. Just by looking at all-star appearances since 2018, we can see that draft pick matters greatly as nearly half (16/33 or 48%) of first-time all-stars were top 5 draft picks. The reason for this is obvious: better players are picked earlier in the draft. But we can get a better idea of the magnitude of the impact by looking at the sample. In the graph below, the breakout rate is shown by groups of draft picks. The breakout rate is simply the number of players who made their first all-star appearance in the following year divided by all players in that category. From this graph, we can see that an astounding 17% of top 5 draft picks had a breakout season in the next year. However, after the top 5, this rate drops off dramatically. Every other group had a breakout rate of less than 5%.

In addition to draft pick, another important factor is usage. There are many different ways to measure usage, including usage rate, points, turnovers, and field goal attempts. While each of these would likely work, it turns out that using turnovers provided the best fit using the multiple logistic regression model (explained further in the following section). Using any measure of usage, we can see that the chance of breaking out greatly increases if a player has a high usage. By plotting turnovers per 100 possessions against breakout rate, we discover that the turnovers are extremely good at predicting future all-stars. Although turnovers are often considered a bad thing, they can be good predictors of future success, especially for young players.

We have now seen that usage is an extremely good predictor of future breakout seasons, but what about efficiency. Unlike usage and draft pick, which were obvious signs of potential, the importance of efficiency is less clear. In general, many young players are inefficient in their first few years in the league, often shooting a low percentage and turning the ball over frequently. Therefore, it is plausible that efficiency does not matter. However, looking at the data gives a different insight.

In order to determine if efficiency matters, I plotted turnovers per 100 possessions (usage) against true shooting percentage (efficiency). I chose true shooting percentage as the measure of efficiency because we have already seen that turnovers are actually a good predictor of future success, so I wanted to leave them out. Therefore, I chose a statistic that only measures shooting efficiency. Looking at the graph, shooting efficiency does seem to be important. Players that had a high true shooting percentage and lots of turnovers were the most likely to become first-time all-stars in the next season. While it was common to see breakout players with below average efficiency, none of the breakout players had a really bad true shooting percentage (below 50%). One example where it was important to look at both usage and efficiency was Killian Hayes in 2021. Hayes had the 2nd highest usage (5.9 TOV/100) in the entire sample, but his efficiency was atrocious with a 42% true shooting percentage. If we only looked at usage, we would have expected Hayes (who was the 7th overall pick in 2020) to be a likely breakout candidate. However, when we take efficiency into account, it is clear why he has not made an all-star game.

Lastly, I wanted to incorporate a player’s ability to create his own shot. Usually, all-star caliber players are able to make their own shot when they need to. The most common ways to do this are to isolate against a defender and either get to the rim or take a pull-up shot, or to post up against a bigger defender. Using NBA.com’s tracking stats, I explored how pull-up efficiency and post-up efficiency impacted the likelihood of becoming a first-time all-star.

The graph below shows pull-up efficiency against post-up efficiency, with the color of the points denoting whether they had a breakout season the following year or not. The efficiencies are expressed as points per shot attempt, but I did not use raw efficiency numbers because then there is misleading data for those with few shot attempts. For example, a player that made 1/1 pull-up shots would have an efficiency of 2.0 despite probably not being a good pull-up shooter. Therefore, I used a sigmoid function to take volume into account. In summary, the efficiencies in the graph take both volume and efficiency into account, so players with very few attempts have an efficiency close to 0.

The relationship between breakout status and creating one’s own shot is clear. All of the players that had a breakout season were either decent in the post or when pulling up, while most of the breakouts were really good in one or the other. Only 3 of the 28 breakout players in the sample had both post-up efficiency and pull-up efficiency below 0.75 points per shot (Ben Simmons, Jarrett Allen, Bam Adebayo). Additionally, it is easy to see that many of the green dots lie near the line for a pull-up efficiency of 0.9 points per shot, indicating that elite pull-up shooting is a good predictor of being a first-time all-star in the next season. The same is true for great post-up players, although to a slightly lesser extent.

## Logistic Regression Model

The final step in predicting the likelihood of a player turning into a first-time all-star is to put all the variables together. To complete this task, I used a logistic regression with 6 input variables: draft pick (numberPickOverall), turnovers per 100 possessions, steals per 100 possessions, true shooting percent, pull-up efficiency, and post up efficiency. I log-transformed the overall draft pick variable in order to account for the fact that draft pick only really matters for high draft picks, and after that the impact drops off. To incorporate usage, I tried several variables like usage rate, points, assists, and field goal attempts, but the variable that gave the best fit was actually turnovers. In addition, I included steals as a variable because it incorporates some element of defense. The results of the regression are shown below, but the most important part is the significance (the column that says Pr(>|z|) ). All of the variables are highly significant (less than 0.01) except for steals, which is a sign that this model is a good fit.

To further explore how good the model is, we can look at the predictions it made for the sample. The most likely breakout candidates according to the model for the past 5 seasons are shown below. Some notable hits included Joel Embiid in 2018, Ben Simmons in 2020, Luka Doncic in 2020, and LaMelo Ball in 2022. However, a weakness of the model is that it seems to believe in top 5 picks a little too much. For example, it has given Jabari Parker a solid chance to breakout despite the fact he was bouncing between teams, and it also has given De’Aaron Fox and Deandre Ayton high chances of becoming first-time all-stars for several years.

It is important to acknowledge that this model has several shortcomings. I did not split the data into a train and test set since the sample was not very large (there were only 28 “successes” in the entire sample), so it is possible that there is some overfitting occurring. Additionally, the model doesn’t give good predictions for rim protecting big men like Jarrett Allen and Bam Adebayo. Both of these players had low predicted breakout chances because none of the variables in the regression are impacted by good rim protecting big men (except for maybe true shooting percentage). Blocks was not a significant variable in the regression, so it is likely that the model doesn’t do well with these types of players because there weren’t many of them in the sample. However, the regression still gives us a guide to the most likely breakout players.

## Predicted Breakouts in 2022-23

Finally, we can get an idea of the most likely breakout candidates for this season by applying the model to last year’s stats. The results for the players with the greatest chances to break out are shown below. The most likely breakout player according to the results of the regression is Cade Cunningham (Chance of breaking out: 61%). Right behind him are two other former 1st overall picks in Deandre Ayton (54%) and Anthony Edwards (42%). Cunningham and Edwards both seem to be the future of the teams, so it makes sense that they could get their first all-star appearance this year. The same is true for Haliburton (35%), who is (as of November 12) averaging almost 10 assists and 21 points while hitting over 45% of 3-point attempts. Those seem like all-star numbers to me. However, I am less confident in Ayton to break out since he is the 3rd best player on his team and he hasn’t had a great start to the season.

It is also unlikely that we will see Lonzo Ball (33%) and Aaron Gordon (20%) turn into all-stars this season. Ball will be out until mid-season, so he won’t be able to make a case for the all-star game this year. Aaron Gordon had a high projection because he is a former top 5 pick that averages over 2.5 turnovers per 100 possessions while shooting efficiently. In addition, he has a decent pull-up efficiency and a good post-up efficiency, so the model gives him a better chance of being a first-time all-star than one may expect. While I don’t think this will happen, it does show us that Gordon is a key piece for the Nuggets despite being the 4th option behind Jokic, Murray, and Porter Jr.

Lastly, I wanted to point out that Alperen Sengun (11%) has a better chance of breaking out than his Rockets teammate Jalen Green. Sengun has a surprisingly high usage (4.6 TOV/100 and 5.9 AST/100) and was one of the most efficient post-up players in the league, scoring 1.11 points per post shot.

Being able to predict future all-stars is important for many reasons. It can tell bettors which teams may be on the rise as their players could develop quickly, it can give general managers an idea of players to try to acquire that will be good in the future, and it can tell fans which players to keep an eye on. Turnovers, draft pick, and efficiency are the best predictors of future first time all-stars based on the past 5 seasons. Using a regression with these predictors, Cade Cunningham, Deandre Ayton, and Anthony Edwards are the most likely breakout players in 2023.