Create Account

Analysis of SMJHL preseason team rankings, player upgrades, and scoring stats
#1
(This post was last modified: 02-05-2020, 01:50 PM by juke.)

First media bonus
This says 1635 words but Word says 1255 + some pictures
R code word count: An additional 1306 words (I don't know what you want to do with that information but including it just in case)

Edit: Sorry to any goalies, this entire report negates and ignores your existence :(

Hey guys, I realize this piece will end up a week too late since we’re 17 games into the regular season, but I put together a few pictures highlighting how each of the SMJHL teams and players performed in the full preseason, based on their builds and attributes. It’s kind of dry material, and I haven’t put too much effort into organizing it into a piece that reads well and has a good flow, so buckle up. I started because I wanted to see which attributes might be the best to upgrade and the data was easy to scrape and format from the website, and I figured that for the time being it was still the biggest collection of games to analyze. I’m know that some of the analyses may not be entirely accurate, since I think some GMs may have been adjusting their preseason lines to get rookies more ice time, so I plan on just re-running the code once the regular season has progressed a little further into the season to see how well the preseason data holds up.

Like I said earlier, it all started with me wanting to see what attributes I should be putting points into most. The first thing I did was pull everyone’s build, basically what they spend their TPE on, and then correlated each attribute with how good they had been playing. Which is already flawed, because there’s no overall rating for player’s performance, so I just went with their points/60 (PP60). This is especially confounding for defenseman, since the best shut down defender might not necessarily score at a high rate. That being said, it’s clear that for both forwards and defenseman that defense, scoring, puck handling, passing, and skating are the 5 attributes that contribute the most to scoring. Which I’m sure is a shock to no one and it was kind of unnecessary, since I’d wager about 90% of players I’ve seen don’t really upgrade anything else, as you can also see buy the vertical lines on the graph.

[Image: forward_scoring_cor.png][Image: defense_scoring_cor.png]

I also took a look at the attributes compared to +/-, but didn’t find this to be as helpful. A lot of the correlations dropped, and as a lot of hockey fans already know it’s not the best metric to determine how good a player is, and largely depends on the talent of the opposition and teammates. It would be nice to use the stats that the simulation generates to somehow make a metric for defensive success (maybe using some sort of combination between +/-, blocked shots, hits, takeaways etc), but unfortunately for now everyone is valued by their scoring.


[Image: forward_plus_cor.png][Image: defense_plus_cor.png]


Putting it all together, I wanted to see which attributes had the highest spearman ‘r’ value. Again, not really too helpful in the end (pay close attention cause that’s the theme of this entire post) because I assume there’s a lot of covariance between the different attributes, and for some of the attributes correlation is kind of pointless caused 95% of the values are the same, but here you have it.  

[Image: spearman_corrs.png]
 
 
This next picture I believe is a little more telling. For the most part it’s simply a heatmap of all the players in the SMJHL attributes, and the players are sorted by their PP60. You can see that despite scoring having the 4th best correlation for scoring in both forwards and defense, the top scorers in the preseason seemed to upgrade scoring more than passing, puck handling, and skating, while defense remains highly relevant to scoring (Andrei pay attention).


[Image: heatmap.png]
 
  
Since it’s clear that these 5 core attributes are the most relevant to how good players are, I wanted to see if the team standings had anything to do with the team’s average values for each of these attributes. One thing that’s important to note, most team averages ranged from about 63-77 ish, which made differences hard to see in the plot. So I rescaled all the values to a 0-20 scale. This means that the difference in attribute values are now over-exaggerated, just for the sake of the plot, which is ordered from top left to bottom right of where teams placed in the standings. It’s tough to see any clear trends, but I think it is pretty clear that defense wins championships preseason games. Other than that it looks like a teams average puck handling and maybe passing have some loose correlations with their place in the standings. This plot would also have you believe that Detroit is like 3x less talented than all the other teams (just like the real NHL), but again these differences in values are not proportional to the actual differences in team's ratings. And interestingly enough, they have 4 of the top 10 PP60 players from the preseason, so they were quite the top heavy team.

[Image: team_corrs.png?width=676&height=676]
 
 
The team standings over time turned out to be a little disappointing. In the sense that it didn’t make for an exciting graph. Most teams were pretty consistent, with no huge slumping periods. I would say Carolina was the streakiest team. Starting around game 20, the last 30 games of their season were pretty much two giant win streaks sandwiching basically a winless streak. On a personal level, our beloved Berserkers were sitting in 2nd place 80% through the preseason, then fell flat during the last 10 games. Conditioning coach has got to go.
 
[Image: team_records.png]

 
The next plot turned out to be so useless that I debated even putting it in here. I wanted to see the scoring breakdown of the teams (ordered again by their place in the standings), maybe hoping we’d see the better teams scored at a higher even strength rate or something. But scoring is super consistent across all teams, the entire league is within a few percentage points, so I learned nothing here.

[Image: scoring_breakdown.png]
 
This last plot shows the shot differentials for each team, but it’s a little unclear at first. At this point, I was getting kind of sick of making graphs, so I combined 3 different visual scales into one graph. Obviously the x and y coordinates of the points are the team’s shots for/against. The color of the dots represents how many points each team got in the standings, and I put the legend in for reference. The size of the dots represents the ranked goal differential for each team. So Vancouver for instance, despite being dead on the league average for both shots for and against, were last in the preseason for goal differential. The last scale is harder to see, but the transparency of the dots represents what percent of the team’s shots hit the net, as opposed to blocked or missed. More solid means a higher percent. I think all the teams were within 1 or 2% of each other, so I wouldn’t really worry if you can’t see the differences too well. This chart shows how well Anaheim really dominated. Ton of shots for, not many shots against, higher percent of their shots hit the net, and they were rewarded with the best goal differential and points in the league.

[Image: team_shot_breakdown.png]
 
That’s all I got for now. This ended up being more work than it was worth I think, especially the parts regarding player upgrades, because most people already knew what the data said anyways. But I think that the team graphs will be fun enough to keep updating and following along during the season. If you made it all the way to the end of this huge post, I doubt you got anything out of it but thanks for reading!


I've posted the code here below, but be warned because I don't annotate/comment my codes well


Code:
---
title: "smalinowski7, SMJHL preseason"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```




```{r}
### Load packages
library(tidyverse)
library(rvest)
```



```{r}
###Scrape preseason values from the URL


###Name the list of teams in alphabetic order
team_list <- c("Anaheim Outlaws", "Anchorage Armada", "Carolina Kraken",
               "Colorado Raptors", "Detroit Falcons", "Halifax Raiders",
               "Kelowna Knights", "Newfoundland Berserkers", "St. Louis Scarecrows", "Vancouver Whalers")


###scrape player stats data
url_stats <- "https://simulationhockey.com/games/smjhl/S52/Preseason/SMJHL-PRE-ProTeamScoring.html"
website_stats_1 <- read_html(url_stats) %>%
  html_nodes(".STHSScoring_PlayersTable1") %>%
  html_table()
website_stats_2 <- read_html(url_stats) %>%
  html_nodes(".STHSScoring_PlayersTable2") %>%
  html_table()




###Iterate through all 10 teams and add them to the same dataframe
preseason_stats <- data.frame()
for (i in 1:10) {
  team_name <- team_list[i]
  temp_df <- data.frame(website_stats_1[[i]]) %>%
    mutate(Team = team_name) %>%
    left_join(data.frame(website_stats_2[[i]]))
  preseason_stats <- rbind(preseason_stats, temp_df)

}



###scrape player attribute data
url_attributes <- "https://simulationhockey.com/games/smjhl/S52/Preseason/SMJHL-PRE-ProTeamRoster.html"
website_attributes <- read_html(url_attributes) %>%
  html_nodes(".STHSRoster_PlayersTable") %>%
  html_table()


###Iterate through all 10 teams and combine
preseason_attributes <- data.frame()
for (i in 1:10) {
  team_name <- team_list[i]
  temp_df <- data.frame(website_attributes[[i]]) %>%
    mutate(Team = team_name)
  preseason_attributes <- rbind(preseason_attributes, temp_df)

}



###Scrape schedule and results data
url_schedule <- "https://simulationhockey.com/games/smjhl/S52/Preseason/SMJHL-PRE-ProTeamSchedule.html"
website_schedule <- read_html(url_schedule) %>%
  html_nodes(".basictablesorter") %>%
  html_table()


###Iterate through all 10 teams schedule and combine
preseason_schedule <- data.frame()
for (i in 1:10) {
  team_name <- team_list[i]
  temp_df <- data.frame(website_schedule[[i]]) %>%
    mutate(Team = team_name)
  preseason_schedule <- rbind(preseason_schedule, temp_df)

}


###Add points, cumulative points, and game number to schedule
preseason_schedule <- preseason_schedule %>%
  mutate(Points = case_when(ST == "W" ~ 2,
                            ST == "L" & OT =="" & SO == "" ~ 0,
                            ST == "L" & (OT == "*" | SO == "*") ~ 1)) %>%
  group_by(Team) %>%
  mutate(game_num = seq(1:50),
         cum_points = cumsum(Points))

```



```{r}
###Tidy and format the data


###Fix inconsistencies between attribute and stat player names
preseason_attributes$Player.Name <- gsub("(R)", " (R)", preseason_attributes$Player.Name, fixed = TRUE)
preseason_attributes$Player.Name <- gsub("(C)", "", preseason_attributes$Player.Name, fixed = TRUE)
preseason_attributes$Player.Name <- gsub("(A)", "", preseason_attributes$Player.Name, fixed = TRUE)


###join the two dataframes, specifiy position variable, and filter by minutes played
preseason_combined <- inner_join(preseason_stats, preseason_attributes, by = "Player.Name") %>%
  mutate(Position = case_when(F == "X" ~ "Forward",
                              TRUE ~ "Defense"),
         PP60 = P/(MP/60)) %>%
  filter(AMG >= 9) %>%
  select(Player.Name, GP:S3,CK:PS.y, Position, PP60)
```





```{r}
###Team stats polar chart

preseason_combined %>% group_by(Team.x) %>%
  summarise(Skating = mean(SK),
            Scoring = mean(SC),
            Passing = mean(PA),
            `Puck Handling` = mean(PH),
            Defense = mean(DF)) %>%
  gather(key = Stat, value = Average, Skating:Defense) %>%
  mutate(Average = 20*((Average - min(Average))/(max(Average) - min(Average))),
         Team.x = factor(Team.x,
                         levels = c("Anaheim Outlaws", "Carolina Kraken", "Anchorage Armada", "Kelowna Knights", "St. Louis Scarecrows",
                                    "Newfoundland Berserkers", "Detroit Falcons", "Colorado Raptors", "Halifax Raiders", "Vancouver Whalers"))) %>%
 
  ggplot(aes(x = Stat, y = Average, fill = Stat)) +
    geom_col(alpha = .8) +
    facet_wrap(.~Team.x) +
  theme(panel.background = element_blank(),
        panel.border = element_rect(color = "black", fill = NA),
        strip.background = element_blank(),
        axis.text.x = element_blank()) +
  coord_polar() +
  labs(x = NULL, y = NULL) +
  scale_fill_manual(values = c(Defense = "#54457F",
                               Passing = "#05A8AA",
                               `Puck Handling` = "#7FC6A4",
                               Scoring = "#E8DAB2",
                               Skating = "#DD6E42"))
 
```




```{r}
### Correlate player attribute values to PP/60 and +/- ratings


forwards <- preseason_combined %>%
  filter(Position == "Forward") %>%
  select(PP60, X..., CK:PS.y)
 

defense <- preseason_combined %>%
  filter(Position == "Defense") %>%
  select(PP60, X..., CK:PS.y)


forwards %>% gather(key = Attribute, value = Value, CK:PS.y) %>%
  ggplot(aes(x = Value, y = PP60)) +
  geom_point() +
  facet_wrap(.~ Attribute) +
  geom_smooth(se = F, col = "red", alpha = .75, size = 1) +
  theme_bw() +
  theme(strip.background = element_blank()) +
  labs(x = "Attribute Value", y = "PP/60\n", title = "Forwards Scoring Correlations")

defense %>% gather(key = Attribute, value = Value, CK:PS.y) %>%
  ggplot(aes(x = Value, y = PP60)) +
  geom_point() +
  facet_wrap(.~ Attribute) +
  geom_smooth(se = F, col = "red", alpha = .75, size = 1) +
  theme_bw() +
  theme(strip.background = element_blank()) +
  labs(x = "Attribute Value", y = "PP/60\n", title = "Defenseman Scoring Correlations")

forwards %>% gather(key = Attribute, value = Value, CK:PS.y) %>%
  ggplot(aes(x = Value, y = X...)) +
  geom_point() +
  facet_wrap(.~ Attribute) +
  geom_smooth(se = F, col = "red", alpha = .75, size = 1) +
  theme_bw() +
  theme(strip.background = element_blank()) +
  labs(x = "Attribute Value", y = "+/-\n", title = "Forwards +/- Correlations")

defense %>% gather(key = Attribute, value = Value, CK:PS.y) %>%
  ggplot(aes(x = Value, y = X...)) +
  geom_point() +
  facet_wrap(.~ Attribute) +
  geom_smooth(se = F, col = "red", alpha = .75, size = 1) +
  theme_bw() +
  theme(strip.background = element_blank()) +
  labs(x = "Attribute Value", y = "+/-\n", title = "Defenseman +/- Correlations")

forward_cor <- data.frame(cor(forwards$PP60, select(forwards, -c(PP60, X...)), method = "spearman")) %>%
  gather(key = Attribute, value = correlation, CK:PS.y) %>%
  mutate(Position = "Forwards")
defense_cor <- data.frame(cor(defense$PP60, select(defense, -c(PP60, X...)), method = "spearman")) %>%
  gather(key = Attribute, value = correlation, CK:PS.y) %>%
  mutate(Position = "Defenseman")

correlations <- rbind(forward_cor, defense_cor) %>%
  drop_na() %>%
  arrange(Position, desc(correlation)) %>%
  mutate(Stat = paste(Attribute, " (", Position, ")", sep = ""),
         Stat = factor(Stat, levels = Stat))

ggplot(correlations, aes(y = correlation, x = fct_rev(Stat), fill = correlation)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  scale_fill_gradient2(low = "navyblue", mid = "grey", high = "red", midpoint = 0) +
  theme_bw() +
  labs(x = NULL, y = "\nSpearman Correlation")
```




```{r}
###heatmap of player attribues
heatmap <- preseason_combined %>%
  select(Player.Name, PP60, CK:PS.y) %>%
  arrange(desc(PP60))

rownames(heatmap) <- NULL

heatmap <- select(heatmap, -c(Player.Name, PP60))

player_names <- heatmap$Player.Name

Heatmap(heatmap, row_order = player_names, width = unit(10, "cm"), name = "Attribute Value")
```



```{r}
###track team points over time


###

ggplot(preseason_schedule, aes(x = game_num, y = cum_points, col = Team, label = Team)) +
  geom_line(size = 1) +
  theme_bw() +
  theme(legend.position = "bottom",
        legend.title = element_blank()) +
  labs(x = "Game Number", y = "Cumulative Points\n") +
  scale_color_manual(values = c(`Anaheim Outlaws` = "#820c00",
                                `Detroit Falcons` = "#EAC117",
                                `Vancouver Whalers` = "#54C571",
                                `St. Louis Scarecrows` = "#B572B3",
                                `Kelowna Knights` = "#254117",
                                `Anchorage Armada` = "#848482",
                                `Halifax Raiders` = "black",
                                `Colorado Raptors` = "#000080",
                                `Carolina Kraken` = "#FF9B42",
                                `Newfoundland Berserkers` = "red"))
```



```{r}
###breakdown of team scoring

preseason_combined %>%
  group_by(Team.x) %>%
  summarize(goals = sum(G),
            PPG = sum(PPG),
            PKG = sum(PKG),
            EG = sum(EG),
            ES = goals - PPG - PKG - EG) %>%
  select(-goals) %>%
  gather(key = Type, value = Total, PPG:ES) %>%
  mutate(Team.x = factor(Team.x, levels = c("Anaheim Outlaws", "Carolina Kraken", "Anchorage Armada", "Kelowna Knights", "St. Louis Scarecrows",
                                    "Newfoundland Berserkers", "Detroit Falcons", "Colorado Raptors", "Halifax Raiders", "Vancouver Whalers")),
         Type = factor(Type, levels = c("ES", "PPG", "PKG", "EG"),
                       labels = c("Even Strength", "Power Play", "Short Handed", "Empty Net"))) %>%
   ggplot(aes(x = Team.x, y = Total, fill = fct_rev(Type))) +
  geom_col(position = "fill", alpha = .75) +
  scale_x_discrete(expand = c(0,0)) +
  scale_y_continuous(expand = c(0,0)) +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        axis.ticks.x = element_blank(),
        panel.grid = element_blank(),
        panel.background = element_rect(fill = NULL)) +
  labs(fill = "Goal Type", y = "Frequency\n") +
  scale_fill_manual(values = c(`Even Strength` = "#052F5F",
                               `Power Play` = "#06A77D",
                               `Short Handed` = "#D5C67A",
                               `Empty Net` = "#F1A208"))
 
```



```{r}
###team shot charts

###pull the goalie stats
url_goalie <- "https://simulationhockey.com/games/smjhl/S52/Preseason/SMJHL-PRE-ProTeamScoring.html"
goalie_stats_1 <- read_html(url_goalie) %>%
  html_nodes(".STHSScoring_GoaliesTable") %>%
  html_table()



###Iterate through all 10 teams and add them to the same dataframe
goalie_stats <- data.frame()
for (i in 1:10) {
  team_name <- team_list[i]
  temp_df <- data.frame(goalie_stats_1[[i]]) %>%
    mutate(Team = team_name)
  goalie_stats <- rbind(goalie_stats, temp_df)

}


goalie_sum <- goalie_stats %>%
  group_by(Team) %>%
  summarise(shots_against = sum(SA),
            goals_against = sum(GA))

team_sum <- preseason_combined %>%
  mutate(Team =Team.x) %>%
  group_by(Team.x) %>%
  summarise(shots_for = sum(SHT),
            blocked_shots = sum(OSB) + sum(OSM),
            goals_for = sum(G)) %>%
  mutate(Team = Team.x,
         percent_through = shots_for/(shots_for + blocked_shots)) %>%
  select(-Team.x) %>%
  left_join(goalie_sum, by = "Team") %>%
  mutate(goal_diff = goals_for - goals_against,
         points = c(69,62,62,50,51,48,61,54,54,43))

ggplot(team_sum, aes(x = shots_for, y = shots_against, label = Team)) +
  geom_point(aes(size = goal_diff, alpha = percent_through, col = points)) +
  theme_bw() +
  geom_vline(xintercept = mean(team_sum$shots_for), linetype = "dashed") +
  geom_hline(yintercept = mean(team_sum$shots_against), linetype = "dashed") +
  geom_text_repel() +
  scale_color_gradient2(low = "blue", mid = "black", high = "red", midpoint = 55) +
  scale_alpha_continuous(range = c(.33,1)) +
  guides(alpha = FALSE) + guides(size = FALSE) +
  labs(col = "Team Points") +
  labs(x = "Shots For", y = "Shots Against\n")
```

[Image: smalinowski7.gif]
Sigs: Thanks JNH, Lime, Carpy, and ckroyal92 
Reply
#2

Awesome analysis!

[Image: 8cjeXrB.png]
[Image: XigYVPM.png]
[Image: VGl3CB4.png]
Reply
#3

Oh fuck, this is legit work. What did you use to pull the data, Python? And then did you use the libraries in python or R to get the graphs

[Image: 0XJkcN5.png]
Czechoslovakia PROFILE || UPDATE || RAGE. Rage 
[Image: luketd.gif]




Reply
#4

yay Data!!!!!!

[Image: QsZ1vOb.jpeg]
[Image: lqfXIpe.jpeg]
Reply
#5

Berserker stonks

[Image: arTbD7O.png][Image: fQjjgGe.png]

Germany Berserkers Stampede Stars Barracuda syndicate Blizzard
[Image: PuANRuu.png]
Reply
#6

02-03-2020, 02:46 PMluketd Wrote: Oh fuck, this is legit work. What did you use to pull the data, Python? And then did you use the libraries in python or R to get the graphs

Obviously answered you in the discord, but in case anyone else was wondering: rvest R package for scraping, and ggplot R for the graphs

[Image: smalinowski7.gif]
Sigs: Thanks JNH, Lime, Carpy, and ckroyal92 
Reply
#7

New team logo

The Newfoundland MostlyDefense

[Image: newlog10.png]

[Image: cleanandrei.png]
Reply
#8

Thanks for the stats! I love me some stats.

[Image: image.png]
[Image: v2ZHYxx.png]
Reply
#9

This is awesome work! Quick question though... Does Halifax have that low of scoring? Maybe that's our problem haha

[Image: sfOMki5.png]
Reply
#10

02-03-2020, 04:29 PMmxman991 Wrote: This is awesome work! Quick question though... Does Halifax have that low of scoring? Maybe that's our problem haha

Halifax ended up with the least goals scored in the preseason out of the 10 teams, and the second lowest differential (Vancouver was lowest)

[Image: smalinowski7.gif]
Sigs: Thanks JNH, Lime, Carpy, and ckroyal92 
Reply
#11

This is amazing work - keep it coming!

[Image: thiefofcheese.gif]


[Image: Yztckjo.png] 


Sig credit: Ragnar, Carpy48, High Stick King

Reply
#12

Good stuff!

[Image: JvI8fTp.png]

[Image: 9tINabI.png] [Image: c97iD9R.png]

[Image: uDjThoa.png]




Reply
#13

Excited for your next data article

[Image: 336.jpg]
Reply




Users browsing this thread:
1 Guest(s)




Navigation

 

Extra Menu

 

About us

The Simulation Hockey League is a free online forums based sim league where you create your own fantasy hockey player. Join today and create your player, become a GM, get drafted, sign contracts, make trades and compete against hundreds of players from around the world.