Response to Deeper look into steals/turnovers - Printable Version +- Simulation Hockey League (https://simulationhockey.com) +-- Forum: League Media (https://simulationhockey.com/forumdisplay.php?fid=610) +--- Forum: SHL Media (https://simulationhockey.com/forumdisplay.php?fid=46) +---- Forum: Graded Articles (https://simulationhockey.com/forumdisplay.php?fid=545) +---- Thread: Response to Deeper look into steals/turnovers (/showthread.php?tid=95946) |
Response to Deeper look into steals/turnovers - Fordyford - 05-04-2019 I looked at @aaronwilson's post the other day about steals/turnovers, and thought that the data looked fairly spread out. Lots of lines seeming to show correlation, but I wasn't full convinced. With that in mind, I decided that I would look a bit further into the data (thanks t him kindly providing me with his spreadsheet), to see if it is statistically significant. For the purposes of this article, I am using a 95% confidence level as my standard confidence level (as is the standard in academia). The first part of the article was a graph of Corsi For % vs EVPTS/60 (even strength points/60). The graph appeared to show a positive correlation. With 247 data points, we need only a very small value of a statistic called the Product Moment Correlation Coefficient to be calculated, for this to provide significant evidence of a correlation. Because I can't be bothered to calculate the value for 247 exactly, I will be using the critical value for 200 data points (fewer data points means a greater value is required for significance, hence this means that I may decide something is insignificant when using the correct value it would be significant) The Value Calculated (using ExCel's CORREL function) for this relationship was 0.4050, which is much greater than the value required (.138), so we can conclude that it is likely these stats are positively correlated (in fact the probability that they aren't, given this value, is much lower than 1%) Next, we come to STL/TO vs CF%. This is an interesting one, because, as Wilson commented, he would expect a positive correlation, but the graph appeared to show little correlation. This gives a value of the PMCC of 0.039, much lower than the required value. Again, this provides strong evidence that these values have NO correlation, despite what one might expect, which is interesting. Next up the combo deal of Turnovers/60 vs SC-PA and Steals/60 vs SC-PA. The values were, respectively, -.3983 and -.4019, which are both statistically VERY significant negative values, indicating that pass first players certainly seem to get both more turnovers and more steals. If anyone can work out some reason for this, I'd be interested to hear (specifically steals, not turnovers, which is fairly obvious). The final graph was STL/TO against SC - PA which provided a correlation coefficient of -0.2207. This is statistically significant still, but less so, providing evidence (still at over a 99% confidence level) that these are negatively correlated. There are some important takeaways from this: 1) All of aawil's conclusions are correct, in terms of whether data is correlated 2)Correlation is not NECESSARILY causation, although it may be. These could all be coincidental, or caused by a 3rd factor. The only thing this article provides is evidence that they are likely correlated (this isn't the most random of samples, as it only comes from one season. More seasons' worth of data may provide more detailed analysis of whether STHS has consistent correlation in these areas) 3)These stats are not as correlated as they might be. A stat that should obviously be very correlated, and indeed one have a causal effect on the other, even strength time on Ice vs even strength points, has a value of .6644, which is higher than these. High values of CF% are far from a guarantee of high numbers of even strength points. 4) I have far too much time on my hands (I am expanding this article because Im still hella bored) I was then interested to use some of the stats and see if we can fit them to any distribution (which may be useful for future analysis) I will be attempting to perform tests known as "Chi Square Goodness of fit tests" to test for the normality of this data, by assuming that the data are appropriately modelled with a normal distribution and testing whether this is a reasonable assumption. For those who care, a chi-square goodness of fit test compares the expected value provided by the distribution with the observed value, squares it and then divides by the expected value, and this provides a useful measure of whether the data fits with the suggested distribution. For a certain reason (that relies on assuming normality for certain data, you may be aware of the Central Limit Theorem whereby any data randomly sampled from a population approximately follows a normal distribution as long as n>25 or so), the values of the test statistic must be greater than 5. I will be grouping the data in order to ensure this is the case. I will be working on this overnight, and uploading my findings in the morning. I failed to perform the Chi-square distribution tests, because there are in almost all the data too many outliers. However, I then spotted something interesting, looking at CF%. I decided to plot DF stat against CF%, and observe the correlation. The graph is as follows: The correlation coefficient being 0.521522. This is incredibly large and suggests there is a large causal effect here. The conclusion to draw from my attempts to do these goodness of fit tests, is that Y'ALL NEED TO STOP ICING 65 DEFENSE PLAYERS SO I CAN ACTUALLY DO THE GODDAMN TEST OK. Observe the generated histogram in excel of CF%: Which has a distribution which looks approximately normal on the right, and then these gross outliers who are: XAVIER CROSS: DF 62 JACK PARKER: DF 63 SAMUEL MCVAY: DF 64 NOLAN SNIPEZ: DF 65 VLADIMIR VASKOV: DF 70 ANDREW HAWKINS: DF 70 And the player who had the single most unlucky season in the SHL last year: CHASE BYRON: DF 80 So only one outlier with a defense above 70 (likely anomalous due to Simon), and yet of 9 players with DF 70 or lower, 6 of them are outliers. of the 5 with DF lower than 70, 4 are outliers. STOP ICING THESE GUYS THEY'RE GARBAGE AND THEY RUIN MY STATS. That is all. Pls give @aaronwilson 30% of payout for providing data set RE: Response to Deeper look into steals/turnovers - awils13 - 05-04-2019 I really should've paid more attention in my statistics class, interesting stuff Quote:Next up the combo deal of Turnovers/60 vs SC-PA and Steals/60 vs SC-PA. You probably meant more Quote:If anyone can work out some reason for this, I'd be interested to hear (specifically steals, not turnovers, which is fairly obvious) Yep, really curious why it works this way Quote:High values of CF% are far from a guarantee of high numbers of even strength points. This is just my guess after doing some test sims but I feel like if someone has a high corsi but not a lot of points, they're probably getting carried by their linemates RE: Response to Deeper look into steals/turnovers - Fordyford - 05-04-2019 05-04-2019, 05:24 PMaaronwilson Wrote: I really should've paid more attention in my statistics class, interesting stuff Fixed. And I agree on the CF% thing. RE: Response to Deeper look into steals/turnovers - DeletedAtUserRequest - 05-05-2019 This post is very Luketd esk... and thats a good thing. RE: Response to Deeper look into steals/turnovers - Daco - 05-05-2019 05-05-2019, 08:32 AMMike Izzy Wrote: This post is very Luketd esk... and thats a good thing. He's actually his test tube son Luke been trying to produce the perfect data God and may have finally attained it |