Login

***Canadice*** · 11-24-2020, 03:47 PM

The following page contains a visualization and explanation of the similarities between the SHL player ratings taken from the SHL Index. Website with visualization. The below text is extracted from the Explanation tab of the website.

ELI5 or tldr
I see how you look like and place you in the sandbox close to your lookalike. The closer you are, the more you look alike.

Introduction
As every player is defined by more than 20 different attributes, it is difficult to try and compare and visualize players with each other in a simple manner. Fortunately a method called multidimensional scaling allows us to reduce the number of dimensions (or attributes) in order to facilitate this goal.

Reducing the number of dimension too much risks reducing the data to a form that summarizes or hides part of the information the data holds. On the other hand not reducing the number of dimensions enough still produces problems with visualization and interpretation.

If we focus on visualization, a 2- or 3 dimensional plot provides a result that is easily created, however the interpretation of this plot might still prove difficult.

The setup to multidimensional scaling
In order to reduce the number of dimension we must start with defining how similar each player is to one another, as this relationship is what we want to keep. As the player ratings are numerical values, this becomes relatively easy by calculating a pairwise distance value between the players.

Distance metrics
We can look at two different types of distance metrics that can be used to calculate the distance values.

First we have the Euclidean distance that calculates the closest distance between two points, as seen in figure below. Hope you remember your high school geography because the distance is calculated as the hypotenuse of the side lengths of the right angle triangle that is formed from the points. The distance in figure is thereby calculated as √(1^2+1^2) = √2 ≈ 1.41.

[Image: p4MMnOs.png]

This measure assumes that it is possible to place values between the integers (whole numbers), which is not the case if the variable is discrete. We must then restrict that the distance between two points follows the whole numbers. A practical example of this issue is if you would want to calculate the distance you would need to walk between two streets and avenues in Manhattan, for instance the corner of the 50th Street and 3rd Avenue, and 51th Street and 2nd Avenue. The Euclidean distance assumes that you are able to walk between the corners directly, but if you've ever been in Manhattan, you would notice that there are some rather large buildings in the way. You must walk along the Street to another corner before walking along the Avenue to your destination.

[Image: J9oCNeB.png]

This practical example shows how the Manhattan distance is calculated, by calculating the number of corners between your current position and your destination. The distance calculation would then be the sum of all the lengths between the two points, in the case of the figure below: 1+1=2.

[Image: bWraUO3.png]

We can also calculate the distance as the sum of the individual differences for each dimension, in this case the difference in x from 0 to 1, and difference in y from 0 to 1: (1−0)+(1−0)=2

Calculating the distances
In the case of player ratings, they are all integers between 5 and 20, with some limitations for specific ratings. We now want to calculate the similarity of them, where players that have similar ratings also will end up closer together in the final plot. A similarity can also be considered as an inverse distance, where high similarity is equal to small distance and vice versa. As shown above, the Manhattan distance is to be used when we have discrete variables, so this calculation can be done with the following example data:

[Image: 7o52FlO.png]

The last row contains the the absolute differences between the two players, i.e. how many “Manhattan intersections” between them. If we tally that row of differences we would get the total distance (or inverse similarity) between the two players: 39.

This calculation is then done with every player to produce a distance matrix that contain all pairwise calculated distances of all players in the data. The diagonal of the matrix will have a distance of 0 as the calculation is based on the same player.

[Image: 7mg79j0.png]

Reducing the dimensions
After the distances between the players have been calculated, we now know the relationship we want to visualize and interpret. Multidimensional scaling tries to find a set of points in k dimensions that equally represent the relationships seen in the n (number of attributes) dimensions. We don't need to get into the details of how this is performed but the algorithm usually performs some form of optimization to reduce the error between the observed distance matrix and the distances from the projected new dimensions.

A classic example of this is how you with direct distances between different US cities can produce a map that somewhat corresponds to the real world, without providing any coordinates or similar to the method.

[Image: KedrBFF.png]

The dimensions of the map does not exactly correspond to the longitude and latitude but they are somewhat representative of those measures. The map produced isn't perfect, for instance it is upside down and the cities located on the corners of the map might not directly correspond to their geographic location. However the map was produced with only the calculated direct distances between the different and nothing else, which shows the value of the method.

In the case of the map the two dimensions can be interpreted relatively easy, as a representation of the geographical coordinates. However in the case of the players, the reduced dimensions are not that easily interpreted as something that relates to the data. This produces some difficulty in determining what distinguishes players in one area of the plot from other areas, but the visual representation still shows how similar players in the league are to one another.

[Image: LXBtAhg.png]

Code:
Word Count: 1045

SlashACM · 11-24-2020, 04:56 PM

Cool article, but nikolaj muller? Hmm

***hotdog*** · 11-24-2020, 05:10 PM

this is sick

dankoa · 11-24-2020, 05:13 PM

yo this is cool as shit, unreal work

***Canadice*** · 11-25-2020, 04:41 PM

Thank you for the kind words. The website is currently being updated with more interactive features in the visualizations, and I hope to show some more individual data through the plots by the end of the weekend.

PremierBromanov · 11-25-2020, 04:45 PM

dank

Rotti · 11-25-2020, 04:54 PM

I like your funny words magic man

nyumbayangu · 11-25-2020, 04:59 PM

This is some wonderful stuff that we teach infants on Sardang. Well done!

Navigation

Extra Menu

About us