A decade after the Fancy Stat Era officially began, at least to judge by most databases, hockey analytics are officially mainstream. While some players, managers, and the Edmonton media may try to fight it, there is a growing acceptance that shot attempts and other data are relevant information.
In fact, we’ve begun to build past the initial stage of Corsi, and we’re seeing more sophisticated analysis around concepts like expected goals, which try to parse out both quality and quantity of play. It’s an exciting time. And sites like Sportsnet’s have sought to cater to this new market.
Unfortunately, this has led to a problem.
Sportlogiq And Its Stats
The analytics company Sportlogiq, according to its very sleek website, “offers the most comprehensive hockey data in the world.” By their own account, the good people of Sportlogiq track over 2,500 events per NHL game. For a community of sports nerds who are mostly used to settling for shot attempts and location, filtered through sometimes-crashing databases, this sounds like manna from Heaven.
Of course, said nerds aren’t actually going to get that data, for the most part, because Sportlogiq is a business, and they quite reasonably want to make money. According to a June 2016 piece, “nearly half” the teams in the NHL were receiving information from Sportlogiq at that time. Sportlogiq is secretive about its client list, and as per the linked article, many of its employees don’t even know for whom their company is working. This is, if a bit cloak-and-dagger, understandable in a company trying to exploit an information disparity on multiple sides of a competitive environment.
If Sportlogiq went about their business selling whatever it is they’re selling to whoever wishes to buy it, that would be that. But that’s not quite the end of it.
Elegant Graphs For Mysterious Data
Andrew Berkshire, formerly of SBN’s Montreal Canadiens site, Habs Eyes On The Prize, is (as per his Linkedin) Managing Editor and Social Media Coordinator at Sportlogiq. While for a time in 2015 he wrote pieces on the company’s website, he now seems to do most of his writing over at Sportsnet.ca, although he occasionally also writes similar pieces for The Sporting News. In his pieces, Berkshire brings a taste of Sportlogiq’s specific event-tracking to the larger public, with some pretty sleek-looking graphs.
For example, he recently took a look at Leaf defender Morgan Rielly, and the year he’s having. The article prominently featured the following graph:
Rielly seems to have improved as an offensive defender, as per this graph. That sounds plausible to us; Rielly has looked better this season, and some of his data seems to bear that out. That said, we have a question.
How is a scoring chance defined?
We all think we know what a scoring chance is, but efforts to make a unified definition have failed repeatedly because no one can come up with a firm standard. What qualifies as an outlet pass? I have an idea, but what are the limits, and is it distinct from stretch passes or are they contained within the category?
More importantly, do any of these things correlate with winning hockey games? How strongly? How much should we care? It’s already been shown that SCF% isn’t as good at prediction as good ol’ CF%, because it has many of the same conceptual issues (largely binning) and a necessarily smaller sample size to boot. New isn’t always better.
This is the key point that makes it very hard to dig into any of this work. We have no idea whatsoever what any of these terms mean. We can absolutely believe that the people at Sportlogiq have concrete definitions for these metrics, that they’re tracked accurately, and that they’ve done work internally to establish that they are repeatable skills and are associated with another skill of importance (for example, zone entries explain a notable portion of a player’s offensive shot results). But they haven’t shown the audience that, and the result is that a reader has no idea what exactly these metrics mean, and how much they matter.
Related to that: context is not established. Rielly has increased his outlet passes this year. Where does he stand in the league? Has he gone from bad to average? Average to good? Good to great? Who knows?
Another issue is that there appears to be no accounting for variance—in other words, the normal ups and downs that happen over the course of small samples. When do these metrics stabilize enough for us to be comfortable saying that a departure from a previously established average represents a true shift in the underlying play, rather than variance? By no means is this issue limited to Sportlogiq, but without knowing more about the distribution of these statistics, it’s hard to know whether these are describing how the things Rielly has done have changed, or if the player Rielly is has changed.
In this piece, Berkshire points out that Rielly’s carry-outs have decreased and his offensive zone passes have increased. But when we look at the graph, we see that these two figures have barely changed year over year! It’s not clear at all that the change in these figures is due to anything besides statistical noise, and that they represent a meaningful difference in the way Rielly has played.
Maybe you think this is nitpicking for what seem to be believable-enough results—after all, Rielly’s CF% has also improved this year, and he’s produced points. But that leads to another issue: is there a specific reason we’re seeing these data points and not others? Are there some that don’t fit the narrative that aren’t in the graph? We have no idea. Sportlogiq certainly isn’t going to tell us. One of us (Fulemin) asked in the past—for example, in response to this tweet:
Possession-Driving Plays per game:@MapleLeafs are 28th in the NHL (183.0).@NYIslanders are 3rd in the NHL (209.3).
— SPORTLOGiQ (@SPORTLOGiQ) February 6, 2017
NHL median is 197.2
At the time of that tweet, the Leafs were 9th in the league in adjusted CF%. We asked what constituted a “possession-driving play” in the interests of trying to understand why there was such a disparity between Sportlogiq’s stat and the stat most popularly correlated with possession. Sportlogiq’s Twitter person didn’t reply, presumably for the reasons already discussed.
What Is This Data For?
This should not be taken as a slight on Sportlogiq’s data. Not in the slightest; the whole point is we have no way of knowing whether Sportlogiq’s analysis is good or bad. It’s quite possible that Sportlogiq really is the best hockey analytics company in the world right now and that the work they’re doing is in another stratosphere from the work publicly available. There are private analytics companies who, in our opinion, are selling data that seems flawed on its face. That isn’t our issue here.
But for a hockey fan trying to be as educated as possible about analytics, and to stay up to date, the question is what to do with all those cute graphs. And at present, there’s no reason to rely on them other than that an apparently smart and well-funded group of people helped put them together. As to why this is the information that is made public out of the ostensible reams of info the company tracks, the answer is that the published work is corporate advertising. That doesn’t mean it’s wrong, again. But that is what it’s for.
The temptation is to run with the impressive and detailed data that is made public, because it’s presented as the best in the world. Without any evidence of what it means, or what it’s worth, though, we have to be cautious about relying on it. Ultimately, from the standpoint of people like us, about all you can do with those nice Sportsnet stats is shrug, say “that’s interesting” and read something else.