Advanced Stats Demystified

I can’t stand unsupported arguments. I’m a pretty logical guy, but when someone makes an argument based on their own opinion, fairy dust and unicorns, I flip my lid. You’ll notice a trend here at the Seahawks Asylum, that I’m crazy (ha! ha! these puns get me every time) about supporting arguments. With real data.

AdvancedNFLStats is a kick ass website. For about ten years, they’ve been translating advanced stats used in baseball (commonly referred to as Sabermetrics) to football. These metrics are complex, deep and most importantly: contextual. The latter is what current NFL metrics lack. For a great example, read my post about legendary Batman villian Two Face, played by Seneca Wallace.

Below are some of the metrics that I’ll commonly refer to within my posts. But don’t worry about having to come to this page every time I quote; I’ll always summarize what the stats mean in the context of the post.

I know this looks like a lot of text, but I guarantee you that it’s easy to follow and simple.

Expected Points (EP) and Expected Points Added (EPA)

These two put a numerical value on the likelihood of affecting the score of the game. And these values are tied to real football scoring. If you see 7.0 for example, that’s 7 real scored points.

Expected Points (EP). If the Seahawks are pinned on their own 1 yard line, how many points do you expect them to score on the very next play? Not many. Hell, zero. EP is that value, based on historical data. AdvancedNFLStats uses this historical data and looks at ALL plays that have started on the 1 yard line and what they led to on the very next play. We don’t see it often, but we’ve all seen a team score from their 1 yard line on the next play. EP is the numerical value, in points likely to be scored, on the very next play. Since teams HAVE scored from their 1, but VERY rarely, the EP value isn’t zero points. We’ll call the EP for that drive 0.10 at that moment in time.

Now, put the Seahawks on the opponent’s 1 yard line and it’s the opposite. The Seahawks are damned likely to score a TD, aren’t they? The EP for that situation would be something like 6.9 (because again, it’s a weighted average of historical plays on the opponent’s 1 yard line; not every 1 yard goal line offense has resulted in a TD, but most have). Make sense?

Therefore, Expected Points Added (EPA) is the contribution of one play to the EP. If Marshawn Lynch rushes for 30 yards from midfield, he just put the Seahawks in position to score, right? So if midfield has an EP of 2.0 (outside realistic range of a field goal, thus being less than 3.0, the value of a field goal) and the opponent’s 20 yard line has an EP of 3.5, (because they’re in range to score at least a field goal, but also realistic range for a TD) the EPA was +1.5, right? Damned fine running there, Marshawn.

But if instead he’d ran for a 5 yard loss, putting the Seahawks on their own 45 yard line and thus hurting the team’s chance the score, the EPA would be negative, something like -0.2, bringing the Expected Points from 2.0 to 1.8.

Visit AdvancedNFLStats to read a full explanation of Expected Points Added (scroll down the page).

Win Probability (WP) and Win Probability Added (WPA)

Win Probability (WP) is very similar to Expect Points, except instead of assigning an amount of points to a game situation, it determines how likely a team is to win (or lose) in that very same situation. To determine this, AdvancedNFLStats has put together a Win Probability model that takes into account contextual details other than down and distance, but also: score of the game, time left on the clock, etc., based on historical data.

Win Probability uses percentages. If a team’s WP is 99% (or .99) means a team has 99% chance of winning the game (a WP this high likely means they’re leading, they have the ball, it’s 1st and 10 and there are 4 seconds on the clock; virtually every time a team was in this position in the past, they’ve won 99% of the time).

Or, a game tied at halftime would be something like .48/.52 (it’s not fifty fifty because team receiving the ball after halftime will give have a slight edge).

So, similar to EPA, Win Probability Added (WPA) determines a play’s likely affect on the outcome of the game. Let’s layer some context to Marshawn’s 30 yard rush, from above: the game is tied, it’s the 4th quarter, Seahawks have possession on the 50 yard line, it’s 1st and 10 and only 1 minute remains. Seahawks have 3 downs and don’t need a lot of yards to get within field goal range, so we’re gonna say they’re pretty highly likely to win, right? Let’s call their Win Probably .75 (or 75% for those of you skimming, not reading).

Marshawn’s run puts the Seahawks at the 20 yard line, thus in easy range for a field goal. That run elevated the Seahawks’ WP to something like .85 (because how many kickers miss 37 yard field goals?), thus meaning his rush was worth .10 WPA (that’s a TON for a single play). Seahawks, while on the 20 yard line, then knee the ball twice (each knee is likely worth .02 WPA, not necessarily because it means they’re more likely to score, but because the opponent is less likely to score with less and less time remaining) and then the kicker nails the field goal with only seconds remaining to finish the game and put WP up to 1.00 (technically when the clock expired).

Visit AdvancedNFLStats to read a full explanation of Win Probability Added (scroll down the page).

+EPA/+WPA

+WPA and +EPA are a bit sticky. These numbers attempt to measure a DEFENSIVE players’ contributions to Expected Points and Win Percentage. Now, understand that it is much more difficult to assign values to situations that can’t be assigned a numerical value. If a pass is completed, on which defensive player do you lay the blame? The corner back playing zone that didn’t make the play? The safety that didn’t come to support? Or was it the lack of pass rush?

Instead, +EPA and +WPA attempt to put numerical values on things that CAN be measured that prevent the opposing team from scoring: tackles, tackles for a loss, interceptions, sacks, passes defensed, etc. Now just like a player’s actual salary, we know how much he’s earning (WPA+), but we don’t know how much he’s spending (WPA-). Now here’s where it gets tricky, so follow closely.

Now since there are a set number of snaps within a game, we can extrapolate negative contributions, right? If a player made six positive plays out of ten snaps in a game, we’ll assume the other four plays were neutral or negative. His overall contribution was positive, right? Now you might be shaking your head thinking this stat is incomplete, and is assuming too much. But again, think of the bell curve. With enough data, it will complete itself.

If you look at the 2009 season +WPA/+EPA stats for defensive players, you’ll see guys like Derrell Revis ranked #1 among cornerbacks, Jon Vilma ranked #1 among linebackers, Jared Allen #3 among defensive ends, etc. Enough data points (both number of players and number of snaps) will build an eventual bell curve that is accurate and robust.

So, if a cornerback picks a pass and returns it for a touchdown with a successful PAT, what is his EPA+? This is an easy one, because it led to actual points: 7.0! If Chris Clemons sacks Alex Smith for a loss of 10 yards, what’s his EPA+? If you answered “it depends” you win! The value will change based on down and distance from the goal. If it was fourth down, and the 49ers turned the ball over with one minute to go in the game, with the Hawks up, the Win Percentage (WP) probably went from something like .85 to .95, awarding Clemons with a WPA+ of 0.10. Huge!

Visit AdvancedNFLStats to read a full explanation of +WPA/+EPA (scroll down the page).


11 Responses to Advanced Stats Demystified

  1. Pingback: New Pages for Your Perusal | The Seahawks Asylum

  2. Jesse Gunderson says:

    I just hate NFL stats and a few of these advanced stats because at the end of the day, much of it is actually WORSE than worthless. Almost anything to do with team based historical stats is actually worse than worthless, the conclusions drawn or parallels made are straight wrong. Even player historical data can’t accurately be used to imply future results, the best you can do with it is a basis to gauge how good a player was previously.

    An example:
    If player Y historically averages 5 yards per carry over 5 games on team X playing with a consistent set of players, you could somewhat reasonably expect this stat to be based on that players skill and less on the variable of players on his own team as well as the opposing teams defense. With this limited set of variables it’s not crazy to make a statement that you expect this rate to continue for production with all other variables being the same (which they aren’t).

    If Team X historically averages scoring 80% of the time from the OPP 30 yard line over the last 5 years, this stat is almost completely worthless. In that 5 year historical span a huge amount of variables has been entered to the point where basically the only constant between year 1 and year 5 is the name of the team.

    So many stats are stated when watching an NFL game about the history of a team and a teams success in doing X in a similar situation over the lifetime of a teams performance data. It’s useless. It’s pointless. People eat it up and get all hyped up about ‘Our team has beaten team x 85% of the time over 30 years’ as if what happened in 1978 – 1989 has any bearing on the game currently being played. Past performance cannot predict future results, and all of these historical stats are basically utter bullshit for doing anything other than talking about who was better and when.

    • Nick says:

      Wow. Next time, don’t hold back. I’d like to know what you REALLY think 😉

      First, I mostly use statistics to support statements and analysis. Not necessarily to predict future outcomes. If I say “the defensive line has been playing well” – I will support with statistics that prove it.

      Second, I use Advanced Statistics because they do a better job quantifying performance based upon the result of the play AND important contextual information, as explained above. Context is important, and most NFL metrics lack context.

      Third, if I’m attempting to predict an outcome, I will obviously use statistics. What are we supposed to do? Throw our hands up because they ‘only represent the past’? If the Seahawks, over the span of 10 games this season, have absolutely stymied opponents’ running backs, I’m going to apply that data to my prediction of the coming weeks’ opponents’ likelihood to succeed running the ball. Pretty simple.

      Fourth, football is a fairly stable, predictable sport. It does not evolve over the course of weeks or even seasons, but history has shown it takes decades. Thus using historical data is quite smart and effective. If the data shows that 1% of offenses have scored on a drive in which a team has started on their own 1 yard line, you can pretty safely say that the Seahawks will have a 1% chance of scoring if put in that position this Sunday. You obviously disagree, but hey, that is your right =)

  3. Pingback: What Rhymes with “Live and Shoe”? | The Seahawks Asylum

  4. Pingback: What Rhymes with “Live and Shoe”? | The Seahawks Asylum

  5. Ian says:

    I agree on Jesse’s point about the “we’ve beaten them X times in the last 30 years”. Those kinds of stats are pointless because they do only focus on the team name.

    On the other hand, if I know that home teams with a 7 point lead and possession late in the game win on average Y% of the time then that is a useful stat because although it doesn’t apply directly to my team, I can use it as a benchmark and adjust in my head for any other factors I think apply to the situation.

    As for past data not predicting future, I’ll take a simple case. From 2002-2007, home teams won 884, lost 651 and tied 1 of all regular season games. This means home teams win approx 57.6% of the time (the 95% CI would say they win between 55-60% of the time).

    There have been 602 games played from 2008-so far in 2010. Of those, 343 were won by the home team and 1 was tied, for a WP of 57.1%. If I used my 57.6% as a guess, I’d have gone for 347 home wins. So using only past results, and no insider knowledge about the teams other than where they were playing, I was able to predict to within 3.5 games the total number of home wins. Could you have got that close without the stats? Obviously venue is only one variable, but as you build up your model to take into account more and more game situations, you start to get a pretty accurate picture of how likely various outcomes are.

    • Nick says:

      Ian, thanks for commenting. Very well put and that’s a very cool (and very telling) statistic. Thanks for coming by – I hope to see you commenting more!

  6. Pingback: On Seahawks Roster Holes, Gaping or Otherwise | The Seahawks Asylum

  7. Pingback: Oakland’s Biggest Strength is the Seattle Seahawks’ Biggest Weakness | The Seahawks Asylum

  8. Pingback: Key to Beating the Oakland Raiders: Score More Points | The Seahawks Asylum

  9. Pingback: Mo-Data. Data made EasyModeling the Probability of Winning an NFL Game #sportsanalytics #bigdata - Mo-Data. Data made Easy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s