If you clicked through to read this post, you’ve probably visited the ABS Challenge Leaderboard on Baseball Savant at some point this season. While you were there, you may have sorted by Won% to see which players have been the most successful with their challenges. And if you, like me, are a bit of hater, you also reverse sorted to see which players are now considered a fire hazard because of how rapidly they burn through challenges. In that case, you know that James Wood has won just 20% of his 15 challenges, that Josh Naylor owns a 25% success rate on 12 challenges, and that Jazz Chisholm Jr. has a 27% hit rate on his 15 challenges. Players this bad at picking their spots probably shouldn’t be allowed to challenge at all, right? Well the truth is, those samples probably aren’t large enough to definitively signal an inability to consistently win ABS challenges. Or maybe they are large enough, but it’s tough to say for sure because the ABS challenge system hasn’t been in place long enough to generate the volume of data needed to determine an appropriate sample size.
But even if there were absolute certainty about which players lack the eye for challenging ball/strike calls, sitting a player down and telling him he’s not allowed to challenge anymore because he sucks at it isn’t exactly the best strategy. It runs the risk of damaging the relationship between the player and the team and it shuts down the opportunity to improve with additional reps. And let’s say that player is in the box for a pitch that absolutely should be challenged — given the short window to challenge following a call, a batter paralyzed by self-doubt or concern over potential reprimand is set up to fail. It’s also much easier to communicate and get buy-in on a single, team-wide philosophy than it is to devise a bunch of player-specific exceptions to the rule.
The good news is that there’s a straightforward method for eliminating many of the most infuriating failed challenges, a method independent of any given player’s ability to judge whether a pitch was in the zone. Because there’s more to challenging than assessing whether a ball/strike call is correct and then assigning a level of certainty to that assessment. If you’ve ever watched a batter on your favorite team spend a challenge on an 0-0 count in the first inning, you know that it’s vexing on multiple levels. Even a successful challenge in that scenario doesn’t offer a significant swing in advantage, since it’s just flipping an 0-1 count to a 1-0 count (a swing in run expectancy of about a tenth of a run, depending on the base-out state). And to make things even more maddening, it also tightens the calculus around future challenges, since an additional failed challenge risks leaving the team unable to act on a potential missed call in a late-and-close situation.
Every failed challenge stems from a fundamental skill issue in reading the location of the pitch and judging the likelihood of an overturned call, but in some cases, the pain of failure is compounded by a lack of situational awareness. Before the ball leaves the pitcher’s hand, players should consider whether even a successful challenge is likely to have a meaningful impact given the current context of the game. Implementing an overarching strategy around which pitches are worth challenging from a situational perspective would limit the likelihood of a failed challenge in the early innings having a detrimental effect later on. Instead, those unsuccessful challenges would be concentrated in the high risk/high reward scenarios where failure is more acceptable because of the increased benefit associated with success.
But before we can determine which pitches merit the use of a challenge, we need to know the situations where a flipped ball/strike call is the most likely to influence the outcome of the game. Fortunately, we have a few ways to measure the magnitude of the impact of a successful challenge, such as Leverage Index, Win Probability Added, and run expectancy. When Ben Clemens wrote up his initial takeaways on the challenge system at the start of April, he advocated for using RE288 (the version of run expectancy that includes the pre-pitch count in its calculation) when evaluating the optimal usage of a team’s allotted challenges. I mostly agree with the logic he presented to justify that decision, but I do think there’s one component of Leverage Index that shouldn’t be completely cast aside.
RE288 does exactly what its name suggests, which is use historical averages to estimate the number of runs expected to score following each of the 288 distinct combinations of count, outs, and runners on base. Leverage Index works similarly, but its unit of output isn’t runs, but rather a rating of each situation’s importance relative to the outcome of the game. Leverage Index can be adapted to include the current count, but the standard version is calculated based on outs, runners on base, inning, and score differential, and it’s score differential and inning that distinguish Leverage Index from RE288.
You Aren’t a FanGraphs Member
It looks like you aren’t yet a FanGraphs Member (or aren’t logged in). We aren’t mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we’d like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won’t bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn’t sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don’t be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you’ll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we’ve also removed all the other ads in this article. We didn’t want to overdo it.
Score differential we can ignore. Ben made a solid argument for discarding the score as a variable in the context of ABS challenges, noting that challenges are a finite resource within a single game. They don’t rollover from one game to the next, so there’s no benefit to sitting on them even if the game is a blowout.
Inning, on the other hand, is still worth keeping an eye on. Since teams are allotted only two incorrect challenges, the amount of game still left to play is a relevant component of the risk calculus around challenging a call. The primary consequence of a failed challenge is the potential inability to challenge at an important moment later in the game, but the likelihood of being caught without a challenge diminishes as the game progresses.
I’ll get to how exactly I want to incorporate inning into challenge strategy in a bit, but first let’s return to the notion of setting up situational guardrails to ensure challenges are consistently used on the pitches that matter most. We can isolate those pitches by using RE288 to compare the expected run scoring if the umpire’s call stands against the expected run scoring if the call is overturned.
As an example, consider the challenge initiated by Orioles catcher Samuel Basallo in the top of the ninth inning of a road game against the Mariners last Thursday night. Batting with two outs, a runner on second, and one team challenge remaining, Basallo tapped his helmet following a called strike in a 3-0 count. A called strike in that situation has an RE288 of 0.39, whereas a ball and the resulting walk would have increased the RE288 for the remainder of the inning to 0.44, an improvement of 0.05 runs (a relatively modest shift within the scope of possible increases in run expectancy). The call was confirmed, leaving the Orioles without challenges for the remainder of the game. But given that there was just one out remaining and Baltimore trailed by three runs, the Orioles were unlikely to see another borderline call. Then again, this game featured nine total ABS challenges and seven overturned calls, so another missed call by home plate umpire Tyler Jones wasn’t completely out of the question.
With a framework for estimating the potential run-scoring benefit associated with challenging any given pitch in hand, we can now bucket pitches based on the magnitude of the change in RE288. In his piece, Ben referred to the bump in expected runs scored as run leverage, so for the sake of consistency and because I find the term pretty apt, I’ll do the same here. I also used a similar methodology to his when defining the upper and lower bounds for the run leverage buckets. The low-leverage bucket encompasses all changes in RE288 less than 0.13, the medium-leverage bucket contains the values from 0.13 to 0.33 (inclusive), and the high-leverage bucket holds the pitches with a change in run expectancy over 0.33.
The table below shows the distribution of pitches thrown in low, medium, and high run leverage situations this season, the distribution of pitches challenged in each of those situations, and the challenge and success rates for each classification:
Challenges By Run Leverage
| Run Leverage | Opp. | Challenges | Challenge Rate | Success Rate | % of Opp. | % of Challenges | RV Per Challenge | RV Per Overturn |
|---|---|---|---|---|---|---|---|---|
| Low | 111,240 | 2,579 | 2.3% | 56.8% | 65.7% | 54.4% | 0.04 | 0.07 |
| Med | 42,674 | 1,449 | 3.4% | 50.7% | 25.2% | 30.6% | 0.11 | 0.21 |
| High | 15,012 | 710 | 4.6% | 45.9% | 9.1% | 15.0% | 0.23 | 0.49 |
Data Through June 20
Though high run leverage pitches make up less than 10% of all pitches thrown, they comprise 15% of all pitches challenged, which is to be expected given the higher payout associated with a successful challenge on those pitches. On the flip side, low run leverage pitches make up less than 55% of challenged pitches despite representing just over 65% of overall pitches. Again, this makes logical sense, given the diminished reward for successful challenges in this bucket.
When measuring across the total population of players, it seems the general strategy of challenging fewer low run leverage pitches and more high run leverage pitches is already in effect. However, zooming in reveals a wide swath of individual players who have yet to get with the program. Below is a rundown of the worst offenders, which is to say, the batters with at least five challenges on low run leverage pitches and a success rate on those challenges that sits at or below 40%:
Most Prolific and Least Successful Low Leverage Challengers
Batter challenges only, data through June 20
This pattern of behavior is most apparent among batters. Catchers who challenge low run leverage pitches at an above-average clip do so with enough success that it isn’t a glaring issue. And of course, pitchers aren’t challenging frequently enough in any situation to really play a role in this conversation.
Going back to the Savant leaderboard referenced above, nearly all of the players who are in the bottom 10 of Won% appear in the table above. (Only two of Matt Chapman’s 10 challenges have been in low run leverage situations, while Randy Arozarena was one low leverage challenge shy of qualifying, though he has a 0.0% success rate in those situations). Removing these players’ low run leverage challenges doesn’t necessarily improve their overall success rate, but it does reduce the damage done by burning challenges in minimally impactful situations.
More broadly speaking, if players stopped challenging in low run leverage situations, how much value would be lost? Could that value be made up by re-allocating lost challenges to medium and high run leverage situations? Does it make sense to cut out all low run leverage challenges, or is there a way to be more purposeful about it?
Starting with the last question, there is something to be said for making sure there are challenges available for borderline calls in high run leverage situations. But teams wouldn’t need to behave all that conservatively with their challenges to stay in the clear on that front. High run leverage pitches make up under 10% of all offerings, and only around 4% of all high run leverage pitches have been challenged so far this year. Based on some quick back-of-the-envelope math, that’s maybe two pitches per game. Moreover, high run leverage pitches are fairly evenly distributed throughout the game, as are missed calls by umpires, so there’s no need to play it safe in anticipation of a run on challenges in the eighth and ninth innings.
Adding another useful data point for pacing challenges, Baseball Savant classifies each taken pitch as either “reasonable” to challenge or not. Reasonable is defined as meeting at least one of the following criteria:
- The umpire’s call on the pitch was actually incorrect.
- The pitch was located within three inches of the edge of the zone and an overturned call would lead to a swing in run expectancy of at least 0.3 runs. That is, the potential benefit of getting the call overturned justifies challenging, even if the pitch only meets the broadest definition of “borderline.”
- The expected challenge rate of the pitch is at least 20%.
Through Saturday’s games, the league is averaging just over three reasonably challengeable pitches per team-game. Assuming those pitches, like missed calls, are distributed evenly throughout the game, teams wanting to make sure they have challenges available for such pitches need to either get one of their first two challenges right or pass on low run leverage challenge opportunities early in the game. And this is where the current inning becomes a useful variable to challenge strategy. Because fully opting out of all low run leverage challenges would be leaving runs on the table. There aren’t enough challenge-worthy medium and high run leverage pitches to spend challenges on. Too many challenges would go unused, and not using a valuable resource is just silly.
To determine the most reasonable approach for trimming low run leverage challenges, I divided the nine regulation innings of a baseball game into three-inning chunks and measured the run value gained on successful challenges in each block of innings, broken down into high, medium and low run leverage buckets. I did not attempt to quantify value lost on unsuccessful challenges; that’s a topic for another day. Here’s what I found:
Challenges by Inning and Run Leverage
| Innings | 1 – 3 | 4 – 6 | 7 – 9 |
|---|---|---|---|
| # of Low-Leverage Challenges | 708 | 841 | 992 |
| Total Low Leverage Run Value | 32.7 | 36.7 | 35.2 |
| Low Leverage Success Rate | 65.4% | 59.3% | 49.0% |
| # of Med-Leverage Challenges | 392 | 470 | 563 |
| Total Med Leverage Run Value | 43.9 | 52.2 | 57.6 |
| Med Leverage Success Rate | 53.3% | 52.5% | 47.8% |
| # of High-Leverage Challenges | 196 | 226 | 264 |
| Total High Leverage Run Value | 47.6 | 51.3 | 56.7 |
| High Leverage Success Rate | 49.8% | 45.1% | 44.7% |
Data through June 20
Unsurprisingly, low run leverage challenges in the first three innings yielded the lowest total run value, and though low run leverage challenges offered slightly less value on a per-challenge basis during innings four through nine, the difference was nominal. If teams were to sacrifice low run leverage challenges during the first three innings of play, would they be able to make up that value elsewhere? Mathematically, yes. League-wide, they would re-gain 245 failed challenges. Use those in high leverage situations instead, and that value could be recouped and then some. But that requires finding an additional 140ish high run leverage pitches worth challenging, which would be a 20% increase in total challenges for high run leverage situations. That’s a pretty big ask.
But as alluded to above, it’s probably unnecessary to swear off all low run leverage challenges in those first three innings. Maybe just the lowest of the low run leverage situations would do. How would that work in practice, though? It’s not as though players (or anyone else!) would be willing to memorize the RE288 table. Thankfully, it turns out several of the lowest-leverage situations have characteristics that are easy enough to distill down into digestible chunks one could actually commit to memory.
Here’s the simple summary of situations that basically never merit using a challenge during the first three innings of a game:
- Any 2-1 count. It doesn’t matter how many outs there are, and it doesn’t matter if there are runners on base. Batters in a 2-1 count should keep their hands as far away from their helmets as possible.
- With the bases empty, any count that’s made up entirely of zeros and ones (0-0, 0-1, 1-0, 1-1), regardless of the number of outs.
- Any 0-0 count with fewer than two baserunners. Again, regardless of the number of outs.
- With two outs, no RISP, and fewer than two strikes.
Stop challenging in these scenarios and teams are all but guaranteed (depending on specific personnel) to lose fewer challenges, increase their odds of still having a challenge for a critical moment late in the game, and see better overall returns in the instances where they do opt to challenge.
Convincing players to challenge less because their failed challenges are hurting the team is difficult and uncomfortable. But this approach is about convincing players to challenge less because eliminating challenges in these four very specific situations provides an extra layer of cunning gamesmanship. It’s a message some need to hear more than others, but it’s also one that will appeal to anyone with an unrelenting need to exploit whatever competitive edge they can find. And last time I checked, that’s most professional baseball players.