Prevent+: My Expected Run Value Metric

8 min readJun 4, 2023

I’ve written too many formal reports at the end of the semester, so this will be a more conversational piece. Is this a good way to put it?

One of the things that made a big impression on me when I first delved into baseball analytics during my Orioles co-op was their pitch grader. It was really cool to see all the different metrics and characteristics of a pitch combined into a single number to assess its quality. And, most importantly, it was just fun to click on random pitcher pages and understand their strengths, weaknesses, and value (I should probably consider letting it go, but I’m still a Michael Lorenzen believer).

Of course, once I became technically sound enough, I had to create my own pitch grader. I’ll walkthrough data collection and cleaning, feature selection and creation, modeling, and some analysis and appliation for my model.

Finding a response variable

When evaluating the quality of a pitch, a wide variety of results can be considered, such as swings and misses, called strikes, and contact allowed. To capture a diverse range of outcomes, I decided to model based on run values. Run values represent the average number of runs added or lost based on a specific event within a run-scoring environment. For instance, during the 2022 MLB season, a home run had a run value of 1.39, indicating that, on average, each home run hit during that season drove in approximately that many runs. On the other hand, a strikeout is valued at -0.21 runs.

To minimize the influence of other variables, such as defense, I opted to categorize balls in play based on batted ball types (e.g., ground ball, fly ball) rather than the specific play result (single, double, ground out). I employed the same approach for balls that were not put in play, and I finalized the selection of these events and their corresponding run values.

Feature Selection and Creation

My dataset was collected from Baseball Savant, which provides a plethora of metrics for each pitch, including velocity, movement, and acceleration. While having around 20 features may not seem excessive for a data science project, I was skeptical about whether all of them truly mattered in determining the outcome of a pitch. To address this, I ran the Boruta feature selection algorithm on my feature set, and interestingly, it deemed all of them as significant. However, I still wanted to reduce noise and complexity in my models, so I proceeded with a process of trial and error and experimentation to narrow down my final feature set.

During this process, I considered the guiding question, ‘Does this factor matter to a hitter perceiving a pitch?’ This question helped me evaluate the relevance of each feature. After some iterations, I arrived at the final model, which includes velocity, horizontal and vertical movement, spin axis, and plate location. Additionally, for breaking and off-speed pitches, I calculated the velocity and movement difference between the pitch and their fastball (as well as variants like cutters and sinkers).

I chose not to include spin rate since two pitches can spin differently but still exhibit the same movement. However, I did include spin axis because it correlates with seam-shifted wake, and the differences between expected and actual movement contribute to the quality of a pitch. I’ll discuss this a bit more at the end.

Regarding pitch groups, I worked with fastballs (FF), cutters (FC), sinkers (SI), sliders (SL), curveballs (CU), changeups (CH), and splitters (FS). At the time of training, slurves (SV) and sweepers (FS) were not yet specified in Savant. I have since retagged sweepers as sliders and classified slurves as either a curveball or slider with a classifier (achieving approximately 90% accuracy after cross-validation).

Modeling

As hinted in the previous section (regarding the Boruta algorithm), I employed random forest regression to develop my expected run value (xRV) models. I created models for various combinations, including three pitch types (fastball variants, breaking balls, and off-speed pitches) and pitcher and hitter handedness matchups (RvR, LvR, etc), resulting in a total of 12 models. Each model was trained in a pipeline that includes a standard scaler object, to reduce input values, error, and complexity. The pipelies were then cross-validated to determine the best hyperparameters, namely max depth and n estimators.

Results and Next Steps

After aggregating all of my training and testing data and re-running it through my models, I obtained an R-squared value of 0.14 for all pitches, indicating that the pitch itself contributes to approximately 14% of the outcome, while isolating other variables such as the hitter, defense, and ballpark. Additionally, the mean squared error (MSE) was 0.0065 (with the square root of that being 0.08). These metrics were satisfactory to me, suggesting that my models could predict run values with an error of around 0.08 runs, which seemed acceptable. I referred to this as my ‘Prevent+’ metric (xRV).

Next, I examined feature importance to determine which features were truly impactful, as I had done in previous iterations of my models during feature selection. While different pitch groups placed emphasis on various pitch metrics, one consistent finding was that pitch location mattered the most.

Feature importance breakdown for one of my models

Based on this insight, I decided to explore isolating command and pitch shape from each other. I trained models using only plate coordinates and pitch movement metrics. For the plate models, I got a R-squared value of 0.102 and a squared MSE of 0.08. These results were promising, and I labeled this metric as ‘Plate+’.

However, modeling pitch shape proved to be challenging. Many models yielded negative R-squared values, particularly for breaking balls, and it was difficult to establish a direct correlation between the features and run values.

Creating Stuff+

After contemplating it for a couple of days, I made the decision to define the ‘Stuff’ of a pitch as the additional value gained (or lost) due to its shape in comparison to other pitches thrown at similar locations. I developed my ‘Stuff+’ metric as the difference between xRV (expected run value) and the expected location run value (Prevent — Plate). This approach allowed me to represent the quality of pitch shape, overcoming the limitation of using regressors solely on non-location features. Similar to Prevent+ and Plate+, the values of Stuff+ became more stable as the sample size increased.

Well, except some of these…

Exploring xRV

There’s so much to explore with these numbers, so I’ll provide the links to my pitch grades for 2022 and 2023 here.

Now that we have functional models and satisfactory expected run values, it’s time to delve into the analysis! I rescaled the run values to a + scale, with 100 representing the league average. Additionally, I trained a linear regression to convert run value to ERA, making the predicted values more easily understandable. I have also been running the 2023 season through my models and plan to retrain them with the data from this season sometime this summer.

It’s been fun sharing these numbers online and making my own analysis on players, especially when evaluating if I pitcher is over or underperforming. Anyways, here’s some pitch grades!

Here’s what a report for a pitcher would look like; this is Kyle Gibson’s 2022 season:

This can be interpreted that Kyle Gibson has an arsenal of a couple average pitches, and he is using his better ones more often than the rest which is good. He’s expected to around half a run for each 100 fastballs he throws, but would prevent runs for each 100 sinkers, changeups, and sliders.

I can also create leaderboards for the top ten fastballs, by Prevent+, Stuff+, and Plate+. These seem to past the eye test

We can also see how location plays a factor in effectiveness of a pitch, here’s the top three curveballs from last year.

Continuing to explore the importance of pitch location, I looked at how Plate+ and Stuff+ correlated to Prevent+

Future Work

Throughout my experimentation and research process, I have accumulated a list of aspects that I intend to investigate further or incorporate into future versions of the model. Here are a few of them, some of which are already in progress:

Dealing with outliers: Regression modeling can result in extreme run values for outlier pitches due to the continuous scale of the response variable. To address this issue, I have started working on probabilistic models that predict the probability of an event occurring with a pitch and weigh those probabilities with their corresponding run values. This approach may also provide a means to directly model Stuff+.
Sneaky movement: Seam-shifted wake, which refers to the difference between the spin and movement vectors, has been extensively researched and optimized in modern baseball. I aim to incorporate this feature into my next model, as it would provide further insights into the impact of seams on movement, particularly for changeups and sinkers. I just need to research and understand this phenomena more before I do so.
Batted ball events: Currently, I categorize balls in play based solely on their batted ball category. However, I am considering the possibility of further distinguishing them, perhaps by including factors such as exit velocity and launch angle. This additional breakdown could enhance the accuracy of modeling the expected run value (xRV) for pitches that are likely to be put into play.
Make it look cool: Less of a data science thing but I’d love to design a front end to host my pitch grades. Don’t know how I would go upon this since web development isn’t my forte but I’m sure something could be done if I find the time for it.

These are just a few of the areas I plan to explore and improve upon in future iterations of my model.

I’ll be revamping my pitch grader this summer, so hopefully I get to share that in the coming months! In the meantime, thanks for reading and feel free to DM me on Twitter @jstinchen with any questions! I also discuss my pitch grades often there!