This is a multipart series:

The source code is available on Github

MLB Stats API: An Introduction

The MLB stats API is a publicly available (see license information below) REST API that you can query to get back almost any information about a baseball game, past or present. From my armchair research, it seems to have a history [0, 1, 2] of providing statistical information for the curious baseball fan. MLB doesn’t collect this data for fans though, and its primary use case is MLB’s own website and off-shoots, such as Baseball Savant.

If you’ve ever watched a baseball game on TV you’ve seen the data the API passes around in action. Two prime examples are the pitch/strike zone overlay, and home run stats (like distance and launch angle). This is accomplished with MLB’s Statcast system, which is a sophisticated vision tracking system implemented in every Major League ballpark. I highly recommend reading this blog post from the MLB [3] as it goes into more details, but essentially the system is incredibly accurate for tracking pitches, as this quote highlights:

The measured location of the center of the baseball at the front of home plate has been found in repeated testing to be accurate within 0.25 inches on average

Somewhat surprisingly, at least to me, all this data they collect (well, probably not all of it) is available through their API. The biggest issue with the API is there is no documentation for public consumption. It is behind a login screen, here, which does allow you to register. Alas, after many months of waiting for a response for my registration request, I don’t think they allow the general public in. (Which raises the question of why allow registration in the first place? I remain cautiously hopeful.)

Luckily though, there are a decent amount of open source repositories that have built API clients of varying completeness. Here are a few of my favorites:

The MLB-StatsAPI has pretty comprehensive documentation for the available endpoints and query parameters.

A general theme I’ve noticed in the clients is that they are focussed on statistical data and analysis. The only one I’ve seen that focuses on game data is mlbgame.

Gameday

I am less interested in stats, and more interested in live baseball. MLB has a free web app that displays a live baseball game in an informative manner, called Gameday. These exist for other sports from the likes of ESPN are often called something similar, e.g. “Gamecast”. I personally find watching baseball the most enjoyable when using something like this, as opposed to say, basketball, and my guess is that it largely has to do with the speed of play of the sport. Baseball has a natural cadence to it that allows you to check in every now and then and not feel like you’re missing anything.

Screen shot taken from MLB Gameday.

Screen shot taken from MLB Gameday.

Regardless, this data is exposed through the MLB API; in fact, it is the same exact data used to power Gameday. I first came to this realization after watching the network manager in Chrome while using Gameday, and seeing the statsapi url show up. After a little bit of digging to figure out the endpoint, then running it in my current favorite API client, insomnia, I was surprised by the amount of data contained in one API call. It was all of the information for a single baseball game! Seeing this Gameday data got me thinking - I could probably reverse engineer this thing.

A Rust TUI is born

Around the same time I was also thinking of a project to do in Rust. Full disclosure, it was for my Rust class during my studies for a Master’s degree in computer science. At the time I was considering making a terminal user interface (TUI) for MQTT, building something with WASM, or writing a simple-as-possible unikernel.

I decided that I wanted a fun and light project (so, not a unikernel), and I’d been wanting to write a TUI application for a long time (originally I was going to use Go, since Go is awesome). The 2021 baseball season had recently started, so I thought, I’ll swap out MQTT for MLB!

I got approval from my professor, and bam, I got to watch baseball for school. Just kidding, I was mostly looking at jsons, but that’s a story for Part 1.

License

The data from the API is subject to the license posted here.


[0] - https://old.reddit.com/r/Sabermetrics/comments/81u527/mlb_stats_api/

[1] - https://old.reddit.com/r/Sabermetrics/comments/c1aoqj/future_of_free_mlb_data_feeds_gameday_xml/

[2] - https://old.reddit.com/r/baseball/comments/bjovz3/new_python_wrapper_for_mlb_stats_api/

[3] - https://technology.mlblogs.com/introducing-statcast-2020-hawk-eye-and-google-cloud-a5f5c20321b8