My friend Benny Rubanov and I ( link to my homepage 😁) wrote code to calculate various chess stats for all 5,000,000,000+ chess games in the Lichess Database. We may not have run that code for 5 billion games yet (asking for AWS credits for Christmas), but we ran it for 13,948,545 games and got some interesting results!*

So why do this? The idea originated with Benny & I trying to guess which piece would have the highest “kill-to-death” ratio (i.e. captures more times than it is captured). I’m a casual chess fan, but much more passionate about math, statistics and software engineering, so I decided to actually answer this and other chess questions no one has quantified before!

<aside> 🧑‍💻 All code is open source. See ‣

</aside>

<aside> 📊 Raw data (aggregated) is available here!

</aside>

Untitled

<aside> ℹ️ Here are the results of the “kill-to-death” ratio question. The black king leads all pieces! Note that a checkmate is considered a “capture” of a king.

</aside>

From that first question we went on to write code for calculating and visualize chess statistics at scale, starting with some fun statistics no one has bothered to calculate before. See below for how we did this, and more results!

A preview of the results (so far)

I’ll cover methodology and major challenges before diving deep into the numbers, but here’s a quick teaser:

Methodology 📖

We owe a big thanks to the developers of chess.js (another great open source project). We took advantage of some helpful methods written for the library to save ourselves development time.

We also owe thanks to the maintainers of Lichess. All data used in this analysis comes from the Lichess open games database.

Approach

The steps to the analysis can roughly be broken into

  1. Fetch & decompress game data