Histogram with Adjustable Bin Size Slider

Dataviz logo representing a Histogram chart.

Constructing a histogram carries a significant responsibility: selecting the appropriate bin size. It's crucial to experiment with different values, as an incorrect choice can obscure the underlying narrative.

Many aspire to complete a marathon in under four hours, leading to a pronounced gap in the histogram. Choosing a bin size that is too large can cause you to overlook this important detail!

Useful links

Data

Countless marathons are held around the globe each year. This article examines the completion times for male participants across various events.

I downloaded the data and used R to filter it down to just the first 100,000 rows focused on male finishers.

The cleaned dataset is just an array of numbers, stored in an array called data.

Plot and code

The goal of this post is to create a histogram that illustrates the distribution of finishing times for a marathon.

Specifically, it demonstrates how to use D3 to compute bins from the original dataset and how React can be utilized to render the rectangles that form the histogram.

Additionally, it features a slider that allows users to adjust the bin size, dynamically recalculating and rendering the bar sizes in real time.

More importantly, this post highlights the significance of experimenting with bin sizes in a histogram.

With a lower bin count (around 30), readers will observe a nearly normal distribution. However, increasing the number of bins reveals a significant gap around the 4-hour mark, showcasing the intense effort many runners put in to finish under this threshold!

Histogram section
Number of bins (target):300
3:003:304:00

A histogram with a slider that controls the bin size.

Dataviz caveat

A great way to improve your data visualization skills is by learning the most common pitfalls when creating charts.

I’ve compiled a large collection of these mistakes and created flashcards that summarize each one. For example, here’s the card that explains the bin size issue.

Explore the full collection when you can!

Caveat collection


a dataviz flashcard explaining that one must always check the optimal bin size when making an histogram

Distribution

Contact

👋 Hey, I'm Yan and I'm currently working on this project!

Feedback is welcome ❤️. You can fill an issue on Github, drop me a message on Twitter, or even send me an email pasting yan.holtz.data with gmail.com. You can also subscribe to the newsletter to know when I publish more content!