Stats in MySQL Pt. II: Histograms

Missing Image
By Joe Izenman

(Author's note: not necessarily actually a practical idea. But fun!)

So pictured here is a histogram of a moderately large set of random integers. Each vertical line represents the total number of entries at each particular integer. Since each number is made up of multiple random factors (10 different random numbers, each between 0 and 100, added together), the distribution tends toward a bell curve.

So how did I build the graph? Excel? PHP? Nope. Just a MySQL query.

Admittedly I cheated a bit. That image is zoomed out and rotated 90 degrees. Here's a snippet of the actual query output:

Histogram Snippet

Each "bucket" has one asterisk for each entry that falls inside it. Using the same single-column database as in pt. 1, and populating the data in a similar fashion with 30,000 rows ranging from 0 to 1000, it actually takes a remarkably simple query to generate a histogram like this:
mysql> SELECT `value`, RPAD('', COUNT(`value`), '*') AS bar
    -> FROM `randomNumbers`
    -> GROUP BY `value`;
The RPAD statement starts with an empty string and adds as many '*' characters as there are matches of the given value in the database.

In practical terms, would you ever really want to use your database to build graphical presentation like this? Probably not. Logically, you really just want to run the following query and handle your display on the application side.
mysql> SELECT `value`, COUNT(`value`) AS bar
    -> FROM `randomNumbers`
    -> GROUP BY `value`;
But it can be a quick and dirty way to get an overview of your data's distribution. There are a couple kinks to iron out: this query doesn't show buckets with zero entries. This shouldn't be a huge concern, since again, this method of graphical output should only be used for a glancing overview, and not a precision statistical analysis. With enough data, these empty buckets should only occur out at the very fringe anyway (surrounding the outliers mentioned in the last entry).


Back To Feed