Aug27

(Author's note: not necessarily actually a practical idea. But fun!)

So pictured here is a histogram of a moderately large set of random integers. Each vertical line represents the total number of entries at each particular integer. Since each number is made up of multiple random factors (10 different random numbers, each between 0 and 100, added together), the distribution tends toward a bell curve.

So how did I build the graph? Excel? PHP? Nope. Just a MySQL query.

Admittedly I cheated a bit. That image is zoomed out and rotated 90 degrees. Here's a snippet of the actual query output:

Each "bucket" has one asterisk for each entry that falls inside it. Using the same single-column database as in pt. 1, and populating the data in a similar fashion with 30,000 rows ranging from 0 to 1000, it actually takes a remarkably simple query to generate a histogram like this:

mysql> SELECT `value`, RPAD('', COUNT(`value`), '*') AS barThe RPAD statement starts with an empty string and adds as many '*' characters as there are matches of the given value in the database.

-> FROM `randomNumbers`

-> GROUP BY `value`;

In practical terms, would you ever really want to use your database to build graphical presentation like this? Probably not. Logically, you really just want to run the following query and handle your display on the application side.

mysql> SELECT `value`, COUNT(`value`) AS barBut it can be a quick and dirty way to get an overview of your data's distribution. There are a couple kinks to iron out: this query doesn't show buckets with zero entries. This shouldn't be a huge concern, since again, this method of graphical output should only be used for a glancing overview, and not a precision statistical analysis. With enough data, these empty buckets should only occur out at the very fringe anyway (surrounding the outliers mentioned in the last entry).

-> FROM `randomNumbers`

-> GROUP BY `value`;

Dev