At various points in my career, I have created or managed systems which need to perform calculations on enormous amounts of numeric data in real-time. Whether I was running a Varnish cluster that served 30,000 requests per second or writing financial analytics which needed to calculate aggregates on terabytes of price and return data in a few seconds, a common need emerged: how can I calcalate aggregates on large amounts of numeric data in a single pass?

The answer, of course, is *streaming analytics*: the ability to constantly
calculate statistical analytics while moving within the stream of data.
A streaming analytics system can calculate statistics in a single pass,
in real time, and without necessarily having all the raw data (i.e.
calculation as-you-go)

I decided to explore the family of *approximate percentile* algorithms
such as Greenwald-Khanna and Cormode, Korn, Muthukrishnan, and Srivastava.
This exploration turned into the following: