I have decided to move my link aggregation efforts, such as they are, to Bitly, and to leave this blog for original content. Follow me on Twitter if you’re interested in tracking these links in the future.
Playing board games is a major hobby of mine. Every year I look forward to attending Gen Con, where I aggressively shop the dings and dents section of CoolStuffInc and play as many new games as I can. We enjoy playing board games as a family. I’ve been playing board games with my daughter since she was four; she’s eight now, and quite the gamer. One of the things we’ve learned is that the recommended age range on a board game is merely a recommendation: the only real challenge for my daughter in playing advanced games was how well she was able to read.
This is part 7 of my series on calculating percentiles on streaming data. In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05]. This paper took the Greenwald-Khanna algorithm [GK01] and made the following notable changes: Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile $\phi$ and a maximum allowable error $\epsilon$.
This is part 5 of my series on calculating percentiles on streaming data. I have created a reusable C++ library which contains my implementation of the streaming percentile algorithms found within this blog post and published it to GitHub. Here’s what using it looks like: #include <stmpct/gk.hpp> using namespace stmpct; double epsilon = 0.1; gk g(epsilon); for (int i = 0; i < 1000; ++i) g.insert(rand()); double p50 = g.quantile(0.5); // Approx.
This is part 3 of my series on calculating percentiles on streaming data. In an effort to better understand the Greenwald-Khanna [GK01] algorithm, I created a series of visualizations of the cumulative distribution functions of a randomly-generated, normally-distributed data set with $\mu$ = 0 and $\sigma$ = 1, as the number of random numbers $n$ increases from 1 to 1,000. The way to read these visualizations is to find the percentile you are looking for on the y-axis, then trace horizontally to find the vertical line on the chart which intersects this location, then read the value from the x-axis.