Posts

Moved Link Aggregation to Bitly

Moved Link Aggregation to Bitly

I have decided to move my link aggregation efforts, such as they are, to Bitly, and to leave this blog for original content. Follow me on Twitter if you’re interested in tracking these links in the future.
Unit Testing Emscripten Library in Browser Using CMake and Nightwatch.JS

Unit Testing Emscripten Library in Browser Using CMake and Nightwatch.JS

In a previous blog post, I described how I took Emscripten-created JS and turned it into an UMD module. One of the reasons I did this was because I wanted more control over the generated JavaScript and for it to be usable in more contexts, such as with the RequireJS module loader. As I am a responsible developer, I desired to create a number of automated unit tests to ensure that the client-visible API for my Emscripten module works as I intended.
Creating UMD Module from Emscripten using CMake

Creating UMD Module from Emscripten using CMake

By default, Emscripten creates a module which can be used from both Node.JS and the browser, but it has the following issues: The module pollutes the global namespace The module is created with the name Module (in my case, I require streamingPercentiles) The module cannot be loaded by some module loaders such as require.js While the above issues can (mostly) be corrected by using –s MODULARIZE=1, it changes the semantics of the resulting JavaScript file, as the module now returns a function rather than an object.
Board Games for Family Night

Board Games for Family Night

Playing board games is a major hobby of mine. Every year I look forward to attending Gen Con, where I aggressively shop the dings and dents section of CoolStuffInc and play as many new games as I can. We enjoy playing board games as a family. I’ve been playing board games with my daughter since she was four; she’s eight now, and quite the gamer. One of the things we’ve learned is that the recommended age range on a board game is merely a recommendation: the only real challenge for my daughter in playing advanced games was how well she was able to read.
Calculating Percentiles on Streaming Data Part 7: Cormode-Korn-Muthukrishnan-Srivastava

Calculating Percentiles on Streaming Data Part 7: Cormode-Korn-Muthukrishnan-Srivastava

This is part 7 of my series on calculating percentiles on streaming data. In 2005, Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava published a paper called Effective Computation of Biased Quantiles over Data Streams [CKMS05]. This paper took the Greenwald-Khanna algorithm [GK01] and made the following notable changes: Generalized the algorithm to work with arbitrary targeted quantiles, where a targeted quantile is a combination of a quantile $\phi$ and a maximum allowable error $\epsilon$.
Calculating Percentiles on Streaming Data Part 6: Building a C++ and JavaScript Library from a Single Codebase

Calculating Percentiles on Streaming Data Part 6: Building a C++ and JavaScript Library from a Single Codebase

This is part 6 of my series on calculating percentiles on streaming data. For the past 10 days or so, I’ve been working on the build process of my C++ and JavaScript streaming analytics libraries. Using the magic of Emscripten, I have been able to combine both libraries into a single, C++ codebase, from which I can compile both the C++ and JavaScript versions of the library. Furthermore, I was able to do this without breaking backwards compatibility of the JavaScript library.
Calculating Percentiles on Streaming Data Part 5: C++ Library

Calculating Percentiles on Streaming Data Part 5: C++ Library

This is part 5 of my series on calculating percentiles on streaming data. I have created a reusable C++ library which contains my implementation of the streaming percentile algorithms found within this blog post and published it to GitHub. Here’s what using it looks like: #include <stmpct/gk.hpp> using namespace stmpct; double epsilon = 0.1; gk g(epsilon); for (int i = 0; i < 1000; ++i) g.insert(rand()); double p50 = g.quantile(0.5); // Approx.
Calculating Percentiles on Streaming Data Part 4: JavaScript Library

Calculating Percentiles on Streaming Data Part 4: JavaScript Library

This is part 4 of my series on calculating percentiles on streaming data. I have created a reusable JavaScript library which contains my implementation of the streaming percentile algorithms found within this blog post and published it to GitHub and NPM. Here’s what using it looks like: var sp = require("streaming-percentiles'); // epsilon is allowable error. As epsilon becomes smaller, the // accuracy of the approximations improves, but the class consumes // more memory.
Calculating Percentiles on Streaming Data Part 3: Visualizing Greenwald-Khanna

Calculating Percentiles on Streaming Data Part 3: Visualizing Greenwald-Khanna

This is part 3 of my series on calculating percentiles on streaming data. In an effort to better understand the Greenwald-Khanna [GK01] algorithm, I created a series of visualizations of the cumulative distribution functions of a randomly-generated, normally-distributed data set with $\mu$ = 0 and $\sigma$ = 1, as the number of random numbers $n$ increases from 1 to 1,000. The way to read these visualizations is to find the percentile you are looking for on the y-axis, then trace horizontally to find the vertical line on the chart which intersects this location, then read the value from the x-axis.
Calculating Percentiles on Streaming Data Part 2: Notes on Implementing Greenwald-Khanna

Calculating Percentiles on Streaming Data Part 2: Notes on Implementing Greenwald-Khanna

This is part 2 of my series on calculating percentiles on streaming data. The most famous algorithm for calculating percentiles on streaming data appears to be Greenwald-Khanna [GK01]. I spent a few days implementing the Greenwald-Khanna algorithm from the paper and I discovered a few things I wanted to share. Insert Operation The insert operation is defined in [GK01] as follows: INSERT($v$) Find the smallest $i$, such that $v_{i-1} \leq v < v_i$, and insert the tuple $ t_x = (v_x, g_x, \Deltax) = (v, 1, \lfloor 2 \epsilon n \rfloor) $, between $t{i-1}$ and $t_i$.