Calculating Percentiles on Streaming Data Part 8: Parameterizing Algorithms on Measurement Type
Calculating Percentiles on Streaming Data c++ ckms gk javascript percentiles streaming
Published: 2018-12-21

This is part 8/8 of my Calculating Percentiles on Streaming Data series.

As mentioned in part 6 of this series, I have published a C++ and JavaScript library which implements a number of streaming percentile algorithms on GitHub at https://github.com/sengelha/streaming-percentiles.

Versions 1.x and 2.x of the C++ library required all measurements to use the type double, and usage of the algorithms looked something like this:

 1 2 3 4 5 6 7 8  #include double epsilon = 0.1; stmpct::gk g(epsilon); for (int i = 0; i < 1000; ++i) g.insert(rand()); double p50 = g.quantile(0.5); // Approx. median double p95 = g.quantile(0.95); // Approx. 95th percentile 

For version 3.x of the library, I have modified the algorithms to be parameterized by type so that they are not limited to double. The algorithms now may be used by type which implements a few simple requirements: it must be default constructible, copy constructible, assignable, and implement operator<. This means that they can now be used with float, int, long double, a custom number type, or even std::string.

Usage has slightly changed:

 1 2 3 4 5 6 7 8  #include double epsilon = 0.1; stmpct::gk g(epsilon); for (int i = 0; i < 1000; ++i) g.insert(rand()); double p50 = g.quantile(0.5); // Approx. median double p95 = g.quantile(0.95); // Approx. 95th percentile 

In order to implement this change, I was required to change the library from being a combination of pre-built binaries and headers to a header-only library. Fortunately, CMake makes this quite easy with its notion of an INTERFACE library:

 1 2 3 4 5 6 7 8 9  # Old add_library(stmpct_cpp_static STATIC ${LIBSTMPCT_SRC}) if (BUILD_SHARED_LIBRARY) add_library(stmpct_cpp_shared SHARED${LIBSTMPCT_SRC}) endif() # New add_library(stmpct_cpp INTERFACE) target_include_directories(stmpct_cpp INTERFACE include) 

The JavaScript interface to the algorithms was unchanged.

On the downside, as new versions of the library are released, users of the library will be forced to recompile the library against the latest source code; the library has lost the ability to fix bugs by replacing a shared library binary file without recompilation. Furthermore, I now have no choice but to distribute the full source code of the library to all users. This is not a big deal for this toy open source project, but for a commercial offering, that may be quite undesirable.