Calculating Percentiles on Streaming Data Part 8: Parameterizing Algorithms on Measurement Type

Calculating Percentiles on Streaming Data Part 8: Parameterizing Algorithms on Measurement Type

This is part 8 of my series on calculating percentiles on streaming data.

As mentioned in part 6 of this series, I have published a C++ and JavaScript library which implements a number of streaming percentile algorithms on GitHub at https://github.com/sengelha/streaming-percentiles.

Versions 1.x and 2.x of the C++ library required all measurements to use the type double, and usage of the algorithms looked something like this:

#include <stmpct/gk.hpp>

double epsilon = 0.1;
stmpct::gk g(epsilon);
for (int i = 0; i < 1000; ++i)
    g.insert(rand());
double p50 = g.quantile(0.5); // Approx. median
double p95 = g.quantile(0.95); // Approx. 95th percentile

For version 3.x of the library, I have modified the algorithms to be parameterized by type so that they are not limited to double. The algorithms now may be used by type which implements a few simple requirements: it must be default constructible, copy constructible, assignable, and implement operator<. This means that they can now be used with float, int, long double, a custom number type, or even std::string.

Usage has slightly changed:

#include <stmpct/gk.hpp>

double epsilon = 0.1;
stmpct::gk<double> g(epsilon);
for (int i = 0; i < 1000; ++i)
    g.insert(rand());
double p50 = g.quantile(0.5); // Approx. median
double p95 = g.quantile(0.95); // Approx. 95th percentile

In order to implement this change, I was required to change the library from being a combination of pre-built binaries and headers to a header-only library. Fortunately, CMake makes this quite easy with its notion of an INTERFACE library:

# Old
add_library(stmpct_cpp_static STATIC ${LIBSTMPCT_SRC})
if (BUILD_SHARED_LIBRARY)
  add_library(stmpct_cpp_shared SHARED ${LIBSTMPCT_SRC})
endif()

# New
add_library(stmpct_cpp INTERFACE)
target_include_directories(stmpct_cpp INTERFACE include)

The JavaScript interface to the algorithms was unchanged.

On the downside, as new versions of the library are released, users of the library will be forced to recompile the library against the latest source code; the library has lost the ability to fix bugs by replacing a shared library binary file without recompilation. Furthermore, I now have no choice but to distribute the full source code of the library to all users. This is not a big deal for this toy open source project, but for a commercial offering, that may be quite undesirable.

To download the latest version of the streaming percentiles library (including documentation), visit https://github.com/sengelha/streaming-percentiles.