### Calculating Percentiles on Streaming Data Part 3: Visualizing Greenwald-Khanna

This is part 3 of my series on calculating percentiles on streaming data. In an effort to better understand the Greenwald-Khanna [GK01] algorithm, I created a series of visualizations of the cumulative distribution functions of a randomly-generated, normally-distributed data set with $\mu$ = 0 and $\sigma$ = 1, as the number of random numbers $n$ increases from 1 to 1,000. The way to read these visualizations is to find the percentile you are looking for on the y-axis, then trace horizontally to find the vertical line on the chart which intersects this location, then read the value from the x-axis.

### Calculating Percentiles on Streaming Data Part 2: Notes on Implementing Greenwald-Khanna

This is part 2 of my series on calculating percentiles on streaming data. The most famous algorithm for calculating percentiles on streaming data appears to be Greenwald-Khanna [GK01]. I spent a few days implementing the Greenwald-Khanna algorithm from the paper and I discovered a few things I wanted to share. Insert Operation The insert operation is defined in [GK01] as follows: INSERT($v$) Find the smallest $i$, such that $v_{i-1} \leq v < v_i$, and insert the tuple $t_x = (v_x, g_x, \Deltax) = (v, 1, \lfloor 2 \epsilon n \rfloor)$, between $t{i-1}$ and $t_i$.

### Calculating Percentiles on Streaming Data Part 1: Introduction

This is part 1 of my series on calculating percentiles on streaming data. Suppose that you are dealing with a system which processes one million requests per second, and you’d like to calculate the median percentile response time over the last 24 hours. The naive approach would be to store every response time, sort them all, and then return the value in the middle. Unfortunately, this approach would require manipulating 1,000,000 * 60 * 60 * 24 = 86.

### Visualizing Latency Part 4: Official D3 Latency Heatmap Page

This post is part 4 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. Allow me to wrap up my visualizing latency post series by noting that my official D3 latency heatmap repository is at https://github.com/sengelha/d3-latency-heatmap/. Monitor this repository for future developments to the D3 latency heatmap chart.

### Visualizing Latency Part 3: Rendering Event Data

This post is part 3 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. Now that I have introduced the D3 latency heatmap chart component and explained what binning is, I can discuss the primary use case of the chart: rendering event data. What is event data First, I must explain what I mean by event data. For a fuller treatment, please read Analytics For Hackers: How To Think About Event Data, but allow me to summarize: Event data describes actions performed by entities.

### Visualizing Latency Part 2: What is Binning?

This post is part 2 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. As mentioned in Brendan Gregg’s Latency Heat Maps page, a latency heat map is a visualization where each column of data is a histogram of the observations for that time interval. Using Brendan Gregg’s visualization: As with histograms, the key decision that needs to be made when using a latency heat map is how to bin the data.

### Visualizing Latency Part 1: Introduction

This post is part 1 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. A latency heatmap is a particularly useful tool for visualizing latency. For a great treatment of latency heatmaps, please read Brendan Gregg’s Latency Heat Maps page and the ACM Queue article Visualizing System Latency. On the right, you can see a latency heatmap generated from a job queueing system which shows a number of interesting properties, not least of which is that the system appears to be getting slower over time.

### Data-Driven Code Generation of Unit Tests Part 5: Closing Thoughts

This post is part 5 of my series about data-driven code generation of unit tests. In the previous posts in this series, I walked through the idea of performing data-driven code generation for unit tests, as well as how I implemented it in three different programming languages and build systems. This post contains some final thoughts about the effort. Was it worth it? Almost certainly. Although it required substantial up-front effort to set up the unit test generators, this approach found numerous, previously-undetected bugs both within my implementation of the calculation library as well as with legacy implementations.

### Data-Driven Code Generation of Unit Tests Part 4: C#, MSBuild, T4, MS Unit Test

This post is part 4 of my series about data-driven code generation of unit tests. This blog post explains how I used C#, MSBuild, T4 Text Templates, and the Microsoft Unit Test Framework for Managed Code to perform data-driven code generation of unit tests for a financial performance analytics library. If you haven’t read it already, I recommend starting with Part 1: Background. As mentioned in Part 2: C++, CMake, Jinja2, Boost, all performance analytics metadata is stored in a single file called metadata.

### Data-Driven Code Generation of Unit Tests Part 3: Java, Maven, StringTemplate, JUnit

This post is part 3 of my series about data-driven code generation of unit tests. This blog post explains how I used Java, Apache Maven, StringTemplate, and JUnit to perform data-driven code generation of unit tests for a financial performance analytics library. If you haven’t read it already, I recommend starting with Part 1: Background. As mentioned in Part 2: C++, CMake, Jinja2, Boost, all performance analytics metadata is stored in a single file called metadata.