Calculating Percentiles on Streaming Data Part 1: Introduction

Calculating Percentiles on Streaming Data Part 1: Introduction

This is part 1 of my series on calculating percentiles on streaming data. Suppose that you are dealing with a system which processes one million requests per second, and you’d like to calculate the median percentile response time over the last 24 hours. The naive approach would be to store every response time, sort them all, and then return the value in the middle. Unfortunately, this approach would require manipulating 1,000,000 * 60 * 60 * 24 = 86.
Visualizing Latency Part 4: Official D3 Latency Heatmap Page

Visualizing Latency Part 4: Official D3 Latency Heatmap Page

This post is part 4 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. Allow me to wrap up my visualizing latency post series by noting that my official D3 latency heatmap repository is at https://github.com/sengelha/d3-latency-heatmap/. Monitor this repository for future developments to the D3 latency heatmap chart.
Visualizing Latency Part 3: Rendering Event Data

Visualizing Latency Part 3: Rendering Event Data

This post is part 3 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. Now that I have introduced the D3 latency heatmap chart component and explained what binning is, I can discuss the primary use case of the chart: rendering event data. What is event data First, I must explain what I mean by event data. For a fuller treatment, please read Analytics For Hackers: How To Think About Event Data, but allow me to summarize: Event data describes actions performed by entities.
Visualizing Latency Part 2: What is Binning?

Visualizing Latency Part 2: What is Binning?

This post is part 2 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. As mentioned in Brendan Gregg’s Latency Heat Maps page, a latency heat map is a visualization where each column of data is a histogram of the observations for that time interval. Using Brendan Gregg’s visualization: As with histograms, the key decision that needs to be made when using a latency heat map is how to bin the data.
Visualizing Latency Part 1: Introduction

Visualizing Latency Part 1: Introduction

This post is part 1 of my series about visualizing latency, which is very useful for debugging certain classes of performance problems. A latency heatmap is a particularly useful tool for visualizing latency. For a great treatment of latency heatmaps, please read Brendan Gregg’s Latency Heat Maps page and the ACM Queue article Visualizing System Latency. On the right, you can see a latency heatmap generated from a job queueing system which shows a number of interesting properties, not least of which is that the system appears to be getting slower over time.
Data-Driven Code Generation of Unit Tests Part 5: Closing Thoughts

Data-Driven Code Generation of Unit Tests Part 5: Closing Thoughts

This post is part 5 of my series about data-driven code generation of unit tests. In the previous posts in this series, I walked through the idea of performing data-driven code generation for unit tests, as well as how I implemented it in three different programming languages and build systems. This post contains some final thoughts about the effort. Was it worth it? Almost certainly. Although it required substantial up-front effort to set up the unit test generators, this approach found numerous, previously-undetected bugs both within my implementation of the calculation library as well as with legacy implementations.
Data-Driven Code Generation of Unit Tests Part 4: C#, MSBuild, T4, MS Unit Test

Data-Driven Code Generation of Unit Tests Part 4: C#, MSBuild, T4, MS Unit Test

This post is part 4 of my series about data-driven code generation of unit tests. This blog post explains how I used C#, MSBuild, T4 Text Templates, and the Microsoft Unit Test Framework for Managed Code to perform data-driven code generation of unit tests for a financial performance analytics library. If you haven’t read it already, I recommend starting with Part 1: Background. As mentioned in Part 2: C++, CMake, Jinja2, Boost, all performance analytics metadata is stored in a single file called metadata.
Data-Driven Code Generation of Unit Tests Part 3: Java, Maven, StringTemplate, JUnit

Data-Driven Code Generation of Unit Tests Part 3: Java, Maven, StringTemplate, JUnit

This post is part 3 of my series about data-driven code generation of unit tests. This blog post explains how I used Java, Apache Maven, StringTemplate, and JUnit to perform data-driven code generation of unit tests for a financial performance analytics library. If you haven’t read it already, I recommend starting with Part 1: Background. As mentioned in Part 2: C++, CMake, Jinja2, Boost, all performance analytics metadata is stored in a single file called metadata.
Data-Driven Code Generation of Unit Tests Part 2: C++, CMake, Jinja2, Boost

Data-Driven Code Generation of Unit Tests Part 2: C++, CMake, Jinja2, Boost

This post is part 2 of my series about data-driven code generation of unit tests. This blog post explains how I used CMake, Jinja2, and the Boost Unit Test framework to perform data-driven code generation of unit tests for a financial performance analytics library. If you haven’t read it already, I recommend starting with Part 1: Background. All performance analytics metadata is stored in a single metadata file called metadata.csv. This file contains the complete list of calculations, and for each calculation, its settings (i.
Data-Driven Code Generation of Unit Tests Part 1: Background

Data-Driven Code Generation of Unit Tests Part 1: Background

This post is part 1 of my series about data-driven code generation of unit tests. At Morningstar, I created a multi-language, cross-platform performance analytics library which implements both online and offline implementations of a number of common financial analytics such as Alpha, Beta, R-Squared, Sharpe Ratio, Sortino Ratio, and Treynor Ratio (more on this library later). The library relies almost exclusively on a comprehensive suite of automated unit tests to validate its correctness.