This post is part 2/5 of my Data-Driven Code Generation of Unit Tests series.
This blog post explains how I used CMake, Jinja2, and the Boost Unit Test framework to perform data-driven code generation of unit tests for a financial performance analytics library. If you haven’t read it already, I recommend starting with Part 1: Background.
All performance analytics metadata is stored in a single metadata file called metadata.csv
. This file contains the complete list of calculations, and for each calculation, its settings (i.e. how it differs from other calculations), including properties like:
- How many parameters does the calculation take (1, 2, or 3)?
- Does the calculation have an online (streaming) implementation?
- Does the calculation support annualization?
- What is the default annualization mode?
- Given a predefined set of inputs, what are the expected values of the calculation for various combinations of time period, annualization, etc.
The file looks something like:
algorithm_type,function_name,num_parameters,minimum_arr_size,supports_streaming,supports_annualization,default_annualization,expected_value_unannualized,expected_value_annualized_daily,expected_value_annualized_weekly,expected_value_annualized_monthly,expected_value_annualized_quarterly,expected_value_annualized_semiannually,expected_value_annualized_daily_200_day_year
absolute_statistics,calculation1,1,1,true,false,never,7.283238516,-999,-999,-999,-999,-999,-999
...
relative_statistics,calculation2,3,1,true,true,always,0.189846006,69.34125385,9.871992334,2.278152077,0.759384026,0.379692013,37.96920129
...
I use CSV rather than JSON or YAML because it can be easily read by CMake during the build process (more below).
A Jinja2 template defines all unit tests for a given calculation. It uses the attributes found in metadata.csv
to determine how to generate the appropriate source code. For example, if the calculation does not support annualization per the supports_annualization
flag, the Jinja2 template will ignore (not generate) the unit tests which test annualization support.
Each calculation has a number of possible combinations to test for, such as:
- Test the online vs. offline versions of the calculation
- Test the various annualization settings (always, never, calculation default)
- Test the various pre-defined annualization periods (daily, weekly, monthly, etc.)
- etc.
The Jinja2 template uses for loops extensively to make sure that it tests all possible combinations of all of the above parameters. It looks something like:
|
|
As you can imagine, the resulting code coverage of the unit tests is excellent.
A Python script, render_jinja.py
, knows how to read metadata.csv
and pass the appropriate values to Jinja2 in order to generate the unit tests for a given function. The meat of the Python script looks like:
|
|
The build system uses CMake. It too reads metadata.csv
to get a list of calculations, calls render_jinja.py
on each calculation to generate the unit test code C++ file, and then compiles and executes the unit tests. Here’s a sample of the CMake build file:
|
|
A single script, build.sh
, ties everything together. While the full build.sh supports a number of command-line options (e.g. -c, --clean
for a clean build; -d, --debug
for a debug build; -r, --release
for a release build), the core of the script looks like:
|
|
Windows uses an equivalent script called build.cmd
.
I am quite happy with the results. Adding a new calculation is almost as simple as writing the implementation of the calculation and adding a single line to metadata.csv
. The unit tests are comprehensive and provide great code coverage. New test patterns (e.g. what should happen if you pass in NULL
to a calculation?) can be added to all calculations at once, simply by editing the Jinja2 template file. Everything works across Windows, Mac OS, and Linux.
The only remaining frustration that I have is that the build system will often re-generate the unit test source code, and recompile the unit tests, even though nothing has changed. This notably slows down build times. I’m hopeful this can be solved with some further work on the CMake build file, but I’ll leave that for another time.