Practical Bazel: Comparing Collections for Equivalence in sh_test()
Practical Bazel bazel bash
Published: 2023-02-28
Practical Bazel: Comparing Collections for Equivalence in sh_test()

When writing Bazel tests using sh_test(), I often find myself needing to compare two collections for equivalence. For example, I might compare a directory listing against a set of expected files or directories, or the list of files and directories in a .tar file against a set of expected items. This blog post provides some tips and tricks as to how to do so.

The basic approach to solving this problem is simple: put the expected and the actual sets into Bash arrays and then compare the contents. The trick is in the details.

Defining Expected Values

The trivial case is define your set of expected values, which you can do by defining the Bash array inline:

1
2
3
4
5
EXPECTED_VALUES=(
  file1.txt
  file2.txt
  dir1/file3.txt
)

Defining Actual Values

To read in the actual values, use the Bash builtin readarray with the stdout from a program. Be sure to trim off any extraneous or unwanted output. For example:

To read in a set of expected values from a separate text file:

1
readarray -t ACTUAL_VALUES < source_file.txt

To read in a set of files in a directory recursively:

1
readarray -t ACTUAL_VALUES < <(cd $DIR && find . -type f | sed -e 's#^\./##')

To read in a set of files from a ZIP archive:

1
readarray -t ACTUAL_VALUES < <(unzip -Z1 $ZIP_FILE)

To read in a set of files from a TAR archive:

1
readarray -t ACTUAL_VALUES < <(tar -tf $TAR_FILE)

Comparing Actual to Expected

The program comm can be used to compare the actual and expected arrays together. The main thing to remember is that the input to comm must be sorted, which is trivial to handle using sort.

Typically I’ll report “missing” from “extra” values separately as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
ERR=0
MISSING_VALUES=$(comm -13 <(printf '%s\n' "${ACTUAL_VALUES[@]}" | sort) <(printf '%s\n' "${EXPECTED_VALUES[@]}" | sort))
if [[ ! -z "$MISSING_VALUES" ]]; then
  echo "ERROR: Following expected files are missing: $MISSING_VALUES" 1>&2
  ERR=1
fi

EXTRA_VALUES=$(comm -23 <(printf '%s\n' "${ACTUAL_VALUES[@]}" | sort) <(printf '%s\n' "${EXPECTED_VALUES[@]}" | sort))
if [[ ! -z "$EXTRA_VALUES" ]]; then
  echo "ERROR: Following actual files are not in expected list: $EXTRA_VALUES" 1>&2
  ERR=1
fi

if [[ $ERR -ne 0 ]]; then
  exit $ERR
fi

Dealing With Bash Shells That Don’t Support readarray

Some very old versions of Bash, like the one shipped with OS X, don’t include the Bash readarray builtin. For these systems, I use the following script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# //tools/bash/readarray:readarray.bash
#
# Provide a simplified implementation of readarray for Bash shells that don't
# have the readarray builtin.
if ! type -t readarray >/dev/null; then
  # Very minimal readarray implementation using read. Does NOT work with lines that contain double-quotes due to eval()
  readarray() {
    local cmd
    local opt=""
    local t
    local v=MAPFILE
    while [ $# -gt 0 ]; do
      case "$1" in
      -h|--help) echo "minimal substitute readarray for older bash"; exit; ;;
      -r) shift; opt="$opt -r"; ;;
      -t) shift; t=1; ;;
      -u) 
          shift; 
          if [ -n "$1" ]; then
            opt="$opt -u $1"; 
            shift
          fi
          ;;
      *)
          if [[ "$1" =~ ^[A-Za-z_]+$ ]]; then
            v="$1"
            shift
          else
            echo -en "${C_BOLD}${C_RED}Error: ${C_RESET}Unknown option: '$1'\n" 1>&2
            exit
          fi
          ;;
      esac
    done
    cmd="read $opt"
    eval "$v=()"
    while IFS= eval "$cmd line"; do      
      line=$(echo "$line" | sed -e "s#\([\"\`]\)#\\\\\1#g" )
      eval "${v}+=(\"$line\")"
    done
  }
fi

I then wrap this script in a sh_library() which can be used as a dep from my sh_test()s:

1
2
3
4
5
6
# //tools/bash/readarray:BUILD.bzl
sh_library(
    name = "readarray",
    srcs = ["readarray.bash"],
    visibility = ["//visibility:public"],
)

Putting It All Together

1
2
3
4
5
6
7
# BUILD.bzl
sh_test(
    name = "my_test",
    srcs = ["my_test.sh"],
    ...
    deps = ["//tools/bash/readarray"],
)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/bin/bash
#
# my_test.sh: Implement the test case

set -euo pipefail

# Pull in readarray script to handle Bash shells that don't have the readarray builtin
source ./tools/bash/readarray/readarray.bash

# Populate expected values
EXPECTED_VALUES=(
  file1.txt
  file2.txt
  dir1/file3.txt
)

# Populate actual values
readarray -t ACTUAL_VALUES < <(cd $DIR && find . -type f | sed -e 's#^\./##')

# Compare expected to actual, existing with non-zero code if they are different
ERR=0
MISSING_VALUES=$(comm -13 <(printf '%s\n' "${ACTUAL_VALUES[@]}" | sort) <(printf '%s\n' "${EXPECTED_VALUES[@]}" | sort))
if [[ ! -z "$MISSING_VALUES" ]]; then
  echo "ERROR: Following expected files are missing: $MISSING_VALUES" 1>&2
  ERR=1
fi

EXTRA_VALUES=$(comm -23 <(printf '%s\n' "${ACTUAL_VALUES[@]}" | sort) <(printf '%s\n' "${EXPECTED_VALUES[@]}" | sort))
if [[ ! -z "$EXTRA_VALUES" ]]; then
  echo "ERROR: Following actual files are not in expected list: $EXTRA_VALUES" 1>&2
  ERR=1
fi

if [[ $ERR -ne 0 ]]; then
  exit $ERR
fi