Property-based testing with ScalaCheck - Markus Günther IT-Beratung

Property-based testing is a powerful test methodology that is a useful addition to a developer's portfolio. ScalaCheck brings property-based testing to the Java platform. In this article, we explore what property-based testing is, how it differs from traditional unit testing and how it can be integrated into your own test base.

A simple example

Let's talk about a rather artificial scenario to get the idea behind property-based testing across. Imagine that we receive a request from some department of our organization. They request a function that can add two integers together. Nothing could be easier, and we dive straight into implementing the necessary code. The function is written quickly and example-based unit tests ensure that everything works as it should.

class AdderTest extends FlatSpec with Matchers {
  "Adder" should "yield 4 when I add 1 and 3" in {
    add(1, 3) shouldBe 4
  }
  "Adder" should "yield 4 when I add 4 and 0" in {
    add(4, 0) shouldBe 4
  }
}

But shortly after we rolled out our solution, the department got in touch: apparently there are some problems with the adder!

Our analysis shows that the implemented function does not quite correspond to the definition of adding two integers together.

object Adder {
  def add(a: Int, b: Int) = 4
}

Obviously, our example-based test cases do not adequately cover the correct behavior of the adder. You scratch your head, thinking, "But I've done everything that test-driven development suggest: I wrote minimal test cases and derived the implementation from there. Where did I went wrong?" And indeed, the test cases provide sufficient line coverage, but what they lack is a test for the essential properties that every adder must satisfy.

Admittedly, this example is exaggerated, but this is primarily due to the simplicity of the function to be implemented. In practice, such phenomena occur when the specification is unclear and the developer uses his or her own interpretation on what should be done, without talking to the requester. Of course, this also happens when example-based tests are simply inadequate and do not check either edge cases or the defining characteristics of the implemented logic.

Coming back to the above example: Let us first reason about the specific requirements for the adder. What properties must an adding function satisfy?

We can approach the solution by looking at what distinguishes an adder from other functions, such as a subtractor. Well, an adder should give the same result regardless of the order of the input parameters. We can check for this property with ScalaCheck.

forAll {
  (a: Int, b: Int) => {
    add(a, b) == add(b, a)
  }
}

Let us first concentrate on the forAll method. forAll consumes an anonymous function. In contrast to example-based unit tests, however, we do not pass any concrete numerical values, but rather define two Ints, call them a and b, via the function arguments. Using implicitly defined generators, ScalaCheck can provide randomly generated values for the Int type and inject them into the property without us having to specify how ScalaCheck should generate these values. In the definition of the property, we then call the add function and check for the expected behavior.

This property does not yet fully describe the behavior of the adding function, so let us continue. If we add the number 1 twice to a, we expect the result to be the same as if we were to add the number 2 to a only once. This can easily be expressed with a ScalaCheck property.

forAll {
  (a: Int) => {
    add(add(a, 1), 1) == add(a, 2)
  }
}

Last but not least, the addition must handle the neutral element, the number 0, correctly. This property actually checks for two things:

Adding 0 to a number does not change that number.
Ensure that the adder has a direct reference to its input parameters by returning the other given number in that case.

forAll {
  (a: Int) => {
    add(a, 0) == a
  }
}

These three properties adequately describe the specification of a correct adding function and thus limit the scope for interpretation to such an extent that there is effectively only one single implementation for the adder.

object Adder {
  def add(a: Int, b: Int) = a + b
}

The complete specification is shown in the following listing.

object AdderSpec extends Properties("Adder") {
  property("adding two numbers should not depend on parameter order") = forAll {
    (a: Int, b: Int) => {
      add(a, b) == add(b, a)
    }
  }

  property("adding 1 twice is the same as adding 2 once") = forAll {
    (a: Int) => {
      add(add(a, 1), 1) == add(a, 2)
    }
  }

  property("adding zero is the same as doing noßthing") = forAll {
    (a: Int) => {
      add(a, 0) == a
    }
  }
}

This specification is not only quite readable, but also executable and testable against our implementation. Nice!

After running the tests in AdderSpec, we see the following message on the console.

[info] + Adder.adding zero is the same as doing nothing: OK, passed 100 tests.
[info] + Adder.adding two numbers should not depend on parameter order: OK, passed 100 tests.
[info] + Adder.adding a number twice is the same as adding two-times the number once: OK, passed
100 tests.

The output shows that ScalaCheck has generated one hundred random parameterizations - i.e. one hundred specific test cases - for each property defined in AdderSpec. Using these randomly generated test instances, ScalaCheck tries to falsify each property, thereby unconvering potential defects in our code.

Property-based testing can dramatically increase the test coverage of your code base through its generative approach. While this is certainly also possible with well-formulated unit tests, ScalaCheck is a helpful addition to the mix, especially when it comes to testing equivalence classes of your input parameters. ScalaCheck does not generate data purely at random, but tries to exercise your implemenation on edge cases for known data types as well. These edge cases include, for example, minimum and maximum values for Ints or empty sets for collections. Edge cases are often overlooked in unit tests that use specific examples.

When writing unit tests, developers tend to be too tied to the specific implementation in the test, which increases the maintenance effort on the test basis when changes are made to the implementation. Compared to unit tests, properties are usually formulated at a higher level of abstraction. This makes maintaining our test base a little bit less tedious, as we do not have to change details on test data input if some business logic changes or the function gets refactored.

Figure 1: One or more properties define the characteristics of the subject under test (SUT). A property-based testing library such as ScalaCheck uses these properties and generates test instances using generators supplied by the developer. Using these test instances, the library attempts to falsify a property of the SUT. If this is successful, the library attempts to find the smallest possible test instance that falsifies the property using the supplied shrinkers (if available).

Generators

Generators are at the heart of ScalaCheck. ScalaCheck uses generators to create test cases for propeties. A generator is nothing more than a function that can generate a random value for a specific type. Let us look at the definition of the oneOf generator for example.

def oneOf[T](xs: Seq[T]): Gen[T]

oneOf consumes a list of possible values and returns an object of type Gen[T] as the result. Gen.sample generates a value that satisfies the specification of the generator. The following example illustrates the use of the generator.

scala> val g = oneOf(-1, 0, 1)
scala> g.sample
res0: Option[Int] = Some(-1)

Calling g.sample ensures that the generator produces a randomly selected value from the value set defined by the generator.

ScalaCheck already comes with a rich repertoire of generator functions that can generate random data or data subject to certain conditions, from primitive to collection-based types. An overview of frequently used generators is shown in the following table.

Generator	Semantics	Example
`alphaNumStr`	Generates a string of alphanumeric characters of random length.	`alphaNumStr()`
`choose`	Selects a random number from a given interval.	`choose(1, 10)`
`oneOf`	Randomly selects a sample instance from a list of object instances.	`oneOf(-1, 0, 1)`
`listOf`	Generates a list of objects based on another given generator. The generated list can be empty.	`listOf(choose(1, 10))`
`listOfN`	Generates a list with exactly `n` elements based on the given generator.	`listOfN(choose(1, 10), 5)`
`nonEmptyListOf`	Has the same behavior as `listOf`, but excludes the empty list as a possible result.	`nonEmptyListOf(choose(1, 10))`

This table is by no means exhaustive. ScalaCheck offers a lot of generators ready-to-use. Be sure to check out the Gen class to get the full picture.

The generator type Gen uses functional combinators to create complex generators from simple generator functions through concatenation and composition. The most prominent representatives of these functions are map, flatMap, and filter. Using these methods, it is possible to use Gen in for-yieldexpressions and create more powerful generators from basic building blocks.

def capitalizedString: Gen[String] = for {
  c <- Gen.alphaUpperChar
  s <- Gen.listOf(Gen.alphaLowerChar)
} yield (c :: s).mkString

In this example, we generate a capitalized character string whose characters are generated randomly using the alphaUpperChar, listOf, and alphaLowerChar generators. listOf is used to ensure that a sequence of random length is generated via the value set as generated by alphaLowerChar.

Migrating example-based unit tests to properties

Let us look at an existing unit test that works with a specific test data instance. We would like to rewrite this unit test as a property-based test.

"Repository" should "retain attributes of previously saved user" in {
  val user = User("John Doe", 64823)
  val userId = db.insert(user)
  db.load(userId) == Some(user)
}

We reduce the example values used in the unit test to the underlying types and pass them to forAll. This yields:

forAll {
  (name: String, postalCode: Int) => {
    val user = User(name, postalCode)
    val userId = db.insert(user)
    db.load(userId) == Some(user)
  }
}

Using implicit generators for types String and Int, ScalaCheck already knows how to generate random values for this property. However, this leads to undesirable behavior in this case, because these generators also produce empty Strings for the name or invalid postal codes. At this point, we would like a more precise definition of what the generators produce. Hence, we define a Username type for the name and represent the zip code as type PostalCode. Both of these types restrict the set of admissible values.

ScalaCheck does not know how to generate instances of Username or PostalCode, unless we implement generators for them. This can be done using implicits.

implicit def usernameGen: Gen[Username] =
  Gen.nonEmptyListOf(Gen.alphaChar)
     .map(Username.fromString)

implicit def postalCodeGen: Gen[PostalCode] =
  Gen.choose(10000, 99999)
     .map(PostalCode.fromInt)

With these implicitly defined generators, ScalaCheck can generate the corresponding random values for the property.

forAll {
  (u: Username, p: PostalCode) => {
    val user = User(u.value, p.value)
    val userId = db.insert(user)
    db.load(userId) == Some(user)
  }
}

This can be time-consuming if your types implement nested hierarchies. From my own experience, I can say this time is well spent. Invest in domain-specific generators for your types. Also note that generators are not limited to be used within properties. Traditional unit tests can also benefit from a generative approach. Generators increase the readability of a test immensely, as we can concentrate on the actual testing logic, rather than constructing exhaustive test fixtures for each and every test method.

Reduction

If ScalaCheck has successfully falsified a property, it attempts to find the smallest possible test case that leads to falsification. The best way to illustrate ScalaCheck's behavior is with a simple example. Suppose we have implemented a function that checks whether a passed value is less than 80.

def isLowerThan80(a: Int) = a < 80

For demonstration purposes, we intentionally let the property fail and do not limit the amount of Int values generated.

forAll {
  (a: Int) => {
    isLowerThan80(a)
  }
}

It is obvious that ScalaCheck will falsify this property for all inputs greater than or equal to 80. Let's try it out: After running the test, we get a console output similar to the following:

[info] ! Shrinker.isLowerThan80 should yield true for all integers < 80: Falsified
after 2 passed tests.
[info] > ARG_0: 80
[info] > ARG_0_ORIGINAL: 100
...
[error] Failed tests:
[error] net.mguenther.pbt.ShrinkingSpec

This output shows that ScalaCheck was able to perform two successful tests until it was finally able to refute the property. ScalaCheck identified the contradiction by entering the value 100 (cf. ARG_0_ORIGINAL) and returned the smallest possible, failing test case with value 80 (cf. ARG_0). When performing the test, ScalaCheck proceeds iteratively and starts, for example, with the following sequence of input parameters: [0, 50, 75, 100, 217, 3048, ...]. ScalaCheck now tries to find counterexamples and goes through the sequence element by element.

isLowerThan80(0)   // evaluates to true
isLowerThan80(50)  // evaluates to true
isLowerThan80(75)  // evaluates to true
isLowerThan80(100) // evaluates to false

With test case 100 (cf. ARG_0_ORIGINAL), ScalaCheck has found the first example that can be used to contradict the property. At this point, ScalaCheck generates a new sequence of input parameters, but this time with the restriction that the random values generated are below 100. This process is repeated until ScalaCheck is not able to find a smaller test input that falsifies the property.

Test case reduction is a powerful feature of ScalaCheck and can help to identify off-by-one errors at an early stage or at least limit the problem to the smallest possible test input so that the developer can carry out the error analysis in a targeted manner.

Shrinkers perform this reduction. ScalaCheck comes with shrinkers for many data types from the Scala SDK (e.g. number types such as Int or Scala collections). Analoguous to the generators, we can implement shrinkers for our domain classes, which ScalaCheck can use for test case reduction. However, the meaningfulness of this depends very much on the subject matter.

How do you find suitable properties?

Identifying the right properties for classes and components is sometimes not easy and requires detailed consideration of the requirements of the software solution to be implemented. Nevertheless, there are patterns that can help us in our search for properties. Scott Wlaschin has summarized a whole series of these in a very detailed blog post and illustrated them with code examples in F#. The article is highly recommended!

Figure 2: The inverse operation to an operation x leads us back to the original state.

Figure 2 shows that after applying some operation x to the string "ABC", we obtain a binary-coded value. After applying the inverse operation, we obtain the original string "ABC" again. This can be easily expressed as a ScalaCheck property.

property("there and back again") = forAll {
  (a: String) => {
    decode(encode(a)) == a
  }
}

In practice, this pattern can also be used where we cross system boundaries when executing an operation, for example in relation to database round trips: If we retrieve an entity that was saved immediately beforehand from the database, we expect the attributes of this entity to be completely retained and not to have changed.

The same applies to the integration of external interfaces, whether from a consuming perspective or for transferring data to a neighbouring system. We use self-defined generators to cover the value set of the payload and then use the corresponding properties to test whether the interface specification is met.

You can often also use the intrinsic properties of a class or component as a guide. For example, if an implemented data structure supports commutativity, simple properties can be found that check for compliance with this property (cf. figure 3).

Figure 3: Starting from the initial state, we apply two operations in differing order, but reach the same target state.

Figure 3 shows that, starting from a list structure with values 2, 3, and 1, we arrive at the same target state 2, 3, and 4, regardless of whether we first sort all elements and then increase each element by the value 1, or vice versa.

Of course, this does not end with commutativity. Especially in the context of distributed systems, we are often faced with the challenge of designing interfaces in such a way that they work idempotently. At this point, idempotency is a suitable means of moving from an at-least-once processing semantics to an effectively-once processing semantics. In this case, too, properties can help us to check for compliance with this property.

Figure 4: Regardless of whether we perform an operation once or several times, we remain in the same target state (idempotency).

Conclusion

With ScalaCheck, property-based testing has found its way into Scala-based development and adds a powerful test methodology to the developer's toolbox. The property-based approach is by no means to be understood as a replacement for example-based testing using unit tests, but rather as a supplement, especially since property-based tests are formulated more generally and usually identify other classes of errors.

Nevertheless, it must be noted that suitable properties are generally more difficult to identify than sample test instances for unit tests. In addition to unclear technical requirements and specifications, the reasons for this are certainly also to be found in the fact that it is quite simply more demanding to express criteria in the form of invariants than to consider a bunch of sample instances. Nevertheless, the effort is worth it, because it helps us as developers to scrutinize the requirements and better understand what we are ultimately implementing.

Despite all the praise, criticism of property-based testing should not be ignored. Some people may be bothered by random-based values that have nothing to do with reality (character strings such as *%JSfdsl as input for a user name or similar). The recommendation here is: Don't work with nonsensical values, but invest the time in meaningful generators that fit your specialized domain.

Some people may be bothered by the lack of regression capability: ScalaCheck may be able to falsify a property on one test run, but not on the next. At this point, a sensible combination with an example based test is essential: from the input of the falsified property, we derive a concrete unit test that makes our test basis regression-proof. This is by no means to be understood as a negative example of property-based testing. It is questionable whether we would have found a suitable unit test for such a special input right from the start. It rather suggests that both test methodologies complement each other wonderfully.

References

[1] Nilsson, R., ScalaCheck: Property-based Testing for Scala, https://www.scalacheck.org
[2] Nilsson, R., ScalaCheck - The Definitive Guide, Artima Press, 2014
[3] Wlaschin, S., Choosing Properties for Property-Based Testing, https://fsharpforfunandprofit.com/posts/property-based-testing-2/, 2014