Introduction to Snapshot Testing in R

Indrajeet Patil

Unit testing

The goal of a unit test is to capture the expected output of a function using code and making sure that actual output after any changes matches the expected output.

{testthat} is a popular framework for writing unit tests in R.

Benefits of unit testing

  • insures against unintentionally changing function behaviour
  • prevents re-introducing already fixed bugs
  • acts as the most basic form of developer-focused documentation
  • catches breaking changes coming from upstream dependencies
  • etc.

Test output

Test pass only when actual function behaviour matches expected.

actual expected tests

Unit testing with {testthat}: A recap

Test organization

Testing infrastructure for R package has the following hierarchy:

Component Role

Test file
Tests for code situated in R/foo.R will typically be in the file tests/testthat/test-foo.R.
Tests A single file can contain multiple tests.

Expectations
A single test can have multiple expectations that formalize what you expect code to do.

Example test file

  • Every test is a call to testthat::test_that() function.

  • Every expectation is represented by testthat::expect_*() function.

  • You can generate a test file using usethis::use_test() function.

# File: tests/testthat/test-op.R

# test-1
test_that("multiplication works", {
  expect_equal(2 * 2, 4) # expectation-1
  expect_equal(-2 * 2, -4) # expectation-2
})

# test-2
test_that("addition works", {
  expect_equal(2 + 2, 4) # expectation-1
  expect_equal(-2 + 2, 0) # expectation-2
})

...

What is different about snapshot testing?

A unit test records the code to describe expected output.


(actual) (expected)


A snapshot test records expected output in a separate, human-readable file.


(actual) (expected)

Why do you need snapshot testing?

If you develop R packages and have struggled to

  • test that text output prints as expected
  • test that an entire file is as expected
  • test that generated graphical output looks as expected
  • update such tests en masse

then you should be excited to know more about snapshot tests (aka golden tests)! 🀩

Prerequisites

Familiarity with writing unit tests using {testthat}.

If not, have a look at this chapter from R Packages book.


Nota bene

In the following slides, in all snapshot tests, I include the following line of code:

local_edition(3)

You don’t need to do this in your package tests!

It’s sufficient to update the DESCRIPTION file to use testthat 3rd edition:

Config/testthat/edition: 3

For more, see this article.

Testing text outputs

Snapshot tests can be used to test that text output prints as expected.

Important for testing functions that pretty-print R objects to the console, create elegant and informative exceptions, etc.

Example function

Let’s say you want to write a unit test for the following function:

Source code

print_movies <- function(keys, values) {
  paste0(
    "Movie: \n",
    paste0("  ", keys, ": ", values, collapse = "\n")
  )
}

Output

cat(print_movies(
  c("Title", "Director"),
  c("Salaam Bombay!", "Mira Nair")
))
Movie: 
  Title: Salaam Bombay!
  Director: Mira Nair


Note that you want to test that the printed output looks as expected.

Therefore, you need to check for all the little bells and whistles in the printed output.

Example test

Even testing this simple function is a bit painful because you need to keep track of every escape character, every space, etc.

test_that("`print_movies()` prints as expected", {
  expect_equal(
    print_movies(
      c("Title", "Director"),
      c("Salaam Bombay!", "Mira Nair")
    ),
    "Movie: \n  Title: Salaam Bombay!\n  Director: Mira Nair"
  )
})
Test passed 🎊

With a more complex code, it’d be impossible for a human to reason about what the output is supposed to look like.

Important

If this is a utility function used by many other functions, changing its behaviour would entail manually changing expected outputs for many tests.

This is not maintainable! 😩

Alternative: Snapshot test

Instead, you can use expect_snapshot(), which, when run for the first time, generates a Markdown file with expected/reference output.

test_that("`print_movies()` prints as expected", {
  local_edition(3)
  expect_snapshot(cat(print_movies(
    c("Title", "Director"),
    c("Salaam Bombay!", "Mira Nair")
  )))
})
── Warning: `print_movies()` prints as expected ────────────────────────────────
Adding new snapshot:
Code
  cat(print_movies(c("Title", "Director"), c("Salaam Bombay!", "Mira Nair")))
Output
  Movie: 
    Title: Salaam Bombay!
    Director: Mira Nair

Warning

The first time a snapshot is created, it becomes the truth against which future function behaviour will be compared.

Thus, it is crucial that you carefully check that the output is indeed as expected. πŸ”Ž

Human-readable Markdown file

Compared to your unit test code representing the expected output

"Movie: \n  Title: Salaam Bombay!\n  Director: Mira Nair"

notice how much more human-friendly the Markdown output is!

Code
  cat(print_movies(c("Title", "Director"), c("Salaam Bombay!", "Mira Nair")))
Output
  Movie: 
    Title: Salaam Bombay!
    Director: Mira Nair

It is easy to see what the printed text output is supposed to look like. In other words, snapshot tests are useful when the intent of the code can only be verified by a human.

More about snapshot Markdown files

  • If test file is called test-foo.R, the snapshot will be saved to test/testthat/_snaps/foo.md.

  • If there are multiple snapshot tests in a single file, corresponding snapshots will also share the same .md file.

  • By default, expect_snapshot() will capture the code, the object values, and any side-effects.

What test success looks like

If you run the test again, it’ll succeed:

test_that("`print_movies()` prints as expected", {
  local_edition(3)
  expect_snapshot(cat(print_movies(
    c("Title", "Director"),
    c("Salaam Bombay!", "Mira Nair")
  )))
})
Test passed 😸


Why does my test fail on a re-run?

If testing a snapshot you just generated fails on re-running the test, this is most likely because your test is not deterministic. For example, if your function deals with random number generation.

In such cases, setting a seed (e.g. set.seed(42)) should help.

What test failure looks like

When function changes, snapshot doesn’t match the reference, and the test fails:

Changes to function

print_movies <- function(keys, values) {
  paste0(
    "Movie: \n",
    paste0(
      "  ", keys, "- ", values,
      collapse = "\n"
    )
  )
}


Failure message provides expected (-) vs observed (+) diff.

Test failure

test_that("`print_movies()` prints as expected", {
  expect_snapshot(cat(print_movies(
    c("Title", "Director"),
    c("Salaam Bombay!", "Mira Nair")
  )))
})
── Failure: `print_movies()` prints as expected ────────────────────────────────
Snapshot of code has changed:
old[2:6] vs new[2:6]
    cat(print_movies(c("Title", "Director"), c("Salaam Bombay!", "Mira Nair")))
  Output
    Movie: 
-     Title: Salaam Bombay!
+     Title- Salaam Bombay!
-     Director: Mira Nair
+     Director- Mira Nair

* Run `testthat::snapshot_accept('slides.qmd')` to accept the change.
* Run `testthat::snapshot_review('slides.qmd')` to interactively review the change.
Error:
! Test failed

Fixing tests

Message accompanying failed tests make it explicit how to fix them.

  • If the change was deliberate, you can accept the new snapshot as the current truth.
* Run `snapshot_accept('foo.md')` to accept the change
  • If this was unexpected, you can review the changes, and decide whether to change the snapshot or to correct the function behaviour instead.
* Run `snapshot_review('foo.md')` to interactively review the change


Fixing multiple snapshot tests

If this is a utility function used by many other functions, changing its behaviour would lead to failure of many tests.

You can update all new snapshots with snapshot_accept(). And, of course, check the diffs to make sure that the changes are expected.

Capturing messages and warnings

So far you have tested text output printed to the console, but you can also use snapshots to capture messages, warnings, and errors.

message

f <- function() message("Some info for you.")

test_that("f() messages", {
  local_edition(3)
  expect_snapshot(f())
})
── Warning: f() messages ───────────────────────────────────────────────────────
Adding new snapshot:
Code
  f()
Message
  Some info for you.

warning

g <- function() warning("Managed to recover.")

test_that("g() warns", {
  local_edition(3)
  expect_snapshot(g())
})
── Warning: g() warns ──────────────────────────────────────────────────────────
Adding new snapshot:
Code
  g()
Condition
  Warning in `g()`:
  Managed to recover.

Tip

Snapshot records both the condition and the corresponding message.

You can now rest assured that the users are getting informed the way you want! 😌

Capturing errors

In case of an error, the function expect_snapshot() itself will produce an error. You have two ways around this:

Option-1 (recommended)

test_that("`log()` errors", {
  local_edition(3)
  expect_snapshot(log("x"), error = TRUE)
})
── Warning: `log()` errors ─────────────────────────────────────────────────────
Adding new snapshot:
Code
  log("x")
Condition
  Error in `log()`:
  ! non-numeric argument to mathematical function

Option-2

test_that("`log()` errors", {
  local_edition(3)
  expect_snapshot_error(log("x"))
})
── Warning: `log()` errors ─────────────────────────────────────────────────────
Adding new snapshot:
non-numeric argument to mathematical function

Which option should I use?

  • If you want to capture both the code and the error message, use expect_snapshot(..., error = TRUE).

  • If you want to capture only the error message, use expect_snapshot_error().

Further reading

Testing graphical outputs

To create graphical expectations, you will use testthat extension package: {vdiffr}.

How does {vdiffr} work?

vdiffr introduces expect_doppelganger() to generate testthat expectations for graphics. It does this by writing SVG snapshot files for outputs!

The figure to test can be:

Note

  • If test file is called test-foo.R, the snapshot will be saved to test/testthat/_snaps/foo folder.

  • In this folder, there will be one .svg file for every test in test-foo.R.

  • The name for the .svg file will be sanitized version of title argument to expect_doppelganger().

Example function

Let’s say you want to write a unit test for the following function:

Source code

library(ggplot2)

create_scatter <- function() {
  ggplot(mtcars, aes(wt, mpg)) +
    geom_point(size = 3, alpha = 0.75) +
    geom_smooth(method = "lm")
}

Output

create_scatter()

Note that you want to test that the graphical output looks as expected, and this expectation is difficult to capture with a unit test.

Graphical snapshot test

You can use expect_doppelganger() from vdiffr to test this!

The first time you run the test, it’d generate an .svg file with expected output.

test_that("`create_scatter()` plots as expected", {
  local_edition(3)
  expect_doppelganger(
    title = "create scatter",
    fig = create_scatter(),
  )
})
── Warning: `create_scatter()` plots as expected ───────────────────────────────
Adding new file snapshot: 'tests/testthat/_snaps/create-scatter.svg'

Warning

The first time a snapshot is created, it becomes the truth against which future function behaviour will be compared.

Thus, it is crucial that you carefully check that the output is indeed as expected. πŸ”Ž

You can open .svg snapshot files in a web browser for closer inspection.

What test success looks like

If you run the test again, it’ll succeed:

test_that("`create_scatter()` plots as expected", {
  local_edition(3)
  expect_doppelganger(
    title = "create scatter",
    fig = create_scatter(),
  )
})
Test passed πŸ₯‡

What test failure looks like

When function changes, snapshot doesn’t match the reference, and the test fails:

Changes to function

create_scatter <- function() {
  ggplot(mtcars, aes(wt, mpg)) +
    geom_point(size = 2, alpha = 0.85) +
    geom_smooth(method = "lm")
}

Test failure

test_that("`create_scatter()` plots as expected", {
  local_edition(3)
  expect_doppelganger(
    title = "create scatter",
    fig = create_scatter(),
  )
})

── Failure ('<text>:3'): `create_scatter()` plots as expected ──────────────────
Snapshot of `testcase` to 'slides.qmd/create-scatter.svg' has changed
Run `testthat::snapshot_review('slides.qmd/')` to review changes
Backtrace:
 1. vdiffr::expect_doppelganger(...)
 3. testthat::expect_snapshot_file(...)
Error in `reporter$stop_if_needed()`:
! Test failed

Fixing tests

Running snapshot_review() launches a Shiny app which can be used to either accept or reject the new output(s).

Why are my snapshots for plots failing?! πŸ˜”

If tests fail even if the function didn’t change, it can be due to any of the following reasons:

  • R’s graphics engine changed
  • ggplot2 itself changed
  • non-deterministic behaviour
  • changes in system libraries

For these reasons, snapshot tests for plots tend to be fragile and are not run on CRAN machines by default.

Further reading

Testing entire files

Whole file snapshot testing makes sure that media, data frames, text files, etc. are as expected.

Writing test

Let’s say you want to test JSON files generated by jsonlite::write_json().

Test

# File: tests/testthat/test-write-json.R
test_that("json writer works", {
  local_edition(3)
  
  r_to_json <- function(x) {
    path <- tempfile(fileext = ".json")
    jsonlite::write_json(x, path)
    path
  }

  x <- list(1, list("x" = "a"))
  expect_snapshot_file(r_to_json(x), "demo.json")
})
── Warning: json writer works ──────────────────────────────────────────────────
Adding new file snapshot: 'tests/testthat/_snaps/demo.json'

Snapshot

[[1],{"x":["a"]}]

Note

  • To snapshot a file, you need to write a helper function that provides its path.

  • If a test file is called test-foo.R, the snapshot will be saved to test/testthat/_snaps/foo folder.

  • In this folder, there will be one file (e.g. .json) for every expect_snapshot_file() expectation in test-foo.R.

  • The name for snapshot file is taken from name argument to expect_snapshot_file().

What test success looks like

If you run the test again, it’ll succeed:

# File: tests/testthat/test-write-json.R
test_that("json writer works", {
  local_edition(3)
  
  r_to_json <- function(x) {
    path <- tempfile(fileext = ".json")
    jsonlite::write_json(x, path)
    path
  }

  x <- list(1, list("x" = "a"))
  expect_snapshot_file(r_to_json(x), "demo.json")
})
Test passed πŸŽ‰

What test failure looks like

If the new output doesn’t match the expected one, the test will fail:

# File: tests/testthat/test-write-json.R
test_that("json writer works", {
  local_edition(3)
  
  r_to_json <- function(x) {
    path <- tempfile(fileext = ".json")
    jsonlite::write_json(x, path)
    path
  }

  x <- list(1, list("x" = "b"))
  expect_snapshot_file(r_to_json(x), "demo.json")
})
── Failure: json writer works ──────────────────────────────────────────────────
Snapshot of `r_to_json(x)` to 'slides.qmd/demo.json' has changed
Run `testthat::snapshot_review('slides.qmd/')` to review changes
Error:
! Test failed

Fixing tests

Running snapshot_review() launches a Shiny app which can be used to either accept or reject the new output(s).

Further reading

Documentation for expect_snapshot_file()

Testing Shiny applications

To write formal tests for Shiny applications, you will use testthat extension package: {shinytest2}.

How does {shinytest2} work?

shinytest2 uses a Shiny app (how meta! πŸ˜…) to record user interactions with the app and generate snapshots of the application’s state. Future behaviour of the app will be compared against these snapshots to check for any changes.

Exactly how tests for Shiny apps in R package are written depends on how the app is stored. There are two possibilities, and you will discuss them both separately.


Stored in /inst folder

β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”œβ”€β”€ inst
β”‚   └── sample_app
β”‚       └── app.R

Returned by a function

β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”‚   └── app-function.R

Shiny app in subdirectory


β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”œβ”€β”€ inst
β”‚   └── sample_app
β”‚       └── app.R

Example app

Let’s say this app resides in the inst/unitConverter/app.R file.

App

Code

library(shiny)

ui <- fluidPage(
  titlePanel("Convert kilograms to grams"),
  numericInput("kg", "Weight (in kg)", value = 0),
  textOutput("g")
)

server <- function(input, output, session) {
  output$g <- renderText(
    paste0("Weight (in g): ", input$kg * 1000)
  )
}

shinyApp(ui, server)

Generating a test

To create a snapshot test, go to the app directory and run record_test().

Auto-generated artifacts

Test

library(shinytest2)

test_that("{shinytest2} recording: unitConverter", {
  app <- AppDriver$new(
  name = "unitConverter", height = 543, width = 426)
  app$set_inputs(kg = 1)
  app$set_inputs(kg = 10)
  app$expect_values()
})

Snapshot

Note

  • record_test() will auto-generate a test file in the app directory. The test script will be saved in a subdirectory of the app (inst/my-app/tests/testthat/test-shinytest2.R).

  • There will be one /tests folder inside every app folder.

  • The snapshots are saved as .png file in tests/testthat/test-shinytest2/_snaps/{.variant}/shinytest2. The {.variant} here corresponds to operating system and R version used to record tests. For example, _snaps/windows-4.1/shinytest2.

Creating a driver script

Note that currently your test scripts and results are in the /inst folder, but you’d also want to run these tests automatically using testthat.

For this, you will need to write a driver script like the following:

library(shinytest2)

test_that("`unitConverter` app works", {
  appdir <- system.file(package = "package_name", "unitConverter")
  test_app(appdir)
})

Now the Shiny apps will be tested with the rest of the source code in the package! 🎊

Tip

You save the driver test in the /tests folder (tests/testthat/test-inst-apps.R), alongside other tests.

What test failure looks like

Let’s say, while updating the app, you make a mistake, which leads to a failed test.

Changed code with mistake

ui <- fluidPage(
  titlePanel("Convert kilograms to grams"),
  numericInput("kg", "Weight (in kg)", value = 0),
  textOutput("g")
)

server <- function(input, output, session) {
  output$g <- renderText(
    paste0("Weight (in kg): ", input$kg * 1000) # should be `"Weight (in g): "`
  )
}

shinyApp(ui, server)

Test failure JSON diff

Diff in snapshot file `shinytest2unitConverter-001.json`
< before                            > after                           
@@ 4,5 @@                           @@ 4,5 @@                         
    },                                  },                            
    "output": {                         "output": {                   
<     "g": "Weight (in g): 10000"   >     "g": "Weight (in kg): 10000"
    },                                  },                            
    "export": {                         "export": {    

Updating snapshots

Fixing this test will be similar to fixing any other snapshot test you’ve seen thus far.

{testthat2} provides a Shiny app for comparing the old and new snapshots.

Function returns Shiny app


β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”‚   └── app-function.R

Example app and test

The only difference in testing workflow when Shiny app objects are created by functions is that you will write the test ourselves, instead of shinytest2 auto-generating it.

Source code

# File: R/unit-converter.R
unitConverter <- function() {
  ui <- fluidPage(
    titlePanel("Convert kilograms to grams"),
    numericInput("kg", "Weight (in kg)", value = 0),
    textOutput("g")
  )

  server <- function(input, output, session) {
    output$g <- renderText(
      paste0("Weight (in g): ", input$kg * 1000)
    )
  }

  shinyApp(ui, server)
}

Test file to modify

# File: tests/testthat/test-unit-converter.R
test_that("unitConverter app works", {
  shiny_app <- unitConverter()
  app <- AppDriver$new(shiny_app)
})

Generating test and snapshots

you call record_test() directly on a Shiny app object, copy-paste commands to the test script, and run devtools::test_active_file() to generate snapshots.

Testing apps from frameworks

This testing workflow is also relevant for app frameworks (e.g. {golem}, {rhino}, etc.).

golem

Function in run_app.R returns app.

β”œβ”€β”€ DESCRIPTION 
β”œβ”€β”€ NAMESPACE 
β”œβ”€β”€ R 
β”‚   β”œβ”€β”€ app_config.R 
β”‚   β”œβ”€β”€ app_server.R 
β”‚   β”œβ”€β”€ app_ui.R 
β”‚   └── run_app.R 

rhino

Function in app.R returns app.

β”œβ”€β”€ app
β”‚   β”œβ”€β”€ js
β”‚   β”‚   └── index.js
β”‚   β”œβ”€β”€ logic
β”‚   β”‚   └── __init__.R
β”‚   β”œβ”€β”€ static
β”‚   β”‚   └── favicon.ico
β”‚   β”œβ”€β”€ styles
β”‚   β”‚   └── main.scss
β”‚   β”œβ”€β”€ view
β”‚   β”‚   └── __init__.R
β”‚   └── main.R
β”œβ”€β”€ tests
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ app.R
β”œβ”€β”€ RhinoApplication.Rproj
β”œβ”€β”€ dependencies.R
β”œβ”€β”€ renv.lock
└── rhino.yml

Final directory structure

The final location of the tests and snapshots should look like the following for the two possible ways Shiny apps are included in R packages.


Stored in /inst folder

β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”œβ”€β”€ inst
β”‚   └── sample_app
β”‚       β”œβ”€β”€ app.R
β”‚       └── tests
β”‚           β”œβ”€β”€ testthat
β”‚           β”‚   β”œβ”€β”€ _snaps
β”‚           β”‚   β”‚   └── shinytest2
β”‚           β”‚   β”‚       └── 001.json
β”‚           β”‚   └── test-shinytest2.R
β”‚           └── testthat.R
└── tests
    β”œβ”€β”€ testthat
    β”‚   └── test-inst-apps.R
    └── testthat.R

Returned by a function

β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”‚   └── app-function.R
└── tests
    β”œβ”€β”€ testthat
    β”‚   β”œβ”€β”€ _snaps
    β”‚   β”‚   └── app-function
    β”‚   β”‚       └── 001.json
    β”‚   └── test-app-function.R
    └── testthat.R

Testing multiple apps

For the sake of completeness, here is what the test directory structure would like when there are multiple apps in a single package.

Stored in /inst folder

β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”œβ”€β”€ inst
β”‚   └── sample_app1
β”‚       β”œβ”€β”€ app.R
β”‚       └── tests
β”‚           β”œβ”€β”€ testthat
β”‚           β”‚   β”œβ”€β”€ _snaps
β”‚           β”‚   β”‚   └── shinytest2
β”‚           β”‚   β”‚       └── 001.json
β”‚           β”‚   └── test-shinytest2.R
β”‚           └── testthat.R
β”‚   └── sample_app2
β”‚       β”œβ”€β”€ app.R
β”‚       └── tests
β”‚           β”œβ”€β”€ testthat
β”‚           β”‚   β”œβ”€β”€ _snaps
β”‚           β”‚   β”‚   └── shinytest2
β”‚           β”‚   β”‚       └── 001.json
β”‚           β”‚   └── test-shinytest2.R
β”‚           └── testthat.R
└── tests
    β”œβ”€β”€ testthat
    β”‚   └── test-inst-apps.R
    └── testthat.R

Returned by a function

β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ R
β”‚   └── app-function1.R
β”‚   └── app-function2.R
└── tests
    β”œβ”€β”€ testthat
    β”‚   β”œβ”€β”€ _snaps
    β”‚   β”‚   └── app-function1
    β”‚   β”‚       └── 001.json
    β”‚   β”‚   └── app-function2
    β”‚   β”‚       └── 001.json
    β”‚   └── test-app-function1.R
    β”‚   └── test-app-function2.R
    └── testthat.R

Advanced topics

The following are some advanced topics that are beyond the scope of the current presentation, but you may wish to know more about.

Extra

  • If you want to test Shiny apps with continuous integration using shinytest2, read this article.

  • shinytest2 is a successor to shinytest package. If you want to migrate from the latter to the former, have a look at this.

Further reading

Headaches

It’s not all kittens and roses when it comes to snapshot testing.

Let’s see some issues you might run into while using them. πŸ€•

Testing behavior that you don’t own

Let’s say you write a graphical snapshot test for a function that produces a ggplot object. If ggplot2 authors make some modifications to this object, your tests will fail, even though your function works as expected!

In other words, your tests are now at the mercy of other package authors because snapshots are capturing things beyond your package’s control.

Caution

Tests that fail for reasons other than what they are testing for are problematic. Thus, be careful about what you snapshot and keep in mind the maintenance burden that comes with dependencies with volatile APIs.

Note

A way to reduce the burden of keeping snapshots up-to-date is to automate this process. But there is no free lunch in this universe, and now you need to maintain this automation! 🀷

Failures in non-interactive environments

If snapshots fail locally, you can just run snapshot_review(), but what if they fail in non-interactive environments (on CI/CD platforms, during R CMD Check, etc.)?

The easiest solution is to copy the new snapshots to the local folder and run snapshot_review().

Tip

If expected snapshot is called (e.g.) foo.svg, there will be a new snapshot file foo.new.svg in the same folder when the test fails.

snapshot_review() compares these files to reveal how the outputs have changed.

But where can you find the new snapshots?

Accessing new snapshots

In local R CMD Check, you can find new snapshots in .Rcheck folder:

package_name.Rcheck/tests/testthat/_snaps/

On CI/CD platforms, you can find snapshots in artifacts folder:

      - uses: r-lib/actions/check-r-package@v2
        with:
          upload-snapshots: true

Code review with snapshot tests

Despite snapshot tests making the expected outputs more human-readable, given a big enough change and complex enough output, sometimes it can be challenging to review changes to snapshots.

How do you review pull requests with complex snapshots changes?

Use tools provided for code review by hosting platforms (like GitHub). For example, to review changes in SVG snapshots:

Locally open the PR branch and play around with the changes to see if the new behaviour makes sense. 🀷

Danger of silent failures

Given their fragile nature, snapshot tests are skipped on CRAN by default.

Although this makes sense, it means that you miss out on anything but a breaking change from upstream dependency. E.g., if ggplot2 (hypothetically) changes how the points look, you won’t know about this change until you happen to run your snapshot tests again locally or on CI/CD.

Unit tests run on CRAN, on the other hand, will fail and you will be immediately informed about it.

Tip

A way to insure against such silent failures is to run tests daily on CI/CD platforms (e.g. AppVeyor nightly builds).

Parting wisdom

What not to do

Don’t use snapshot tests for everything

It is tempting to use them everywhere out of laziness. But they are sometimes inappropriate (e.g. when testing requires external benchmarking).

Let’s say you write a function to extract estimates from a regression model.

extract_est <- function(m) m$coefficients

Its test should compare results against an external benchmark, and not a snapshot.

Good

test_that("`extract_est()` works", {
  m <- lm(wt ~ mpg, mtcars)
  expect_equal(
    extract_est(m)[[1]],
    m$coefficients[[1]]
  )
})

Bad

test_that("`extract_est()` works", {
  m <- lm(wt ~ mpg, mtcars)
  expect_snapshot(extract_est(m))
})

Snapshot for humans, not machines

Snapshot testing is appropriate when the human needs to be in the loop to make sure that things are working as expected. Therefore, the snapshots should be human readable.

E.g. if you write a function that plots something:

point_plotter <- function() ggplot(mtcars, aes(wt, mpg)) + geom_point()

To test it, you should snapshot the plot, and not the underlying data, which is hard to make sense of for a human.

Good

test_that("`point_plotter()` works", {
  expect_doppelganger(
    "point-plotter-mtcars",
    point_plotter()
  )
})

Bad

test_that("`point_plotter()` works", {
  p <- point_plotter()
  pb <- ggplot_build(p)
  expect_snapshot(pb$data)
})

Don’t blindly accept snapshot changes

Resist formation of such a habit.

testthat provides tools to make it very easy to review changes, so no excuses!

Self-study

In this presentation, you deliberately kept the examples and the tests simple.

To see a more realistic usage of snapshot tests, you can study open-source test suites.

Suggested repositories

  • {cli} (for testing command line interfaces)

  • {pkgdown} (for testing generated HTML documents)

  • {dbplyr} (for testing printing of generated SQL queries)

  • {gt} (for testing table printing)

Visualizations

Activities

If you feel confident enough to contribute to open-source projects to practice these skills, here are some options.

Practice makes it perfect

These are only suggestions. Feel free to contribute to any project you like! 🀝

Suggestions

  • See if ggplot2 extensions you like use snapshot tests for graphics. If not, you can add them for key functions.

  • Check out hard reverse dependencies of {shiny}, and add snapshot tests using shinytest2 to an app of your liking.

  • Add more vdiffr snapshot tests to plotting functions in {see}, a library for statistical visualizations (I can chaperone your PRs here).

  • shinytest2 is the successor to shinytest package. Check out which packages currently use it for testing Shiny apps, and see if you can use shinytest2 instead (see how-to here).

General reading

Although current presentation is focused only on snapshot testing, here is reading material on automated testing in general.

  • McConnell, S. (2004). Code Complete. Microsoft Press. (pp. 499-533)

  • Boswell, D., & Foucher, T. (2011). The Art of Readable Code. O’Reilly Media, Inc. (pp. 149-162)

  • Riccomini, C., & Ryaboy D. (2021). The Missing Readme. No Starch Press. (pp. 89-108)

  • Martin, R. C. (2009). Clean Code. Pearson Education. (pp. 121-133)

  • Fowler, M. (2018). Refactoring. Addison-Wesley Professional. (pp. 85-100)

  • Beck, K. (2003). Test-Driven Development. Addison-Wesley Professional.

Additional resources

For a comprehensive collection of packages for unit testing in R, see this page.

For more

If you are interested in good programming and software development practices, check out my other slide decks.

Find me at…

Twitter

LikedIn

GitHub

Website

E-mail

Thank You

And Happy Snapshotting! 😊

Session information

quarto::quarto_version()
[1] '1.5.30'
sessioninfo::session_info(include_base = TRUE)
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.3 (2024-02-29)
 os       Ubuntu 22.04.4 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       UTC
 date     2024-04-21
 pandoc   3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 base        * 4.3.3   2024-04-18 [3] local
 brio          1.1.4   2023-12-10 [1] RSPM
 cli           3.6.2   2023-12-11 [1] RSPM
 colorspace    2.1-0   2023-01-23 [1] RSPM
 compiler      4.3.3   2024-04-18 [3] local
 crayon        1.5.2   2022-09-29 [1] RSPM
 datasets    * 4.3.3   2024-04-18 [3] local
 desc          1.4.3   2023-12-10 [1] RSPM
 diffobj       0.3.5   2021-10-05 [1] RSPM
 digest        0.6.35  2024-03-11 [1] RSPM
 evaluate      0.23    2023-11-01 [1] RSPM
 fansi         1.0.6   2023-12-08 [1] RSPM
 farver        2.1.1   2022-07-06 [1] RSPM
 fastmap       1.1.1   2023-02-24 [1] RSPM
 ggplot2     * 3.5.0   2024-02-23 [1] RSPM
 glue          1.7.0   2024-01-09 [1] RSPM
 graphics    * 4.3.3   2024-04-18 [3] local
 grDevices   * 4.3.3   2024-04-18 [3] local
 grid          4.3.3   2024-04-18 [3] local
 gtable        0.3.4   2023-08-21 [1] RSPM
 htmltools     0.5.8.1 2024-04-04 [1] RSPM
 jsonlite      1.8.8   2023-12-04 [1] RSPM
 knitr         1.46    2024-04-06 [1] RSPM
 labeling      0.4.3   2023-08-29 [1] RSPM
 later         1.3.2   2023-12-06 [1] RSPM
 lattice       0.22-5  2023-10-24 [3] CRAN (R 4.3.3)
 lifecycle     1.0.4   2023-11-07 [1] RSPM
 magrittr      2.0.3   2022-03-30 [1] RSPM
 Matrix        1.6-5   2024-01-11 [3] CRAN (R 4.3.3)
 methods     * 4.3.3   2024-04-18 [3] local
 mgcv          1.9-1   2023-12-21 [3] CRAN (R 4.3.3)
 munsell       0.5.1   2024-04-01 [1] RSPM
 nlme          3.1-164 2023-11-27 [3] CRAN (R 4.3.3)
 pillar        1.9.0   2023-03-22 [1] RSPM
 pkgconfig     2.0.3   2019-09-22 [1] RSPM
 pkgload       1.3.4   2024-01-16 [1] RSPM
 png           0.1-8   2022-11-29 [1] RSPM
 processx      3.8.4   2024-03-16 [1] RSPM
 ps            1.7.6   2024-01-18 [1] RSPM
 quarto        1.4.1   2024-03-10 [1] Github (quarto-dev/quarto-r@ba8485a)
 R6            2.5.1   2021-08-19 [1] RSPM
 Rcpp          1.0.12  2024-01-09 [1] RSPM
 rematch2      2.1.2   2020-05-01 [1] RSPM
 rlang         1.1.3   2024-01-10 [1] RSPM
 rmarkdown     2.26    2024-03-05 [1] RSPM
 rprojroot     2.0.4   2023-11-05 [1] RSPM
 rstudioapi    0.16.0  2024-03-24 [1] RSPM
 scales        1.3.0   2023-11-28 [1] RSPM
 sessioninfo   1.2.2   2021-12-06 [1] any (@1.2.2)
 splines       4.3.3   2024-04-18 [3] local
 stats       * 4.3.3   2024-04-18 [3] local
 testthat    * 3.2.1.1 2024-04-14 [1] RSPM
 tibble        3.2.1   2023-03-20 [1] RSPM
 tools         4.3.3   2024-04-18 [3] local
 utf8          1.2.4   2023-10-22 [1] RSPM
 utils       * 4.3.3   2024-04-18 [3] local
 vctrs         0.6.5   2023-12-01 [1] RSPM
 vdiffr      * 1.0.7   2023-09-22 [1] RSPM
 waldo         0.5.2   2023-11-02 [1] RSPM
 withr         3.0.0   2024-01-16 [1] RSPM
 xfun          0.43    2024-03-25 [1] RSPM
 yaml          2.3.8   2023-12-11 [1] RSPM

 [1] /home/runner/work/_temp/Library
 [2] /opt/R/4.3.3/lib/R/site-library
 [3] /opt/R/4.3.3/lib/R/library

──────────────────────────────────────────────────────────────────────────────