DRY Package Development in R

Indrajeet Patil

Source code for these slides can be found on GitHub.

“Copy and paste is a design error.”   - David Parnas

Why So DRY

Why should you not repeat yourself?

Don’t Repeat Yourself (DRY) Principle

The DRY Principle states that:

Every piece of knowledge must have a single representation in the codebase.

That is, you should not express the same thing in multiple places in multiple ways.

It’s about knowledge and not just code

The DRY principle is about duplication of knowledge. Thus, it applies to all programming entities that encode knowledge:

  • You should not duplicate code.
  • You should not duplicate intent across code and comments.
  • You should not duplicate knowledge in data structures.

Benefits of DRY codebase

  • Duplicated code requires parallel changes in multiple places; DRY code eliminates this need.

  • Easier to maintain - update knowledge in one place only.

  • Routines developed to remove duplicated code can become general-purpose utilities.


Further Reading

Plan

Apply DRY to remove duplication in:

  • Documentation
  • Vignette setup
  • Unit testing
  • Dependency management
  • Exceptions

Documentation

Avoiding repetition in documentation.

What do users read?

Users consult different sources depending on context.


README: While exploring the package repository.


Vignettes: When first learning how to use a package.


Manual: When checking details about a specific function.


Including crucial information in only one place risks users missing it in certain contexts.

Go forth and multiply (without repetition)

Some documentation is important enough to be included in multiple places (e.g. in the function documentation and in a vignette).


How can you document something just once but include it in multiple locations?

Child documents

You can stitch an R Markdown document from smaller child documents.

Information is stored once in child documents and reused multiple times across parents.

Storing child documents in package

Store child documents in the manual directory and reuse them.

Child documents

├── DESCRIPTION
├── man
│   └── rmd-children
│       └── info1.Rmd
│       └── ...
info1.Rmd
This is some crucial information to be repeated across documentation.

```{r}
1 + 1
```


Tips

  • Include as many child documents as needed.
  • Child documents are standard .Rmd files with full .Rmd capabilities.
  • You can choose a different name for the folder containing child documents (e.g. rmd-fragments).
  • Make sure to include Roxygen: list(markdown = TRUE) field in the DESCRIPTION file.
  • The child documents will not pose a problem either for R CMD check or for {pkgdown} website.

Using child documents in package: Part-1

Include child document contents in multiple documentation locations.

Vignette

├── DESCRIPTION
├── vignettes
│   └── vignette1.Rmd
│   └── ...
│   └── web_only
│       └── vignette2.Rmd
│       └── ...

README

├── DESCRIPTION
├── README.Rmd
vignette1.Rmd
---
output: html_vignette
---

Vignette content.

```{r}, child="../man/rmd-children/info1.Rmd"}
```
README.Rmd
---
output: github_document
---

README content.

```{r}, child="man/rmd-children/info1.Rmd"}
```

Using child documents in package: Part-2

Include child document contents in multiple documentation locations.

Manual

├── DESCRIPTION
├── R
│   └── foo1.R
│   └── foo2.R
├── man
│   └── foo1.Rd
│   └── foo2.Rd
│   └── ...
foo1.R
#' @title Foo1
#' @section Information:
#'
#' ```{r, child="man/rmd-children/info1.Rmd"}
#' ```
foo1 <- function() { ... }


Important

The underlying assumption here is that you are using {roxygen2} to generate package documentation.

What about non-child documents?

You can include contents from any file in .Rmd, not just a child document!

Storing other documentation files in package

Like child documents, store other document types in the man/ folder.

Reusable content

├── DESCRIPTION
├── man
│   └── rmd-children
│       └── info1.Rmd
│       └── ...
│   └── md-fragments
│       └── fragment1.md
│       └── ...
│   └── r-chunks
│       └── chunk1.R
│       └── ...
fragment1.md
This `.md` file contains
content to be included *as is*
across multiple locations
in the documentation.
chunk1.R
# some comment and code
1 + 1

# more comments and code
2 + 3


Folder names

Name folders to describe their contents (e.g., r-examples, yaml-snippets, md-fragments).

Using non-child documents in package: Part-1

Include various file contents in multiple documentation locations.

Vignette

├── DESCRIPTION
├── vignettes
│   └── vignette1.Rmd
│   └── ...
│   └── web_only
│       └── vignette2.Rmd
│       └── ...

README

├── DESCRIPTION
├── README.Rmd
vignette1.Rmd
---
output: html_vignette
---

Vignette content.

```{asis}, file="../man/md-fragments/fragment1.md"}
```

```{r}, file="../man/r-chunks/chunk1.R"}
```
README.Rmd
---
output: github_document
---

README content.

```{asis}, file="man/md-fragments/fragment1.md"}
```

```{r}, file="man/r-chunks/chunk1.R"}
```

Using non-child documents in package: Part-2

Include child document contents in multiple documentation locations.

Manual

├── DESCRIPTION
├── R
│   └── foo1.R
│   └── ...
├── man
│   └── foo1.Rd
│   └── ...
foo1.R
#' @title Foo1
#' @section Information:
#'
#' ```{r, file="man/md-fragments/fragment1.Rmd"}
#' ```
#'
#' @example man/r-chunks/chunk1.R
foo1 <- function() { ... }


Important

The underlying assumption here is that you are using {roxygen2} to generate package documentation.

Summary on how to repeat documentation

If you are overwhelmed by this section, note that you actually need to remember only the following rules:

  • Store reusable document files in the /man folder.

  • When you wish to include their contents, provide paths to these files relative to the document you are linking from.

  • If it’s a child .Rmd document, use the child option to include its contents.

  • If it’s not an .Rmd document, use the file option to include its contents and use appropriate {knitr} engine. To see available engines, run names(knitr::knit_engines$get()).

Self-study

Example packages that use reusable component documents to repeat documentation.

Vignette Setup

Avoiding repetition in vignette setup.

Setup chunks in vignettes

Setup chunks for vignettes often contain duplication.

Some setup code is identical across vignettes.


├── DESCRIPTION
├── vignettes
│   └── vignette1.Rmd
│   └── vignette2.Rmd
│   └── ...
vignette1.Rmd
---
title: "Vignette-1"
output: html_vignette
---

```{r}
knitr::opts_chunk$set(
  message = FALSE,
  collapse = TRUE,
  comment = "#>"
)
```
vignette2.Rmd
---
title: "Vignette-2"
output: html_vignette
---

```{r}
knitr::opts_chunk$set(
  message = FALSE,
  collapse = TRUE,
  comment = "#>"
)

options(crayon.enabled = TRUE)
```


How can this repetition be avoided?

Sourcing setup chunks in vignettes

Avoid this by moving common setup to a script and sourcing it from vignettes. If you have many reusable artifacts, store scripts in a /setup folder.

Option 1

├── DESCRIPTION
├── vignettes
│   └── setup.R

Option 2

├── DESCRIPTION
├── vignettes
│   └── setup
│       └── setup.R

Common setup

setup.R
knitr::opts_chunk$set(
  message = FALSE,
  collapse = TRUE,
  comment = "#>"
)

Sourcing common setup

vignette1.Rmd
---
title: "Vignette-1"
output: html_vignette
---

```{r} setup, include = FALSE}
source("setup/setup.R")
```

Sourcing common setup

vignette2.Rmd
---
title: "Vignette-2"
output: html_vignette
---

```{r} setup, include = FALSE}
source("setup/setup.R")
options(crayon.enabled = TRUE)
```

No parallel modification

Modify common setup in one place only!

Self-study

Packages in the wild that use this trick.

Data

Avoiding repetition with example datasets.

Illustrative example datasets

Create new datasets when existing ones don’t illustrate your functions well.

Using dataset exdat with function foo() in examples, vignettes, and README requires defining it multiple times.

In examples

foo.R
#' @examples
#' exdat <- matrix(c(71, 50))
#' foo(exdat)

In vignettes

vignette.Rmd
---
title: "My Vignette"
output: html_vignette
---

```{r}
exdat <- matrix(c(71, 50))
foo(exdat)
```

In README

README.Rmd
---
output: github_document
---

```{r}
exdat <- matrix(c(71, 50))
foo(exdat)
```


How can this repetition be avoided?

Shipping data in a package

Define data once, save it, and ship it with the package.

Store datasets in data/ and document them in R/data.R.

Saving data

exdat.R
exdat <- matrix(c(71, 50))
save(exdat, file="data/exdat.rdata")

Directory structure

├── DESCRIPTION
├── R
├── data-raw
│   └── exdat.R
├── data
│   └── exdat.rdata
├── R
│   └── data.R

Don’t forget!

  • Save the creation script in data-raw/ for future updates.
  • Set LazyData: true in DESCRIPTION when including datasets.

Reusable dataset

exdat can now be used everywhere without redefining it.

In examples

foo.R
#' @examples
#' foo(exdat)

In vignettes

vignette.Rmd
---
title: "My Vignette"
output: html_vignette
---

```{r}
foo(exdat)
```

In README

README.Rmd
---
output: github_document
---

```{r}
foo(exdat)
```


No parallel modification

Update the dataset in one place only!

Self-study

Examples of R packages that define datasets and use them repeatedly.

Unit testing

Avoiding repetition in unit tests.

Repeated test patterns

A unit test describes expected output.


(actual) (expected)


Testing functions with a range of inputs often recycles test patterns.

Not DRY

But such recycling violates the DRY principle. How can you avoid this?

multiplier.R
# Function to test
multiplier <- function(x, y) {
  x * y
}

# Tests
test_that(
  desc = "multiplier works as expected",
  code = {
    expect_identical(multiplier(-1, 3),  -3)
    expect_identical(multiplier(0,  3.4), 0)
    expect_identical(multiplier(NA, 4),   NA_real_)
    expect_identical(multiplier(-2, -2),  4)
    expect_identical(multiplier(3,  3),   9)
  }
)

Parametrized unit testing

Write parameterized unit tests using {patrick}.

Repeated test pattern

expect_identical() used repeatedly.

test-multiplier.R
test_that(
  desc = "multiplier works as expected",
  code = {
    expect_identical(multiplier(-1, 3),  -3)
    expect_identical(multiplier(0,  3.4), 0)
    expect_identical(multiplier(NA, 4),   NA_real_)
    expect_identical(multiplier(-2, -2),  4)
    expect_identical(multiplier(3,  3),   9)
  }
)

Parametrized test pattern

expect_identical() used once.

test-multiplier.R
patrick::with_parameters_test_that(
  desc_stub = "multiplier works as expected",
  code = expect_identical(multiplier(x, y), res),
  .cases = tibble::tribble(
    ~x,  ~y,  ~res,
    -1,  3,   -3,
    0,   3.4,  0,
    NA,  4,    NA_real_,
    -2,  -2,   4,
    3,   3,    9
  )
)

Combinatorial explosion

The parametrized version may not seem impressive for this simple example, but it becomes exceedingly useful when there is a combinatorial explosion of possibilities. Creating each such test manually is cumbersome and error-prone.

Repeated usage of testing datasets

Like user-facing datasets, define developer-facing test datasets once and reuse them across multiple tests.

Either location works for saving datasets.

├── DESCRIPTION
├── tests
│   └── data
│       └── script.R
│       └── testdat1.rdata
│       └── testdat2.rdata
│       └── ...
├── DESCRIPTION
├── tests
│   └── testthat
│       └── data
│           └── script.R
│           └── testdat1.rdata
│           └── testdat2.rdata
│           └── ...

Save the script!

Always save the script used to create datasets. This script:

  • acts as documentation for the datasets
  • makes it easy to modify the datasets in the future (if needed)

Using test datasets

Without stored datasets, datasets are defined multiple times across test files.

test-foo1.R
testdat1 <- { ... }
foo1(testdat1)
test-foo2.R
testdat1 <- { ... }
foo2(testdat1)

        ...


With saved datasets, define once and load from test files.

test-foo1.R
testdat1 <- readRDS("testdat1")
foo1(testdat1)
test-foo2.R
testdat1 <- readRDS("testdat1")
foo2(testdat1)

        ...


Note

The exact path provided to readRDS() will depend on where the datasets are stored inside the tests/ folder.

Self-study

Examples of R packages that save datasets required for unit testing.

Exceptions

Avoiding repetition when signaling exceptions

Sending signals

Functions use exceptions (messages, warnings, errors) to signal unexpected events. Similar exceptions are often signaled across functions.

E.g., for functions that don’t accept negative values:

input validation

foo1.R
foo1 <- function(x) {
  if (x < 0) {
    stop("Argument `x` should be positive.")
  }
  ...
}
foo2.R
foo2 <- function(y) {
  if (y < 0) {
    stop("Argument `y` should be positive.")
  }
  ...
}

unit testing

test-foo1.R
expect_error(
  foo1(-1),
  "Argument `x` should be positive."
)
test-foo2.R
expect_error(
  foo2(-1),
  "Argument `y` should be positive."
)

How can this repetition be avoided?

List of exception functions

Extract exception message strings into named functions and store them in a list.

exceptions.R
exceptions <- list(
  only_positives_allowed = function(arg_name) {
    paste0("Argument `", arg_name, "` should be positive.")
  },

  ... # you can store as many functions as you want
)

Why not include the entire validation?

You can move the entire if() block to only_positives_allowed() and create a new validation function.

But this is not done here to address the most general case where:

  • the exception message string can be used outside of an if() block
  • it can be used not only as a message, but may be as a warning or an error

Reusable exceptions: Part-1

Use these functions to signal exceptions.

Input validation

foo1.R
foo1 <- function(x) {
  if (x < 0) {
    stop(exceptions$only_positives_allowed("x"))
  }
  ...
}
foo2.R
foo2 <- function(y) {
  if (y < 0) {
    stop(exceptions$only_positives_allowed("y"))
  }
  ...
}

Unit testing

test-foo1.R
expect_error(
  foo1(-1),
  exceptions$only_positives_allowed("x")
)
test-foo2.R
expect_error(
  foo2(-1),
  exceptions$only_positives_allowed("y")
)


No parallel modification

Change the condition string in one place only!

Reusable exceptions: Part-2

Alternatively, move entire validations to new functions:

exceptions.R
exceptions <- list(
  check_only_positive = function(arg) {
    arg_name <- deparse(substitute(arg))
    if (arg < 0) {
      stop(paste0("Argument `", arg_name, "` should be positive."))
    }
  },
  ... # you can store as many functions as you want
)

Input validation

foo1.R
foo1 <- function(x) {
  check_only_positive(x)
  ...
}
foo2.R
foo2 <- function(y) {
  check_only_positive(y)
  ...
}

Unit testing

test-check-only-positive.R
x <- -1
expect_error(
  exceptions$check_only_positive(x),
  "Argument `x` should be positive."
)

Since the validation has moved to a new function, you only need to test it once.

DRY once, DRY multiple times

Exceptions are usually useful only within their package. However, export generic exceptions to reuse them across packages.

DRYing up exceptions in one package does the same for many!

Why a list?

Storing exceptions in a list is optional - individual functions work too.

Lists offer advantages:

  • Simpler NAMESPACE: One export for all exceptions instead of dozens that can overpower the package API.

  • Extendability: Easily append imported exceptions with package-specific ones (e.g., exceptions$my_new_exception <- function() {...}).

Self-study

Example of R package that create a list of exception functions and exports it:

{ospsuite.utils}

Example of R package that imports this list and appends it:

{ospsuite}

Dependency management

Avoiding repetition when importing external functions.

Imports

Instead of using :: to access external package functions (rlang::warn()), specify imports explicitly via #' @importFrom.

Collect repeated imports in a single file instead of specifying them multiple times.

Import statements scattered across files:

Multiple R files
# file-1
#' @importFrom rlang warn
...

# file-2
#' @importFrom rlang warn
...

#' @importFrom purrr pluck
...

# file-3
#' @importFrom rlang warn seq2
...

# file-4, file-5, etc.
...

Import statements in a single file:

{pkgname}-package.R
## {pkgname} namespace: start
#'
#' @importFrom rlang warn seq2
#' @importFrom purrr pluck
#'
## {pkgname} namespace: end
NULL

Self-study

Examples of R packages that list the NAMESPACE imports in a single file this way.

Conclusion

These techniques make R package development faster, more maintainable, and less error-prone.

Advanced

These meta-level topics are beyond this presentation’s scope. See these resources to get started:

Thank You

And Happy (DRY) Package Development! 😊



Check out my other slide decks on software development best practices