- Phil Karlton
The following advice on naming applies to all kinds of programming entities (variables, functions, packages, classes, etc.) and is language-agnostic.
How good a name is can be assessed by how detailed the accompanying comment needs to be.
E.g., the function and parameter are named poorly here, and so comments need to do all the heavy lifting:
Contrast it with this:
No need for a comment here!
Using generic names can improve code readability, but only if language or domain customs are followed.
jfor outer and
ifor inner loop index is confusing!
tmpshouldn’t be used to store objects that are not temporary
retValshouldn’t be used for objects not returned from a function
If a loop is longer than a few lines, use more meaningful loop variable names than
k because you will quickly lose track of what they mean.
All variables are temporary in some sense. Calling one
tmp is inviting carelessness.
Consistent names reduce cognitive burden because if the reader encounters a name in one context, they can safely reuse that knowledge in another context.
For example, these names are inconsistent since the reader can’t safely assume that the name size means the same thing throughout the program.
How precise (and thus long) the name should be is a subjective decision, but keep in mind that long names can obscure the visual structure of a program.
You can typically find a middle ground between too short and too long names.
Try your best to misinterpret candidate names and see if you succeed.
E.g., here is a GUI text editor class method to get position of a character:
How I interpret: “
y refer to pixel positions for a character.”
In reality: “
y refer to line of text and character position in that line.”
You can avoid such misinterpretation with better names:
Names that are too similar make great candidates for mistaken identity.
nnn are easy to be confused and such confusion can lead to painful bugs.
While naming, always ask yourself how easy it would be to find and update the name.
E.g., this plotting function uses identifier
p to represent a scatter plot object.
In the future, it won’t be easy either to search for and/or to rename it in the code base because searching for
p would flag all ps (ggplot, mpg, etc.).
Instead, if the
scatter_plot identifier is used, both search and replace operations will be straightforward.
The names should respect the conventions adopted in a given project, organization, programming language, domain of knowledge, etc.
For example, C++ convention is to use PascalCase for class names and lowerCamel case for variables.
If a set of functions have dependencies (because they share the same data, e.g.), their names should clarify this dependence.
E.g., computing annual revenues involves computing quarterly revenues, which in turn requires computing monthly revenues.
Each of the function names makes clear the order in which they need to be run.
Names for Boolean variables or functions should make clear what true and false mean. This can be done using prefixes (is, has, can, etc.).
Use positive terms for Booleans since they are easier to process.
If unit testing in a given programming language requires writing test functions, choose names that describe the details of the test.
The test function names should effectively act as a comment.
To resist software entropy, not only should you name entities properly, but you should also update them. Otherwise, names will become something worse than meaningless or confusing: misleading.
For example, let’s say your class has the
Therefore, it is misleading to continue to call it a getter method, and it should be renamed to (e.g.)
This is probably the weakest of the requirements, but one can’t deny the ease of communication when names are pronounceable.
If you are writing a function to generate a time-stamp, discussing the following function verbally would be challenging.
This is a much better (and pronounceable) alternative:
Additionally, avoid naming separate entities with homonyms.
Discussing entities named
waist is inevitably going to lead to confusion.
Once you settle down on a mapping from an abstraction to a name, use it consistently throughout the code base.
E.g., two similar methods here have different names across
Both of these methods should either be named
Having different name formats for different entities acts like syntax highlighting. That is, a name not only represents an entity but also provides hints about its nature.
There are various casing conventions used for software development.
Illustration (CC-BY) by Allison Horst
You won’t have to remember any of these rules if you follow the following principle:
“Names must be readable for the reader, not writer, of code.”
female_birdsong_recordingis a better variable name than
exit()is better than
lastis worse than either
numColumns(number of columns) in one function and
noRows(number of rows) in another, choose one abbreviation as a prefix and use it consistently.
0, etc.). With certain fonts, it can be hard to distinguish
highlight. The benefit is not worth the cost here.
na.rmparameter removes (
rm) missing values (
NA). Using it to mean “remove (
rm) non-authorized (
NA) entries” for a function parameter will be misleading.
center. Adopt one standard and stick to it.
patientRepsare easily confused because they are so similar. There should be at least two-letter difference:
Looking at names in the wild that violate presented guidelines.
This is not to be taken as criticisms, but as learning opportunities to drive home the importance of these guidelines.
R is a programming language for statistical computing, and function names can be expected to respect the domain conventions.
Statistical distributions can be characterized by centrality measures, like mean, median, mode, etc., and R has functions with names that wouldn’t surprise you, except one:
mode() function actually returns the storage mode of an R object!
This function could have been named (e.g.)
storageMode(), which is more precise and doesn’t break domain-specific expectations.
std::array definition is too generic.
size is a bit better but still leaves room for misunderstanding:
“Does it mean length or memory bytes?”
ggplot2 is a plotting framework in R, and supports both British and American English spelling standards. But does it do so consistently?
A user now believes that both spelling standards for function names and parameters are supported. And, since they prefer American spellings, they do this:
filter() can be used to apply a function to an iterable.
filter is an ambiguous word. It could mean either of these:
If you’ve never used this function before, could you predict if it returns
1, so the intent is to pick out the elements that pass the condition.
In this case,
keep() would be a better name.
Had the intent been to find elements to remove,
discard() would be a better name.
It is easy to find such violations.
But, whenever you encounter one, make it a personal exercise to come up with a better name.
Deep dive into benefits of thoughtful naming for an entity at the heart of all software: function
Unix philosophy specifies the golden rule for writing good a function:
“Do One Thing And Do It Well.”
Finding a descriptive name for a function can inform us if we are following this rule.
Consider a function to extract a table of regression estimates. For convenience, it also allows sorting the table by estimate.
When it comes to writing a good function, finding a good name for a parameter can also reveal design problems.
E.g. a boolean or flag parameter name means function is doing more than one thing.
Consider a function that converts Markdown or HTML documents to PDF.
- Psalms 52:9
Initially, you may struggle to find good names and settle down for the first serviceable name that pops into your head.
Resist the urge!
Adopt an investment mindset and remember that the little extra time invested in finding good names early on will pay dividends in the long run by reducing the accumulation of complexity in the system.
The more you do it, the easier it will get!
And, after a while, you won’t even need to think long and hard to come up with a good name. You will instinctively think of one.
For a more detailed discussion about how to name things, see the following references.
McConnell, S. (2004). Code Complete. Microsoft Press. (pp. 259-290)
Boswell, D., & Foucher, T. (2011). The Art of Readable Code. O’Reilly Media, Inc. (pp. 7-31)
Martin, R. C. (2009). Clean Code. Pearson Education. (pp. 17-52)
Ousterhout, J. K. (2018). A Philosophy of Software Design. Palo Alto: Yaknyam Press. (pp. 121-129)
Goodliffe, P. (2007). Code Craft. No Starch Press. (pp. 39-56)
Thomas, D., & Hunt, A. (2019). The Pragmatic Programmer. Addison-Wesley Professional. (pp. 238-242)
For a good example of organizational naming guidelines, see Google C++ Style Guide.
If you are interested in good programming and software development practices, check out my other slide decks.
And Happy Naming! 😊
─ Session info ─────────────────────────────────────────────────────────────── setting value version R version 4.2.2 (2022-10-31) os Ubuntu 22.04.1 LTS system x86_64, linux-gnu ui X11 language (EN) collate C.UTF-8 ctype C.UTF-8 tz UTC date 2023-02-05 pandoc 3.0.1 @ /usr/bin/ (via rmarkdown) ─ Packages ─────────────────────────────────────────────────────────────────── package * version date (UTC) lib source base * 4.2.2 2022-10-31  local cli 3.6.0 2023-01-09  RSPM compiler 4.2.2 2022-10-31  local datasets * 4.2.2 2022-10-31  local digest 0.6.31 2022-12-11  RSPM evaluate 0.20 2023-01-17  RSPM fastmap 1.1.0 2021-01-25  RSPM graphics * 4.2.2 2022-10-31  local grDevices * 4.2.2 2022-10-31  local grid 4.2.2 2022-10-31  local htmltools 0.5.4 2022-12-07  RSPM jsonlite 1.8.4 2022-12-06  RSPM knitr 1.42 2023-01-25  RSPM lattice 0.20-45 2021-09-22  CRAN (R 4.2.2) Matrix 1.5-1 2022-09-13  CRAN (R 4.2.2) methods * 4.2.2 2022-10-31  local png 0.1-8 2022-11-29  RSPM Rcpp 1.0.10 2023-01-22  RSPM reticulate 1.28 2023-01-27  RSPM rlang 1.0.6 2022-09-24  RSPM rmarkdown 2.20 2023-01-19  RSPM sessioninfo 1.2.2 2021-12-06  any (@1.2.2) stats * 4.2.2 2022-10-31  local tools 4.2.2 2022-10-31  local utils * 4.2.2 2022-10-31  local xfun 0.37 2023-01-31  RSPM yaml 2.3.7 2023-01-23  RSPM  /home/runner/work/_temp/Library  /opt/R/4.2.2/lib/R/site-library  /opt/R/4.2.2/lib/R/library ──────────────────────────────────────────────────────────────────────────────