Writing a custom diagnostic

Michael Chirico

2021-07-11

Motivation

potools provides several "diagnostic" functions used to check the "health" of the messaging corpus available in a given package. These are check_cracked_messages, which looks for messages split into chunks which are hard to translate; check_untranslated_cat, which looks for messages displayed via cat() which are not marked for translation; and check_untranslated_src, which looks for messages in the src directory which are not marked for translation.

These just crack the surface of the types of diagnostics that are possible for improving the quality of messaging to users -- not only in the process of translation, but also for bettering the experience in English!

In this vignette we'll demonstrate just such a use case by writing a custom diagnostic function that checks for typos in your messages by applying the function utils::aspell().

Writing the diagnostic

We'll call our function check_spelling; it will take as input a data.table like that produced by get_message_data(), and give as output a data.table indexing any issues found. Specifically, it should have three or four columns: call, file, line_number, and replacement. The first three come directly from the input; the last one is optional and suggests to the user a way to repair any "unhealthy" messages.

check_spelling = function(message_data) {
  # if aspell isn't installed, this won't work; be sure to return an object with the right schema anyway
  if (!nzchar(Sys.which("aspell"))) {
    warning("'aspell' is not installed; returning nothing")
    return(message_data[0, .(call, file, line_number)])
  }

  # aspell() works on files, so we'll write the msgid to files
  aspell_dir <- file.path(tempdir(), 'aspell')
  dir.create(aspell_dir)
  original_dir <- setwd(aspell_dir)
  on.exit({unlink(aspell_dir, recursive = TRUE); setwd(original_dir)})
  
  # (!is_repeat) makes sure we only check duplicate messages once
  # plural messages are in a list, so handle them separately
  message_data[(!is_repeat), by = .(file, type), {
    if (.BY$type == "singular") {
      cat(msgid, file = .BY$file, sep = "\n")
      # aspell() results has 5 columns: Original, File, Line, Column, Suggestions; we only need 1 & 5
      results = utils::aspell(.BY$file)
      unlink(.BY$file)
      
      typo_idx <- sapply(results$Original, grep, msgid)
      # take the first suggestion
      replacement = sapply(
        seq_along(results$Suggestions),
        function(typo_i) {
          # take the identified typo & replace it with aspell's 1st suggestion in the original `call`
          gsub(
            results$Original[typo_i], results$Suggestions[[typo_i]][1L],
            call[typo_idx[typo_i]], fixed = TRUE
          )
        }
      )
      
      .(
        call = call[typo_idx],
        file = file[typo_idx],
        line_number = line_number[typo_idx],
        replacement = replacement
      )
    } else {
      # unlist() to write both the n=1 and n!=1 messages to the file side-by-side
      all_msgid <- unlist(msgid_plural)
      cat(all_msgid, file = .BY$file, sep = "\n")
      results = utils::aspell(.BY$file)
      unlink(.BY$file)
      
      # odd numbers in grep output --> first entry for each plural_msgid; even numbers --> second entry.
      # do this arithmetic trick to re-map that to the original entry number in msgid_plural
      typo_idx <- ((sapply(results$Original, grep, all_msgid) - 1L) %/% 2L) + 1L
      # potentially overwrite each call >1 time if both messages have a typo
      replacement = call
      for (typo_i in seq_along(results$Suggestions)) {
        replacement[typo_idx[typo_i]] <- gsub(
          results$Original[typo_i], results$Suggestions[[typo_i]][1L],
          replacement[typo_idx[typo_i]], fixed = TRUE
        )
      }
      typo_idx <- unique(typo_idx)

      .(
        call = call[typo_idx],
        file = file[typo_idx],
        line_number = line_number[typo_idx],
        replacement = replacement[typo_idx]
      )
    }
  }]
}

In a package, we would probably use a few more helper functions to clean up & simplify the body of this diagnostic; we're piling everything in sequence for illustration to have everything in one place.

Running the diagnostic

We can check how the diagnostic works on a simple test package GreatSpelling created for this vignette.

library(potools)
great_spelling_messages = get_message_data("GreatSpelling")

# showing the structure of the messagedata for this package
great_spelling_messages
##    message_source     type       file                           msgid
## 1:              R singular    hazel.R These dark arts are forbiddden!
## 2:              R singular spellman.R     This is byond my abilities!
## 3:              R   plural   merlin.R                            <NA>
##                  msgid_plural                                         call
## 1:                                 stop("These dark arts are forbiddden!")
## 2:                                  warning("This is byond my abilities!")
## 3: %d lyfe left,%d lyves left ngettext(n, "%d lyfe left", "%d lyves left")
##    line_number is_repeat is_marked_for_translation is_templated
## 1:           2     FALSE                      TRUE        FALSE
## 2:           2     FALSE                      TRUE        FALSE
## 3:           2     FALSE                      TRUE        FALSE
# running our diagnostic
check_spelling(great_spelling_messages)
##          file     type                                         call       file
## 1:    hazel.R singular      stop("These dark arts are forbiddden!")    hazel.R
## 2: spellman.R singular       warning("This is byond my abilities!") spellman.R
## 3:   merlin.R   plural ngettext(n, "%d lyfe left", "%d lyves left")   merlin.R
##    line_number                                 replacement
## 1:           2      stop("These dark arts are forbidden!")
## 2:           2     warning("This is beyond my abilities!")
## 3:           2 ngettext(n, "%d lye left", "%d laves left")

That should covers the basics -- I look forward to seeing all the great uses you more creative developers can devise. Thanks for reading!