Welcome to the ‘Get started’ vignette of the jfa
package. This vignette provides a simple explanation of the functions in
the package and how they facilitate the statistical audit sampling
workflow. See the other vignettes for a more detailed explanation of the
functionality of the package.
To concretely illustrate jfa
‘s functionality, we
consider the BuildIt
data set that is included in the
package (for more info, see ?BuildIt
). This data set
contains a population of 3500 invoices paid to the fictional ’BuildIt’
construction company. Each invoice has an identification number
(ID
), a recorded value (bookValue
), and a
corresponding audit (true) value (auditValue
).
Note: The information in the auditValue
column is added for illustrative purposes since it will only be known to
the auditor after having inspected a sample of invoices.
First, we load the jfa
package and the
BuildIt
data set. The first 10 invoices from the data set
are displayed below.
library(jfa)
data('BuildIt')
head(BuildIt, n = 10)
## ID bookValue auditValue
## 1 82884 242.61 242.61
## 2 25064 642.99 642.99
## 3 81235 628.53 628.53
## 4 71769 431.87 431.87
## 5 55080 620.88 620.88
## 6 93224 501.76 501.76
## 7 24331 466.01 466.01
## 8 81460 295.20 295.20
## 9 14608 216.48 216.48
## 10 79064 243.43 243.43
For a fully illustrated walkthrough of jfa
’s workflow
functionality using the BuildIt
data set, see Workflow:
Classical audit sampling. For a Bayesian version of the illustrated
walkthrough, see Workflow:
Bayesian audit sampling.
auditPrior()
: The basicsThe auditPrior()
function can be used to create a prior
distribution for the misstatement parameter in a statistical audit
sampling model. In an audit sampling context, an advantage of Bayesian
inference is that the prior distribution can be used to incorporate
existing information into the statistical procedure. Incorporating
existing information can potentially yield a decrease in sample size and
an increase in efficiency. The type of audit information that can be
incorporated depends on the information that is available in the context
of the audit. See the vignette Planning:
Prior distributions or the accompanying article
for a detailed explanation of the types of audit information that
jfa
is able to incorporate into a prior distribution.
With the prior distribution in hand, Bayesian audit sampling can be
performed by providing the object returned by the
auditPrior()
function as input for the prior
argument in subsequent calls to the planning()
and
evaluation()
functions.
planning()
: The basicsPlanning a minimum sample size requires knowledge of the conditions that lead to acceptance of the population (i.e., the sampling objectives). Generally, a sampling objective can be one (or both) of the following:
Next to determining the sampling objective(s), it is also important
to determine the statistical distribution linking the sample outcomes to
the population misstatement (e.g., poisson
,
binomial
, or hypergeometric
). All three
distributions are standard in an audit sampling context because they are
(approximations) of the hypergeometric distribution, but
poisson
is the default in jfa
because it is
the most conservative.
Lastly, it is advised to obtain knowledge of the expected (or tolerable) errors in the sample. It is strongly recommended to set the value for the expected errors in the sample conservatively to minimize the chance of the observed errors in the sample exceeding the expected errors, which would imply that insufficient work has been done in the end.
With the BuildIt
data set, because the booked amounts
(monetary values) of each invoice in the population are given, an
auditor may want to make a statement about the amount of misstatement in
the population. For illustrative purposes we will tolerate zero
misstatements in the sample.
First, let’s take a look at how you can use the
planning()
function to calculate the minimum sample size
for testing the hypothesis that the misstatement in the population is
lower than the performance materiality. In this example the performance
materiality is set to 5% of the total population value, meaning that the
population may not contain more than 5% misstatement.
Sampling objective: Calculate a minimum sample size such that, when no misstatements are found in the sample, there is a 95% chance that the misstatement in the population is lower than 5% of the population value.
A minimum sample size for this sampling objective can be calculated
by specifying the materiality
parameter in the
planning()
function, see the code below. Next, a summary of
the statistical results can be obtained using the summary()
function. The results show that, given zero tolerable errors, the
minimum sample size is 60 units.
<- planning(materiality = 0.05, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Hypotheses: H₀: Θ >= 0.05 vs. H₁: Θ < 0.05
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 60
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.049929
## Expected precision: 0.049929
## Expected p-value: < 2.22e-16
Next, let’s take a look at how you can use the
planning()
function to calculate the minimum sample size
for estimating the misstatement in the population with a minimum
precision. The precision is defined as the difference between the most
likely misstatement and the upper confidence bound on the misstatement.
For this example, the minimum precision is set to 2% of the population
value.
Sampling objective: Calculate a minimum sample size such that, when zero misstatements are found in the sample, there is a 95% chance that the misstatement in the population is at most 2% above the most likely misstatement.
A minimum sample size for this sampling objective can be calculated
by specifying the min.precision
parameter in the
planning()
function, see the code below. The results show
that, given zero tolerable errors, the minimum sample size is 150
units.
<- planning(min.precision = 0.02, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Min. precision: 0.02
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 150
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.019971
## Expected precision: 0.019971
selection()
: The basicsSelecting a sample using the selection()
function
requires knowledge of units in the population that are eligible for
selection (i.e., sampling units). Sampling units can be items or
monetary units. Items can be selected from the population using record
sampling (also known as attribute sampling or item sampling) with
units = 'items'
. On the other hand, monetary units can be
selected from the population using monetary unit sampling (MUS) with
units = 'values'
.
Once the sampling units are determined it should also be determined
what method is used to select the units (i.e., the selection method).
Sampling units can be selected with a fixed interval sampling (also
known as systematic sampling) scheme using
method = 'interval'
(the default), with a cell sampling
scheme using method = 'cell'
, using random sampling using
method = 'random'
, or using modified sieve sampling with
method = 'sieve'
. See the vignette Selection:
Sampling methodology for a more detailed explanation the selection
methods implemented in jfa
.
First, let’s take a look at how the selection()
function
can be used to perform random record sampling. Random record sampling
implies that the sampling units are set to items
and the
selection method is set to random
. The code below selects
the 60 planned invoices from the BuildIt
data set using
such a random record sampling scheme.
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2 summary(stage2)
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 60
## Sampling units: items
## Method: random sampling
##
## Data:
## Population size: 3500
##
## Results:
## Selected sampling units: 60
## Selected items: 60
## Proportion of size: 0.017143
Next, let’s take a look at how the selection()
function
can be used to perform fixed interval monetary unit sampling. Fixed
interval monetary unit sampling implies that the sampling units are set
to values
and the selection method is set to
interval
. The code below selects 150 monetary units from
the BuildIt
data set using such a fixed interval monetary
unit sampling scheme.
<- selection(data = BuildIt, size = 150, units = 'values', method = 'interval', values = 'bookValue')
stage2 summary(stage2)
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 150
## Sampling units: monetary units
## Method: fixed interval sampling
## Starting point: 1
##
## Data:
## Population size: 3500
## Population value: 1403221
## Selection interval: 9354.8
##
## Results:
## Selected sampling units: 150
## Proportion of value: 0.0001069
## Selected items: 150
## Proportion of size: 0.042857
The selected units and corresponding items are stored in the object
that is returned by the selection()
function. The sample
can be extracted from this object by indexing it via
$sample
, see the code below. After this step it is up to
the auditor to annotate the sample.
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2
<- stage2$sample
sample head(sample, n = 10)
## row times ID bookValue auditValue
## 1 1017 1 50755 618.24 618.24
## 2 679 1 20237 669.75 669.75
## 3 2177 1 9517 454.02 454.02
## 4 930 1 85674 257.82 257.82
## 5 1533 1 31051 308.53 308.53
## 6 471 1 84375 824.66 824.66
## 7 2347 1 75616 623.70 623.70
## 8 270 1 82033 352.75 352.75
## 9 1211 1 12877 52.89 52.89
## 10 3379 1 85322 330.24 330.24
evaluation()
: The basicsAfter annotating the items in the sample with their audit values you
can perform statistical inference about the misstatement in the
population with the evaluation()
function. Next to a data
sample as input, this function can also be used when only summary
statistics from a data sample (e.g., sample size and number of errors)
are available. For a more elaborate explanation of the output of this
function for each sampling objective, see the package vignettes Evaluation:
Testing misstatement and Evaluation:
Estimating misstatement.
First, let’s take a look at how the evaluation()
function can be combined with summary statistics from a sample. Suppose
that in the previously selected sample of 60 invoices it is found that a
single invoice is missing an autograph. These summary statistics can be
provided to the evaluation()
function with
x = 1
and n = 60
. The function also requires
that you specify the sampling objectives using the
materiality
or min.precision
arguments. Again,
a performance materiality of 5% again applies.
<- evaluation(materiality = 0.05, method = 'poisson', conf.level = 0.95, x = 1, n = 60)
stage4 summary(stage4)
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Materiality: 0.05
## Hypotheses: H₀: Θ >= 0.05 vs. H₁: Θ < 0.05
## Method: poisson
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 1
##
## Results:
## Most likely error: 0.016667
## 95 percent confidence interval: [0, 0.079064]
## Precision: 0.062398
## p-value: 0.19915
The results indicate that the most likely error in the population is 1.66%. Moreover, the 95% one-sided confidence interval for the population misstatement ranges from 0% to 7.9% and contains the performance materiality. This implies that we cannot reject the hypothesis that the population misstatement is lower than 5%, which is also indicated by a non-significant p value (p = 0.199).
Next, let’s take a look at how the evaluation()
function
can be combined with a data sample. Returning to our annotated sample
from the selection()
function, suppose that in the
previously selected sample of 60 invoices it is found that a single
invoice has a true value that deviates from its booked value.
$auditValue <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100 sample
These data can be provided to the evaluation()
function
using the data
, values
,
values.audit
, and times
arguments. The
method
argument determines the method of inference. For
example, the code below evaluates the misstatement in the population
using the commonly used Stringer bound. You can find more information
about which evaluation methods are implemented on the home page.
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
summary(stage4)
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Method: stringer
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 0.1617495
##
## Results:
## Most likely error: 0.0026958
## 95 percent confidence interval: [0, 0.053222]
## Precision: 0.050526
The results indicate that the most likely error in the population is
1%. Moreover, the 95% one-sided confidence interval for the population
misstatement ranges from 0% to 6.5% and contains the performance
materiality. The stringer
method does not provide a
p value for hypothesis testing.
report()
: The basicsWith the results from the evaluation()
function in hand,
a call to the report()
function automatically generates a
report containing the data, the statistical results and their
interpretation, and the conclusion of the sampling procedure with
respect to the sampling objectives. The object returned by the
evaluation()
function can be supplied directly to the
report()
function, see the code below.
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
report(stage4, file = 'report.html', format = 'html_document') # Generates .html report