The goal of stratallo package is to provide implementations of the efficient algorithms that solve a classical problem in survey methodology - an optimum sample allocation problem in stratified sampling schemes. In this context, the classical problem of optimum sample allocation is the Tschuprov-Neyman’s sense (Neyman 1934; Tschuprov 1923). It is formulated as determination of a vector of strata sample sizes that minimizes the variance of the \(\pi\)-estimator of the population total of a given study variable, under constraint on total sample size. This problem can be further complemented by adding lower or upper bounds constraints on sample sizes is strata.
A minor modification of the classical optimium sample allocation problem leads to the minimum sample size allocation. This problems lies in the determination of a vector of strata sample sizes that minimizes total sample size, under assumed fixed level of the \(\pi\)-estimator’s variance. As in the case of the classical optimal allocation, the problem of minimum sample size allocation can be complemented by imposing upper bounds constraints on sample sizes in strata.
Stratallo provides two user functions, dopt
and
nopt
that solve sample allocation problems briefly
characterized above. In this context, it is assumed that the sampling
designs in strata are chosen so that the variance of the \(\pi\)-estimator of the population total is
of the following generic form: \[
D^2_{st}(x_w,\, w \in \mathcal W) = \sum_{w \in \mathcal W}\,
\frac{a_w^2}{x_w} - b,
\] where \(\mathcal W= \{1, \ldots,
H\}\) denotes set of strata labels with total number of strata
equals to \(H\), \((x_w)_{w \in \mathcal W}\) are the strata
sample sizes, and parameters \(b\), and
\(a_w > 0,\, w \in \mathcal W\), do
not depend on the \((x_w)_{w \in \mathcal
W}\). Among the stratified sampling designs that have the \(\pi\)-estimator’s variance of the above
form is stratified simple random sampling without replacement design.
Under this design \(a_w = N_w S_w,\, w \in
\mathcal W\) and \(b = \sum_{w \in
\mathcal W}\, N_w S_w^2\), where \(S_w,\, w \in \mathcal W\), denote stratum
standard deviations of study variable and \(N_w,\, w \in \mathcal W\), are the strata
sizes (see e.g. Sarndal et al. (1993),
Result 3.7.2, p. 103).
Apart from dopt
and nopt
,
stratallo provides var_tst
and
var_tst_si
functions that compute a value of variance \(D^2_{st}\). The var_tst_si
is
a simple wrapper of var_tst
that is dedicated for the case
of simple random sampling without replacement design in each stratum.
Furthermore, the package comes with two predefined, artificial
populations with 507 and 969 strata. These are stored in
pop507
and pop969
objects respectively.
dopt
functionThe dopt
function solves the following two types of the
allocation problem, formulated in the language of mathematical
optimization.
Problem 1 (one-sided upper bounds constraints)
Given numbers \(a_w > 0,\, M_w > 0,\, w
\in \mathcal W\) and \(b,\, n <
\sum_{w \in \mathcal W}\, M_w\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad f(\mathbf x) = \sum_{w \in \mathcal W} \tfrac{a_w^2}{x_w} -
b \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W} x_w = n \\
& \quad x_w \le M_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.
Problem 2 (one-sided lower bounds constraints)
Given numbers \(a_w > 0,\, m_w > 0,\, w
\in \mathcal W\), and \(b,\, n >
\sum_{w \in \mathcal W} m_w\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad f(\mathbf x) = \sum_{w \in \mathcal W} \tfrac{a_w^2}{x_w} -
b \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W} x_w = n \\
& \quad x_w \geq m_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.
User of dopt
can choose whether the solution computed
will be for Problem 1 or Problem 2
This is achieved with the proper use of m
and
M
arguments of the function. In case of Problem
1, user provides the values of upper bounds with M
argument, while leaving m
as NULL
. Similarly,
for Problem 2, user provides the values of lower bounds
with m
argument, while leaving M
as
NULL
. If both m
and M
are
specified as NULL
(default), the dopt
returns
the value of Tschuprov-Neyman allocation that minimizes variance \(D^2_{st}\) under constraints on total
sample size \(\sum_{w \in \mathcal W} x_w =
n\), and it is given by \[
x_w = a_w \frac{n}{\sum_{w \in \mathcal W} a_w}, \quad w \in \mathcal
W
\] There are four different algorithms available to use for
Problem 1, rNa (default), sga,
sgaplus, coma. All these algorithms, except
sgaplus, are described in detail in Wesołowski et al. (2021). The sgaplus
is defined in Wójciak (2019) as
Sequential Allocation (version 1) algorithm.
The optimization Problem 2 is solved by the LrNa that in principle is based on the rNa and it is introduced in Wójciak (2022).
nopt
functionThe nopt
function solves the following minimum sample
size allocation problem, formulated in the language of mathematical
optimization.
Problem 3
Given numbers \(a_w > 0,\, M_w > 0,\, w
\in \mathcal W\), and \(b,\, D >
\sum_{w \in \mathcal W} \tfrac{a_w^2}{M_w} - b > 0\), \[\begin{align*}
\underset{\mathbf x\in (0, +\infty)^{H}}{\mathrm{minimize ~\,}}
& \quad n(\mathbf x) = \sum_{w \in \mathcal W} x_w \\
\mathrm{subject ~ to} & \quad \sum_{w \in \mathcal W}
\tfrac{a_w^2}{x_w} - b = D \\
& \quad x_w \le M_w, \quad \forall w \in \mathcal W,
\end{align*}\] where \(\mathbf x=
(x_w)_{w \in \mathcal W}\) is the optimization variable.
The algorithm that solves Problem 3 is based on the LrNa and it is described in Wójciak (2022).
You can install the released version of stratallo package from CRAN with:
install.packages("stratallo")
These are basic examples that show how to use dopt
and
nopt
functions to solve optimal sample allocation problems
for an example population with 4 strata.
library(stratallo)
dopt
# Define example population
<- c(3000, 4000, 5000, 2000) # Strata sizes.
N <- c(48, 79, 76, 17) # Standard deviations of a study variable in strata.
S <- N * S a
<- c(100, 90, 70, 80) # Upper bounds constraints imposed on the sample sizes in strata.
M all(M <= N)
#> [1] TRUE
<- 190 # Total sample size.
n < sum(M)
n #> [1] TRUE
# Solution to Problem 1.
<- dopt(n = n, a = a, M = M)
opt
opt#> [1] 34.979757 76.761134 70.000000 8.259109
sum(opt) # Equals to n.
#> [1] 190
all(opt <= M) # Does not violate upper bounds constraints.
#> [1] TRUE
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 4035156476
<- c(50, 120, 1, 1) # Lower bounds constraints imposed on the sample sizes in strata.
m > sum(m)
n #> [1] TRUE
# Solution to Problem 2.
<- dopt(n = n, a = a, m = m)
opt
opt#> [1] 50.000000 120.000000 18.357488 1.642512
sum(opt) # Equals to n.
#> [1] 190
all(opt >= m) # Does not violate lower bounds constraints.
#> [1] TRUE
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 9755319333
# Tschuprov-Neyman allocation (no inequality constraints).
<- dopt(n = n, a = a)
opt
opt#> [1] 31.304348 68.695652 82.608696 7.391304
sum(opt) # Equals to n.
#> [1] 190
# Variance of the pi-estimator that corresponds to a given optimal allocation.
var_tst_si(opt, N, S)
#> [1] 3959066000
nopt
<- c(3000, 4000, 5000, 2000)
a <- 70000
b <- c(100, 90, 70, 80)
M <- 1e6 # Variance constraint.
D
<- nopt(D, a, b, M)
opt sum(opt)
#> [1] 183.1776