Returns a stratified sample without replacement
sampleBy.Rd
Returns a stratified sample without replacement based on the fraction given on each stratum.
Usage
sampleBy(x, col, fractions, seed)
# S4 method for SparkDataFrame,character,list,numeric
sampleBy(x, col, fractions, seed)
Arguments
- x
A SparkDataFrame
- col
column that defines strata
- fractions
A named list giving sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.
- seed
random seed
See also
Other stat functions:
approxQuantile()
,
corr()
,
cov()
,
crosstab()
,
freqItems()
Examples
if (FALSE) {
df <- read.json("/path/to/file.json")
sample <- sampleBy(df, "key", fractions, 36)
}