sampleBy {SparkR} | R Documentation |
Returns a stratified sample without replacement based on the fraction given on each stratum.
sampleBy(x, col, fractions, seed) ## S4 method for signature 'SparkDataFrame,character,list,numeric' sampleBy(x, col, fractions, seed)
x |
A SparkDataFrame |
col |
column that defines strata |
fractions |
A named list giving sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero. |
seed |
random seed |
A new SparkDataFrame that represents the stratified sample
sampleBy since 1.6.0
Other stat functions:
approxQuantile()
,
corr()
,
cov()
,
crosstab()
,
freqItems()
## Not run:
##D df <- read.json("/path/to/file.json")
##D sample <- sampleBy(df, "key", fractions, 36)
## End(Not run)