subset {SparkR} | R Documentation |
Return subsets of SparkDataFrame according to given conditions
subset(x, ...) ## S4 method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] ## S4 replacement method for signature 'SparkDataFrame,numericOrcharacter' x[[i]] <- value ## S4 method for signature 'SparkDataFrame' x[i, j, ..., drop = F] ## S4 method for signature 'SparkDataFrame' subset(x, subset, select, drop = F, ...)
x |
a SparkDataFrame. |
... |
currently not used. |
i, subset |
(Optional) a logical expression to filter on rows. For extract operator [[ and replacement operator [[<-, the indexing parameter for a single Column. |
value |
a Column or an atomic vector in the length of 1 as literal value, or |
j, select |
expression for the single Column or a list of columns to select from the SparkDataFrame. |
drop |
if TRUE, a Column will be returned if the resulting dataset has only one column. Otherwise, a SparkDataFrame will always be returned. |
A new SparkDataFrame containing only the rows that meet the condition with selected columns.
[[ since 1.4.0
[[<- since 2.1.1
[ since 1.4.0
subset since 1.5.0
Other SparkDataFrame functions: SparkDataFrame-class
,
agg
, arrange
,
as.data.frame
, attach
,
cache
, coalesce
,
collect
, colnames
,
coltypes
,
createOrReplaceTempView
,
crossJoin
, dapplyCollect
,
dapply
, describe
,
dim
, distinct
,
dropDuplicates
, dropna
,
drop
, dtypes
,
except
, explain
,
filter
, first
,
gapplyCollect
, gapply
,
getNumPartitions
, group_by
,
head
, histogram
,
insertInto
, intersect
,
isLocal
, join
,
limit
, merge
,
mutate
, ncol
,
nrow
, persist
,
printSchema
, randomSplit
,
rbind
, registerTempTable
,
rename
, repartition
,
sample
, saveAsTable
,
schema
, selectExpr
,
select
, showDF
,
show
, storageLevel
,
str
, take
,
union
, unpersist
,
withColumn
, with
,
write.df
, write.jdbc
,
write.json
, write.orc
,
write.parquet
, write.text
Other subsetting functions: filter
,
select
## Not run:
##D # Columns can be selected using [[ and [
##D df[[2]] == df[["age"]]
##D df[,2] == df[,"age"]
##D df[,c("name", "age")]
##D # Or to filter rows
##D df[df$age > 20,]
##D # SparkDataFrame can be subset on both rows and Columns
##D df[df$name == "Smith", c(1,2)]
##D df[df$age %in% c(19, 30), 1:2]
##D subset(df, df$age %in% c(19, 30), 1:2)
##D subset(df, df$age %in% c(19), select = c(1,2))
##D subset(df, select = c(1,2))
##D # Columns can be selected and set
##D df[["age"]] <- 23
##D df[[1]] <- df$age
##D df[[2]] <- NULL # drop column
## End(Not run)