coalesce {SparkR} | R Documentation |
Returns a new SparkDataFrame that has exactly numPartitions
partitions.
This operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100
partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of
the current partitions. If a larger number of partitions is requested, it will stay at the
current number of partitions.
Returns the first column that is not NA, or NA if all inputs are.
coalesce(x, ...) ## S4 method for signature 'SparkDataFrame' coalesce(x, numPartitions) ## S4 method for signature 'Column' coalesce(x, ...)
x |
a Column or a SparkDataFrame. |
... |
additional argument(s). If |
numPartitions |
the number of partitions to use. |
However, if you're doing a drastic coalesce on a SparkDataFrame, e.g. to numPartitions = 1,
this may result in your computation taking place on fewer nodes than
you like (e.g. one node in the case of numPartitions = 1). To avoid this,
call repartition
. This will add a shuffle step, but means the
current upstream partitions will be executed in parallel (per whatever
the current partitioning is).
coalesce(SparkDataFrame) since 2.1.1
coalesce(Column) since 2.1.1
Other SparkDataFrame functions: SparkDataFrame-class
,
agg
, arrange
,
as.data.frame
, attach
,
cache
, collect
,
colnames
, coltypes
,
createOrReplaceTempView
,
crossJoin
, dapplyCollect
,
dapply
, describe
,
dim
, distinct
,
dropDuplicates
, dropna
,
drop
, dtypes
,
except
, explain
,
filter
, first
,
gapplyCollect
, gapply
,
getNumPartitions
, group_by
,
head
, histogram
,
insertInto
, intersect
,
isLocal
, join
,
limit
, merge
,
mutate
, ncol
,
nrow
, persist
,
printSchema
, randomSplit
,
rbind
, registerTempTable
,
rename
, repartition
,
sample
, saveAsTable
,
schema
, selectExpr
,
select
, showDF
,
show
, storageLevel
,
str
, subset
,
take
, union
,
unpersist
, withColumn
,
with
, write.df
,
write.jdbc
, write.json
,
write.orc
, write.parquet
,
write.text
Other normal_funcs: abs
,
bitwiseNOT
, column
,
expr
, greatest
,
ifelse
, isnan
,
least
, lit
,
nanvl
, negate
,
randn
, rand
,
struct
, when
## Not run:
##D sparkR.session()
##D path <- "path/to/file.json"
##D df <- read.json(path)
##D newDF <- coalesce(df, 1L)
## End(Not run)
## Not run: coalesce(df$c, df$d, df$e)