GroupedData.agg(*exprs)
GroupedData.agg
Compute aggregates and returns the result as a DataFrame.
DataFrame
GroupedData.apply(udf)
GroupedData.apply
It is an alias of pyspark.sql.GroupedData.applyInPandas(); however, it takes a pyspark.sql.functions.pandas_udf() whereas pyspark.sql.GroupedData.applyInPandas() takes a Python native function.
pyspark.sql.GroupedData.applyInPandas()
pyspark.sql.functions.pandas_udf()
GroupedData.applyInPandas(func, schema)
GroupedData.applyInPandas
Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.
GroupedData.avg(*cols)
GroupedData.avg
Computes average values for each numeric columns for each group.
GroupedData.cogroup(other)
GroupedData.cogroup
Cogroups this group with another group so that we can run cogrouped operations.
GroupedData.count()
GroupedData.count
Counts the number of records for each group.
GroupedData.max(*cols)
GroupedData.max
Computes the max value for each numeric columns for each group.
GroupedData.mean(*cols)
GroupedData.mean
GroupedData.min(*cols)
GroupedData.min
Computes the min value for each numeric column for each group.
GroupedData.pivot(pivot_col[, values])
GroupedData.pivot
Pivots a column of the current DataFrame and perform the specified aggregation.
GroupedData.sum(*cols)
GroupedData.sum
Computes the sum for each numeric columns for each group.
PandasCogroupedOps.applyInPandas(func, schema)
PandasCogroupedOps.applyInPandas
Applies a function to each cogroup using pandas and returns the result as a DataFrame.