pyspark.pandas.groupby.GroupBy.nth¶

GroupBy.nth(n: int) → FrameLike[source]¶

Take the nth row from each group.

New in version 3.4.0.

Parameters

nint: A single nth value for the row

Returns

Series or DataFrame

See also

pyspark.pandas.Series.groupby
pyspark.pandas.DataFrame.groupby

Notes

There is a behavior difference between pandas-on-Spark and pandas:

when there is no aggregation column, and n not equal to 0 or -1,
the returned empty dataframe may have an index with different lenght __len__.

Examples

>>> df = ps.DataFrame({'A': [1, 1, 2, 1, 2],
...                    'B': [np.nan, 2, 3, 4, 5]}, columns=['A', 'B'])
>>> g = df.groupby('A')
>>> g.nth(0)
     B
A
1  NaN
2  3.0
>>> g.nth(1)
     B
A
1  2.0
2  5.0
>>> g.nth(-1)
     B
A
1  4.0
2  5.0

pyspark.pandas.groupby.GroupBy.min pyspark.pandas.groupby.GroupBy.prod