DataFrame([data, index, columns, dtype, copy])
DataFrame
pandas-on-Spark DataFrame that corresponds to pandas DataFrame logically.
DataFrame.index
The index (row labels) Column of the DataFrame.
DataFrame.columns
The column labels of the DataFrame.
DataFrame.empty
Returns true if the current DataFrame is empty.
DataFrame.dtypes
Return the dtypes in the DataFrame.
DataFrame.shape
Return a tuple representing the dimensionality of the DataFrame.
DataFrame.axes
Return a list representing the axes of the DataFrame.
DataFrame.ndim
Return an int representing the number of array dimensions.
DataFrame.size
Return an int representing the number of elements in this object.
DataFrame.select_dtypes([include, exclude])
DataFrame.select_dtypes
Return a subset of the DataFrame’s columns based on the column dtypes.
DataFrame.values
Return a Numpy representation of the DataFrame or the Series.
DataFrame.copy([deep])
DataFrame.copy
Make a copy of this object’s indices and data.
DataFrame.isna()
DataFrame.isna
Detects missing values for items in the current Dataframe.
DataFrame.astype(dtype)
DataFrame.astype
Cast a pandas-on-Spark object to a specified dtype dtype.
dtype
DataFrame.isnull()
DataFrame.isnull
DataFrame.notna()
DataFrame.notna
Detects non-missing values for items in the current Dataframe.
DataFrame.notnull()
DataFrame.notnull
DataFrame.pad([axis, inplace, limit])
DataFrame.pad
Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.
method=`ffill`
DataFrame.bool()
DataFrame.bool
Return the bool of a single element in the current object.
DataFrame.at
Access a single value for a row/column label pair.
DataFrame.iat
Access a single value for a row/column pair by integer position.
DataFrame.head([n])
DataFrame.head
Return the first n rows.
DataFrame.idxmax([axis])
DataFrame.idxmax
Return index of first occurrence of maximum over requested axis.
DataFrame.idxmin([axis])
DataFrame.idxmin
Return index of first occurrence of minimum over requested axis.
DataFrame.loc
Access a group of rows and columns by label(s) or a boolean Series.
DataFrame.iloc
Purely integer-location based indexing for selection by position.
DataFrame.items()
DataFrame.items
This is an alias of iteritems.
iteritems
DataFrame.iteritems()
DataFrame.iteritems
Iterator over (column name, Series) pairs.
DataFrame.iterrows()
DataFrame.iterrows
Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.itertuples([index, name])
DataFrame.itertuples
Iterate over DataFrame rows as namedtuples.
DataFrame.keys()
DataFrame.keys
Return alias for columns.
DataFrame.pop(item)
DataFrame.pop
Return item and drop from frame.
DataFrame.tail([n])
DataFrame.tail
Return the last n rows.
DataFrame.xs(key[, axis, level])
DataFrame.xs
Return cross-section from the DataFrame.
DataFrame.get(key[, default])
DataFrame.get
Get item from object for given key (DataFrame column, Panel slice, etc.).
DataFrame.where(cond[, other, axis])
DataFrame.where
Replace values where the condition is False.
DataFrame.mask(cond[, other])
DataFrame.mask
Replace values where the condition is True.
DataFrame.query(expr[, inplace])
DataFrame.query
Query the columns of a DataFrame with a boolean expression.
DataFrame.add(other)
DataFrame.add
Get Addition of dataframe and other, element-wise (binary operator +).
DataFrame.radd(other)
DataFrame.radd
DataFrame.div(other)
DataFrame.div
Get Floating division of dataframe and other, element-wise (binary operator /).
DataFrame.rdiv(other)
DataFrame.rdiv
DataFrame.truediv(other)
DataFrame.truediv
DataFrame.rtruediv(other)
DataFrame.rtruediv
DataFrame.mul(other)
DataFrame.mul
Get Multiplication of dataframe and other, element-wise (binary operator *).
DataFrame.rmul(other)
DataFrame.rmul
DataFrame.sub(other)
DataFrame.sub
Get Subtraction of dataframe and other, element-wise (binary operator -).
DataFrame.rsub(other)
DataFrame.rsub
DataFrame.pow(other)
DataFrame.pow
Get Exponential power of series of dataframe and other, element-wise (binary operator **).
DataFrame.rpow(other)
DataFrame.rpow
Get Exponential power of dataframe and other, element-wise (binary operator **).
DataFrame.mod(other)
DataFrame.mod
Get Modulo of dataframe and other, element-wise (binary operator %).
DataFrame.rmod(other)
DataFrame.rmod
DataFrame.floordiv(other)
DataFrame.floordiv
Get Integer division of dataframe and other, element-wise (binary operator //).
DataFrame.rfloordiv(other)
DataFrame.rfloordiv
DataFrame.lt(other)
DataFrame.lt
Compare if the current value is less than the other.
DataFrame.gt(other)
DataFrame.gt
Compare if the current value is greater than the other.
DataFrame.le(other)
DataFrame.le
Compare if the current value is less than or equal to the other.
DataFrame.ge(other)
DataFrame.ge
Compare if the current value is greater than or equal to the other.
DataFrame.ne(other)
DataFrame.ne
Compare if the current value is not equal to the other.
DataFrame.eq(other)
DataFrame.eq
Compare if the current value is equal to the other.
DataFrame.dot(other)
DataFrame.dot
Compute the matrix multiplication between the DataFrame and other.
DataFrame.apply(func[, axis, args])
DataFrame.apply
Apply a function along an axis of the DataFrame.
DataFrame.applymap(func)
DataFrame.applymap
Apply a function to a Dataframe elementwise.
DataFrame.pipe(func, *args, **kwargs)
DataFrame.pipe
Apply func(self, *args, **kwargs).
DataFrame.agg(func)
DataFrame.agg
Aggregate using one or more operations over the specified axis.
DataFrame.aggregate(func)
DataFrame.aggregate
DataFrame.groupby(by[, axis, as_index, dropna])
DataFrame.groupby
Group DataFrame or Series using a Series of columns.
DataFrame.rolling(window[, min_periods])
DataFrame.rolling
Provide rolling transformations.
DataFrame.expanding([min_periods])
DataFrame.expanding
Provide expanding transformations.
DataFrame.transform(func[, axis])
DataFrame.transform
Call func on self producing a Series with transformed values and that has the same length as its input.
func
DataFrame.abs()
DataFrame.abs
Return a Series/DataFrame with absolute numeric value of each element.
DataFrame.all([axis])
DataFrame.all
Return whether all elements are True.
DataFrame.any([axis])
DataFrame.any
Return whether any element is True.
DataFrame.clip([lower, upper])
DataFrame.clip
Trim values at input threshold(s).
DataFrame.corr([method])
DataFrame.corr
Compute pairwise correlation of columns, excluding NA/null values.
DataFrame.count([axis, numeric_only])
DataFrame.count
Count non-NA cells for each column.
DataFrame.describe([percentiles])
DataFrame.describe
Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
NaN
DataFrame.kurt([axis, numeric_only])
DataFrame.kurt
Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
DataFrame.kurtosis([axis, numeric_only])
DataFrame.kurtosis
DataFrame.mad([axis])
DataFrame.mad
Return the mean absolute deviation of values.
DataFrame.max([axis, numeric_only])
DataFrame.max
Return the maximum of the values.
DataFrame.mean([axis, numeric_only])
DataFrame.mean
Return the mean of the values.
DataFrame.min([axis, numeric_only])
DataFrame.min
Return the minimum of the values.
DataFrame.median([axis, numeric_only, accuracy])
DataFrame.median
Return the median of the values for the requested axis.
DataFrame.pct_change([periods])
DataFrame.pct_change
Percentage change between the current and a prior element.
DataFrame.prod([axis, numeric_only, min_count])
DataFrame.prod
Return the product of the values.
DataFrame.product([axis, numeric_only, …])
DataFrame.product
DataFrame.quantile([q, axis, numeric_only, …])
DataFrame.quantile
Return value at the given quantile.
DataFrame.nunique([axis, dropna, approx, rsd])
DataFrame.nunique
Return number of unique elements in the object.
DataFrame.sem([axis, ddof, numeric_only])
DataFrame.sem
Return unbiased standard error of the mean over requested axis.
DataFrame.skew([axis, numeric_only])
DataFrame.skew
Return unbiased skew normalized by N-1.
DataFrame.sum([axis, numeric_only, min_count])
DataFrame.sum
Return the sum of the values.
DataFrame.std([axis, ddof, numeric_only])
DataFrame.std
Return sample standard deviation.
DataFrame.var([axis, ddof, numeric_only])
DataFrame.var
Return unbiased variance.
DataFrame.cummin([skipna])
DataFrame.cummin
Return cumulative minimum over a DataFrame or Series axis.
DataFrame.cummax([skipna])
DataFrame.cummax
Return cumulative maximum over a DataFrame or Series axis.
DataFrame.cumsum([skipna])
DataFrame.cumsum
Return cumulative sum over a DataFrame or Series axis.
DataFrame.cumprod([skipna])
DataFrame.cumprod
Return cumulative product over a DataFrame or Series axis.
DataFrame.round([decimals])
DataFrame.round
Round a DataFrame to a variable number of decimal places.
DataFrame.diff([periods, axis])
DataFrame.diff
First discrete difference of element.
DataFrame.eval(expr[, inplace])
DataFrame.eval
Evaluate a string describing operations on DataFrame columns.
DataFrame.add_prefix(prefix)
DataFrame.add_prefix
Prefix labels with string prefix.
DataFrame.add_suffix(suffix)
DataFrame.add_suffix
Suffix labels with string suffix.
DataFrame.align(other[, join, axis, copy])
DataFrame.align
Align two objects on their axes with the specified join method.
DataFrame.at_time(time[, asof, axis])
DataFrame.at_time
Select values at particular time of day (example: 9:30AM).
DataFrame.between_time(start_time, end_time)
DataFrame.between_time
Select values between particular times of the day (example: 9:00-9:30 AM).
DataFrame.drop([labels, axis, columns])
DataFrame.drop
Drop specified labels from columns.
DataFrame.droplevel(level[, axis])
DataFrame.droplevel
Return DataFrame with requested index / column level(s) removed.
DataFrame.drop_duplicates([subset, keep, …])
DataFrame.drop_duplicates
Return DataFrame with duplicate rows removed, optionally only considering certain columns.
DataFrame.duplicated([subset, keep])
DataFrame.duplicated
Return boolean Series denoting duplicate rows, optionally only considering certain columns.
DataFrame.equals(other)
DataFrame.equals
DataFrame.filter([items, like, regex, axis])
DataFrame.filter
Subset rows or columns of dataframe according to labels in the specified index.
DataFrame.first(offset)
DataFrame.first
Select first periods of time series data based on a date offset.
DataFrame.last(offset)
DataFrame.last
Select final periods of time series data based on a date offset.
DataFrame.rename([mapper, index, columns, …])
DataFrame.rename
Alter axes labels.
DataFrame.rename_axis([mapper, index, …])
DataFrame.rename_axis
Set the name of the axis for the index or columns.
DataFrame.reset_index([level, drop, …])
DataFrame.reset_index
Reset the index, or a level of it.
DataFrame.set_index(keys[, drop, append, …])
DataFrame.set_index
Set the DataFrame index (row labels) using one or more existing columns.
DataFrame.swapaxes(i, j[, copy])
DataFrame.swapaxes
Interchange axes and swap values axes appropriately.
DataFrame.swaplevel([i, j, axis])
DataFrame.swaplevel
Swap levels i and j in a MultiIndex on a particular axis.
DataFrame.take(indices[, axis])
DataFrame.take
Return the elements in the given positional indices along an axis.
DataFrame.isin(values)
DataFrame.isin
Whether each element in the DataFrame is contained in values.
DataFrame.sample([n, frac, replace, …])
DataFrame.sample
Return a random sample of items from an axis of object.
DataFrame.truncate([before, after, axis, copy])
DataFrame.truncate
Truncate a Series or DataFrame before and after some index value.
DataFrame.backfill([axis, inplace, limit])
DataFrame.backfill
Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.
method=`bfill`
DataFrame.dropna([axis, how, thresh, …])
DataFrame.dropna
Remove missing values.
DataFrame.fillna([value, method, axis, …])
DataFrame.fillna
Fill NA/NaN values.
DataFrame.replace([to_replace, value, …])
DataFrame.replace
Returns a new DataFrame replacing a value with another value.
DataFrame.bfill([axis, inplace, limit])
DataFrame.bfill
DataFrame.ffill([axis, inplace, limit])
DataFrame.ffill
DataFrame.pivot_table([values, index, …])
DataFrame.pivot_table
Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivot([index, columns, values])
DataFrame.pivot
Return reshaped DataFrame organized by given index / column values.
DataFrame.sort_index([axis, level, …])
DataFrame.sort_index
Sort object by labels (along an axis)
DataFrame.sort_values(by[, ascending, …])
DataFrame.sort_values
Sort by the values along either axis.
DataFrame.nlargest(n, columns)
DataFrame.nlargest
Return the first n rows ordered by columns in descending order.
DataFrame.nsmallest(n, columns)
DataFrame.nsmallest
Return the first n rows ordered by columns in ascending order.
DataFrame.stack()
DataFrame.stack
Stack the prescribed level(s) from columns to index.
DataFrame.unstack()
DataFrame.unstack
Pivot the (necessarily hierarchical) index labels.
DataFrame.melt([id_vars, value_vars, …])
DataFrame.melt
Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set.
DataFrame.explode(column)
DataFrame.explode
Transform each element of a list-like to a row, replicating index values.
DataFrame.squeeze([axis])
DataFrame.squeeze
Squeeze 1 dimensional axis objects into scalars.
DataFrame.T
Transpose index and columns.
DataFrame.transpose()
DataFrame.transpose
DataFrame.reindex([labels, index, columns, …])
DataFrame.reindex
Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
DataFrame.reindex_like(other[, copy])
DataFrame.reindex_like
Return a DataFrame with matching indices as other object.
DataFrame.rank([method, ascending])
DataFrame.rank
Compute numerical data ranks (1 through n) along axis.
DataFrame.append(other[, ignore_index, …])
DataFrame.append
Append rows of other to the end of caller, returning a new object.
DataFrame.assign(**kwargs)
DataFrame.assign
Assign new columns to a DataFrame.
DataFrame.merge(right[, how, on, left_on, …])
DataFrame.merge
Merge DataFrame objects with a database-style join.
DataFrame.join(right[, on, how, lsuffix, …])
DataFrame.join
Join columns of another DataFrame.
DataFrame.update(other[, join, overwrite])
DataFrame.update
Modify in place using non-NA values from another DataFrame.
DataFrame.insert(loc, column, value[, …])
DataFrame.insert
Insert column into DataFrame at specified location.
DataFrame.shift([periods, fill_value])
DataFrame.shift
Shift DataFrame by desired number of periods.
DataFrame.first_valid_index()
DataFrame.first_valid_index
Retrieves the index of the first valid value.
DataFrame.last_valid_index()
DataFrame.last_valid_index
Return index for last non-NA/null value.
DataFrame.from_records(data[, index, …])
DataFrame.from_records
Convert structured or record ndarray to DataFrame.
DataFrame.info([verbose, buf, max_cols, …])
DataFrame.info
Print a concise summary of a DataFrame.
DataFrame.to_table(name[, format, mode, …])
DataFrame.to_table
Write the DataFrame into a Spark table.
DataFrame.to_delta(path[, mode, …])
DataFrame.to_delta
Write the DataFrame out as a Delta Lake table.
DataFrame.to_parquet(path[, mode, …])
DataFrame.to_parquet
Write the DataFrame out as a Parquet file or directory.
DataFrame.to_spark_io([path, format, mode, …])
DataFrame.to_spark_io
Write the DataFrame out to a Spark data source.
DataFrame.to_csv([path, sep, na_rep, …])
DataFrame.to_csv
Write object to a comma-separated values (csv) file.
DataFrame.to_pandas()
DataFrame.to_pandas
Return a pandas DataFrame.
DataFrame.to_html([buf, columns, col_space, …])
DataFrame.to_html
Render a DataFrame as an HTML table.
DataFrame.to_numpy()
DataFrame.to_numpy
A NumPy ndarray representing the values in this DataFrame or Series.
DataFrame.to_spark([index_col])
DataFrame.to_spark
Spark related features.
DataFrame.to_string([buf, columns, …])
DataFrame.to_string
Render a DataFrame to a console-friendly tabular output.
DataFrame.to_json([path, compression, …])
DataFrame.to_json
Convert the object to a JSON string.
DataFrame.to_dict([orient, into])
DataFrame.to_dict
Convert the DataFrame to a dictionary.
DataFrame.to_excel(excel_writer[, …])
DataFrame.to_excel
Write object to an Excel sheet.
DataFrame.to_clipboard([excel, sep])
DataFrame.to_clipboard
Copy object to the system clipboard.
DataFrame.to_markdown([buf, mode])
DataFrame.to_markdown
Print Series or DataFrame in Markdown-friendly format.
DataFrame.to_records([index, column_dtypes, …])
DataFrame.to_records
Convert DataFrame to a NumPy record array.
DataFrame.to_latex([buf, columns, …])
DataFrame.to_latex
Render an object to a LaTeX tabular environment table.
DataFrame.style
Property returning a Styler object containing methods for building a styled HTML representation for the DataFrame.
DataFrame.spark provides features that does not exist in pandas but in Spark. These can be accessed by DataFrame.spark.<function/property>.
DataFrame.spark
DataFrame.spark.<function/property>
DataFrame.spark.frame([index_col])
DataFrame.spark.frame
Return the current DataFrame as a Spark DataFrame.
DataFrame.spark.cache()
DataFrame.spark.cache
Yields and caches the current DataFrame.
DataFrame.spark.persist([storage_level])
DataFrame.spark.persist
Yields and caches the current DataFrame with a specific StorageLevel.
DataFrame.spark.hint(name, *parameters)
DataFrame.spark.hint
Specifies some hint on the current DataFrame.
DataFrame.spark.to_table(name[, format, …])
DataFrame.spark.to_table
DataFrame.spark.to_spark_io([path, format, …])
DataFrame.spark.to_spark_io
DataFrame.spark.apply(func[, index_col])
DataFrame.spark.apply
Applies a function that takes and returns a Spark DataFrame.
DataFrame.spark.repartition(num_partitions)
DataFrame.spark.repartition
Returns a new DataFrame partitioned by the given partitioning expressions.
DataFrame.spark.coalesce(num_partitions)
DataFrame.spark.coalesce
Returns a new DataFrame that has exactly num_partitions partitions.
DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.
DataFrame.plot
DataFrame.plot.<kind>
alias of pyspark.pandas.plot.core.PandasOnSparkPlotAccessor
pyspark.pandas.plot.core.PandasOnSparkPlotAccessor
DataFrame.plot.area([x, y])
DataFrame.plot.area
Draw a stacked area plot.
DataFrame.plot.barh([x, y])
DataFrame.plot.barh
Make a horizontal bar plot.
DataFrame.plot.bar([x, y])
DataFrame.plot.bar
Vertical bar plot.
DataFrame.plot.hist([bins])
DataFrame.plot.hist
Draw one histogram of the DataFrame’s columns.
DataFrame.plot.line([x, y])
DataFrame.plot.line
Plot DataFrame/Series as lines.
DataFrame.plot.pie(**kwds)
DataFrame.plot.pie
Generate a pie plot.
DataFrame.plot.scatter(x, y, **kwds)
DataFrame.plot.scatter
Create a scatter plot with varying marker point size and color.
DataFrame.plot.density([bw_method, ind])
DataFrame.plot.density
Generate Kernel Density Estimate plot using Gaussian kernels.
DataFrame.hist([bins])
DataFrame.hist
DataFrame.kde([bw_method, ind])
DataFrame.kde
DataFrame.pandas_on_spark provides pandas-on-Spark specific features that exists only in pandas API on Spark. These can be accessed by DataFrame.pandas_on_spark.<function/property>.
DataFrame.pandas_on_spark
DataFrame.pandas_on_spark.<function/property>
DataFrame.pandas_on_spark.apply_batch(func)
DataFrame.pandas_on_spark.apply_batch
Apply a function that takes pandas DataFrame and outputs pandas DataFrame.
DataFrame.pandas_on_spark.transform_batch(…)
DataFrame.pandas_on_spark.transform_batch
Transform chunks with a function that takes pandas DataFrame and outputs pandas DataFrame.