pyspark.sql.functions.split

pyspark.sql.functions.split(str: ColumnOrName, pattern: str, limit: int = - 1) → pyspark.sql.column.Column[source]

Splits str around matches of the given pattern.

New in version 1.5.0.

Parameters
strColumn or str

a string expression to split

patternstr

a string representing a regular expression. The regex string should be a Java regular expression.

limitint, optional

an integer which controls the number of times pattern is applied.

  • limit > 0: The resulting array’s length will not be more than limit, and the

    resulting array’s last entry will contain all input beyond the last matched pattern.

  • limit <= 0: pattern will be applied as many times as possible, and the resulting

    array can be of any size.

Changed in version 3.0: split now takes an optional limit field. If not provided, default limit value is -1.

Examples

>>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',])
>>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect()
[Row(s=['one', 'twoBthreeC'])]
>>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect()
[Row(s=['one', 'two', 'three', ''])]