pyspark.sql.functions.
split
Splits str around matches of the given pattern.
New in version 1.5.0.
Column
a string expression to split
a string representing a regular expression. The regex string should be a Java regular expression.
an integer which controls the number of times pattern is applied.
limit > 0
resulting array’s last entry will contain all input beyond the last matched pattern.
limit <= 0
array can be of any size.
Changed in version 3.0: split now takes an optional limit field. If not provided, default limit value is -1.
Examples
>>> df = spark.createDataFrame([('oneAtwoBthreeC',)], ['s',]) >>> df.select(split(df.s, '[ABC]', 2).alias('s')).collect() [Row(s=['one', 'twoBthreeC'])] >>> df.select(split(df.s, '[ABC]', -1).alias('s')).collect() [Row(s=['one', 'two', 'three', ''])]