pyspark.pandas.
read_csv
Read CSV (comma-separated) file into DataFrame or Series.
Path(s) of the CSV file(s) to be read.
Delimiter to use. Non empty string.
Whether to use the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names
List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list will cause an error to be issued. If a string is given, it should be a DDL-formatted string in Spark SQL, which is preferred to avoid schema inference for better performance.
Index column of table in Spark.
Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True.
If the parsed data only contains one column then return a Series.
Deprecated since version 3.4.0.
Duplicate columns will be specified as ‘X0’, ‘X1’, … ‘XN’, rather than ‘X’ … ‘X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns. Currently only True is allowed.
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use str or object together with suitable na_values settings to preserve and not interpret dtype.
Number of rows to read from the CSV file.
Currently only False is allowed.
The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
One-character string used to escape other characters.
Indicates the line should not be parsed.
Indicates the encoding to read file
All other options passed directly into Spark’s data source.
See also
DataFrame.to_csv
Write DataFrame to a comma-separated values (csv) file.
Examples
>>> ps.read_csv('data.csv')
Load multiple CSV files as a single DataFrame:
>>> ps.read_csv(['data-01.csv', 'data-02.csv'])