The entry point to programming Spark with the Dataset and DataFrame API. To create a Spark session, you should use SparkSession.builder attribute. See also SparkSession.
SparkSession.builder
SparkSession
SparkSession.builder.appName(name)
SparkSession.builder.appName
Sets a name for the application, which will be shown in the Spark web UI.
SparkSession.builder.config([key, value, conf])
SparkSession.builder.config
Sets a config option.
SparkSession.builder.enableHiveSupport()
SparkSession.builder.enableHiveSupport
Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions.
SparkSession.builder.getOrCreate()
SparkSession.builder.getOrCreate
Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder.
SparkSession.builder.master(master)
SparkSession.builder.master
Sets the Spark master URL to connect to, such as “local” to run locally, “local[4]” to run locally with 4 cores, or “spark://master:7077” to run on a Spark standalone cluster.
SparkSession.catalog
Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.
SparkSession.conf
Runtime configuration interface for Spark.
SparkSession.createDataFrame(data[, schema, …])
SparkSession.createDataFrame
Creates a DataFrame from an RDD, a list or a pandas.DataFrame.
DataFrame
RDD
pandas.DataFrame
SparkSession.getActiveSession()
SparkSession.getActiveSession
Returns the active SparkSession for the current thread, returned by the builder
SparkSession.newSession()
SparkSession.newSession
Returns a new SparkSession as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.
SparkContext
SparkSession.range(start[, end, step, …])
SparkSession.range
Create a DataFrame with single pyspark.sql.types.LongType column named id, containing elements in a range from start to end (exclusive) with step value step.
pyspark.sql.types.LongType
id
start
end
step
SparkSession.read
Returns a DataFrameReader that can be used to read data in as a DataFrame.
DataFrameReader
SparkSession.readStream
Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame.
DataStreamReader
SparkSession.sparkContext
Returns the underlying SparkContext.
SparkSession.sql(sqlQuery, **kwargs)
SparkSession.sql
Returns a DataFrame representing the result of the given query.
SparkSession.stop()
SparkSession.stop
Stop the underlying SparkContext.
SparkSession.streams
Returns a StreamingQueryManager that allows managing all the StreamingQuery instances active on this context.
StreamingQueryManager
StreamingQuery
SparkSession.table(tableName)
SparkSession.table
Returns the specified table as a DataFrame.
SparkSession.udf
Returns a UDFRegistration for UDF registration.
UDFRegistration
SparkSession.version
The version of Spark on which this application is running.