pyspark.RDD.persist¶

RDD.persist(storageLevel=StorageLevel(False, True, False, False, 1))[source]¶

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (MEMORY_ONLY).

Examples

>>> rdd = sc.parallelize(["b", "a", "c"])
>>> rdd.persist().is_cached
True

pyspark.RDD.partitionBy pyspark.RDD.pipe