FPGrowthModel¶

class pyspark.mllib.fpm.FPGrowthModel(java_model: py4j.java_gateway.JavaObject)[source]¶

A FP-Growth model for mining frequent itemsets using the Parallel FP-Growth algorithm.

New in version 1.4.0.

Examples

>>> data = [["a", "b", "c"], ["a", "b", "d", "e"], ["a", "c", "e"], ["a", "c", "f"]]
>>> rdd = sc.parallelize(data, 2)
>>> model = FPGrowth.train(rdd, 0.6, 2)
>>> sorted(model.freqItemsets().collect())
[FreqItemset(items=['a'], freq=4), FreqItemset(items=['c'], freq=3), ...
>>> model_path = temp_path + "/fpm"
>>> model.save(sc, model_path)
>>> sameModel = FPGrowthModel.load(sc, model_path)
>>> sorted(model.freqItemsets().collect()) == sorted(sameModel.freqItemsets().collect())
True

Methods

`call`(name, *a)	Call method of java_model
`freqItemsets`()	Returns the frequent itemsets of this model.
`load`(sc, path)	Load a model from the given path.
`save`(sc, path)	Save this model to the given path.

Methods Documentation

call(name: str, *a: Any) → Any¶: Call method of java_model

freqItemsets() → pyspark.rdd.RDD[pyspark.mllib.fpm.FPGrowth.FreqItemset][source]¶: Returns the frequent itemsets of this model.

New in version 1.4.0.

classmethod load(sc: pyspark.context.SparkContext, path: str) → pyspark.mllib.fpm.FPGrowthModel [source]¶: Load a model from the given path.

New in version 2.0.0.

save(sc: pyspark.context.SparkContext, path: str) → None¶: Save this model to the given path.

New in version 1.3.0.

FPGrowth PrefixSpan