pyspark.mllib.feature.
IDFModel
Represents an IDF model that can transform term frequency vectors.
New in version 1.2.0.
Methods
call(name, *a)
call
Call method of java_model
docFreq()
docFreq
Returns the document frequency.
idf()
idf
Returns the current IDF vector.
numDocs()
numDocs
Returns number of documents evaluated to compute idf
transform(x)
transform
Transforms term frequency (TF) vectors to TF-IDF vectors.
Methods Documentation
New in version 3.0.0.
New in version 1.4.0.
If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.
pyspark.mllib.linalg.Vector
pyspark.RDD
an RDD of term frequency vectors or a term frequency vector
an RDD of TF-IDF vectors or a TF-IDF vector
Notes
In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.