Normalizer

class pyspark.mllib.feature.Normalizer(p: float = 2.0)[source]

Normalizes samples individually to unit Lp norm

For any 1 <= p < float(‘inf’), normalizes samples using sum(abs(vector) p) (1/p) as norm.

For p = float(‘inf’), max(abs(vector)) will be used as norm for normalization.

New in version 1.2.0.

Parameters
pfloat, optional

Normalization in L^p^ space, p = 2 by default.

Examples

>>> from pyspark.mllib.linalg import Vectors
>>> v = Vectors.dense(range(3))
>>> nor = Normalizer(1)
>>> nor.transform(v)
DenseVector([0.0, 0.3333, 0.6667])
>>> rdd = sc.parallelize([v])
>>> nor.transform(rdd).collect()
[DenseVector([0.0, 0.3333, 0.6667])]
>>> nor2 = Normalizer(float("inf"))
>>> nor2.transform(v)
DenseVector([0.0, 0.5, 1.0])

Methods

transform(vector)

Applies unit length normalization on a vector.

Methods Documentation

transform(vector: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]

Applies unit length normalization on a vector.

New in version 1.2.0.

Parameters
vectorpyspark.mllib.linalg.Vector or pyspark.RDD

vector or RDD of vector to be normalized.

Returns
pyspark.mllib.linalg.Vector or pyspark.RDD

normalized vector(s). If the norm of the input is zero, it will return the input vector.