pyspark.mllib.feature.
Normalizer
Normalizes samples individually to unit Lp norm
For any 1 <= p < float(‘inf’), normalizes samples using sum(abs(vector) p) (1/p) as norm.
For p = float(‘inf’), max(abs(vector)) will be used as norm for normalization.
New in version 1.2.0.
Normalization in L^p^ space, p = 2 by default.
Examples
>>> from pyspark.mllib.linalg import Vectors >>> v = Vectors.dense(range(3)) >>> nor = Normalizer(1) >>> nor.transform(v) DenseVector([0.0, 0.3333, 0.6667])
>>> rdd = sc.parallelize([v]) >>> nor.transform(rdd).collect() [DenseVector([0.0, 0.3333, 0.6667])]
>>> nor2 = Normalizer(float("inf")) >>> nor2.transform(v) DenseVector([0.0, 0.5, 1.0])
Methods
transform(vector)
transform
Applies unit length normalization on a vector.
Methods Documentation
pyspark.mllib.linalg.Vector
pyspark.RDD
vector or RDD of vector to be normalized.
normalized vector(s). If the norm of the input is zero, it will return the input vector.