pyspark.mllib.regression.
IsotonicRegressionModel
Regression model for isotonic regression.
New in version 1.4.0.
Array of boundaries for which predictions are known. Boundaries must be sorted in increasing order.
Array of predictions associated to the boundaries at the same index. Results of isotonic regression and therefore monotone.
Indicates whether this is isotonic or antitonic.
Examples
>>> data = [(1, 0, 1), (2, 1, 1), (3, 2, 1), (1, 3, 1), (6, 4, 1), (17, 5, 1), (16, 6, 1)] >>> irm = IsotonicRegression.train(sc.parallelize(data)) >>> irm.predict(3) 2.0 >>> irm.predict(5) 16.5 >>> irm.predict(sc.parallelize([3, 5])).collect() [2.0, 16.5] >>> import os, tempfile >>> path = tempfile.mkdtemp() >>> irm.save(sc, path) >>> sameModel = IsotonicRegressionModel.load(sc, path) >>> sameModel.predict(3) 2.0 >>> sameModel.predict(5) 16.5 >>> from shutil import rmtree >>> try: ... rmtree(path) ... except OSError: ... pass
Methods
load(sc, path)
load
Load an IsotonicRegressionModel.
predict(x)
predict
Predict labels for provided features.
save(sc, path)
save
Save an IsotonicRegressionModel.
Methods Documentation
Predict labels for provided features. Using a piecewise linear function. 1) If x exactly matches a boundary then associated prediction is returned. In case there are multiple predictions with the same boundary then one of them is returned. Which one is undefined (same as java.util.Arrays.binarySearch). 2) If x is lower or higher than all boundaries then first or last prediction is returned respectively. In case there are multiple predictions with the same boundary then the lowest or highest is returned respectively. 3) If x falls between two values in boundary array then prediction is treated as piecewise linear function and interpolated value is returned. In case there are multiple values with the same boundary then the same rules as in 2) are used.
pyspark.mllib.linalg.Vector
pyspark.RDD
Feature or RDD of Features to be labeled.