pyspark.mllib.classification.
StreamingLogisticRegressionWithSGD
Train or predict a logistic regression model on streaming data. Training uses Stochastic Gradient Descent to update the model based on each new batch of incoming data from a DStream.
Each batch of data is assumed to be an RDD of LabeledPoints. The number of data points per batch can vary, but the number of features must be constant. An initial weight vector must be provided.
New in version 1.5.0.
Step size for each iteration of gradient descent. (default: 0.1)
Number of iterations run for each batch of data. (default: 50)
Fraction of each batch of data to use for updates. (default: 1.0)
L2 Regularization parameter. (default: 0.0)
Value used to determine when to terminate iterations. (default: 0.001)
Methods
latestModel()
latestModel
Returns the latest model.
predictOn(dstream)
predictOn
Use the model to make predictions on batches of data from a DStream.
predictOnValues(dstream)
predictOnValues
Use the model to make predictions on the values of a DStream and carry over its keys.
setInitialWeights(initialWeights)
setInitialWeights
Set the initial value of weights.
trainOn(dstream)
trainOn
Train the model on the incoming dstream.
Methods Documentation
pyspark.streaming.DStream
DStream containing predictions.
This must be set before running trainOn and predictOn.