R: Generalized Linear Models

spark.glm {SparkR}

R Documentation

Generalized Linear Models

Description

Fits generalized linear model against a Spark DataFrame. Users can call summary to print a summary of the fitted model, predict to make predictions on new data, and write.ml/read.ml to save/load fitted models.

Usage

spark.glm(data, formula, ...)

## S4 method for signature 'SparkDataFrame,formula'
spark.glm(data, formula, family = gaussian,
  tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0)

## S4 method for signature 'GeneralizedLinearRegressionModel'
summary(object)

## S3 method for class 'summary.GeneralizedLinearRegressionModel'
print(x, ...)

## S4 method for signature 'GeneralizedLinearRegressionModel'
predict(object, newData)

## S4 method for signature 'GeneralizedLinearRegressionModel,character'
write.ml(object, path,
  overwrite = FALSE)

Arguments

`data`	a SparkDataFrame for training.
`formula`	a symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'.
`...`	additional arguments passed to the method.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Refer R family at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html.
`tol`	positive convergence tolerance of iterations.
`maxIter`	integer giving the maximal number of IRLS iterations.
`weightCol`	the weight column name. If this is not set or `NULL`, we treat all instance weights as 1.0.
`regParam`	regularization parameter for L2 regularization.
`object`	a fitted generalized linear model.
`x`	summary object of fitted generalized linear model returned by `summary` function.
`newData`	a SparkDataFrame for testing.
`path`	the directory where the model is saved.
`overwrite`	overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.glm returns a fitted generalized linear model.

summary returns summary information of the fitted model, which is a list. The list of components includes at least the coefficients (coefficients matrix, which includes coefficients, standard error of coefficients, t value and p value), null.deviance (null/residual degrees of freedom), aic (AIC) and iter (number of iterations IRLS takes). If there are collinear columns in the data, the coefficients matrix only provides coefficients.

predict returns a SparkDataFrame containing predicted labels in a column named "prediction".

Note

spark.glm since 2.0.0

summary(GeneralizedLinearRegressionModel) since 2.0.0

print.summary.GeneralizedLinearRegressionModel since 2.0.0

predict(GeneralizedLinearRegressionModel) since 1.5.0

write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0

Examples

## Not run: 
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
##D summary(model)
##D 
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D 
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D 
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)

[Package SparkR version 2.1.0 Index]