nntrf: Neural Network based Transformations: Supervised Data Transformation by Means of Neural Network Hidden Layer

Getting Started

The goal of nntrf is to transform datasets from their original feature space to the space defined by the activations of the hidden layer of a 3-layer Multi-layer Perceptron. This is done by training a neural network and then computing the activations of the neural network for each input pattern. Package nnet is used under the hood for this purpose. It is a supervised transformation because it results from solving a supervised problem, as opposed (for instance) to Principal Component Analysis.

The following example shows how to transform the iris dataset, from the original 4-dimension space, into a 2-dimension space by means of nntrf. For nntrf, if the dependent variable is a factor (like for iris), then it is considered a classification problem, otherwise (if it is a number), then it is considered a regression problem.

  iris <- NULL
  data("iris", envir = environment())
  rd <- iris
  n <- nrow(rd)
  # Species is already a factor. Conversion is here to remark that for classification problems
  #   the dependent variable must be a factor.
  rd$Species <- as.factor(rd$Species)
  
  set.seed(0)
  training_index <- sample(1:n, round(0.6*n))
  
  # Get training and test data
  train <- rd[training_index,]
  test <- rd[-training_index,]
  x_train <- as.matrix(train[,-ncol(train)])
  y_train <- train[,ncol(train)]
  x_test <- as.matrix(test[,-ncol(test)])
  y_test <- test[,ncol(test)] 
  
  # Now, use nntrf to transform the original 4-dim space into a 2-dim space (size=2)
  # First, we train the neural network with 2 hidden neurons
  set.seed(0)
  nnpo <- nntrf(formula=Species ~. ,
                data=train,
                size=2, maxit=140, trace=TRUE)
  
  # Second, we transform the dataset using the weights of the hidden layer
  trf_x_train <- nnpo$trf(x=x_train,use_sigmoid=FALSE)
  trf_x_test <- nnpo$trf(x=x_test,use_sigmoid=FALSE)
  
  # It can be seen that the new feature space is 2-dimensional
  print(dim(trf_x_train))
  
  # Third, KNN is used to classify the dataset on the transformed space
  outputs <- FNN::knn(trf_x_train, trf_x_test, train$Species)
  success <- mean(outputs == test$Species)
  cat(paste0("Success rate of KNN (K=1) with iris transformed by nntrf ", success, "\n"))
  
  # For comparison purposes, next KNN is used to classify the iris dataset on the original space
  set.seed(0)
  outputs <- FNN::knn(x_train, x_test, train$Species)
  success <- mean(outputs == test$Species)
  cat(paste0("Success rate of KNN (K=1) with original iris ", success, "\n"))