An R package implementing a Projection Pursuit algorithm based on finite Gaussian Mixtures Models for density estimation using Genetic Algorithms (PPGMMGA) to maximise an approximated Negentropy index. The ppgmmga algorithm provides a method to visualise high-dimensional data in a lower-dimensional space, with special reference to reveal clustering structures.
pp1D <- ppgmmga(data = X, d = 1, approx = "UT", seed = 1)
pp1D
## Call:
## ppgmmga(data = X, d = 1, approx = "UT", seed = 1)
##
## 'ppgmmga' object containing:
## [1] "data" "d" "approx" "GMM" "GA"
## [6] "Negentropy" "basis" "Z"
summary(pp1D)
## ── ppgmmga ─────────────────────────────
##
## Data dimensions = 200 x 6
## Data transformation = center & scale
## Projection subspace dimension = 1
## GMM density estimate = (VEE,4)
## Negentropy approximation = UT
## GA optimal negentropy = 0.6345935
## GA encoded basis solution:
## x1 x2 x3 x4 x5
## [1,] 3.268902 2.373044 1.051365 0.3131285 0.531718
##
## Estimated projection basis:
## PP1
## Length -0.01196531
## Left -0.09347750
## Right 0.16021052
## Bottom 0.57406981
## Top 0.34503463
## Diagonal -0.71892026
pp2D <- ppgmmga(data = X, d = 2, approx = "UT", seed = 1)
summary(pp2D, check = TRUE)
## ── ppgmmga ─────────────────────────────
##
## Data dimensions = 200 x 6
## Data transformation = center & scale
## Projection subspace dimension = 2
## GMM density estimate = (VEE,4)
## Negentropy approximation = UT
## GA optimal negentropy = 1.13624
## GA encoded basis solution:
## x1 x2 x3 x4 x5 x6 x7
## [1,] 2.268667 2.929821 1.061407 1.084929 0.3044298 3.85462 0.9832903
## x8 x9 x10
## [1,] 1.11377 0.1671738 1.668403
##
## Estimated projection basis:
## PP1 PP2
## Length -0.03726866 -0.07183191
## Left 0.03125553 -0.11981164
## Right -0.15480788 0.06300918
## Bottom -0.08569311 0.86390485
## Top -0.10249897 0.46037272
## Diagonal 0.97766012 0.13505761
##
## Monte Carlo Negentropy approximation check:
## UT
## Approx Negentropy 1.136240194
## MC Negentropy 1.137260367
## MC se 0.003527379
## Relative accuracy 0.999102956
summary(pp2D$GMM)
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust VEE (ellipsoidal, equal shape and orientation) model with 4
## components:
##
## log-likelihood n df BIC ICL
## -1191.595 200 51 -2653.405 -2666.898
##
## Clustering table:
## 1 2 3 4
## 16 99 47 38
pp3D <- ppgmmga(data = X, d = 3,
center = TRUE, scale = FALSE,
gatype = "gaisl",
options = ppgmmga.options(numIslands = 2),
seed = 1)
summary(pp3D, check = TRUE)
## ── ppgmmga ─────────────────────────────
##
## Data dimensions = 200 x 6
## Data transformation = center
## Projection subspace dimension = 3
## GMM density estimate = (VVE,3)
## Negentropy approximation = UT
## GA optimal negentropy = 1.16915
## GA encoded basis solution:
## x1 x2 x3 x4 x5 x6 x7
## [1,] 4.306147 2.435962 1.072888 1.02168 1.039589 4.934657 2.005115
## x8 x9 x10 ... x14 x15
## [1,] 2.047029 1.950543 2.200697 1.534584 2.504773
##
## Estimated projection basis:
## PP1 PP2 PP3
## Length -0.3849309 0.5240368 -0.5116536
## Left -0.1655861 -0.1697583 -0.3109141
## Right 0.2462001 0.5001222 -0.4154481
## Bottom 0.2973840 0.3653894 0.3867856
## Top 0.3097231 0.4873071 0.3130374
## Diagonal -0.7612025 0.2747140 0.4704789
##
## Monte Carlo Negentropy approximation check:
## UT
## Approx Negentropy 1.169149621
## MC Negentropy 1.173876686
## MC se 0.004294694
## Relative accuracy 0.995973116
# A rotating 3D plot can be obtained using
if(!require("msir")) install.packages("msir")
msir::spinplot(pp3D$Z, markby = Class,
pch.points = c(19,17),
col.points = Class_color)
Scrucca L, Serafini A (2019). “Projection pursuit based on Gaussian mixtures and evolutionary algorithms.” Journal of Computational and Graphical Statistics. doi: 10.1080/10618600.2019.1598871 (URL: https://doi.org/10.1080/10618600.2019.1598871).
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mclust_5.4.3 ppgmmga_1.2 knitr_1.23
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 GA_3.2 compiler_3.6.0 pillar_1.4.1
## [5] iterators_1.0.10 tools_3.6.0 digest_0.6.19 evaluate_0.14
## [9] tibble_2.1.3 gtable_0.3.0 pkgconfig_2.0.2 rlang_0.4.0
## [13] foreach_1.4.4 cli_1.1.0 yaml_2.2.0 xfun_0.8
## [17] stringr_1.4.0 dplyr_0.8.1 grid_3.6.0 tidyselect_0.2.5
## [21] glue_1.3.1 R6_2.4.0 rmarkdown_1.13 ggplot2_3.2.0
## [25] purrr_0.3.2 magrittr_1.5 scales_1.0.0 codetools_0.2-16
## [29] htmltools_0.3.6 ggthemes_4.2.0 assertthat_0.2.1 colorspace_1.4-1
## [33] labeling_0.3 stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0
## [37] crayon_1.3.4