The arules package for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat. In addition, the following mining algorithms are available via fim4r:
Code examples can be found in Chapter 5 of the web book R Companion for Introduction to Data Mining.
fim4r()
is provided in arules
.opus()
with format = 'itemsets'
.Stable CRAN version: Install from within R with
install.packages("arules")
Current development version: Install from r-universe.
install.packages("arules", repos = "https://mhahsler.r-universe.dev")
Load package and mine some association rules.
library("arules")
data("IncomeESL")
<- transactions(IncomeESL)
trans trans
## transactions in sparse format with
## 8993 transactions (rows) and
## 84 items (columns)
<- apriori(trans, supp = 0.1, conf = 0.9, target = "rules") rules
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 899
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Inspect the rules with the highest lift.
inspect(head(rules, n = 3, by = "lift"))
## lhs rhs support confidence coverage lift count
## [1] {dual incomes=no,
## householder status=own} => {marital status=married} 0.10 0.97 0.10 2.6 914
## [2] {years in bay area=>10,
## dual incomes=yes,
## type of home=house} => {marital status=married} 0.10 0.96 0.10 2.6 902
## [3] {dual incomes=yes,
## householder status=own,
## type of home=house,
## language in home=english} => {marital status=married} 0.11 0.96 0.11 2.6 988
arules
works seamlessly with tidyverse. For example:
dplyr
can be used for cleaning and preparing the
transactions.transaction()
and other functions accept
tibble
as input.%>%
.ggplot2
.For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.
library("tidyverse")
library("arules")
data("IncomeESL")
<- IncomeESL %>%
trans select(-`ethnic classification`) %>%
transactions()
<- trans %>%
rules apriori(supp = 0.1, conf = 0.9, target = "rules", control = list(verbose = FALSE))
%>%
rules head(n = 3, by = "lift") %>%
inspect()
## lhs rhs support confidence coverage lift count
## [1] {dual incomes=no,
## householder status=own} => {marital status=married} 0.10 0.97 0.10 2.6 914
## [2] {years in bay area=>10,
## dual incomes=yes,
## type of home=house} => {marital status=married} 0.10 0.96 0.10 2.6 902
## [3] {dual incomes=yes,
## householder status=own,
## type of home=house,
## language in home=english} => {marital status=married} 0.11 0.96 0.11 2.6 988
See Getting started with arules using Python.
Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.