This vignette gives a very basic introduction to multidimensional unfolding for preference data using smacof. Technical details and more advanced examples can be found in the main package vignette (vignette("smacof")
), in Borg, Groenen, and Mair (2018), and in Mair (2018).
Unfolding can be seen as a variant of MDS for preference data. The input data structure is rectangular, as opposed to MDS where we have a symmetric dissimilarity matrix as input. There are two types of preference data that can be subject to unfolding: rankings and ratings.
Let us illustrate the ranking structure using the breakfast dataset:
library(smacof)
head(breakfast)
#> toast butoast engmuff jdonut cintoast bluemuff hrolls toastmarm butoastj
#> 1 13 12 7 3 5 4 8 11 10
#> 2 15 11 6 3 10 5 14 8 9
#> 3 15 10 12 14 3 2 9 8 7
#> 4 6 14 11 3 7 8 12 10 9
#> 5 15 9 6 14 13 2 12 8 7
#> 6 9 11 14 4 7 6 15 10 8
#> toastmarg cinbun danpastry gdonut cofcake cornmuff
#> 1 15 2 1 6 9 14
#> 2 12 7 1 4 2 13
#> 3 11 1 6 4 5 13
#> 4 15 4 1 2 5 13
#> 5 10 11 1 4 3 5
#> 6 12 5 2 3 1 13
42
individuals ranked 15
breakfast items according to their preference. Note that these rankings are a special case of dissimilarities. The smaller the ranking value, the “more similar” a breakfast item and an individual are.
The second data type compatible with unfolding are ratings. A typical example for rating data are item responses in a questionnaire (e.g., on a 1-5 rating scale). There is one issue users have to be careful about. If rating scales are scored and labelled in a “positive” direction (e.g. 1 as “fully disagree” and 5 as “fully agree”), the scale needs to be reversed such that the input data are dissimilarities, as required by the unfolding()
function. Here we show an example involving 10 items related to Internet privacy, scored on a scale from 1-100. Note that a score of 100 implies highest preference. Therefore the data have to reversed prior to the unfolding fit.
library(MPsychoR)
data(Privacy)
<- 101-Privacy
Privacy_rev head(Privacy_rev)
#> apc1 apc2 apc3 apc4 apc5 apc6 dpc1 dpc2 dpc3 dpc4
#> 1 20 26 16 24 7 15 92 32 51 79
#> 2 59 51 67 63 31 40 37 62 82 74
#> 3 25 29 65 14 15 16 50 10 44 51
#> 4 22 25 87 16 15 1 1 1 11 7
#> 5 46 51 43 1 1 1 85 40 93 69
#> 6 100 88 77 10 22 9 54 35 11 50
We have a total sample size of 405
individuals.
Unfolding is a dual scaling method, as we aim to scale the rows (“ideal points”) and the columns (“object points”) of the input data jointly. As in MDS, we try to keep the number of dimensions low in order to be able to plot the unfolding configuration.
Let us start with the breakfast ranking data. A basic 2D unfolding solution can be fitted as follows:
<- unfolding(breakfast)
un_breakfast
un_breakfast#>
#> Call: unfolding(delta = breakfast)
#>
#> Model: Rectangular smacof
#> Number of subjects: 42
#> Number of objects: 15
#> Transformation: none
#> Conditionality: matrix
#>
#> Stress-1 value: 0.308625
#> Penalized Stress: 3.525172
#> Number of iterations: 50
We obtain a stress-1 value of 0.309
. As in MDS, users should not judge the goodness-of-fit by solely relying on this stress value but rather use several diagnostic tools in combination (see Mair, Borg, and Rusch 2016 for details).
Note that we fitted a ratio unfolding version which implies that the input dissimilarities remain untransformed. The unfolding()
function provdes the same transformation options as mds()
for improving the goodness-of-fit. In addition, the function can be forced to fit an individual transformation function for each row which leads to row-conditional unfolding (conditionality = "row"
). In this example we stick to the basic ratio solution and produce the configuration plot.
plot(un_breakfast, main = "Configuration Breakfast Data")
This plot is highly intuitive to interpret as the distances between any pair of points are Euclidean: breakfast items close to each other are similarly preferred; individuals close to each other have similar breakfast preferences; the closer a breakfast item to an individual, the higher the individuals’ preference for this item. Of course, to which degree this interpretation reflects the actual preferences in the data depends on the goodness-of-fit of the solution. Sometimes it is also possible to interpret the dimensions but, just as in MDS, this is not as crucial.
The unfolding fit for the privacy rating data can be achieved in an analogous manner.
<- unfolding(Privacy_rev)
un_privacy
un_privacy#>
#> Call: unfolding(delta = Privacy_rev)
#>
#> Model: Rectangular smacof
#> Number of subjects: 405
#> Number of objects: 10
#> Transformation: none
#> Conditionality: matrix
#>
#> Stress-1 value: 0.461778
#> Penalized Stress: 7.268547
#> Number of iterations: 72
The resulting configuration plot looks as follows (we suppress the row labels):
plot(un_privacy, main = "Unfolding Configuration Internet Privacy",
label.conf.rows = list(label = FALSE))
The plot shows nicely three clusters of items: items related to individualization (apc1-3), items related to providing correct data (apc4-6), and items related to disadvantages of personal communication (dpc1-4). The first dimension distinguishes between advantages (apc items) and disadvantages of personal communication (dpc items).
The solution can also be plotted using ggplot2
:
library(ggplot2)
<- as.data.frame(un_privacy$conf.col)
conf_items <- as.data.frame(un_privacy$conf.row)
conf_persons <- ggplot(conf_persons, aes(x = D1, y = D2))
p + geom_point(size = 0.5, colour = "gray") + coord_fixed() +
p geom_point(aes(x = D1, y = D2), conf_items, colour = "cadetblue") +
geom_text(aes(x = D1, y = D2, label = rownames(conf_items)),
colour = "cadetblue", vjust = -0.8) +
conf_items, ggtitle("Unfolding Configuration Internet Privacy")
It is important to fix the aspect ratio to 1 such that the distances in the plot are Euclidean. In order to avoid overlapping labels, the ggrepel
package provides some good options.