library(dad)
The dataset dspg
of the dad package is a list of \(T = 7\) matrices. For each of the \(T\) years 1968, 1975, 1982, 1990, 1999, 2010 and 2015, we have the contingency table of Diploma × Socioprofessional group in France. Each table has:
diplome
):
bepc
: brevetcap
: NCQ (CAP)bac
: baccalaureatesup
: higher education (supérieur)csp
):
agri
: farmer (agriculteur)cardr
: senior manager (cadre supérieur)pint
: middle manager (profession intermédiaire)empl
: employee (employé)ouvr
: worker (ouvrier)data("dspg")
print(dspg)
## $`1968`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 1316116 952960 165616 682144 1927408 3978916
## cap 39104 183004 71440 400456 430988 565404
## bac 23920 96648 127140 483900 169804 75924
## sup 4172 27368 426684 193872 36560 19184
##
## $`1975`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 1040335 804175 212965 802950 2349515 4085075
## cap 106645 288680 105385 515125 745730 986965
## bac 16700 123020 217155 685995 289835 112740
## sup 6565 33780 712780 512515 107645 28710
##
## $`1982`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 732716 770188 242356 879564 2502620 3947964
## cap 150132 385052 118612 613640 1067144 1290200
## bac 46256 174204 293440 770516 474504 135888
## sup 12616 62816 940980 924172 136608 17392
##
## $`1990`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 409172 617841 252302 842414 2459709 3461490
## cap 165573 504027 150024 816798 1651050 1991441
## bac 79492 206386 352212 1105251 733300 182040
## sup 22760 130374 1591719 1225666 243767 41267
##
## $`1999`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 187003 391635 182496 676012 2314729 2628094
## cap 206249 564047 180554 1050683 2309559 2543681
## bac 85088 206085 293472 1088315 1076836 358332
## sup 39690 205774 2110963 2189503 681935 155885
##
## $`2010`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 60768 250307 141650 477459 1524048 1754585
## cap 129022 472352 187644 940102 1987379 2241519
## bac 98110 275878 342985 1211857 1540768 739900
## sup 60565 304784 2957218 3115222 1266854 331672
##
## $`2015`
## csp
## diplome agri arti cadr pint empl ouvr
## bepc 32957 213686 114517 365359 1173116 1406264
## cap 95495 462867 169914 815957 1920849 2118735
## bac 91093 298719 297826 1168208 1645780 844749
## sup 67977 395035 3250543 3468127 1492026 415197
After the computation of the distances or divergences between each pair of occasions, that is the distances \((\delta_{ts})\) between their corresponding distributions, the MDS technique looks for a representation of the distributions by \(T\) points in a low dimensional space such that the distances between these points are as similar as possible to the \((\delta_{ts})\).
The dad package includes functions for all the calculations required to implement such a method and to interpret its outputs:
mdsdd
function performs MDS and generates scores;plot
function generates graphics representing the probability distributions on the factorial axes;interpret
function returns other aids to interpretation based on the marginal distributions.mdsdd
functionMDS of discrete probability distributions can be carried using the mdsdd
function. This function applies to
fmdsd
(see help), except that the columns of each data frame of the folder are not numeric, but factors)The following example shows the application of mdsdd
on a list of arrays. The mdsdd
function is built on the cmdscale
function of R. It is carried out on the dataset dspg
as follows:
<- mdsdd(dspg) resultmds
In addition to the add
argument of cmdscale
, the mdsdd
function has two sets of optional arguments:
distance
, controls the method used to compute the distances between the distributions.mdsdd
outputsThe mdsdd
function returns an object of S3 class "mdsdd"
, consisting of a list of 9 elements, including the scores, also called principal coordinates, and the marginal and joint distributions of the variables per occasion.
names(resultmds)
## [1] "call" "group" "variables" "d" "inertia"
## [6] "scores" "jointp" "margins" "associations"
The outputs are displayed with the print
function:
print(resultmds)
## group variable: group
## variables: diplome csp
## ---------------------------------------------------------------
## inertia
## eigenvalue inertia
## 1 1.23e+00 92.8
## 2 6.63e-02 5.0
## 3 1.59e-02 1.2
## 4 8.16e-03 0.6
## 5 4.65e-03 0.4
## 6 5.28e-17 0.0
## ---------------------------------------------------------------
## coordinates
## group PC.1 PC.2 PC.3
## 1968 1968 -0.61455123 0.08277634 0.051535799
## 1975 1975 -0.41112536 0.03027496 -0.006955327
## 1982 1982 -0.25617158 -0.01895616 -0.048278844
## 1990 1990 -0.01123542 -0.10771438 -0.066369785
## 1999 1999 0.24131168 -0.16253594 0.077859986
## 2010 2010 0.48611906 0.03985732 -0.016564831
## 2015 2015 0.56565286 0.13629786 0.008773002
Graphical representations on the principal planes are generated with the plot
function:
plot(resultmds, fontsize.points = 1)
In this example, a single axis is enough to explain the general trends; the first principal coordinate explains 92% of the inertia.
This graph shows an evolution of the value of the first principal score, which gets higher for recent years.
The interpretation of outputs is based on the relationships between the principal scores and the marginal or joint frequencies. These relationships are quantified by correlation coefficients and are represented graphically by plotting the scores against the frequencies. These interpretation tools are provided by the interpret
function which has two optional arguments: nscores
indicating the indices of the column scores to be interpreted and mma
whose default value is "marg1"
(the probability distributions of each variable).
interpret(resultmds, nscore = 1)
## Pearson correlations between scores and probability distributions of each variable
## PC.1
## diplome.bepc -1.00
## diplome.cap 0.79
## diplome.bac 0.99
## diplome.sup 0.98
## csp.agri -0.96
## csp.arti -0.96
## csp.cadr 0.99
## csp.pint 1.00
## csp.empl 0.91
## csp.ouvr -1.00
## Spearman correlations between scores and probability distributions of each variable
## PC.1
## diplome.bepc -1.00
## diplome.cap 0.68
## diplome.bac 1.00
## diplome.sup 1.00
## csp.agri -1.00
## csp.arti -0.96
## csp.cadr 1.00
## csp.pint 1.00
## csp.empl 0.86
## csp.ouvr -1.00
From the correlations between the principal coordinates (PC) and the distributions of the variables, we deduce that:
"diplome.bac"
and "diplome.sup"
, the higher "diplome.cap"
tends to be, and the lower the frequencies of "diplome.bepc"
."csp.cadr"
, "csp.pint"
and “csp.empl”, and the lower the frequencies of "csp.agri"
, "csp.arti"
and "csp.ouvr"
.So, reminding that \(PC1\) gets higher for recent years, these results highlight that in France, since 1968: