Load package mvord
library(mvord)
Load data set
data("data_cr")
str(data_cr, vec.len = 3)
## 'data.frame': 690 obs. of 10 variables:
## $ rater1 : Ord.factor w/ 5 levels "A"<"B"<"C"<"D"<..: 2 3 3 2 5 4 3 4 ...
## $ rater2 : Ord.factor w/ 5 levels "A"<"B"<"C"<"D"<..: 2 4 4 2 5 NA 3 4 ...
## $ rater3 : Ord.factor w/ 6 levels "F"<"G"<"H"<"I"<..: 3 NA NA NA 6 NA 2 5 ...
## $ rater4 : Ord.factor w/ 2 levels "L"<"M": 1 2 2 1 2 2 2 2 ...
## $ firm_id: int 1 2 3 4 5 6 7 8 ...
## $ LR : num 1.72 1.84 2.64 1.31 ...
## $ LEV : num 2.114 0.883 2.3 2.638 ...
## $ PR : num 0.3779 -0.1503 -0.0521 0.3289 ...
## $ RSIZE : num -6.37 -7.84 -7.98 -5.86 ...
## $ BETA : num 0.836 0.49 0.802 1.137 ...
head(data_cr, n = 3)
## rater1 rater2 rater3 rater4 firm_id LR LEV PR RSIZE
## 1 B B H L 1 1.720041 2.1144513 0.37792213 -6.365053
## 2 C D <NA> M 2 1.836574 0.8826725 -0.15032402 -7.839813
## 3 C D <NA> M 3 2.638177 2.2997237 -0.05205389 -7.976650
## BETA
## 1 0.8358773
## 2 0.4895358
## 3 0.8022900
We introduce a simple example with 2 raters (rater1
and rater2
) rating a set of firms. The data set data_cr
has a wide format and hence, we apply the multiple measurement object MMO2
on the left-hand side of the formula object. The firm-level covariates LR
, LEV
, PR
, RSIZE
, BETA
are passed on the right-hand side of the formula. The thresholds are set equal by the argument thresholds.constraints
.
<- mvord(formula = MMO2(rater1, rater2) ~ 0 + LR + LEV + PR + RSIZE + BETA,
res_cor_probit_2raters threshold.constraints = c(1, 1),
data = data_cr)
The results are displayed by:
summary(res_cor_probit_2raters)
##
## Call: mvord(formula = MMO2(rater1, rater2) ~ 0 + LR + LEV + PR + RSIZE +
## BETA, data = data_cr, threshold.constraints = c(1, 1))
##
## Formula: MMO2(rater1, rater2) ~ 0 + LR + LEV + PR + RSIZE + BETA
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvprobit flexible 690 2 -799.84 1630.35 1699.91 4476
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## rater1 A|B 8.40533 0.39600 21.225 < 2.2e-16 ***
## rater1 B|C 9.87574 0.42661 23.149 < 2.2e-16 ***
## rater1 C|D 11.68478 0.46471 25.145 < 2.2e-16 ***
## rater1 D|E 14.00771 0.53773 26.050 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## LR 1 0.191816 0.059864 3.2042 0.001354 **
## LR 2 0.089982 0.069426 1.2961 0.194945
## LEV 1 0.456824 0.039714 11.5027 < 2.2e-16 ***
## LEV 2 0.441115 0.044323 9.9522 < 2.2e-16 ***
## PR 1 -2.626680 0.174938 -15.0149 < 2.2e-16 ***
## PR 2 -2.761407 0.203692 -13.5567 < 2.2e-16 ***
## RSIZE 1 -1.166859 0.050982 -22.8877 < 2.2e-16 ***
## RSIZE 2 -1.184520 0.050259 -23.5685 < 2.2e-16 ***
## BETA 1 1.644008 0.100520 16.3550 < 2.2e-16 ***
## BETA 2 1.741845 0.128536 13.5515 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## corr rater1 rater2 0.872025 0.025461 34.25 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The coefficients can be extracted by:
coef(res_cor_probit_2raters)
## LR 1 LR 2 LEV 1 LEV 2 PR 1 PR 2
## 0.19181645 0.08998178 0.45682446 0.44111540 -2.62668045 -2.76140750
## RSIZE 1 RSIZE 2 BETA 1 BETA 2
## -1.16685941 -1.18452031 1.64400778 1.74184468
A simple example with 3 raters (rater1
, rater2
and rater3
) is presented. The regression coefficients are set equal by the argument coef.constraints
.
<- mvord(formula = MMO2(rater1, rater2, rater 3) ~ 0 + LR + LEV + PR + RSIZE + BETA,
res_cor_probit_3raters coef.constraints = c(1, 1, 1),
data = data_cr,
link = mvlogit())
The results are displayed by:
summary(res_cor_probit_3raters)
##
## Call: mvord(formula = MMO2(rater1, rater2, rater3) ~ 0 + LR + LEV +
## PR + RSIZE + BETA, data = data_cr, link = mvlogit(), coef.constraints = c(1,
## 1, 1))
##
## Formula: MMO2(rater1, rater2, rater3) ~ 0 + LR + LEV + PR + RSIZE + BETA
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvlogit flexible 690 3 -1561.4 3194.64 3357.58 4199
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## rater1 A|B 14.71274 0.84962 17.317 < 2.2e-16 ***
## rater1 B|C 17.46133 0.92699 18.837 < 2.2e-16 ***
## rater1 C|D 20.65831 1.03917 19.880 < 2.2e-16 ***
## rater1 D|E 24.68647 1.20698 20.453 < 2.2e-16 ***
## rater2 A|B 14.87021 0.85299 17.433 < 2.2e-16 ***
## rater2 B|C 17.43898 0.93096 18.732 < 2.2e-16 ***
## rater2 C|D 20.47191 1.04358 19.617 < 2.2e-16 ***
## rater2 D|E 24.80047 1.22174 20.299 < 2.2e-16 ***
## rater3 F|G 14.66330 0.86375 16.976 < 2.2e-16 ***
## rater3 G|H 17.35100 0.92572 18.743 < 2.2e-16 ***
## rater3 H|I 20.73811 1.04700 19.807 < 2.2e-16 ***
## rater3 I|J 23.25687 1.15044 20.216 < 2.2e-16 ***
## rater3 J|K 25.33851 1.22854 20.625 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## LR 1 0.333598 0.112624 2.9621 0.003056 **
## LEV 1 0.746848 0.076165 9.8057 < 2.2e-16 ***
## PR 1 -4.930273 0.356518 -13.8289 < 2.2e-16 ***
## RSIZE 1 -2.073427 0.108014 -19.1958 < 2.2e-16 ***
## BETA 1 3.021155 0.222735 13.5639 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## corr rater1 rater2 0.866603 0.027446 31.575 < 2.2e-16 ***
## corr rater1 rater3 0.903774 0.024724 36.554 < 2.2e-16 ***
## corr rater2 rater3 0.822652 0.045875 17.933 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The threshold parameters are presented by:
thresholds(res_cor_probit_3raters)
## $rater1
## A|B B|C C|D D|E
## 14.71274 17.46133 20.65831 24.68647
##
## $rater2
## A|B B|C C|D D|E
## 14.87021 17.43898 20.47191 24.80047
##
## $rater3
## F|G G|H H|I I|J J|K
## 14.66330 17.35100 20.73811 23.25687 25.33851
This example presents a four dimensional ordinal regression model with probit link and a general correlation error structure cor_general(~ 1)
. The fourth rater only rates the firms on a binary scale: investment grade vs. speculative grade.
<- mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + LEV + PR + RSIZE + BETA, data = data_cr) res_cor_probit_simple
The results are displayed by:
summary(res_cor_probit_simple)
##
## Call: mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR +
## LEV + PR + RSIZE + BETA, data = data_cr)
##
## Formula: MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + LEV + PR + RSIZE +
## BETA
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvprobit flexible 690 4 -2925.83 6037.38 6458.64 15701
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## rater1 A|B 8.04920 0.44306 18.167 < 2.2e-16 ***
## rater1 B|C 9.56783 0.47381 20.194 < 2.2e-16 ***
## rater1 C|D 11.34991 0.51747 21.933 < 2.2e-16 ***
## rater1 D|E 13.51638 0.60129 22.479 < 2.2e-16 ***
## rater2 A|B 8.59513 0.49834 17.247 < 2.2e-16 ***
## rater2 B|C 10.05454 0.53941 18.640 < 2.2e-16 ***
## rater2 C|D 11.85857 0.59749 19.847 < 2.2e-16 ***
## rater2 D|E 14.33336 0.70092 20.450 < 2.2e-16 ***
## rater3 F|G 8.24303 0.51775 15.921 < 2.2e-16 ***
## rater3 G|H 9.77659 0.55593 17.586 < 2.2e-16 ***
## rater3 H|I 11.70601 0.62343 18.777 < 2.2e-16 ***
## rater3 I|J 13.09380 0.68810 19.029 < 2.2e-16 ***
## rater3 J|K 14.17279 0.72153 19.643 < 2.2e-16 ***
## rater4 L|M 13.52691 1.00085 13.515 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## LR 1 0.208652 0.068020 3.0675 0.002159 **
## LR 2 0.153212 0.073420 2.0868 0.036906 *
## LR 3 0.180531 0.078643 2.2956 0.021700 *
## LR 4 0.150887 0.111750 1.3502 0.176944
## LEV 1 0.430213 0.043774 9.8281 < 2.2e-16 ***
## LEV 2 0.432842 0.050129 8.6346 < 2.2e-16 ***
## LEV 3 0.399459 0.050826 7.8594 3.861e-15 ***
## LEV 4 0.624967 0.074146 8.4289 < 2.2e-16 ***
## PR 1 -2.573616 0.194046 -13.2629 < 2.2e-16 ***
## PR 2 -2.827056 0.217061 -13.0242 < 2.2e-16 ***
## PR 3 -2.678170 0.222705 -12.0257 < 2.2e-16 ***
## PR 4 -2.793063 0.279732 -9.9848 < 2.2e-16 ***
## RSIZE 1 -1.129967 0.056379 -20.0424 < 2.2e-16 ***
## RSIZE 2 -1.196432 0.061776 -19.3671 < 2.2e-16 ***
## RSIZE 3 -1.196481 0.066503 -17.9914 < 2.2e-16 ***
## RSIZE 4 -1.565903 0.115749 -13.5285 < 2.2e-16 ***
## BETA 1 1.602187 0.110868 14.4513 < 2.2e-16 ***
## BETA 2 1.801526 0.140123 12.8567 < 2.2e-16 ***
## BETA 3 1.517633 0.139319 10.8932 < 2.2e-16 ***
## BETA 4 1.988644 0.203338 9.7800 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## corr rater1 rater2 0.875996 0.024554 35.677 < 2.2e-16 ***
## corr rater1 rater3 0.913588 0.023462 38.940 < 2.2e-16 ***
## corr rater1 rater4 0.896170 0.033495 26.755 < 2.2e-16 ***
## corr rater2 rater3 0.831679 0.042740 19.459 < 2.2e-16 ***
## corr rater2 rater4 0.926139 0.031808 29.117 < 2.2e-16 ***
## corr rater3 rater4 0.866521 0.051285 16.896 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The threshold coefficients can be extracted by:
thresholds(res_cor_probit_simple)
## $rater1
## A|B B|C C|D D|E
## 8.049203 9.567831 11.349908 13.516375
##
## $rater2
## A|B B|C C|D D|E
## 8.595126 10.054536 11.858567 14.333362
##
## $rater3
## F|G G|H H|I I|J J|K
## 8.243027 9.776591 11.706012 13.093799 14.172787
##
## $rater4
## L|M
## 13.52691
The regression coefficients are obtained by:
coef(res_cor_probit_simple)
## LR 1 LR 2 LR 3 LR 4 LEV 1 LEV 2 LEV 3
## 0.2086517 0.1532119 0.1805309 0.1508872 0.4302126 0.4328417 0.3994591
## LEV 4 PR 1 PR 2 PR 3 PR 4 RSIZE 1 RSIZE 2
## 0.6249669 -2.5736158 -2.8270564 -2.6781697 -2.7930627 -1.1299673 -1.1964316
## RSIZE 3 RSIZE 4 BETA 1 BETA 2 BETA 3 BETA 4
## -1.1964814 -1.5659034 1.6021871 1.8015260 1.5176327 1.9886441
The error structure for firm with firm_id = 11
is displayed by:
error_structure(res_cor_probit_simple)[[11]]
## rater1 rater2 rater3 rater4
## rater1 1.0000000 0.8759965 0.9135876 0.8961703
## rater2 0.8759965 1.0000000 0.8316790 0.9261393
## rater3 0.9135876 0.8316790 1.0000000 0.8665213
## rater4 0.8961703 0.9261393 0.8665213 1.0000000
In this example, we extend the setting of the above example by imposing constraints on the regression as well as on the threshold parameters and changing the link function to the multivariate logit link.
<- mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~
res_cor_logit 0 + LR + LEV + PR + RSIZE + BETA, data = data_cr, link = mvlogit(),
coef.constraints = cbind(LR = c(1, 1, 1, 1),
LEV = c(1, 2, 3, 4),
PR = c(1, 1, 1, 1),
RSIZE = c(1, 1, 1, 2),
BETA = c(1, 1, 2, 3)),
threshold.constraints = c(1, 1, 2, 3))
The results are displayed by:
summary(res_cor_logit)
##
## Call: mvord(formula = MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR +
## LEV + PR + RSIZE + BETA, data = data_cr, link = mvlogit(),
## coef.constraints = cbind(c(1, 1, 1, 1), c(1, 2, 3, 4), c(1,
## 1, 1, 1), c(1, 1, 1, 2), c(1, 1, 2, 3)), threshold.constraints = c(1,
## 1, 2, 3))
##
## Formula: MMO2(rater1, rater2, rater3, rater4) ~ 0 + LR + LEV + PR + RSIZE +
## BETA
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvlogit flexible 690 4 -2926.42 5987.81 6293.98 9338
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## rater1 A|B 15.04918 0.82409 18.262 < 2.2e-16 ***
## rater1 B|C 17.75218 0.89728 19.785 < 2.2e-16 ***
## rater1 C|D 20.97821 1.00773 20.817 < 2.2e-16 ***
## rater1 D|E 25.13047 1.17487 21.390 < 2.2e-16 ***
## rater3 F|G 14.47061 0.83922 17.243 < 2.2e-16 ***
## rater3 G|H 17.17327 0.89515 19.185 < 2.2e-16 ***
## rater3 H|I 20.56634 1.01119 20.339 < 2.2e-16 ***
## rater3 I|J 23.00523 1.11045 20.717 < 2.2e-16 ***
## rater3 J|K 24.97258 1.18725 21.034 < 2.2e-16 ***
## rater4 L|M 23.92770 1.63196 14.662 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## LR 1 0.340210 0.110547 3.0775 0.002087 **
## LEV 1 0.784294 0.075977 10.3228 < 2.2e-16 ***
## LEV 2 0.779694 0.078364 9.9496 < 2.2e-16 ***
## LEV 3 0.718329 0.093425 7.6888 1.485e-14 ***
## LEV 4 1.107836 0.123681 8.9572 < 2.2e-16 ***
## PR 1 -4.917963 0.343464 -14.3187 < 2.2e-16 ***
## RSIZE 1 -2.093378 0.103690 -20.1889 < 2.2e-16 ***
## RSIZE 2 -2.746163 0.188731 -14.5507 < 2.2e-16 ***
## BETA 1 3.135692 0.221944 14.1283 < 2.2e-16 ***
## BETA 2 2.733088 0.252960 10.8044 < 2.2e-16 ***
## BETA 3 3.572691 0.349493 10.2225 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## corr rater1 rater2 0.859776 0.027907 30.809 < 2.2e-16 ***
## corr rater1 rater3 0.908834 0.024636 36.891 < 2.2e-16 ***
## corr rater1 rater4 0.903956 0.031858 28.374 < 2.2e-16 ***
## corr rater2 rater3 0.834904 0.044259 18.864 < 2.2e-16 ***
## corr rater2 rater4 0.932242 0.032173 28.976 < 2.2e-16 ***
## corr rater3 rater4 0.856234 0.058392 14.664 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The constraints are displayed by:
constraints(res_cor_logit)
## $LR
## LR 1
## A|B 1
## B|C 1
## C|D 1
## D|E 1
## A|B 1
## B|C 1
## C|D 1
## D|E 1
## F|G 1
## G|H 1
## H|I 1
## I|J 1
## J|K 1
## L|M 1
##
## $LEV
## LEV 1 LEV 2 LEV 3 LEV 4
## A|B 1 0 0 0
## B|C 1 0 0 0
## C|D 1 0 0 0
## D|E 1 0 0 0
## A|B 0 1 0 0
## B|C 0 1 0 0
## C|D 0 1 0 0
## D|E 0 1 0 0
## F|G 0 0 1 0
## G|H 0 0 1 0
## H|I 0 0 1 0
## I|J 0 0 1 0
## J|K 0 0 1 0
## L|M 0 0 0 1
##
## $PR
## PR 1
## A|B 1
## B|C 1
## C|D 1
## D|E 1
## A|B 1
## B|C 1
## C|D 1
## D|E 1
## F|G 1
## G|H 1
## H|I 1
## I|J 1
## J|K 1
## L|M 1
##
## $RSIZE
## RSIZE 1 RSIZE 2
## A|B 1 0
## B|C 1 0
## C|D 1 0
## D|E 1 0
## A|B 1 0
## B|C 1 0
## C|D 1 0
## D|E 1 0
## F|G 1 0
## G|H 1 0
## H|I 1 0
## I|J 1 0
## J|K 1 0
## L|M 0 1
##
## $BETA
## BETA 1 BETA 2 BETA 3
## A|B 1 0 0
## B|C 1 0 0
## C|D 1 0 0
## D|E 1 0 0
## A|B 1 0 0
## B|C 1 0 0
## C|D 1 0 0
## D|E 1 0 0
## F|G 0 1 0
## G|H 0 1 0
## H|I 0 1 0
## I|J 0 1 0
## J|K 0 1 0
## L|M 0 0 1
Note that the composite likelihood information criteria can be used for model comparison. For objects of class mvord
the composite likelihood AIC and BIC are computed by AIC()
and BIC()
, respectively. The model fits of examples one and two are compared by means of BIC and AIC. We observe that the model of Example 3b has a lower BIC and AIC indicating a better model fit:
BIC(res_cor_probit_simple, res_cor_logit)
## df BIC
## res_cor_probit_simple 92.85637 6458.641
## res_cor_logit 67.48687 6293.977
AIC(res_cor_probit_simple)
## [1] 6037.38
AIC(res_cor_logit)
## [1] 5987.81
The value of the pairwise log-likelihood of the two models can be extracted by:
logLik(res_cor_probit_simple)
## 'log Lik.' -2925.834 (df=92.85637)
logLik(res_cor_logit)
## 'log Lik.' -2926.418 (df=67.48687)
Marginal predictions are obtained by:
<- marginal_predict(res_cor_logit, type = "class") mp
table(data_cr$rater1, mp$rater1)
##
## A B C D E
## A 47 23 4 0 0
## B 11 65 44 3 0
## C 0 28 135 38 0
## D 1 2 44 116 17
## E 0 0 0 30 47
table(data_cr$rater2, mp$rater2)
##
## A B C D E
## A 29 20 8 0 0
## B 12 41 22 2 0
## C 1 18 103 24 0
## D 0 0 35 109 7
## E 0 0 0 13 39
table(data_cr$rater3, mp$rater3)
##
## F G H I J K
## F 25 8 2 0 0 0
## G 4 33 23 1 0 0
## H 1 18 85 13 0 0
## I 0 0 28 30 3 2
## J 0 0 2 13 10 7
## K 0 0 0 6 8 23
table(data_cr$rater4, mp$rater4)
##
## L M
## L 195 40
## M 30 425
Joint predictions are obtained by:
<- joint_probabilities(res_cor_logit,
jp response.cat = data_cr[,1:4],
type = "prob")
table(data_cr$rater1, jp$rater1)
##
## A B C D E
## A 50 20 4 0 0
## B 11 68 41 3 0
## C 0 36 130 35 0
## D 1 2 47 112 18
## E 0 0 0 25 52
table(data_cr$rater2, jp$rater2)
##
## A B C D E
## A 33 17 7 0 0
## B 12 46 17 2 0
## C 1 24 99 22 0
## D 0 0 39 103 9
## E 0 0 0 10 42
table(data_cr$rater3, jp$rater3)
##
## F G H I J K
## F 27 7 1 0 0 0
## G 6 33 21 1 0 0
## H 1 24 77 15 0 0
## I 0 0 25 35 1 2
## J 0 0 2 17 3 10
## K 0 0 0 9 1 27
table(data_cr$rater4, jp$rater4)
##
## L M
## L 175 60
## M 22 433
In this example, we present a longitudinal multivariate ordinal probit regression model with a covariate dependent AR(1) error structure using the data set data_cr_panel:
data("data_cr_panel", package = "mvord")
str(data_cr_panel, vec.len = 3)
## 'data.frame': 11320 obs. of 9 variables:
## $ rating : Ord.factor w/ 5 levels "A"<"B"<"C"<"D"<..: 5 3 3 3 3 1 3 3 ...
## $ firm_id: int 1 2 3 4 5 6 7 8 ...
## $ year : Factor w/ 8 levels "year1","year2",..: 1 1 1 1 1 1 1 1 ...
## $ LR : num 572.86 1.38 7.46 10.9 ...
## $ LEV : num 1.2008 0.0302 0.1517 0.5485 ...
## $ PR : num 0.1459 -0.0396 0.0508 0.1889 ...
## $ RSIZE : num 1.423 -1.944 2.024 -0.433 ...
## $ BETA : num 1.148 1.693 0.761 2.24 ...
## $ BSEC : Factor w/ 8 levels "BSEC1","BSEC2",..: 3 6 3 7 6 7 7 7 ...
head(data_cr_panel, n = 3)
## rating firm_id year LR LEV PR RSIZE BETA
## 1 E 1 year1 572.864658 1.20084294 0.14585117 1.422948 1.1481020
## 2 C 2 year1 1.379547 0.03022761 -0.03962597 -1.944265 1.6926956
## 3 C 3 year1 7.462706 0.15170420 0.05083517 2.024098 0.7610057
## BSEC
## 1 BSEC3
## 2 BSEC6
## 3 BSEC3
<- mvord(formula = MMO(rating, firm_id, year) ~ LR + LEV +
res_AR1_probit + RSIZE + BETA,
PR error.structure = cor_ar1(~ BSEC), link = mvprobit(),
data = data_cr_panel,
coef.constraints = c(rep(1, 4), rep(2, 4)),
threshold.constraints = rep(1, 8),
threshold.values = rep(list(c(0, NA, NA, NA)),8),
control = mvord.control(solver = "BFGS"))
The results of the model can be presented by
summary(res_AR1_probit, short = TRUE, call = FALSE)
## Formula: MMO(rating, firm_id, year) ~ LR + LEV + PR + RSIZE + BETA
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvprobit fix1first 1415 8 -74805.56 150098.19 151377.93 181
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## year1 A|B 0.000000 0.000000 NA NA
## year1 B|C 1.016137 0.026379 38.521 < 2.2e-16 ***
## year1 C|D 2.444887 0.041170 59.385 < 2.2e-16 ***
## year1 D|E 3.891073 0.058843 66.127 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1 1.4190682 0.0532243 26.6620 < 2.2e-16 ***
## (Intercept) 2 1.4975500 0.0494038 30.3124 < 2.2e-16 ***
## LR 1 0.0193546 0.0006938 27.8966 < 2.2e-16 ***
## LR 2 0.0323230 0.0010102 31.9955 < 2.2e-16 ***
## LEV 1 0.0268125 0.0019263 13.9191 < 2.2e-16 ***
## LEV 2 0.0180050 0.0013143 13.6993 < 2.2e-16 ***
## PR 1 -1.0930080 0.0427073 -25.5930 < 2.2e-16 ***
## PR 2 -0.7411681 0.0372295 -19.9081 < 2.2e-16 ***
## RSIZE 1 -0.3676665 0.0114038 -32.2407 < 2.2e-16 ***
## RSIZE 2 -0.3608979 0.0115297 -31.3016 < 2.2e-16 ***
## BETA 1 0.0541624 0.0296028 1.8296 0.0673 .
## BETA 2 0.1099998 0.0236754 4.6462 3.382e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.490395 0.051540 28.9173 < 2.2e-16 ***
## BSECBSEC2 -0.552317 0.069241 -7.9767 1.503e-15 ***
## BSECBSEC3 -0.062028 0.066553 -0.9320 0.3513
## BSECBSEC4 -0.085592 0.065126 -1.3142 0.1888
## BSECBSEC5 -0.066598 0.085692 -0.7772 0.4371
## BSECBSEC6 -0.683429 0.069184 -9.8785 < 2.2e-16 ***
## BSECBSEC7 -0.863911 0.064855 -13.3206 < 2.2e-16 ***
## BSECBSEC8 -0.757997 0.078506 -9.6553 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The default error structure output for AR(1) models:
error_structure(res_AR1_probit)
## V16 V17 V18 V19 V20 V21
## 1.49039489 -0.55231725 -0.06202790 -0.08559155 -0.06659849 -0.68342920
## V22 V23
## -0.86391149 -0.75799729
Correlation parameters \(\rho_i\) for each firm are obtained by choosing type = "corr"
in error_structure()
:
head(error_structure(res_AR1_probit, type = "corr"), n = 3)
## Correlation
## 1 0.8913315
## 2 0.6679130
## 3 0.8913315
Correlation matrices for each specific firm are obtained by choosing type = "sigmas"
:
head(error_structure(res_AR1_probit, type = "sigmas"), n = 1)
## $`1`
## year1 year2 year3 year4 year5 year6 year7
## year1 1.0000000 0.8913315 0.7944718 0.7081377 0.6311854 0.5625954 0.5014590
## year2 0.8913315 1.0000000 0.8913315 0.7944718 0.7081377 0.6311854 0.5625954
## year3 0.7944718 0.8913315 1.0000000 0.8913315 0.7944718 0.7081377 0.6311854
## year4 0.7081377 0.7944718 0.8913315 1.0000000 0.8913315 0.7944718 0.7081377
## year5 0.6311854 0.7081377 0.7944718 0.8913315 1.0000000 0.8913315 0.7944718
## year6 0.5625954 0.6311854 0.7081377 0.7944718 0.8913315 1.0000000 0.8913315
## year7 0.5014590 0.5625954 0.6311854 0.7081377 0.7944718 0.8913315 1.0000000
## year8 0.4469662 0.5014590 0.5625954 0.6311854 0.7081377 0.7944718 0.8913315
## year8
## year1 0.4469662
## year2 0.5014590
## year3 0.5625954
## year4 0.6311854
## year5 0.7081377
## year6 0.7944718
## year7 0.8913315
## year8 1.0000000
In this example we analyze longitudinal data on self-reported health status measured on a five point ordinal scale (5 = poor, 4 = fair, 3 = good, 2 = very good and 1 = excellent) derived from the Health and Retirement Study conducted by the University of Michigan which is available in the LMest package:
data(data_SRHS_long, package = "LMest")
str(data_SRHS_long)
## 'data.frame': 56592 obs. of 6 variables:
## $ id : num 1 1 1 1 1 1 1 1 2 2 ...
## $ gender : num 1 1 1 1 1 1 1 1 2 2 ...
## $ race : num 1 1 1 1 1 1 1 1 1 1 ...
## $ education: num 3 3 3 3 3 3 3 3 5 5 ...
## $ age : num 56 58 60 62 64 66 68 70 54 55 ...
## $ srhs : num 4 4 3 3 4 3 3 3 3 3 ...
The dataset contains a self-reported health status srhs
together with covariates such as ethnicity (race
coded as 1 = white, 2 = black, 3 = others), gender
(coded as 1 = male, 2 = female), education
level (coded as 1 = high school, 2 = general educational diploma, 3 = high school graduate, 4 = some college, 5 = college and above) and age
for 7074 subjects on 8 different (approximately equally spaced) time occasions.
We add a time index to the data set:
$time <- rep(1:8, length(unique(data_SRHS_long$id))) data_SRHS_long
We estimate a multivariate ordinal logit model similar to the one of the models employed by Bartolucci, Pandolfi, and Pennoni (2017), where for every subject in the sample the errors follow an \(AR(1)\) process. Moreover, the threshold and regression coefficients are equal across all years. In order to reduce the computational burden we only consider pairs of observations not more than two time points appart by setting PL.lag = 2
.
<- mvord(formula = MMO(srhs, id, time) ~ 0 + factor(gender) +
res_srhs factor(race) + factor(education) + age,
data = data_SRHS_long,
threshold.constraints = rep(1, 8),
coef.constraints = rep(1, 8),
error.structure = cor_ar1(~ 1), link = mvlogit(),
PL.lag = 2)
(runtime 12 minutes).
The persistence in the reported health status is high. The correlation parameter in the cor_ar1
error structure is
unique(error_structure(res_srhs, type = "corr"))
## Correlation
## 1 0.8820163
The estimated parameters are shown by the function summary()
:
summary(res_srhs, call = FALSE)
## Formula: MMO(srhs, id, time) ~ 0 + factor(gender) + factor(race) + factor(education) +
## age
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvlogit flexible 7074 8 -239860.22 479933.65 480665.42 404
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## 1 1|2 -1.78119 0.11595 -15.3621 <2e-16 ***
## 1 2|3 -0.11563 0.11679 -0.9901 0.3221
## 1 3|4 1.40676 0.11857 11.8644 <2e-16 ***
## 1 4|5 3.03358 0.12245 24.7741 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## factor(gender)2 1 -0.0059763 0.0382602 -0.1562 0.8759
## factor(race)2 1 0.6083505 0.0560734 10.8492 < 2.2e-16 ***
## factor(race)3 1 0.5050706 0.0994654 5.0779 3.817e-07 ***
## factor(education)2 1 -0.5344365 0.0889470 -6.0085 1.873e-09 ***
## factor(education)3 1 -1.0114449 0.0516341 -19.5887 < 2.2e-16 ***
## factor(education)4 1 -1.3037171 0.0583832 -22.3304 < 2.2e-16 ***
## factor(education)5 1 -1.7340257 0.0603490 -28.7333 < 2.2e-16 ***
## age 1 0.0161467 0.0016881 9.5649 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.3847764 0.0088376 156.69 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the logit model the coefficients can be interpreted in terms of log-odds ratios. The results suggest that being non-white increases the chances of reporting a worse health status while higher people with education levels tend to report better health. Moreover, every additional year increases the odds of reporting a worse health status by 1.64% (\(exp(0.0162628)=1.016396\)).
The mvord package can also be employed to measure agreement among several rating sources either with or without additional relevant covariate information.
We use the multirater agreement data from Chapter 5 in Johnson and Albert (1999). These data consist of grades assigned to 198 essays by 5 experts, each of whom rated all essays on a 10-point scale. A score of 10 indicates an excellent essay. In addition, the average word length is also available as an essay characteristic.
<- "http://www-math.bgsu.edu/~albert/ord_book/Chapter5/essay_data/N.dat"
N <- "http://www-math.bgsu.edu/~albert/ord_book/Chapter5/essay_data/X.dat"
X <- read.delim(url(N), header = F, sep = "")
y <- read.delim(url(X), header = F, sep = "")[,2]
wl <- cbind.data.frame(y, wl)
df colnames(df)[1:5] <- paste0("Judge", 1:5)
save(df, file = "df.rda")
The traditional measure of agreement between raters in the social sciences is the polychoric correlation. The polychoric correlation can be assessed using the mvord package by estimating of a model with no covariates and probit link. The correlation parameters of the cor_general
error structure are to be interpretated as the measure of agreement.
head(df)
## Judge1 Judge2 Judge3 Judge4 Judge5 wl
## 1 8 6 9 10 5 4.760
## 2 7 5 3 5 3 4.244
## 3 2 1 1 1 1 4.087
## 4 5 3 4 5 3 4.355
## 5 7 2 7 7 7 4.306
## 6 10 6 8 10 6 4.508
The data is in the wide format (each ordinal response is in one column) so the MMO2
object will be used in the formula
object:
<- mvord(
res_essay_0 formula = MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ -1,
data = df, threshold.constraints = rep(1, 5),
coef.constraints = rep(1, 5))
(runtime 14 seconds).
summary(res_essay_0, call = FALSE)
## Formula: MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ -1
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvprobit flexible 198 5 -8528.42 17172.01 17361.38 1443
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## Judge1 1|2 -1.2448340 0.0795063 -15.6571 < 2.2e-16 ***
## Judge1 2|3 -0.6634138 0.0676890 -9.8009 < 2.2e-16 ***
## Judge1 3|4 -0.3297115 0.0651692 -5.0593 4.208e-07 ***
## Judge1 4|5 0.0001387 0.0632803 0.0022 0.9983
## Judge1 5|6 0.3496646 0.0654689 5.3409 9.247e-08 ***
## Judge1 6|7 0.6352872 0.0668804 9.4989 < 2.2e-16 ***
## Judge1 7|8 0.9937520 0.0740965 13.4116 < 2.2e-16 ***
## Judge1 8|9 1.3344835 0.0806365 16.5494 < 2.2e-16 ***
## Judge1 9|10 1.9196999 0.1064901 18.0270 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## corr Judge1 Judge2 0.476080 0.061835 7.6992 1.369e-14 ***
## corr Judge1 Judge3 0.569194 0.046682 12.1930 < 2.2e-16 ***
## corr Judge1 Judge4 0.645198 0.046595 13.8468 < 2.2e-16 ***
## corr Judge1 Judge5 0.322528 0.090319 3.5710 0.0003556 ***
## corr Judge2 Judge3 0.396377 0.056243 7.0475 1.821e-12 ***
## corr Judge2 Judge4 0.634422 0.046930 13.5186 < 2.2e-16 ***
## corr Judge2 Judge5 0.621900 0.041815 14.8725 < 2.2e-16 ***
## corr Judge3 Judge4 0.391911 0.072272 5.4227 5.871e-08 ***
## corr Judge3 Judge5 0.382999 0.066020 5.8013 6.582e-09 ***
## corr Judge4 Judge5 0.693040 0.041727 16.6087 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The word length can be included as a covariate in the model:
<- mvord(
res_essay_wl formula = MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ 0 + wl,
data = df, threshold.constraints = rep(1, 5),
coef.constraints = rep(1, 5))
(runtime 0 minutes).
summary(res_essay_wl, call = FALSE)
## Formula: MMO2(Judge1, Judge2, Judge3, Judge4, Judge5) ~ 0 + wl
##
## link threshold nsubjects ndim logPL CLAIC CLBIC fevals
## mvprobit flexible 198 5 -8488.74 17110.73 17329.81 2161
##
## Thresholds:
## Estimate Std. Error z value Pr(>|z|)
## Judge1 1|2 1.09799 0.78039 1.4070 0.1594326
## Judge1 2|3 1.68728 0.78221 2.1571 0.0309994 *
## Judge1 3|4 2.02523 0.78287 2.5869 0.0096838 **
## Judge1 4|5 2.36059 0.78294 3.0150 0.0025694 **
## Judge1 5|6 2.71667 0.78472 3.4620 0.0005362 ***
## Judge1 6|7 3.00649 0.78550 3.8275 0.0001295 ***
## Judge1 7|8 3.36963 0.78891 4.2712 1.944e-05 ***
## Judge1 8|9 3.71636 0.79304 4.6862 2.783e-06 ***
## Judge1 9|10 4.31272 0.79820 5.4031 6.551e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## wl 1 0.54193 0.17837 3.0382 0.00238 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error Structure:
## Estimate Std. Error z value Pr(>|z|)
## corr Judge1 Judge2 0.459154 0.062913 7.2983 2.915e-13 ***
## corr Judge1 Judge3 0.558919 0.047414 11.7881 < 2.2e-16 ***
## corr Judge1 Judge4 0.631977 0.048175 13.1182 < 2.2e-16 ***
## corr Judge1 Judge5 0.295951 0.089837 3.2943 0.0009866 ***
## corr Judge2 Judge3 0.382554 0.056902 6.7230 1.780e-11 ***
## corr Judge2 Judge4 0.621086 0.048802 12.7266 < 2.2e-16 ***
## corr Judge2 Judge5 0.608996 0.043499 14.0004 < 2.2e-16 ***
## corr Judge3 Judge4 0.374731 0.073984 5.0651 4.083e-07 ***
## corr Judge3 Judge5 0.368174 0.067220 5.4772 4.322e-08 ***
## corr Judge4 Judge5 0.681338 0.044116 15.4442 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The probabilities of agreement among all five judges which implied by the model can be computed by the function joint_probabilities()
:
<- lapply(1:10, function(i)
agree_prob_list joint_probabilities(res_essay_wl, rep(i, 5)))
<- Reduce("+", agree_prob_list)
agree_prob summary(agree_prob)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.005150 0.006445 0.008724 0.009701 0.011795 0.023869
In order to assess the relationship between the average word length and the agreement probabilities, we plot the probabilities of agreement implied by the model against the word length.
plot(df$wl, agree_prob,
xlab = "word length", ylab = "probability of agreement")
The graphic suggests that the judges tend to agree more on the quality of essays with lower average word length than on the essays with larger average word length.