Jackknife Covariance and Missing Values

Martin Ueding

Missing column

We create some data and replace one column with NA.

data <- matrix(rnorm(120), ncol = 10)
data[, 3] <- NA
print(data)
##             [,1]         [,2] [,3]       [,4]        [,5]       [,6]
##  [1,]  0.8079227  0.953249395   NA  1.1274464  0.09712716 -0.4742720
##  [2,]  0.9832281 -1.260280844   NA -0.1783912 -1.03575574 -0.3311356
##  [3,]  0.7170138  0.262284696   NA -0.8507424 -0.07323595  1.2373899
##  [4,] -1.0496321  0.001392252   NA  1.4477400  0.33728515  0.1810778
##  [5,]  1.0878961  0.878943747   NA -0.2420444  0.89014789 -0.5636744
##  [6,]  1.1928337 -2.258967269   NA -1.1358111 -2.01461250 -0.1612763
##  [7,] -0.6118118  0.593095212   NA -0.1851273 -0.22177968 -0.8654250
##  [8,]  0.8141878 -1.513587594   NA -0.8900747  0.56687196 -1.0843648
##  [9,] -1.4052400  0.233289389   NA -1.2110320  1.26545184  0.7710383
## [10,] -1.0421440 -0.418332662   NA  0.4143678  1.01131831  1.6529310
## [11,] -0.7593017  0.115098120   NA  1.6526934 -0.42903052  0.8128500
## [12,]  0.8663020  0.570655355   NA  0.9834333 -0.94808047  1.4868666
##              [,7]       [,8]       [,9]      [,10]
##  [1,] -2.62666773 -1.4722316 -0.3164024 -0.1323967
##  [2,] -0.12335079  1.0972550  2.1741523  0.5336786
##  [3,] -0.44908266 -1.0502304 -0.6344078  0.7937576
##  [4,] -0.04179480 -0.5329891 -0.3001254  0.5143836
##  [5,] -1.00694068 -0.2079030  0.1998212 -0.4027840
##  [6,] -0.19144789 -0.6729825 -1.6493459  0.7047418
##  [7,]  1.69847767  1.1487469  1.3176655 -0.8453007
##  [8,]  0.09829714  0.1868569 -0.3452368 -1.7971696
##  [9,] -0.35486162  0.3770601  0.4880484  0.5611533
## [10,] -0.95844137  0.2238168  0.7127507 -0.2597389
## [11,] -0.34446407 -1.2050152 -0.2222073 -0.3870832
## [12,] -1.72659855  1.7059909 -0.3943868 -2.3102125

The covariance, with the implicit use = 'everything' will give us a “cross” of NA in the covariance matrix.

cov(data)
##              [,1]        [,2] [,3]        [,4]        [,5]        [,6]
##  [1,]  1.00481929 -0.25953847   NA -0.26191673 -0.49433916 -0.33281927
##  [2,] -0.25953847  1.03356272   NA  0.46433141  0.44723463  0.19329342
##  [3,]          NA          NA   NA          NA          NA          NA
##  [4,] -0.26191673  0.46433141   NA  1.05110996 -0.02342355  0.22988607
##  [5,] -0.49433916  0.44723463   NA -0.02342355  0.91857807  0.06178755
##  [6,] -0.33281927  0.19329342   NA  0.22988607  0.06178755  0.88860997
##  [7,] -0.33972499 -0.33570333   NA -0.39318178 -0.07746939 -0.32934492
##  [8,] -0.01822199 -0.05081532   NA -0.17677354 -0.07053865  0.02496831
##  [9,] -0.25764497  0.15639461   NA  0.01073743  0.23005399 -0.13299972
## [10,] -0.17436534 -0.17321098   NA -0.23505304 -0.03599258  0.06184745
##              [,7]        [,8]        [,9]       [,10]
##  [1,] -0.33972499 -0.01822199 -0.25764497 -0.17436534
##  [2,] -0.33570333 -0.05081532  0.15639461 -0.17321098
##  [3,]          NA          NA          NA          NA
##  [4,] -0.39318178 -0.17677354  0.01073743 -0.23505304
##  [5,] -0.07746939 -0.07053865  0.23005399 -0.03599258
##  [6,] -0.32934492  0.02496831 -0.13299972  0.06184745
##  [7,]  1.10731732  0.31389085  0.32705747  0.11035842
##  [8,]  0.31389085  1.01367751  0.56074750 -0.49520843
##  [9,]  0.32705747  0.56074750  0.98163234  0.01398087
## [10,]  0.11035842 -0.49520843  0.01398087  0.99145070

The jackknife covariance does the same thing.

jackknife_cov(data)
##             [,1]       [,2] [,3]       [,4]       [,5]       [,6]       [,7]
##  [1,] 10.1319278 -2.6170129   NA -2.6409937 -4.9845865 -3.3559276 -3.4255603
##  [2,] -2.6170129 10.4217575   NA  4.6820084  4.5096159  1.9490419 -3.3850086
##  [3,]         NA         NA   NA         NA         NA         NA         NA
##  [4,] -2.6409937  4.6820084   NA 10.5986921 -0.2361874  2.3180179 -3.9645829
##  [5,] -4.9845865  4.5096159   NA -0.2361874  9.2623288  0.6230244 -0.7811497
##  [6,] -3.3559276  1.9490419   NA  2.3180179  0.6230244  8.9601505 -3.3208946
##  [7,] -3.4255603 -3.3850086   NA -3.9645829 -0.7811497 -3.3208946 11.1654496
##  [8,] -0.1837384 -0.5123878   NA -1.7824665 -0.7112647  0.2517638  3.1650661
##  [9,] -2.5979201  1.5769790   NA  0.1082691  2.3197110 -1.3410805  3.2978295
## [10,] -1.7581839 -1.7465440   NA -2.3701182 -0.3629252  0.6236285  1.1127807
##             [,8]       [,9]      [,10]
##  [1,] -0.1837384 -2.5979201 -1.7581839
##  [2,] -0.5123878  1.5769790 -1.7465440
##  [3,]         NA         NA         NA
##  [4,] -1.7824665  0.1082691 -2.3701182
##  [5,] -0.7112647  2.3197110 -0.3629252
##  [6,]  0.2517638 -1.3410805  0.6236285
##  [7,]  3.1650661  3.2978295  1.1127807
##  [8,] 10.2212483  5.6542040 -4.9933517
##  [9,]  5.6542040  9.8981261  0.1409737
## [10,] -4.9933517  0.1409737  9.9971279

Missing row

When we have some NA values in a row, we have a conceptual problem with the jackknife as the width of the jackknife distribution is linked to the number of measurements.

data <- matrix(rnorm(120), ncol = 10)
data[2, ] <- NA
print(data)
##             [,1]       [,2]       [,3]       [,4]        [,5]       [,6]
##  [1,] -0.7157144  0.3830505 -0.7483596  0.7578021 -0.42463748 -1.0319888
##  [2,]         NA         NA         NA         NA          NA         NA
##  [3,]  1.5840008  2.3416447 -0.6236529  0.7555545 -0.88647717  0.9775473
##  [4,] -0.5344119 -1.2091202  0.6460490  1.3712069 -1.24553237  0.3204870
##  [5,] -0.1075479 -1.0631454  0.5848066  0.2025899  0.44373591  0.5551791
##  [6,]  1.3168021 -0.2133760  0.6932702 -0.4290675  1.19539207  0.9631749
##  [7,]  0.1555438  0.3464016 -0.1989959  2.1343560  0.44835542  0.9661736
##  [8,]  0.5374163 -0.5724669 -1.7348646  1.1058122 -2.58813398  0.9200456
##  [9,] -1.4434792  1.9038218 -0.5851163  0.2887971 -1.26253861  1.0112359
## [10,]  0.5052784  0.4225661 -0.5862763  0.0177994 -2.16879414 -0.3203245
## [11,] -0.5389071 -1.9780189 -0.6286661  1.1347285  0.05426532 -0.3279949
## [12,] -0.5772848 -1.3236084  0.7361014  0.5699565 -0.54043595 -0.3360399
##              [,7]        [,8]        [,9]       [,10]
##  [1,] -0.98568735  0.83193395  0.05163127  1.26338239
##  [2,]          NA          NA          NA          NA
##  [3,] -0.43978461  0.07868216 -1.16101851 -1.09880530
##  [4,]  1.95037020 -0.66837550  0.31933346 -0.61617362
##  [5,]  0.02014322  0.27293429  0.58948715  1.26511164
##  [6,] -0.69760198  0.68743862 -0.02808972 -0.76255415
##  [7,] -0.15452058  0.14798859 -0.57284665  0.48192309
##  [8,]  0.20937894 -0.33652017  0.36277158 -0.21724651
##  [9,]  0.75228259 -0.55324366 -1.36605017  0.23668048
## [10,]  0.97152448 -0.55950405  0.02085035  0.90439651
## [11,]  0.33203712 -2.59494362  0.45120574 -0.95755261
## [12,]  0.96568011  0.88597241  0.51337796  0.06969566

Also here we get the same behavior by default:

cov(data)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [10,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
jackknife_cov(data)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## [10,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA

When we use complete, we get the same thing as just dropping the NA rows.

cov(data, use = 'complete')
##              [,1]        [,2]         [,3]         [,4]        [,5]
##  [1,]  0.83250622  0.29721215 -0.040818273 -0.121794571  0.08951248
##  [2,]  0.29721215  1.80989413 -0.368006348 -0.136563215 -0.31446225
##  [3,] -0.04081827 -0.36800635  0.631771866 -0.138924601  0.55825904
##  [4,] -0.12179457 -0.13656322 -0.138924601  0.503761136 -0.06619509
##  [5,]  0.08951248 -0.31446225  0.558259040 -0.066195088  1.32129893
##  [6,]  0.26572290  0.37253587 -0.004927037  0.023871159  0.05484673
##  [7,] -0.32454495 -0.36480161  0.161003109  0.085960166 -0.47637573
##  [8,]  0.20371515  0.36304579  0.262045321 -0.183206838  0.24357836
##  [9,] -0.05847824 -0.83474393  0.140077916 -0.009062944  0.05487916
## [10,] -0.29156935  0.04493083 -0.005503744 -0.071729203 -0.03559882
##               [,6]        [,7]        [,8]         [,9]        [,10]
##  [1,]  0.265722902 -0.32454495  0.20371515 -0.058478236 -0.291569350
##  [2,]  0.372535872 -0.36480161  0.36304579 -0.834743927  0.044930829
##  [3,] -0.004927037  0.16100311  0.26204532  0.140077916 -0.005503744
##  [4,]  0.023871159  0.08596017 -0.18320684 -0.009062944 -0.071729203
##  [5,]  0.054846728 -0.47637573  0.24357836  0.054879159 -0.035598821
##  [6,]  0.524206505 -0.03716920  0.04319642 -0.243855291 -0.229269321
##  [7,] -0.037169205  0.72991222 -0.32109435  0.113442823 -0.087148907
##  [8,]  0.043196423 -0.32109435  0.96679949 -0.043367164  0.334500851
##  [9,] -0.243855291  0.11344282 -0.04336716  0.453328559  0.089814514
## [10,] -0.229269321 -0.08714891  0.33450085  0.089814514  0.741091738
all(cov(data, use = 'complete') == cov(data[complete.cases(data), ]))
## [1] TRUE

With our jackknife function we get a failure, which should not happen!

jackknife_cov(data, na.rm = TRUE)
##             [,1]       [,2]        [,3]       [,4]       [,5]        [,6]
##  [1,] 100.733253   35.96267  -4.9390111 -14.737143  10.831010  32.1524711
##  [2,]  35.962670  218.99719 -44.5287681 -16.524149 -38.049932  45.0768405
##  [3,]  -4.939011  -44.52877  76.4443958 -16.809877  67.549344  -0.5961715
##  [4,] -14.737143  -16.52415 -16.8098768  60.955097  -8.009606   2.8884102
##  [5,]  10.831010  -38.04993  67.5493438  -8.009606 159.877171   6.6364541
##  [6,]  32.152471   45.07684  -0.5961715   2.888410   6.636454  63.4289871
##  [7,] -39.269939  -44.14099  19.4813762  10.401180 -57.641463  -4.4974738
##  [8,]  24.649533   43.92854  31.7074839 -22.168027  29.472981   5.2267672
##  [9,]  -7.075867 -101.00402  16.9494278  -1.096616   6.640378 -29.5064902
## [10,] -35.279891    5.43663  -0.6659531  -8.679234  -4.307457 -27.7415878
##             [,7]       [,8]        [,9]       [,10]
##  [1,] -39.269939  24.649533   -7.075867 -35.2798913
##  [2,] -44.140994  43.928541 -101.004015   5.4366303
##  [3,]  19.481376  31.707484   16.949428  -0.6659531
##  [4,]  10.401180 -22.168027   -1.096616  -8.6792335
##  [5,] -57.641463  29.472981    6.640378  -4.3074573
##  [6,]  -4.497474   5.226767  -29.506490 -27.7415878
##  [7,]  88.319379 -38.852416   13.726582 -10.5450178
##  [8,] -38.852416 116.982738   -5.247427  40.4746029
##  [9,]  13.726582  -5.247427   54.852756  10.8675561
## [10,] -10.545018  40.474603   10.867556  89.6721003