TestDimorph examples

Bassam Ahmed Elsayed Abulnoor, MennattAllah H. Attia, Iain R. Konigsberg, and Lyle W. Konigsberg

August 22, 2022



Table 1 Description of the main functions of the TestDimorph R package.

Function Description Notes
aov_ss Calculates sex specific one-way ANOVA from summary statistics and performs pairwise comparisons Uses the summary statistics
D_index Dissimilarity index (Chakraborty and Majumder 1982) for statistical computation and visualization of the area of non-overlap in the trait distribution between the sexes. Provides a table and a graphical representation of the selected traits and their corresponding dissimilarity indices. Also provides confidence intervals via a bias-corrected parametric bootstrap.
extract_sum Extract summary statistics needed for the other functions from uploaded raw data directly without need to go to a third-party package. Can also run the aov_ss, multivariate, t_greene, univariate, or van_vark functions after extracting the summary statistics.
Hedges_g Calculates Hedges’ (1981) for effect size between the sexes for a single trait. The confidence interval is found using a method described in Goulet-Pelletier and Cousineau (2018). Can also find the confidence interval using a bias-corrected parametric bootstrap.
MI_index Mixture Index is the mixture intersection measure of sexual dimorphism (Ipiña and Durand 2010). Ipiña and Durand (2010) also define a normal intersection NI measure which is the overlap coefficient of two normal distributions, equivalent to Inman and Bradley’s (1989) overlap coefficient. Can produce confidence intervals using a bias-corrected parametric bootstrap.
multivariate An extension of the univariate analysis of sexual dimorphism between different samples. MANOVA test is used to analyze the interaction effects and main effects. Type of MANOVA test employed can be “I,” “II” or “III” sum of squares and cross products. The test statistics can be Wilks’ lambda, Pillai’s trace, Hotelling-Lawley’s trace or Roy’s largest root. If univariate argument is TRUE, the function conducts ANOVAs on each variable.
raw_gen Raw data generation from summary statistics using univariate or multivariate normal distributions (with truncation as an option).
t_greene Relethford and Hodges’ (1985) and Greene’s (1989) t-test of sexual dimorphism. A plot of p-values for differences in sexual dimorphism across all pairs of samples can be produced with plot=TRUE
univariate Univariate analysis of sexual dimorphism using two-way ANOVA. Type of sums of squares can type type “I,” “II,” or “III.”
van_vark Provides testing for differences in sexual dimorphism between samples using van Vark et al.’s (1989) method.

This is Table 2. The R script extracts summary statistics for body mass from the 1999-2000 NHANES raw data and stores it in NHANES_univariate. The “populations” are self-reported race, coded as “Black” = Non-Hispanic Black, “Mex.Am” = Mexican American, or “White” = Non-Hispanic White, and the data are restricted to ages 20-40 years (inclusive). The output are an ANOVA (type II sums of squares by default) with eta-squared values and a table of pairwise comparisons. For more information on the NHANES studies see https://www.cdc.gov/nchs/nhanes/index.htm

Table.02=function () 
{
library(TestDimorph)
options(width=100) # This option just for output from Rmarkdown
NHANES_univariate<<-extract_sum(NHANES_1999,test='uni',run=FALSE) # BMXWT (Body mass)
univariate(NHANES_univariate,es_anova = "eta2",pairwise = TRUE)
}

Table.02()
The parameter used is BMXWT
$univariate
       term   df    sumsq  meansq statistic p.value signif   eta2 lower.eta2 upper.eta2
1       Sex    1  25378.2 25378.2   63.8746  0.0000    *** 0.0429     0.0247     0.0652
2       Pop    2  20970.0 10485.0   26.3898  0.0000    *** 0.0357     0.0186     0.0558
3   Sex*Pop    2   4141.8  2070.9    5.2123  0.0056     ** 0.0073     0.0007     0.0177
4 Residuals 1424 565773.3   397.3        NA      NA   <NA>     NA         NA         NA

$pairwise
   populations   df mean.diff conf.low conf.high statistic p.value signif
1 Black-Mex.Am  764   -5.9980 -11.8657   -0.1304   -2.0067  0.0451      *
2  Black-White  965   -8.9769 -14.8397   -3.1142   -3.0048  0.0027     **
3 Mex.Am-White 1119   -2.9789  -7.4263    1.4685   -1.3142  0.1890     ns

This is Table 3. The R script extracts summary statistics for body mass, standing height, and upper arm length from the 1999-2000 NHANES raw data and stores it in NHANES_multivariate. The “populations” are self-reported race, coded as “Black” = Non-Hispanic Black, “Mex.Am” = Mexican American, or “White” = Non-Hispanic White, and the data are restricted to ages 20-40 years (inclusive). The output is a MANOVA table (type II sums of squares and cross products by default) using Wilk’s lambda (the default).


Table.03=function()
{
library(TestDimorph)
NHANES_multivariate<<-extract_sum(NHANES_1999,test='multi',run=FALSE)
multivariate(NHANES_multivariate)
}

Table.03()
The parameters used are BMXWT,BMXHT,BMXARML
        term df  Wilks approx.f num.df den.df p.value signif
1     Sex(E)  1 0.5223 433.5580      3   1422  0.0000    ***
2     Pop(E)  2 0.7637  68.4009      6   2844  0.0000    ***
3 Sex*Pop(E)  2 0.9851   3.5834      6   2844  0.0015     **

This is Table 4. The R script uses NHANES_univariate and does an ANOVA with type III sums of squares. It then applies t_greene to do Relethford and Hodges’ (1985) and Greene’s (1989) t-test for all pairs, adjusts the p-values using the false discovery rate, and finally returns a “corrplot” view (https://cran.r-project.org/package=corrplot) of the p-values.


Table.04=function()
{
library(TestDimorph)
print(univariate(NHANES_univariate, type_anova='III'))
t_greene(NHANES_univariate,plot = TRUE,padjust ="fdr")
}

Table.04()
       term   df    sumsq  meansq statistic p.value signif
1       Sex    1  17356.4 17356.4   43.6845  0.0000    ***
2       Pop    2  18902.3  9451.2   23.7877  0.0000    ***
3   Sex*Pop    2   4141.8  2070.9    5.2123  0.0056     **
4 Residuals 1424 565773.3   397.3        NA      NA   <NA>

   populations   df mean.diff conf.low conf.high statistic p.value signif
1 Black-Mex.Am  764   -5.9980 -11.8657   -0.1304   -2.0067 0.06765     ns
2  Black-White  965   -8.9769 -14.8397   -3.1142   -3.0048 0.00810     **
3 Mex.Am-White 1119   -2.9789  -7.4263    1.4685   -1.3142 0.18900     ns

This is Table 5. The R script extracts summary statistics on eight variables from four samples in the W.W. Howells’(1973, 1989, 1995, 1996) dataset. The full data set can be found in https://rdrr.io/github/geanes/bioanth/man/howell.html. The script then runs van Vark et al.’s (1989) analysis.


Table.05=function()
{
library(TestDimorph)
to_van_Vark=extract_sum(Howells,test='van',run=F)
van_vark(to_van_Vark)
}

Table.05()
The parameters used are GOL,NOL,BNL,BBH,XCB,XFB,ZYB,AUB
The maximum possible value of q is (7).

  populations statistic df p.value signif
1 NORSE-EGYPT    1.2809  2  0.5271     ns
2 NORSE-TOLAI    8.8981  2  0.0117      *
3  NORSE-PERU    0.4268  2  0.8078     ns
4 EGYPT-TOLAI    5.2097  2  0.0739     ns
5  EGYPT-PERU    0.7477  2  0.6881     ns
6  TOLAI-PERU    5.4584  2  0.0653     ns

This is Table 6. The R script writes summary statistics on femur head diameters for males and females from four samples (Curate et al. 2017; Gulhan 2017; Kranioti et al. 2009; Timonov et al. 2014) to the object df. It then uses aov_ss to do ANOVA within males and within females across the four samples with each ANOVA followed by post-hoc pairwise comparisons.


Table.06=function () 
{
# Comparisons of femur head diameter in four populations
library(TestDimorph)
df <- data.frame(
  Pop = c("Turkish", "Bulgarian", "Greek", "Portuguese"),
  m = c(150.00, 82.00, 36.00, 34.00),
  f = c(150.00, 58.00, 34.00, 24.00),
  M.mu = c(49.39, 48.33, 46.99, 45.20),
  F.mu = c(42.91, 42.89, 42.44, 40.90),
  M.sdev = c(3.01, 2.53, 2.47, 2.00),
  F.sdev = c(2.90, 2.84, 2.26, 2.90)
)
print(aov_ss(x = df, CI=0.95),digits=6)
}

Table.06()
$`Male model`
         term  df    sumsq   meansq statistic p.value signif
1 Populations   3  566.214 188.7379   25.4042       0    ***
2   Residuals 298 2213.959   7.4294        NA      NA   <NA>

$`Male posthoc`
           populations mean.diff conf.low conf.high p.value signif
1      Greek-Bulgarian     -1.34  -2.7479    0.0679  0.0686     ns
2 Portuguese-Bulgarian     -3.13  -4.5664   -1.6936  0.0000    ***
3    Turkish-Bulgarian      1.06   0.0929    2.0271  0.0254      *
4     Portuguese-Greek     -1.79  -3.4741   -0.1059  0.0323      *
5        Turkish-Greek      2.40   1.0930    3.7070  0.0000    ***
6   Turkish-Portuguese      4.19   2.8524    5.5276  0.0000    ***

$`Female model`
         term  df     sumsq  meansq statistic p.value signif
1 Populations   3   88.4265 29.4755    3.7221   0.012      *
2   Residuals 262 2074.8100  7.9191        NA      NA   <NA>

$`Female posthoc`
           populations mean.diff conf.low conf.high p.value signif
1      Greek-Bulgarian     -0.45  -2.0216    1.1216  0.8807     ns
2 Portuguese-Bulgarian     -1.99  -3.7560   -0.2240  0.0202      *
3    Turkish-Bulgarian      0.02  -1.1050    1.1450  1.0000     ns
4     Portuguese-Greek     -1.54  -3.4798    0.3998  0.1716     ns
5        Turkish-Greek      0.47  -0.9120    1.8520  0.8156     ns
6   Turkish-Portuguese      2.01   0.4104    3.6096  0.0071     **

This is Table 7. The R script uses the maximum width of the patella from Cavazzuti et al. (2019). Calculated values are the mixture index (MI) and normal intersection (NI) (Ipiña and Durand 2010), the D index (Chakraborty and Majumder 1982), and Hedges’ g (Hedges 1981). For each statistic the bias-corrected parametric bootstrap (Efron 1981; Tibshirani 1984) is run 1,000 starting from a set.seed value of 42.


Table.07=function (i.which=13) 
{
library(TestDimorph)
print(MI_index(Cremains_measurements[i.which,],B=1000,rand=F,verbose=F,plot=T))
print(MI_index(Cremains_measurements[i.which,],index_type='NI',
   B=1000,rand=F,plot=T,verbose=F))
print(D_index(Cremains_measurements[i.which,],B=1000,rand=F,verbose=F,plot=T))
print(Hedges_g(Cremains_measurements[i.which,],B=1000,rand=F,verbose=F))
}

Table.07()

   Trait lower     MI upper
1 PA-MXW 0.025 0.1108 0.231

   Trait  lower     NI upper
1 PA-MXW 0.0544 0.2496 0.521

   Trait  lower      D  upper
1 PA-MXW 0.4788 0.7504 0.9454
   Trait  lower      g  upper
1 PA-MXW 1.2094 2.2429 3.7713


References

Cavazzuti, Claudio, Benedetta Bresadola, Chiara d’Innocenzo, Stella Interlando, and Alessandra Sperduti. 2019. “Towards a New Osteometric Method for Sexing Ancient Cremated Human Remains. Analysis of Late Bronze Age and Iron Age Samples from Italy with Gendered Grave Goods.” PloS One 14 (1): e0209423.
Chakraborty, Ranajit, and Partha P Majumder. 1982. “On Bennett’s Measure of Sex Dimorphism.” American Journal of Physical Anthropology 59 (3): 295–98.
Curate, Francisco, Cláudia Umbelino, Andreia Perinha, Carla Nogueira, Ana Maria Silva, and Eugénia Cunha. 2017. “Sex Determination from the Femur in Portuguese Populations with Classical and Machine-Learning Classifiers.” Journal of Forensic and Legal Medicine 52: 75–81.
Efron, Bradley. 1981. “Nonparametric Standard Errors and Confidence Intervals.” Canadian Journal of Statistics 9 (2): 139–58.
Goulet-Pelletier, Jean-Christophe, and Denis Cousineau. 2018. “A Review of Effect Sizes and Their Confidence Intervals, Part i: The Cohen’s d Family.” The Quantitative Methods for Psychology 14 (4): 242–65.
Greene, D. L. 1989. “Comparison of t-Tests for Differences in Sexual Dimorphism Between Populations.” American Journal of Physical Anthropology 79: 121–25.
Gulhan, Oznur. 2017. “Skeletal Sexing Standards of Human Remains in Turkey.” PhD thesis.
Hedges, Larry V. 1981. “Distribution Theory for Glass’s Estimator of Effect Size and Related Estimators.” Journal of Educational Statistics 6 (2): 107–28.
Howells, William White. 1973. Cranial Variation in Man: A Study by Multivariate Analysis of Patterns of Difference Among Recent Human Populations. Vol. 67. Papers of the Peabody Museum of Archaeology and Ethnology, Harvard University. Cambridge, MA: Peabody Museum of Archaeology and Ethnology.
———. 1989. Skull Shapes and the Map: Craniometric Analyses in the Dispersion of Modern Homo. Vol. 79. Papers of the Peabody Museum of Archaeology and Ethnology, Harvard University. Cambridge, MA: Papers of the Peabody Museum of Archaeology and Ethnology.
———. 1995. Who’s Who in Skulls: Ethnic Identification of Crania from Measurements. Vol. 82. Papers of the Peabody Museum of Archaeology and Ethnology. Cambridge, MA: Peabody Museum.
———. 1996. “Notes and Comments: Howells’ Craniometric Data on the Internet.” American Journal of Physical Anthropology 101 (3): 441–42.
Inman, Henry F, and Edwin L Bradley. 1989. “The Overlapping Coefficient as a Measure of Agreement Between Probability Distributions and Point Estimation of the Overlap of Two Normal Densities.” Communications in Statistics-Theory and Methods 18 (10): 3851–74.
Ipiña, Santiago L, and Ana I Durand. 2010. “Assessment of Sexual Dimorphism: A Critical Discussion in a (Paleo-) Anthropological Context.” Human Biology 82 (2): 199–220. https://doi.org/10.1006/bulm.2000.0185.
Kranioti, Elena F, Nikolaos Vorniotakis, Christianna Galiatsou, Mehmet Y İşcan, and Manolis Michalodimitrakis. 2009. “Sex Identification and Software Development Using Digital Femoral Head Radiographs.” Forensic Science International 189 (1-3): 113. e1–7.
Relethford, J. H., and D. C. Hodges. 1985. “A Statistical Test for Differences in Sexual Dimorphism Between Populations.” American Journal of Physical Anthropology 66: 55–61.
Tibshirani, Robert J. 1984. Bootstrap Confidence Intervals. Technical Report No. 3, Laboratory for Computational Statistics, Department of Statistics, Stanford University.
Timonov, Pavel, Antoaneta Fasova, Dobrinka Radoinova, Alexandar Alexandrov, and Delian Delev. 2014. “A Study of Sexual Dimorphism in the Femur Among Contemporary Bulgarian Population.” Eurasian Journal of Anthropology 5 (2): 46–53.
Vark, GN van, PGM van der Sman, J Dijkema, and JE Buikstra. 1989. “Some Multivariate Tests for Differences in Sexual Dimorphism Between Human Populations.” Annals of Human Biology 16 (4): 301–10.