Get started

The goal of this package is to easily apply the same t-tests/basic data description across several sub-groups, with the output provided as a nice arranged data.frame. Multiple comparison and \(p\)-value significance symbols are also provided.

This kind of analysis is commonly seen in ROI (Region-of-interest) analysis of brain imaging data (hence the name of the package, roistats).

Get some basic description about the data

The package comes with a data.frame called color_index, which is similar in format to what we might get after cleaning and wrangling ROI data. This data.frame contains the neural analysis result of the degree of color memory sensitivity at each brain region of each subject. The color_index data frame has three columns:

subj_id: identifies the subjects in the study.
roi_id: identifies the brain sub-region of interest for the analysis. We are interested in eight brain regions. Note that the combination of subj_id and roi_id uniquely identifies rows in the dataset.
color_index: indicates how sensitive a certain brain region is to the memory of color for each subject/brain region combination.

head(color_index)
#> # A tibble: 6 x 3
#>   subj_id roi_id  color_index
#>   <chr>   <chr>         <dbl>
#> 1 01      AnG        -0.0324 
#> 2 01      dLatIPS    -0.0425 
#> 3 01      LO         -0.0326 
#> 4 01      pIPS       -0.0148 
#> 5 01      V1         -0.00126
#> 6 01      vIPS       -0.0238

Before we dive into the statistical test, we likely want to get descriptive statistics for color_index, including the mean, standard deviation, and standard error of the mean for each brain region. The df_sem function help us with this, and is designed to be used in combination with dplyr, specifically dplyr::group_by(). To obtain our descriptive statistics by roi_id, we just first group the data frame by roi_id, then pass the data frame to def_sem().

library(dplyr)

color_index %>%
  group_by(roi_id) %>% # The column to get summaries by
  df_sem(color_index) # The column to summarize
#> # A tibble: 8 x 5
#>   roi_id  mean_color_index     sd     n      se
#>   <chr>              <dbl>  <dbl> <int>   <dbl>
#> 1 AnG              0.00537 0.0507    29 0.00942
#> 2 LO               0.0181  0.0428    29 0.00796
#> 3 V1               0.00955 0.0421    29 0.00782
#> 4 VTC              0.00468 0.0218    29 0.00405
#> 5 dLatIPS          0.0159  0.0510    29 0.00946
#> 6 pIPS             0.0102  0.0297    29 0.00552
#> 7 vIPS             0.0162  0.0327    29 0.00607
#> 8 vLatIPS          0.0162  0.0514    29 0.00955

Yay! We have obtained the SEM (which is commonly used for error bar plotting in psych and cog neuro area) for each sub-group easily.

Note that if we do not use dplyr::group_by() we just get the overall summaries. The package reports a warning when this happens because it generally goes against the intent of the package (computing multiple comparisons).

df_sem(data = color_index, x = color_index)
#> Warning: The `df_sem()` function expects a grouped data frame (i.e., from
#> `dplyr::group_by()`). Returning the overall mean, sd, n and se.
#> # A tibble: 1 x 4
#>   mean_color_index     sd     n      se
#>              <dbl>  <dbl> <int>   <dbl>
#> 1           0.0120 0.0413   232 0.00271

One-sample t-tests for all sub-groups

Now suppose we want to test whether color_index is significantly different (i.e., significantly different than zero) for each possible sub-group (roi_id). Note that we are not computing pairwise comparisons yet, just whether the mean for each subgroup is different from zero. Here, we have eight sub-groups, which means we will get eight one-sample t-test results in total. As a first step in the analysis we probably don’t care much about all the detailed output from stats::t.test. Instead, we’re just looking for mean difference and significance. This is what the t_test_one_sample function was designed to accommodate. We again pass the function a grouped data frame, and we get \(t\)-test results back for each group.

By default, a Bonferroni \(p\)-value adjustment is applied, but any adjustment available through stats::p.adjust() can be supplied. Similarly, each mean is compared to zero by default, but this can be adjusted through the optional mu argument. The interface for the function works essentially equivalanetly to df_sem().

color_index %>% 
  group_by(roi_id) %>% 
  t_test_one_sample(color_index)
#> # A tibble: 8 x 5
#> # Groups:   roi_id [8]
#>   roi_id  tvalue    df      p p_bonferroni
#>   <chr>    <dbl> <dbl>  <dbl>        <dbl>
#> 1 AnG      0.570    28 0.573        1     
#> 2 dLatIPS  1.68     28 0.104        0.835 
#> 3 LO       2.27     28 0.0311       0.249 
#> 4 pIPS     1.85     28 0.0752       0.601 
#> 5 V1       1.22     28 0.232        1     
#> 6 vIPS     2.67     28 0.0124       0.0991
#> 7 vLatIPS  1.69     28 0.101        0.811 
#> 8 VTC      1.16     28 0.257        1

Here, we see the t-values, degrees of freedom, as well as the uncorrected and bonferroni corrected \(p\)-values! Nice! Note that the multiple comparison corrected \(p\)-values are provided by the p_bonferroni column, but the name of this column will change depending on the method you want to use. Let’s try again, but this time using both the Bonferroni and the Benjamini and Hochberg (1995) method. We’ll put this in a nice table as well.

color_index_one_sample_t_res <- color_index %>%
  group_by(roi_id) %>% 
  t_test_one_sample(color_index, p_adjust = c("bonferroni", "BH"))

knitr::kable(color_index_one_sample_t_res, digits = 3)

roi_id	tvalue	df	p	p_bonferroni	p_BH
AnG	0.570	28	0.573	1.000	0.573
dLatIPS	1.678	28	0.104	0.835	0.167
LO	2.270	28	0.031	0.249	0.124
pIPS	1.848	28	0.075	0.601	0.167
V1	1.221	28	0.232	1.000	0.294
vIPS	2.673	28	0.012	0.099	0.099
vLatIPS	1.694	28	0.101	0.811	0.167
VTC	1.156	28	0.257	1.000	0.294

As before, if we our data frame is not grouped, we’ll get the stats returned, but with a warning.

color_index %>% 
  t_test_one_sample(color_index)
#> Warning: The `t_test_one_sample()` function expects a grouped data frame (i.e.,
#> from `dplyr::group_by()`). Returning statistics for the overall column.
#> Warning: `...` must not be empty for ungrouped data frames.
#> Did you want `data = everything()`?
#> # A tibble: 1 x 4
#>   tvalue    df         p p_bonferroni
#>    <dbl> <dbl>     <dbl>        <dbl>
#> 1   4.44   231 0.0000142    0.0000142

Significance symbols for an even clearer table and possible visulization

Usually, we want the significance symbol to highlight the result in a table or plot. Here we have the p_range function to create the significance symbol:

color_index_one_sample_t_with_sig <- color_index_one_sample_t_res %>% 
  mutate(sig_origin_p = p_range(p))

knitr::kable(color_index_one_sample_t_with_sig, digits = 3)

roi_id	tvalue	df	p	p_bonferroni	p_BH	sig_origin_p
AnG	0.570	28	0.573	1.000	0.573
dLatIPS	1.678	28	0.104	0.835	0.167
LO	2.270	28	0.031	0.249	0.124	*
pIPS	1.848	28	0.075	0.601	0.167
V1	1.221	28	0.232	1.000	0.294
vIPS	2.673	28	0.012	0.099	0.099	*
vLatIPS	1.694	28	0.101	0.811	0.167
VTC	1.156	28	0.257	1.000	0.294

You can use p_range for a single number too:

p_range(0.002)
#> [1] "**"

Two-sample t-tests for all sub-groups

The t_test_two_sample function is used for applying two-sample t-tests to all sub-groups. This dataset has all the same columns as color_index, but also includes a group column specifying the condition of the experiement (paired versus control).

head(color_index_two_sample)
#> # A tibble: 6 x 4
#>   subj_id roi_id  group  color_effect
#>   <chr>   <chr>   <fct>         <dbl>
#> 1 01      AnG     Paired     -0.0155 
#> 2 01      dLatIPS Paired     -0.0484 
#> 3 01      LO      Paired     -0.00366
#> 4 01      pIPS    Paired     -0.0398 
#> 5 01      V1      Paired     -0.0120 
#> 6 01      vIPS    Paired     -0.0366

We can obtain paired t-test for each sub-group using a similar approach as before. Here we’re including the additional paired = TRUE argument to specify we want a paired \(t\)-test. This argument defaults to FALSE.

color_index_two_sample %>% 
  group_by(roi_id) %>% 
  t_test_two_sample(color_effect, group, paired = TRUE)
#> # A tibble: 8 x 5
#> # Groups:   roi_id [8]
#>   roi_id  tvalue    df      p p_bonferroni
#>   <chr>    <dbl> <dbl>  <dbl>        <dbl>
#> 1 AnG      0.570    28 0.573        1     
#> 2 dLatIPS  1.68     28 0.104        0.835 
#> 3 LO       2.27     28 0.0311       0.249 
#> 4 pIPS     1.85     28 0.0752       0.601 
#> 5 V1       1.22     28 0.232        1     
#> 6 vIPS     2.67     28 0.0124       0.0991
#> 7 vLatIPS  1.69     28 0.101        0.811 
#> 8 VTC      1.16     28 0.257        1