The goal of this package is to easily apply the same t-tests/basic data description across several sub-groups, with the output provided as a nice arranged data.frame
. Multiple comparison and \(p\)-value significance symbols are also provided.
This kind of analysis is commonly seen in ROI (Region-of-interest) analysis of brain imaging data (hence the name of the package, roistats).
The package comes with a data.frame called color_index
, which is similar in format to what we might get after cleaning and wrangling ROI data. This data.frame contains the neural analysis result of the degree of color memory sensitivity at each brain region of each subject. The color_index
data frame has three columns:
subj_id
: identifies the subjects in the study.roi_id
: identifies the brain sub-region of interest for the analysis. We are interested in eight brain regions. Note that the combination of subj_id
and roi_id
uniquely identifies rows in the dataset.color_index
: indicates how sensitive a certain brain region is to the memory of color for each subject/brain region combination.head(color_index)
#> # A tibble: 6 x 3
#> subj_id roi_id color_index
#> <chr> <chr> <dbl>
#> 1 01 AnG -0.0324
#> 2 01 dLatIPS -0.0425
#> 3 01 LO -0.0326
#> 4 01 pIPS -0.0148
#> 5 01 V1 -0.00126
#> 6 01 vIPS -0.0238
Before we dive into the statistical test, we likely want to get descriptive statistics for color_index
, including the mean, standard deviation, and standard error of the mean for each brain region. The df_sem
function help us with this, and is designed to be used in combination with dplyr, specifically dplyr::group_by()
. To obtain our descriptive statistics by roi_id
, we just first group the data frame by roi_id
, then pass the data frame to def_sem()
.
library(dplyr)
color_index %>%
group_by(roi_id) %>% # The column to get summaries by
df_sem(color_index) # The column to summarize
#> # A tibble: 8 x 5
#> roi_id mean_color_index sd n se
#> <chr> <dbl> <dbl> <int> <dbl>
#> 1 AnG 0.00537 0.0507 29 0.00942
#> 2 LO 0.0181 0.0428 29 0.00796
#> 3 V1 0.00955 0.0421 29 0.00782
#> 4 VTC 0.00468 0.0218 29 0.00405
#> 5 dLatIPS 0.0159 0.0510 29 0.00946
#> 6 pIPS 0.0102 0.0297 29 0.00552
#> 7 vIPS 0.0162 0.0327 29 0.00607
#> 8 vLatIPS 0.0162 0.0514 29 0.00955
Yay! We have obtained the SEM
(which is commonly used for error bar plotting in psych and cog neuro area) for each sub-group easily.
Note that if we do not use dplyr::group_by()
we just get the overall summaries. The package reports a warning when this happens because it generally goes against the intent of the package (computing multiple comparisons).
Now suppose we want to test whether color_index
is significantly different (i.e., significantly different than zero) for each possible sub-group (roi_id
). Note that we are not computing pairwise comparisons yet, just whether the mean for each subgroup is different from zero. Here, we have eight sub-groups, which means we will get eight one-sample t-test results in total. As a first step in the analysis we probably don’t care much about all the detailed output from stats::t.test
. Instead, we’re just looking for mean difference and significance. This is what the t_test_one_sample
function was designed to accommodate. We again pass the function a grouped data frame, and we get \(t\)-test results back for each group.
By default, a Bonferroni \(p\)-value adjustment is applied, but any adjustment available through stats::p.adjust()
can be supplied. Similarly, each mean is compared to zero by default, but this can be adjusted through the optional mu
argument. The interface for the function works essentially equivalanetly to df_sem()
.
color_index %>%
group_by(roi_id) %>%
t_test_one_sample(color_index)
#> # A tibble: 8 x 5
#> # Groups: roi_id [8]
#> roi_id tvalue df p p_bonferroni
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 AnG 0.570 28 0.573 1
#> 2 dLatIPS 1.68 28 0.104 0.835
#> 3 LO 2.27 28 0.0311 0.249
#> 4 pIPS 1.85 28 0.0752 0.601
#> 5 V1 1.22 28 0.232 1
#> 6 vIPS 2.67 28 0.0124 0.0991
#> 7 vLatIPS 1.69 28 0.101 0.811
#> 8 VTC 1.16 28 0.257 1
Here, we see the t-values, degrees of freedom, as well as the uncorrected and bonferroni corrected \(p\)-values! Nice! Note that the multiple comparison corrected \(p\)-values are provided by the p_bonferroni
column, but the name of this column will change depending on the method you want to use. Let’s try again, but this time using both the Bonferroni and the Benjamini and Hochberg (1995) method. We’ll put this in a nice table as well.
color_index_one_sample_t_res <- color_index %>%
group_by(roi_id) %>%
t_test_one_sample(color_index, p_adjust = c("bonferroni", "BH"))
knitr::kable(color_index_one_sample_t_res, digits = 3)
roi_id | tvalue | df | p | p_bonferroni | p_BH |
---|---|---|---|---|---|
AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 |
dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 |
LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 |
pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 |
V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 |
vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 |
vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 |
VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |
As before, if we our data frame is not grouped, we’ll get the stats returned, but with a warning.
color_index %>%
t_test_one_sample(color_index)
#> Warning: The `t_test_one_sample()` function expects a grouped data frame (i.e.,
#> from `dplyr::group_by()`). Returning statistics for the overall column.
#> Warning: `...` must not be empty for ungrouped data frames.
#> Did you want `data = everything()`?
#> # A tibble: 1 x 4
#> tvalue df p p_bonferroni
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4.44 231 0.0000142 0.0000142
Usually, we want the significance symbol to highlight the result in a table or plot. Here we have the p_range
function to create the significance symbol:
color_index_one_sample_t_with_sig <- color_index_one_sample_t_res %>%
mutate(sig_origin_p = p_range(p))
knitr::kable(color_index_one_sample_t_with_sig, digits = 3)
roi_id | tvalue | df | p | p_bonferroni | p_BH | sig_origin_p |
---|---|---|---|---|---|---|
AnG | 0.570 | 28 | 0.573 | 1.000 | 0.573 | |
dLatIPS | 1.678 | 28 | 0.104 | 0.835 | 0.167 | |
LO | 2.270 | 28 | 0.031 | 0.249 | 0.124 | * |
pIPS | 1.848 | 28 | 0.075 | 0.601 | 0.167 | |
V1 | 1.221 | 28 | 0.232 | 1.000 | 0.294 | |
vIPS | 2.673 | 28 | 0.012 | 0.099 | 0.099 | * |
vLatIPS | 1.694 | 28 | 0.101 | 0.811 | 0.167 | |
VTC | 1.156 | 28 | 0.257 | 1.000 | 0.294 |
You can use p_range
for a single number too:
The t_test_two_sample
function is used for applying two-sample t-tests to all sub-groups. This dataset has all the same columns as color_index
, but also includes a group
column specifying the condition of the experiement (paired versus control).
head(color_index_two_sample)
#> # A tibble: 6 x 4
#> subj_id roi_id group color_effect
#> <chr> <chr> <fct> <dbl>
#> 1 01 AnG Paired -0.0155
#> 2 01 dLatIPS Paired -0.0484
#> 3 01 LO Paired -0.00366
#> 4 01 pIPS Paired -0.0398
#> 5 01 V1 Paired -0.0120
#> 6 01 vIPS Paired -0.0366
We can obtain paired t-test for each sub-group using a similar approach as before. Here we’re including the additional paired = TRUE
argument to specify we want a paired \(t\)-test. This argument defaults to FALSE
.
color_index_two_sample %>%
group_by(roi_id) %>%
t_test_two_sample(color_effect, group, paired = TRUE)
#> # A tibble: 8 x 5
#> # Groups: roi_id [8]
#> roi_id tvalue df p p_bonferroni
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 AnG 0.570 28 0.573 1
#> 2 dLatIPS 1.68 28 0.104 0.835
#> 3 LO 2.27 28 0.0311 0.249
#> 4 pIPS 1.85 28 0.0752 0.601
#> 5 V1 1.22 28 0.232 1
#> 6 vIPS 2.67 28 0.0124 0.0991
#> 7 vLatIPS 1.69 28 0.101 0.811
#> 8 VTC 1.16 28 0.257 1