Introduction

Analysts typically work with raw or unitary data as many have access to either student information systems or data warehouses that store information at the student level. Most functions in the DisImpact package are designed with such data structures in mind. However, when analysts collaborate with other data providers or have limited access to data, the data provided are typically summarized or aggregated to protect student privacy. For example, the California Community Colleges Chancellor’s Office (CCCCO) Student Success Metrics (SSM) dashboard allows users to download the data that underlies the visualizations.

This data set is summarized by cohort, outcome, time window, and value, meaning each row corresponds to a data point in a visualization within the dashboard. The DisImpact package allows one to calculate disproportionate impact (DI) for such a data structure using the di_iterate_on_long function, which is very similar to the di_iterate function illustrated in the Scaling DI Calculations vignette.

Load `DisImpact` and toy data set

First, load the necessary packages.

library(DisImpact)
library(dplyr) # Ease in manipulations with data frames

Second, load a toy data set.

data(ssm_cohort) # provided from DisImpact
dim(ssm_cohort)

## [1] 5760   20

# head(ssm_cohort)

A few rows from the `ssm_cohort` data set. The following variables are ommitted in this print out: `description`, `categoryLabel`, `source`.
cohort	localeName	academicYear	metricID	title	categoryID	disagg1	subgroup1	disagg2	subgroup2	value	denom	perc	dataType	missingFlag	ferpaFlag	X20
After 3 Years	Community College A	2015	SM 504C3	Completed Transfer-Level Math and English	501	Age	19 or Less	Gender	All Other Values	NA	NA	NA	Percent	1	1	NA
After 3 Years	Community College A	2015	SM 504C3	Completed Transfer-Level Math and English	501	Age	19 or Less	Gender	Female	169	957	0.17659	Percent	0	0	NA
After 3 Years	Community College A	2015	SM 504C3	Completed Transfer-Level Math and English	501	Age	19 or Less	Gender	Male	182	1149	0.15840	Percent	0	0	NA
After 3 Years	Community College A	2015	SM 504C3	Completed Transfer-Level Math and English	501	Age	19 or Less	None	None	353	2131	0.16565	Percent	0	0	NA
After 3 Years	Community College A	2015	SM 504C3	Completed Transfer-Level Math and English	501	Age	20 to 24	Gender	All Other Values	NA	NA	NA	Percent	1	1	NA
After 3 Years	Community College A	2015	SM 504C3	Completed Transfer-Level Math and English	501	Age	20 to 24	Gender	Female	NA	NA	NA	Percent	1	1	NA

To get a description of each variable, type ?ssm_cohort in the R console.

Select relevant rows

In the following code, we select relevant rows that correspond to the outcomes of interest (categoryLabel), the disaggregations of interest (disagg1), and all non-missing and non-FERPA-suppressed groups:

d_relevant <- ssm_cohort %>%
  filter(
    categoryLabel %in% c('Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF'
                       , 'Attained the Vision Goal Definition of Completion'
                       , 'Earned an Associate Degree'
                       , 'Transferred to a Four-Year Postsecondary Institution'
                         )
    , disagg1 %in% c('Ethnicity', 'Foster Youth', 'Veterans')
    , disagg2 == 'None' # There's also Gender
    , missingFlag == 0
    , ferpaFlag == 0
  )
d_relevant %>%
  group_by(disagg1, subgroup1) %>%
  tally

## # A tibble: 13 x 3
## # Groups:   disagg1 [3]
##    disagg1      subgroup1                     n
##    <chr>        <chr>                     <int>
##  1 Ethnicity    All Masked Values            14
##  2 Ethnicity    Asian                        14
##  3 Ethnicity    Black or African American     4
##  4 Ethnicity    Filipino                     14
##  5 Ethnicity    Hispanic                     14
##  6 Ethnicity    Two or More Races            14
##  7 Ethnicity    White                        14
##  8 Foster Youth All Masked Values            11
##  9 Foster Youth Foster Youth                  3
## 10 Foster Youth Not Foster Youth              3
## 11 Veterans     All Masked Values            12
## 12 Veterans     Not Veteran                   2
## 13 Veterans     Veteran                       1

In the following code, we select similar rows to the previous selection, but also allow for each group within the first level of disaggregation to also be disaggregated by gender (disagg2):

d_relevant_gender <- ssm_cohort %>%
  filter(
    categoryLabel %in% c('Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF'
                       , 'Attained the Vision Goal Definition of Completion'
                       , 'Earned an Associate Degree'
                       , 'Transferred to a Four-Year Postsecondary Institution'
                         )
    , disagg1 %in% c('Ethnicity', 'Foster Youth', 'Veterans')
    # , disagg2 == 'None' # There's also Gender
    , disagg2 == 'Gender'
    , missingFlag == 0
    , ferpaFlag == 0
  )
d_relevant_gender %>%
  group_by(disagg1, subgroup1, disagg2, subgroup2) %>%
  tally

## # A tibble: 21 x 5
## # Groups:   disagg1, subgroup1, disagg2 [12]
##    disagg1   subgroup1         disagg2 subgroup2            n
##    <chr>     <chr>             <chr>   <chr>            <int>
##  1 Ethnicity All Masked Values Gender  All Other Values    14
##  2 Ethnicity Asian             Gender  Female              14
##  3 Ethnicity Asian             Gender  Male                14
##  4 Ethnicity Filipino          Gender  Female              10
##  5 Ethnicity Filipino          Gender  Male                 7
##  6 Ethnicity Hispanic          Gender  Female              14
##  7 Ethnicity Hispanic          Gender  Male                14
##  8 Ethnicity Two or More Races Gender  Female              13
##  9 Ethnicity Two or More Races Gender  Male                13
## 10 Ethnicity White             Gender  Female              14
## # ... with 11 more rows

For an ethnicity group like Asian (or any group specified by the disaggregation variable disagg1), the data set d_relevant would have a row for the group, and the data set d_relevant would have multiple rows, one corresponding to each gender class.

Execute `di_iterate_on_long` on a data set

Let’s illustrate the di_iterate_on_long function with some key arguments:

data: A data frame for which to iterate DI calculations for a set of variables.
num_var: A variable name (character value) from data where the variable stores success counts (the numerator in success rates). Success rates are calculated by aggregating num_var and denom_var for each unique combination of values in disagg_var_col, group_var_col, disagg_var_col_2, group_var_col_2, cohort_var_col, and summarize_by_vars. If such combinations are unique (single row), then rows are not collapsed.
denom_var: A variable name (character value) from data where the variable stores the group size (the denominator in success rates).
disagg_var_col: A variable name (character value) from data where the variable stores the different disaggregation scenarios. The disaggregation variable could include such values as ‘Ethnicity’, ‘Age Group’, and ‘Foster Youth’, corresponding to three disaggregation scenarios.
group_var_col: A variable name (character value) from data where the variable stores the group name for each group within a level of disaggregation specified in disagg_var_col. For example, the group names could include ‘Asian’, ‘White’, ‘Black’, ‘Latinx’, ‘Native American’, and ‘Other’ for a disaggregation on ethnicity; ‘Under 18’, ‘18-21’, ‘22-25’, and ‘25+’ for an age group disaggregation; and ‘Yes’ and ‘No’ for a foster youth status disaggregation.
disagg_var_col_2: (Optional) A variable name (character value) from data where the variable stores an optional second disaggregation variable, which allows for the intersectionality of variables listed in disagg_var_col and disagg_var_col_2. The second disaggregation variable could describe something not in disagg_var_col_2, such as ‘Gender’, which would require all groups described in group_var_col to be broken out by gender.
group_var_col_2: (Optional) A variable name (character value) from data where the variable stores the group name for each group within a second level of disaggregation specified in disagg_var_col_2. For example, the group names could include ‘Male’, ‘Female’, ‘Non-binary’, and ‘Unknown’ if ‘Gender’ is a value in the variable disagg_var_col_2.
cohort_var_col: (Optional) A variable name (character value) from data where the variable stores the cohort label for the data described in each row.
summarize_by_vars: (Optional) A character vector of variable names in data for which num_var and denom_var are used for aggregation to calculate success rates for the dispropotionate impact (DI) analysis set up by disagg_var_col, group_var_col, disagg_var_col_2, and group_var_col_2. For example, summarize_by_vars=c('Outcome') could specify a single variable/column that describes the outcome or metric in num_var, where the outcome values might include ‘Completion of Transfer-Level Math’, ‘Completion of Transfer-Level English’,‘Transfer’, ‘Associate Degree’.

To see the details of these and other arguments, type ?di_iterate_on_long in the R console.

# Example 1: By outcome, cohort
di_summ_1 <- di_iterate_on_long(data=d_relevant
                                , num_var='value'
                                , denom_var='denom'
                                , disagg_var_col='disagg1'
                                , group_var_col='subgroup1'
                                , cohort_var_col='academicYear'
                                , summarize_by_vars=c('categoryLabel', 'cohort')
                                , ppg_reference_groups='all but current' # PPG-1
                                , di_80_index_reference_groups='all but current' # Relative rates analogous to PPG-1 for reference group
                                  )

## Joining, by = c("cohort", "academicYear", "categoryLabel", "disagg1")
## Joining, by = c("cohort", "academicYear", "categoryLabel", "disagg1",
## "subgroup1")
## Joining, by = c("categoryLabel", "cohort", "disagg1", "academicYear")
## Joining, by = c("..scenario..", "..groupref..")

nrow(di_summ_1)

## [1] 120

nrow(d_relevant)

## [1] 120

di_summ_1 %>%
  head %>%
  as.data.frame

##                                                                                            categoryLabel
## 1 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 2 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 3 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 4 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 5 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 6 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
##          cohort   disagg1 academicYear         subgroup1    n success
## 1 After 3 Years Ethnicity         2015 All Masked Values  132      10
## 2 After 3 Years Ethnicity         2015             Asian  938     115
## 3 After 3 Years Ethnicity         2015          Filipino   97      16
## 4 After 3 Years Ethnicity         2015          Hispanic  903      69
## 5 After 3 Years Ethnicity         2015 Two or More Races  162      21
## 6 After 3 Years Ethnicity         2015             White 1178     133
##          pct ppg_reference ppg_reference_group        moe       pct_lo
## 1 0.07575758     0.1079927     all but current 0.08529805 -0.009540476
## 2 0.12260128     0.1007282     all but current 0.03199813  0.090603145
## 3 0.16494845     0.1050407     all but current 0.09950392  0.065444529
## 4 0.07641196     0.1176705     all but current 0.03261236  0.043799602
## 5 0.12962963     0.1056034     all but current 0.07699607  0.052633558
## 6 0.11290323     0.1034946     all but current 0.03000000  0.082903226
##      pct_hi di_indicator_ppg success_needed_not_di_ppg
## 1 0.1610556                0                         0
## 2 0.1545994                0                         0
## 3 0.2644524                0                         0
## 4 0.1090243                1                         8
## 5 0.2066257                0                         0
## 6 0.1429032                0                         0
##   success_needed_full_parity_ppg di_prop_index di_indicator_prop_index
## 1                              5     0.7097070                       1
## 2                              0     1.1485450                       0
## 3                              0     1.5452589                       0
## 4                             38     0.7158373                       1
## 5                              0     1.2143875                       0
## 6                              0     1.0576923                       0
##   success_needed_not_di_prop_index success_needed_full_parity_prop_index
## 1                                2                                     5
## 2                                0                                     0
## 3                                0                                     0
## 4                               11                                    38
## 5                                0                                     0
## 6                                0                                     0
##   di_80_index_reference_group di_80_index di_indicator_80_index
## 1             all but current   0.7015066                     1
## 2             all but current   1.2171501                     0
## 3             all but current   1.5703282                     0
## 4             all but current   0.6493721                     1
## 5             all but current   1.2275132                     0
## 6             all but current   1.0909091                     0
##   success_needed_not_di_80_index success_needed_full_parity_80_index
## 1                              2                                   5
## 2                              0                                   0
## 3                              0                                   0
## 4                             17                                  38
## 5                              0                                   0
## 6                              0                                   0

To calculate DI with cohort year collapsed, then one could omit the cohort_var_col argument for rows with common disagg_var_col, group_var_col, and those in summarize_by_vars to be aggregated or collapsed:

# Example 2: by outcome, collapse cohort academic years
di_summ_2 <- di_iterate_on_long(data=d_relevant
                                , num_var='value'
                                , denom_var='denom'
                                , disagg_var_col='disagg1'
                                , group_var_col='subgroup1'
                                # , cohort_var_col='academicYear'
                                , summarize_by_vars=c('categoryLabel', 'cohort')
                                , ppg_reference_groups='all but current'
                                , di_80_index_reference_groups='all but current'
                                  )

## Joining, by = c("cohort", "categoryLabel", "disagg1")
## Joining, by = c("cohort", "categoryLabel", "disagg1", "subgroup1")
## Joining, by = c("categoryLabel", "cohort", "disagg1")
## Joining, by = c("..scenario..", "..groupref..")

nrow(di_summ_2)

## [1] 43

nrow(d_relevant)

## [1] 120

di_summ_2 %>%
  head %>%
  as.data.frame

##                                                                                            categoryLabel
## 1 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 2 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 3 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 4 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 5 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 6 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
##          cohort   disagg1         subgroup1    n success       pct
## 1 After 3 Years Ethnicity All Masked Values  798     117 0.1466165
## 2 After 3 Years Ethnicity             Asian 6391    1316 0.2059146
## 3 After 3 Years Ethnicity          Filipino  622     174 0.2797428
## 4 After 3 Years Ethnicity          Hispanic 4757     647 0.1360101
## 5 After 3 Years Ethnicity Two or More Races  984     251 0.2550813
## 6 After 3 Years Ethnicity             White 6797    1262 0.1856701
##   ppg_reference ppg_reference_group        moe    pct_lo    pct_hi
## 1     0.1865634     all but current 0.03469162 0.1119249 0.1813082
## 2     0.1754724     all but current 0.03000000 0.1759146 0.2359146
## 3     0.1820249     all but current 0.03929442 0.2404483 0.3190372
## 4     0.1998851     all but current 0.03000000 0.1060101 0.1660101
## 5     0.1814533     all but current 0.03124126 0.2238400 0.2863226
## 6     0.1846685     all but current 0.03000000 0.1556701 0.2156701
##   di_indicator_ppg success_needed_not_di_ppg success_needed_full_parity_ppg
## 1                1                         5                             32
## 2                0                         0                              0
## 3                0                         0                              0
## 4                1                       162                            304
## 5                0                         0                              0
## 6                0                         0                              0
##   di_prop_index di_indicator_prop_index success_needed_not_di_prop_index
## 1     0.7925135                       1                                2
## 2     1.1130399                       0                                0
## 3     1.5121070                       0                                0
## 4     0.7351819                       1                               71
## 5     1.3788032                       0                                0
## 6     1.0036118                       0                                0
##   success_needed_full_parity_prop_index di_80_index_reference_group di_80_index
## 1                                    32             all but current   0.7858807
## 2                                     0             all but current   1.1734871
## 3                                     0             all but current   1.5368383
## 4                                   304             all but current   0.6804415
## 5                                     0             all but current   1.4057685
## 6                                     0             all but current   1.0054242
##   di_indicator_80_index success_needed_not_di_80_index
## 1                     1                              3
## 2                     0                              0
## 3                     0                              0
## 4                     1                            114
## 5                     0                              0
## 6                     0                              0
##   success_needed_full_parity_80_index
## 1                                  32
## 2                                   0
## 3                                   0
## 4                                 304
## 5                                   0
## 6                                   0

Second layer of disaggregation / Intersectionality

Sometimes, users may want to incorporate a second layer of disaggregation / intersection with another a second variable (e.g., gender). The Intersectionality vignette discusses this in some detail. One could do this using the second derived data set, d_relevant_gender, which contains summarized data with rows split out by gender:

# Example 3: by outcome, intersecting gender
di_summ_3 <- di_iterate_on_long(data=d_relevant_gender
                                , num_var='value'
                                , denom_var='denom'
                                , disagg_var_col='disagg1'
                                , group_var_col='subgroup1'
                                , disagg_var_col_2='disagg2'
                                , group_var_col_2='subgroup2'
                                , cohort_var_col='academicYear'
                                , summarize_by_vars=c('categoryLabel', 'cohort')
                                , ppg_reference_groups='overall'
                                , di_80_index_reference_groups='all but current'
                                  )

## Joining, by = c("cohort", "academicYear", "categoryLabel", "disagg1",
## "disagg2")
## Joining, by = c("cohort", "academicYear", "categoryLabel", "disagg1",
## "subgroup1", "disagg2", "subgroup2")
## Joining, by = c("categoryLabel", "cohort", "disagg1", "disagg2",
## "academicYear")
## Joining, by = c("..scenario..", "..groupref..")

nrow(di_summ_3)

## [1] 201

nrow(d_relevant_gender)

## [1] 201

di_summ_3 %>%
  head %>%
  as.data.frame

##                                                                                            categoryLabel
## 1 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 2 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 3 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 4 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 5 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 6 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
##          cohort   disagg1 disagg2 academicYear         subgroup1
## 1 After 3 Years Ethnicity  Gender         2015 All Masked Values
## 2 After 3 Years Ethnicity  Gender         2015             Asian
## 3 After 3 Years Ethnicity  Gender         2015             Asian
## 4 After 3 Years Ethnicity  Gender         2015          Hispanic
## 5 After 3 Years Ethnicity  Gender         2015          Hispanic
## 6 After 3 Years Ethnicity  Gender         2015 Two or More Races
##          subgroup2   n success        pct ppg_reference ppg_reference_group
## 1 All Other Values 444      46 0.10360360     0.1064011             overall
## 2           Female 490      54 0.11020408     0.1064011             overall
## 3             Male 427      60 0.14051522     0.1064011             overall
## 4           Female 457      34 0.07439825     0.1064011             overall
## 5             Male 438      35 0.07990868     0.1064011             overall
## 6             Male  98      13 0.13265306     0.1064011             overall
##          moe     pct_lo    pct_hi di_indicator_ppg success_needed_not_di_ppg
## 1 0.04650874 0.05709486 0.1501123                0                         0
## 2 0.04427189 0.06593219 0.1544760                0                         0
## 3 0.04742552 0.09308970 0.1879407                0                         0
## 4 0.04584247 0.02855578 0.1202407                0                         0
## 5 0.04682621 0.03308246 0.1267349                0                         0
## 6 0.09899495 0.03365811 0.2316480                0                         0
##   success_needed_full_parity_ppg di_prop_index di_indicator_prop_index
## 1                              2     0.9737077                       0
## 2                              0     1.0357416                       0
## 3                              0     1.3206177                       0
## 4                             15     0.6992242                       1
## 5                             12     0.7510134                       1
## 6                              0     1.2467260                       0
##   success_needed_not_di_prop_index success_needed_full_parity_prop_index
## 1                                0                                     2
## 2                                0                                     0
## 3                                0                                     0
## 4                                6                                    17
## 5                                3                                    14
## 6                                0                                     0
##   di_80_index_reference_group di_80_index di_indicator_80_index
## 1             all but current   0.9700203                     0
## 2             all but current   1.0417730                     0
## 3             all but current   1.3818822                     0
## 4             all but current   0.6691466                     1
## 5             all but current   0.7253068                     1
## 6             all but current   1.2556108                     0
##   success_needed_not_di_80_index success_needed_full_parity_80_index
## 1                              0                                   2
## 2                              0                                   0
## 3                              0                                   0
## 4                              7                                  17
## 5                              4                                  14
## 6                              0                                   0

Custom Reference Groups

The function di_iterate, the workhorse function underlying the function di_iterate_on_long, defaults to the overall rate and the highest performing group rate as reference when determining disproportionate impact using the percentage point gap method and the 80% index method, respectively (function arguments default: ppg_reference_groups="overall", di_80_index_reference_groups="hpg").

When using the di_iterate_on_long function, the user could specify 'overall', 'hpg', or 'all but current' to override the default for the ppg_reference_groups and the di_80_index_reference_groups arguments. To speficy custom reference groups for comparison, the user could use the argument custom_reference_group_flag_var to specify a variable in the data set specified by data that indicates the rows/groups to be used as reference. The same groups will be used as reference for both the percentage point gap method and the 80% index method. The following is an illustration:

# Example 4: By outcome, cohort; custom reference groups
di_summ_4 <- di_iterate_on_long(data=d_relevant %>%
                                  filter(subgroup1 != 'All Masked Values') %>% # some foster youth and vetans disaggregation have just a single All Masked Values row; removing these scenarios for purpose of illustration
                                  mutate(custom_reference=ifelse(subgroup1 %in% c('White','Not Foster Youth', 'Not Veteran'), 1, 0)) # create a variable that flags the reference groups
                                , num_var='value'
                                , denom_var='denom'
                                , disagg_var_col='disagg1'
                                , group_var_col='subgroup1'
                                , cohort_var_col='academicYear'
                                , summarize_by_vars=c('categoryLabel', 'cohort')
                                , custom_reference_group_flag_var='custom_reference' # Specify variable/flag for custom reference groups
                                  )

## Joining, by = c("cohort", "academicYear", "categoryLabel", "disagg1")
## Joining, by = c("cohort", "academicYear", "categoryLabel", "disagg1",
## "subgroup1", "custom_reference")
## Joining, by = c("categoryLabel", "cohort", "disagg1", "academicYear")
## Joining, by = c("..scenario..", "..groupref..")

nrow(di_summ_4)

## [1] 83

di_summ_4 %>%
  head %>%
  as.data.frame

##                                                                                            categoryLabel
## 1 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 2 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 3 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 4 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 5 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
## 6 Completed Both Transfer-Level Math and English Within the District in the First Year Aligned with SCFF
##          cohort   disagg1 academicYear         subgroup1 custom_reference    n
## 1 After 3 Years Ethnicity         2015             Asian                0  938
## 2 After 3 Years Ethnicity         2015          Filipino                0   97
## 3 After 3 Years Ethnicity         2015          Hispanic                0  903
## 4 After 3 Years Ethnicity         2015 Two or More Races                0  162
## 5 After 3 Years Ethnicity         2015             White                1 1178
## 6 After 3 Years  Veterans         2015       Not Veteran                1 3385
##   success        pct ppg_reference ppg_reference_group        moe     pct_lo
## 1     115 0.12260128     0.1129032              Custom 0.03199813 0.09060315
## 2      16 0.16494845     0.1129032              Custom 0.09950392 0.06544453
## 3      69 0.07641196     0.1129032              Custom 0.03261236 0.04379960
## 4      21 0.12962963     0.1129032              Custom 0.07699607 0.05263356
## 5     133 0.11290323     0.1129032              Custom 0.03000000 0.08290323
## 6     364 0.10753323     0.1075332              Custom 0.03000000 0.07753323
##      pct_hi di_indicator_ppg success_needed_not_di_ppg
## 1 0.1545994                0                         0
## 2 0.2644524                0                         0
## 3 0.1090243                1                         4
## 4 0.2066257                0                         0
## 5 0.1429032                0                         0
## 6 0.1375332                0                         0
##   success_needed_full_parity_ppg di_prop_index di_indicator_prop_index
## 1                              0     1.1352740                       0
## 2                              0     1.5274040                       0
## 3                             33     0.7075661                       1
## 4                              0     1.2003557                       0
## 5                              0     1.0454711                       0
## 6                              0     1.0000000                       0
##   success_needed_not_di_prop_index success_needed_full_parity_prop_index
## 1                                0                                     0
## 2                                0                                     0
## 3                               12                                    40
## 4                                0                                     0
## 5                                0                                     0
## 6                                0                                     0
##   di_80_index_reference_group di_80_index di_indicator_80_index
## 1                      Custom   1.0858970                     0
## 2                      Custom   1.4609720                     0
## 3                      Custom   0.6767916                     1
## 4                      Custom   1.1481481                     0
## 5                      Custom   1.0000000                     0
## 6                      Custom   1.0000000                     0
##   success_needed_not_di_80_index success_needed_full_parity_80_index
## 1                              0                                   0
## 2                              0                                   0
## 3                             13                                  33
## 4                              0                                   0
## 5                              0                                   0
## 6                              0                                   0

Additional Information

For additional illustrations of various parameter changes in di_iterate_on_long, please see the Scaling DI Calculations vignette as the di_iterate_on_long function is very similar to di_iterate that’s applied to a unitary data set.

Appendix: R and R Package Versions

This vignette was generated using an R session with the following packages. There may be some discrepancies when the reader replicates the code caused by version mismatch.

sessionInfo()

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.39       dplyr_1.0.8      DisImpact_0.0.19
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.8.3     highr_0.9        pillar_1.7.0     bslib_0.3.1     
##  [5] compiler_4.0.2   jquerylib_0.1.4  prettydoc_0.4.1  tools_4.0.2     
##  [9] digest_0.6.25    jsonlite_1.7.0   evaluate_0.15    lifecycle_1.0.1 
## [13] tibble_3.1.6     fstcore_0.9.12   pkgconfig_2.0.3  rlang_1.0.1     
## [17] cli_3.2.0        DBI_1.1.0        yaml_2.3.5       parallel_4.0.2  
## [21] xfun_0.30        fastmap_1.1.0    stringr_1.4.0    generics_0.1.2  
## [25] vctrs_0.3.8      sass_0.4.1       tidyselect_1.1.2 glue_1.6.1      
## [29] R6_2.3.0         fansi_1.0.2      rmarkdown_2.14   tidyr_1.2.0     
## [33] purrr_0.3.4      blob_1.2.1       magrittr_2.0.2   ellipsis_0.3.2  
## [37] htmltools_0.5.2  fst_0.9.8        assertthat_0.2.1 utf8_1.2.2      
## [41] stringi_1.4.6    crayon_1.5.0

Disproportionate Impact (DI) Calculations on Long, Summarized Data Sets

Vinh Nguyen

2022-08-22

Introduction

Load `DisImpact` and toy data set

Select relevant rows

Execute `di_iterate_on_long` on a data set

Second layer of disaggregation / Intersectionality

Custom Reference Groups

Additional Information

Appendix: R and R Package Versions

Introduction

Load DisImpact and toy data set

Select relevant rows

Execute di_iterate_on_long on a data set

Second layer of disaggregation / Intersectionality

Custom Reference Groups

Additional Information

Appendix: R and R Package Versions

Load `DisImpact` and toy data set

Execute `di_iterate_on_long` on a data set