The observer package checks that a given dataset passes user-specified rules. The main functions are observe_if
and inspect
.
For instance, according to the documentation of the diamonds
dataset in package ggplot2
, the column depth
is equal to 100*2*z/(x+y)
. Let us make an observation of this:
df <- ggplot2::diamonds %>%
mutate(depth2 = 100*2*z/(x+y)) %>%
observe_if(x > 0,
y > 0,
z > 0,
abs(depth-depth2) < 1)
obs(df)
#> # A tibble: 4 × 8
#> Id Predicate Passed Failed Missing Rows Status
#> * <int> <chr> <int> <int> <int> <list> <chr>
#> 1 1 x > 0 53932 8 0 <S3: bit> failed
#> 2 2 y > 0 53933 7 0 <S3: bit> failed
#> 3 3 z > 0 53920 20 0 <S3: bit> failed
#> 4 4 abs(depth - depth2) < 1 53840 93 7 <S3: bit> failed
#> # ... with 1 more variables: Number_of_trials <int>
We observe that 93 rows fail to satisfy this rule. To go further we need to see what is happening; with inspect
we can select the rows at stake:
inspect(df, 4)
#> # A tibble: 100 × 11
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 1.00 Premium G SI2 59.1 59 3142 6.55 6.48 0.00
#> 2 1.22 Premium J SI2 62.6 59 3156 6.79 4.24 3.76
#> 3 1.01 Premium H I1 58.1 59 3167 6.66 6.60 0.00
#> 4 0.70 Ideal G VS2 62.7 54 3172 5.65 5.70 3.65
#> 5 1.00 Very Good J SI2 62.8 63 3293 6.26 6.19 3.19
#> 6 0.70 Premium E IF 62.9 59 3403 5.66 5.59 3.40
#> 7 1.01 Fair F SI2 64.6 59 3540 6.19 6.25 4.20
#> 8 1.00 Fair G SI1 43.0 59 3634 6.32 6.27 3.97
#> 9 0.81 Premium E VS2 61.5 58 3674 5.99 5.94 3.97
#> 10 1.10 Premium G SI2 63.0 59 3696 6.50 6.47 0.00
#> # ... with 90 more rows, and 1 more variables: depth2 <dbl>
Another way is to write it with standard evaluation:
## Write your predicates first
p <- c(~ x > 0, ~ y > 0, ~ z > 0,
~ abs(depth-depth2) < 1)
## Make observations
df %>%
observe_if_(.dots = p) %>%
obs()
#> # A tibble: 8 × 8
#> Id Predicate Passed Failed Missing Rows Status
#> * <int> <chr> <int> <int> <int> <list> <chr>
#> 1 1 x > 0 53932 8 0 <S3: bit> failed
#> 2 2 y > 0 53933 7 0 <S3: bit> failed
#> 3 3 z > 0 53920 20 0 <S3: bit> failed
#> 4 4 abs(depth - depth2) < 1 53840 93 7 <S3: bit> failed
#> 5 5 x > 0 53932 8 0 <S3: bit> failed
#> 6 6 y > 0 53933 7 0 <S3: bit> failed
#> 7 7 z > 0 53920 20 0 <S3: bit> failed
#> 8 8 abs(depth - depth2) < 1 53840 93 7 <S3: bit> failed
#> # ... with 1 more variables: Number_of_trials <int>