{datawizard}
package aims to make basic data wrangling
easier than with base R. The data wrangling workflow it supports is
similar to the one supported by the tidyverse package combination of
{dplyr}
and {tidyr}
. However, one of its main
features is that it has a very few dependencies: {stats}
and {utils}
(included in base R) and
{insight}
, which is the core package of the
easystats ecosystem. This package grew organically to
simultaneously satisfy the “0 non-base hard dependency” principle of
easystats and the data wrangling needs of the constituent
packages in this ecosystem.
One drawback of this genesis is that not all features of the
{tidyverse}
packages are supported since only features that
were necessary for easystats ecosystem have been implemented.
Some of these missing features (such as summarize
or the
pipe operator %>%
) are made available in other
dependency-free packages, such as {poorman}
.
It is also important to note that {datawizard}
was designed
to avoid namespace collisions with {tidyverse}
packages.
In this article, we will see how to go through basic data wrangling
steps with {datawizard}
. We will also compare it to the
{tidyverse}
syntax for achieving the same. This way, if you
decide to make the switch, you can easily find the translations here.
This vignette is largely inspired from {dplyr}
’s Getting started
vignette.
library(dplyr)
library(tidyr)
library(datawizard)
Before we look at their tidyverse equivalents, we can first
have a look at {datawizard}
’s key functions for data
wrangling:
Function | Operation |
---|---|
data_filter() |
to select only certain observations |
data_select() |
to select only a few variables |
data_extract() |
to extract a single variable |
data_rename() |
to rename variables |
data_relocate() |
to reorder a data frame |
data_to_long() |
to convert data from wide to long |
data_to_wide() |
to convert data from long to wide |
data_join() |
to join two data frames |
Note that there are a few functions in {datawizard}
that
have no strict equivalent in {dplyr}
or
{tidyr}
(e.g data_rotate()
), and so we won’t
discuss them in the next section.
{dplyr}
/ {tidyr}
Before we look at them individually, let’s first have a look at the summary table of this equivalence.
Function | Tidyverse equivalent(s) |
---|---|
data_filter() |
dplyr::filter() |
data_select() |
dplyr::select() |
data_extract() |
dplyr::pull() |
data_rename() |
dplyr::rename() |
data_relocate() |
dplyr::relocate() |
data_to_long() |
tidyr::pivot_longer() |
data_to_wide() |
tidyr::pivot_wider() |
data_join() |
dplyr::inner_join() ,
dplyr::left_join() , dplyr::right_join() , |
dplyr::full_join() ,
dplyr::anti_join() , dplyr::semi_join() |
data_filter()
is a wrapper around subset()
.
Therefore, if you want to have several filtering conditions, you need to
use &
. Separating the conditions with a comma (as in
dplyr::filter()
) will not work; it will
only apply the first condition.
# ---------- datawizard -----------
%>%
starwars data_filter(skin_color == "light" &
== "brown") eye_color
# ---------- tidyverse -----------
%>%
starwars filter(
== "light",
skin_color == "brown"
eye_color )
## # A tibble: 7 × 14
## name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ sex gender homew…⁵
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 Leia Organa 150 49 brown light brown 19 fema… femin… Aldera…
## 2 Biggs Darkl… 183 84 black light brown 24 male mascu… Tatooi…
## 3 Cordé 157 NA brown light brown NA fema… femin… Naboo
## 4 Dormé 165 NA brown light brown NA fema… femin… Naboo
## 5 Raymus Anti… 188 79 brown light brown NA male mascu… Aldera…
## 6 Poe Dameron NA NA brown light brown NA male mascu… <NA>
## 7 Padmé Amida… 165 45 brown light brown 46 fema… femin… Naboo
## # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
## # starships <list>, and abbreviated variable names ¹hair_color, ²skin_color,
## # ³eye_color, ⁴birth_year, ⁵homeworld
## # ℹ Use `colnames()` to see all variable names
data_select()
is the equivalent of
dplyr::select()
. The main difference between these two
functions is that data_select()
uses two arguments
(select
and exclude
) and requires quoted
column names if we want to select several variables, while
dplyr::select()
accepts any unquoted column names.
# ---------- datawizard -----------
%>%
starwars data_select(select = c("hair_color", "skin_color", "eye_color"))
# ---------- tidyverse -----------
%>%
starwars select(hair_color, skin_color, eye_color)
## # A tibble: 6 × 3
## hair_color skin_color eye_color
## * <chr> <chr> <chr>
## 1 blond fair blue
## 2 <NA> gold yellow
## 3 <NA> white, blue red
## 4 none white yellow
## 5 brown light brown
## 6 brown, grey light blue
# ---------- datawizard -----------
%>%
starwars data_select(select = -ends_with("color"))
# ---------- tidyverse -----------
%>%
starwars select(-ends_with("color"))
## # A tibble: 6 × 11
## name height mass birth…¹ sex gender homew…² species films vehic…³ stars…⁴
## * <chr> <int> <dbl> <dbl> <chr> <chr> <chr> <chr> <lis> <list> <list>
## 1 Luke … 172 77 19 male mascu… Tatooi… Human <chr> <chr> <chr>
## 2 C-3PO 167 75 112 none mascu… Tatooi… Droid <chr> <chr> <chr>
## 3 R2-D2 96 32 33 none mascu… Naboo Droid <chr> <chr> <chr>
## 4 Darth… 202 136 41.9 male mascu… Tatooi… Human <chr> <chr> <chr>
## 5 Leia … 150 49 19 fema… femin… Aldera… Human <chr> <chr> <chr>
## 6 Owen … 178 120 52 male mascu… Tatooi… Human <chr> <chr> <chr>
## # … with abbreviated variable names ¹birth_year, ²homeworld, ³vehicles,
## # ⁴starships
# ---------- datawizard -----------
%>%
starwars data_select(select = -hair_color:eye_color)
# ---------- tidyverse -----------
%>%
starwars select(!(hair_color:eye_color))
## # A tibble: 6 × 11
## name height mass birth…¹ sex gender homew…² species films vehic…³ stars…⁴
## * <chr> <int> <dbl> <dbl> <chr> <chr> <chr> <chr> <lis> <list> <list>
## 1 Luke … 172 77 19 male mascu… Tatooi… Human <chr> <chr> <chr>
## 2 C-3PO 167 75 112 none mascu… Tatooi… Droid <chr> <chr> <chr>
## 3 R2-D2 96 32 33 none mascu… Naboo Droid <chr> <chr> <chr>
## 4 Darth… 202 136 41.9 male mascu… Tatooi… Human <chr> <chr> <chr>
## 5 Leia … 150 49 19 fema… femin… Aldera… Human <chr> <chr> <chr>
## 6 Owen … 178 120 52 male mascu… Tatooi… Human <chr> <chr> <chr>
## # … with abbreviated variable names ¹birth_year, ²homeworld, ³vehicles,
## # ⁴starships
# ---------- datawizard -----------
%>%
starwars data_select(exclude = regex("color$"))
# ---------- tidyverse -----------
%>%
starwars select(-contains("color$"))
## # A tibble: 6 × 11
## name height mass birth…¹ sex gender homew…² species films vehic…³ stars…⁴
## * <chr> <int> <dbl> <dbl> <chr> <chr> <chr> <chr> <lis> <list> <list>
## 1 Luke … 172 77 19 male mascu… Tatooi… Human <chr> <chr> <chr>
## 2 C-3PO 167 75 112 none mascu… Tatooi… Droid <chr> <chr> <chr>
## 3 R2-D2 96 32 33 none mascu… Naboo Droid <chr> <chr> <chr>
## 4 Darth… 202 136 41.9 male mascu… Tatooi… Human <chr> <chr> <chr>
## 5 Leia … 150 49 19 fema… femin… Aldera… Human <chr> <chr> <chr>
## 6 Owen … 178 120 52 male mascu… Tatooi… Human <chr> <chr> <chr>
## # … with abbreviated variable names ¹birth_year, ²homeworld, ³vehicles,
## # ⁴starships
# ---------- datawizard -----------
%>%
starwars data_select(select = is.numeric)
# ---------- tidyverse -----------
%>%
starwars select(where(is.numeric))
## # A tibble: 6 × 3
## height mass birth_year
## * <int> <dbl> <dbl>
## 1 172 77 19
## 2 167 75 112
## 3 96 32 33
## 4 202 136 41.9
## 5 150 49 19
## 6 178 120 52
You can find a list of all the select helpers with
?data_select
.
Although we mostly work on dataframes, it is sometimes useful to
extract a single column as a vector. This can be done with
data_extract()
, which reproduces the behavior of
dplyr::pull()
:
# ---------- datawizard -----------
%>%
starwars data_extract(gender)
# ---------- tidyverse -----------
%>%
starwars pull(gender)
## [1] "masculine" "masculine" "masculine" "masculine" "feminine" "masculine"
We can also specify several variables in select
. In this
case, data_extract()
is equivalent to
data_select()
:
%>%
starwars data_extract(select = contains("color"))
## # A tibble: 6 × 3
## hair_color skin_color eye_color
## <chr> <chr> <chr>
## 1 blond fair blue
## 2 <NA> gold yellow
## 3 <NA> white, blue red
## 4 none white yellow
## 5 brown light brown
## 6 brown, grey light blue
data_rename()
is the equivalent of
dplyr::rename()
but the syntax between the two is
different. While dplyr::rename()
takes new-old pairs of
column names, data_rename()
requires a vector of column
names to rename, and then a vector of new names for these columns that
must be of the same length.
# ---------- datawizard -----------
%>%
starwars data_rename(
pattern = c("sex", "hair_color"),
replacement = c("Sex", "Hair Color")
)
# ---------- tidyverse -----------
%>%
starwars rename(
Sex = sex,
"Hair Color" = hair_color
)
## # A tibble: 6 × 14
## name height mass Hair …¹ skin_…² eye_c…³ birth…⁴ Sex gender homew…⁵
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 Luke Skywal… 172 77 blond fair blue 19 male mascu… Tatooi…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu… Tatooi…
## 3 R2-D2 96 32 <NA> white,… red 33 none mascu… Naboo
## 4 Darth Vader 202 136 none white yellow 41.9 male mascu… Tatooi…
## 5 Leia Organa 150 49 brown light brown 19 fema… femin… Aldera…
## 6 Owen Lars 178 120 brown,… light blue 52 male mascu… Tatooi…
## # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
## # starships <list>, and abbreviated variable names ¹`Hair Color`,
## # ²skin_color, ³eye_color, ⁴birth_year, ⁵homeworld
## # ℹ Use `colnames()` to see all variable names
The way data_rename()
is designed makes it easy to apply
the same modifications to a vector of column names. For example, we can
remove underscores and use TitleCase with the following code:
<- names(starwars)
to_rename
%>%
starwars data_rename(
pattern = to_rename,
replacement = tools::toTitleCase(gsub("_", " ", to_rename))
)
## # A tibble: 6 × 14
## Name Height Mass Hair …¹ Skin …² Eye C…³ Birth…⁴ Sex Gender Homew…⁵
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
## 1 Luke Skywal… 172 77 blond fair blue 19 male mascu… Tatooi…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu… Tatooi…
## 3 R2-D2 96 32 <NA> white,… red 33 none mascu… Naboo
## 4 Darth Vader 202 136 none white yellow 41.9 male mascu… Tatooi…
## 5 Leia Organa 150 49 brown light brown 19 fema… femin… Aldera…
## 6 Owen Lars 178 120 brown,… light blue 52 male mascu… Tatooi…
## # … with 4 more variables: Species <chr>, Films <list>, Vehicles <list>,
## # Starships <list>, and abbreviated variable names ¹`Hair Color`,
## # ²`Skin Color`, ³`Eye Color`, ⁴`Birth Year`, ⁵Homeworld
## # ℹ Use `colnames()` to see all variable names
It is also possible to add a prefix or a suffix to all or a subset of
variables with data_addprefix()
and
data_addsuffix()
. The argument select
accepts
all select helpers that we saw above with
data_select()
:
%>%
starwars data_addprefix(
pattern = "OLD.",
select = contains("color")
%>%
) data_addsuffix(
pattern = ".NEW",
select = -contains("color")
)
## # A tibble: 6 × 14
## name.NEW heigh…¹ mass.…² OLD.h…³ OLD.s…⁴ OLD.e…⁵ birth…⁶ sex.NEW gende…⁷
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Skywalker 172 77 blond fair blue 19 male mascul…
## 2 C-3PO 167 75 <NA> gold yellow 112 none mascul…
## 3 R2-D2 96 32 <NA> white,… red 33 none mascul…
## 4 Darth Vader 202 136 none white yellow 41.9 male mascul…
## 5 Leia Organa 150 49 brown light brown 19 female femini…
## 6 Owen Lars 178 120 brown,… light blue 52 male mascul…
## # … with 5 more variables: homeworld.NEW <chr>, species.NEW <chr>,
## # films.NEW <list>, vehicles.NEW <list>, starships.NEW <list>, and
## # abbreviated variable names ¹height.NEW, ²mass.NEW, ³OLD.hair_color,
## # ⁴OLD.skin_color, ⁵OLD.eye_color, ⁶birth_year.NEW, ⁷gender.NEW
## # ℹ Use `colnames()` to see all variable names
Sometimes, we want to relocate one or a small subset of columns in
the dataset. Rather than typing many names in
data_select()
, we can use data_relocate()
,
which is the equivalent of dplyr::relocate()
. Just like
data_select()
, we can specify a list of variables we want
to relocate with select
and exclude
. Then, the
arguments before
and after
1 specify where the
selected columns should be relocated:
# ---------- datawizard -----------
%>%
starwars data_relocate(sex:homeworld, before = "height")
# ---------- tidyverse -----------
%>%
starwars relocate(sex:homeworld, .before = height)
## # A tibble: 6 × 14
## name sex gender homew…¹ height mass hair_…² skin_…³ eye_c…⁴ birth…⁵
## * <chr> <chr> <chr> <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywal… male mascu… Tatooi… 172 77 blond fair blue 19
## 2 C-3PO none mascu… Tatooi… 167 75 <NA> gold yellow 112
## 3 R2-D2 none mascu… Naboo 96 32 <NA> white,… red 33
## 4 Darth Vader male mascu… Tatooi… 202 136 none white yellow 41.9
## 5 Leia Organa fema… femin… Aldera… 150 49 brown light brown 19
## 6 Owen Lars male mascu… Tatooi… 178 120 brown,… light blue 52
## # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
## # starships <list>, and abbreviated variable names ¹homeworld, ²hair_color,
## # ³skin_color, ⁴eye_color, ⁵birth_year
## # ℹ Use `colnames()` to see all variable names
In addition to column names, before
and
after
accept column indices. Finally, one can use
before = -1
to relocate the selected columns just before
the last column, or after = -1
to relocate them after the
last column.
# ---------- datawizard -----------
%>%
starwars data_relocate(sex:homeworld, after = -1)
## # A tibble: 6 × 14
## name height mass hair_…¹ skin_…² eye_c…³ birth…⁴ species films vehic…⁵
## * <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <lis> <list>
## 1 Luke Skywa… 172 77 blond fair blue 19 Human <chr> <chr>
## 2 C-3PO 167 75 <NA> gold yellow 112 Droid <chr> <chr>
## 3 R2-D2 96 32 <NA> white,… red 33 Droid <chr> <chr>
## 4 Darth Vader 202 136 none white yellow 41.9 Human <chr> <chr>
## 5 Leia Organa 150 49 brown light brown 19 Human <chr> <chr>
## 6 Owen Lars 178 120 brown,… light blue 52 Human <chr> <chr>
## # … with 4 more variables: starships <list>, sex <chr>, gender <chr>,
## # homeworld <chr>, and abbreviated variable names ¹hair_color, ²skin_color,
## # ³eye_color, ⁴birth_year, ⁵vehicles
## # ℹ Use `colnames()` to see all variable names
Reshaping data from wide to long or from long to wide format can be
done with data_to_long()
and data_to_wide()
.
These functions were designed to match
tidyr::pivot_longer()
and tidyr::pivot_wider()
arguments, so that the only thing to do is to change the function name.
However, not all of tidyr::pivot_longer()
and
tidyr::pivot_wider()
features are available yet.
We will use the relig_income
dataset, as in the {tidyr}
vignette.
relig_income
## # A tibble: 18 × 11
## religion `<$10k` $10-2…¹ $20-3…² $30-4…³ $40-5…⁴ $50-7…⁵ $75-1…⁶ $100-…⁷
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Agnostic 27 34 60 81 76 137 122 109
## 2 Atheist 12 27 37 52 35 70 73 59
## 3 Buddhist 27 21 30 34 33 58 62 39
## 4 Catholic 418 617 732 670 638 1116 949 792
## 5 Don’t know/r… 15 14 15 11 10 35 21 17
## 6 Evangelical … 575 869 1064 982 881 1486 949 723
## 7 Hindu 1 9 7 9 11 34 47 48
## 8 Historically… 228 244 236 238 197 223 131 81
## 9 Jehovah's Wi… 20 27 24 24 21 30 15 11
## 10 Jewish 19 19 25 25 30 95 69 87
## 11 Mainline Prot 289 495 619 655 651 1107 939 753
## 12 Mormon 29 40 48 51 56 112 85 49
## 13 Muslim 6 7 9 10 9 23 16 8
## 14 Orthodox 13 17 23 32 32 47 38 42
## 15 Other Christ… 9 7 11 13 13 14 18 14
## 16 Other Faiths 20 33 40 46 49 63 46 40
## 17 Other World … 5 2 3 4 2 7 3 4
## 18 Unaffiliated 217 299 374 365 341 528 407 321
## # … with 2 more variables: `>150k` <dbl>, `Don't know/refused` <dbl>, and
## # abbreviated variable names ¹`$10-20k`, ²`$20-30k`, ³`$30-40k`, ⁴`$40-50k`,
## # ⁵`$50-75k`, ⁶`$75-100k`, ⁷`$100-150k`
## # ℹ Use `colnames()` to see all variable names
We would like to reshape this dataset to have 3 columns: religion,
count, and income. The column “religion” doesn’t need to change, so we
exclude it with -religion
. Then, each remaining column
corresponds to an income category. Therefore, we want to move all these
column names to a single column called “income”. Finally, the values
corresponding to each of these columns will be reshaped to be in a
single new column, called “count”.
# ---------- datawizard -----------
%>%
relig_income data_to_long(
-religion,
names_to = "income",
values_to = "count"
)
# ---------- tidyverse -----------
%>%
relig_income pivot_longer(
!religion,
names_to = "income",
values_to = "count"
)
## # A tibble: 180 × 3
## religion income count
## * <chr> <chr> <dbl>
## 1 Agnostic <$10k 27
## 2 Agnostic $10-20k 34
## 3 Agnostic $20-30k 60
## 4 Agnostic $30-40k 81
## 5 Agnostic $40-50k 76
## 6 Agnostic $50-75k 137
## 7 Agnostic $75-100k 122
## 8 Agnostic $100-150k 109
## 9 Agnostic >150k 84
## 10 Agnostic Don't know/refused 96
## # … with 170 more rows
## # ℹ Use `print(n = ...)` to see more rows
To explore a bit more the arguments of data_to_long()
,
we will use another dataset: the billboard
dataset.
billboard
## # A tibble: 317 × 79
## artist track date.ent…¹ wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 Pac Baby… 2000-02-26 87 82 72 77 87 94 99 NA NA
## 2 2Ge+h… The … 2000-09-02 91 87 92 NA NA NA NA NA NA
## 3 3 Doo… Kryp… 2000-04-08 81 70 68 67 66 57 54 53 51
## 4 3 Doo… Loser 2000-10-21 76 76 72 69 67 65 55 59 62
## 5 504 B… Wobb… 2000-04-15 57 34 25 17 17 31 36 49 53
## 6 98^0 Give… 2000-08-19 51 39 34 26 26 19 2 2 3
## 7 A*Tee… Danc… 2000-07-08 97 97 96 95 100 NA NA NA NA
## 8 Aaliy… I Do… 2000-01-29 84 62 51 41 38 35 35 38 38
## 9 Aaliy… Try … 2000-03-18 59 53 38 28 21 18 16 14 12
## 10 Adams… Open… 2000-08-26 76 76 74 69 68 67 61 58 57
## # … with 307 more rows, 67 more variables: wk10 <dbl>, wk11 <dbl>, wk12 <dbl>,
## # wk13 <dbl>, wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>, wk18 <dbl>,
## # wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, wk22 <dbl>, wk23 <dbl>, wk24 <dbl>,
## # wk25 <dbl>, wk26 <dbl>, wk27 <dbl>, wk28 <dbl>, wk29 <dbl>, wk30 <dbl>,
## # wk31 <dbl>, wk32 <dbl>, wk33 <dbl>, wk34 <dbl>, wk35 <dbl>, wk36 <dbl>,
## # wk37 <dbl>, wk38 <dbl>, wk39 <dbl>, wk40 <dbl>, wk41 <dbl>, wk42 <dbl>,
## # wk43 <dbl>, wk44 <dbl>, wk45 <dbl>, wk46 <dbl>, wk47 <dbl>, wk48 <dbl>, …
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
# ---------- datawizard -----------
%>%
billboard data_to_long(
cols = starts_with("wk"),
names_to = "week",
values_to = "rank",
values_drop_na = TRUE
)
# ---------- tidyverse -----------
%>%
billboard pivot_longer(
cols = starts_with("wk"),
names_to = "week",
values_to = "rank",
values_drop_na = TRUE
)
## # A tibble: 5,307 × 5
## artist track date.entered week rank
## * <chr> <chr> <date> <chr> <dbl>
## 1 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk1 87
## 2 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk2 82
## 3 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk3 72
## 4 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk4 77
## 5 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk5 87
## 6 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk6 94
## 7 2 Pac Baby Don't Cry (Keep... 2000-02-26 wk7 99
## 8 2Ge+her The Hardest Part Of ... 2000-09-02 wk1 91
## 9 2Ge+her The Hardest Part Of ... 2000-09-02 wk2 87
## 10 2Ge+her The Hardest Part Of ... 2000-09-02 wk3 92
## # … with 5,297 more rows
## # ℹ Use `print(n = ...)` to see more rows
Once again, we use an example in the {tidyr}
vignette to
show how close data_to_wide()
and
pivot_wider()
are:
fish_encounters
## # A tibble: 114 × 3
## fish station seen
## <fct> <fct> <int>
## 1 4842 Release 1
## 2 4842 I80_1 1
## 3 4842 Lisbon 1
## 4 4842 Rstr 1
## 5 4842 Base_TD 1
## 6 4842 BCE 1
## 7 4842 BCW 1
## 8 4842 BCE2 1
## 9 4842 BCW2 1
## 10 4842 MAE 1
## # … with 104 more rows
## # ℹ Use `print(n = ...)` to see more rows
# ---------- datawizard -----------
%>%
fish_encounters data_to_wide(
names_from = "station",
values_from = "seen",
values_fill = 0
)
# ---------- tidyverse -----------
%>%
fish_encounters pivot_wider(
names_from = station,
values_from = seen,
values_fill = 0
)
## # A tibble: 19 × 12
## fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
## * <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4842 1 1 1 1 1 1 1 1 1 1 1
## 2 4843 1 1 1 1 1 1 1 1 1 1 1
## 3 4844 1 1 1 1 1 1 1 1 1 1 1
## 4 4845 1 1 1 1 1 0 0 0 0 0 0
## 5 4847 1 1 1 0 0 0 0 0 0 0 0
## 6 4848 1 1 1 1 0 0 0 0 0 0 0
## 7 4849 1 1 0 0 0 0 0 0 0 0 0
## 8 4850 1 1 0 1 1 1 1 0 0 0 0
## 9 4851 1 1 0 0 0 0 0 0 0 0 0
## 10 4854 1 1 0 0 0 0 0 0 0 0 0
## 11 4855 1 1 1 1 1 0 0 0 0 0 0
## 12 4857 1 1 1 1 1 1 1 1 1 0 0
## 13 4858 1 1 1 1 1 1 1 1 1 1 1
## 14 4859 1 1 1 1 1 0 0 0 0 0 0
## 15 4861 1 1 1 1 1 1 1 1 1 1 1
## 16 4862 1 1 1 1 1 1 1 1 1 0 0
## 17 4863 1 1 0 0 0 0 0 0 0 0 0
## 18 4864 1 1 0 0 0 0 0 0 0 0 0
## 19 4865 1 1 1 0 0 0 0 0 0 0 0
In {datawizard}
, joining datasets is done with
data_join()
(or its alias data_merge()
).
Contrary to {dplyr}
, this unique function takes care of all
types of join, which are then specified inside the function with the
argument join
(by default, join = "left"
).
Below, we show how to perform the four most common joins: full, left,
right and inner. We will use the datasets band_members
and
band_instruments
provided by {dplyr}
:
band_members
## # A tibble: 3 × 2
## name band
## <chr> <chr>
## 1 Mick Stones
## 2 John Beatles
## 3 Paul Beatles
band_instruments
## # A tibble: 3 × 2
## name plays
## <chr> <chr>
## 1 John guitar
## 2 Paul bass
## 3 Keith guitar
# ---------- datawizard -----------
%>%
band_members data_join(band_instruments, join = "full")
# ---------- tidyverse -----------
%>%
band_members full_join(band_instruments)
## # A tibble: 4 × 3
## name band plays
## * <chr> <chr> <chr>
## 1 Mick Stones <NA>
## 2 John Beatles guitar
## 3 Paul Beatles bass
## 4 Keith <NA> guitar
# ---------- datawizard -----------
%>%
band_members data_join(band_instruments, join = "left")
# ---------- tidyverse -----------
%>%
band_members left_join(band_instruments)
## # A tibble: 3 × 3
## name band plays
## * <chr> <chr> <chr>
## 1 Mick Stones <NA>
## 2 John Beatles guitar
## 3 Paul Beatles bass
# ---------- datawizard -----------
%>%
band_members data_join(band_instruments, join = "right")
# ---------- tidyverse -----------
%>%
band_members right_join(band_instruments)
## # A tibble: 3 × 3
## name band plays
## * <chr> <chr> <chr>
## 1 John Beatles guitar
## 2 Paul Beatles bass
## 3 Keith <NA> guitar
# ---------- datawizard -----------
%>%
band_members data_join(band_instruments, join = "inner")
# ---------- tidyverse -----------
%>%
band_members inner_join(band_instruments)
## # A tibble: 2 × 3
## name band plays
## * <chr> <chr> <chr>
## 1 John Beatles guitar
## 2 Paul Beatles bass
{datawizard}
contains other functions that are not
necessarily included in {dplyr}
or {tidyr}
.
Some of them are inspired from the package janitor
.
We can convert a column in rownames and move rownames to a new column
with rownames_as_column()
and
column_as_rownames()
:
<- head(mtcars)
mtcars mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
<- mtcars %>%
mtcars2 rownames_as_column(var = "model")
mtcars2
## model mpg cyl disp hp drat wt qsec vs am gear carb
## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
%>%
mtcars2 column_as_rownames(var = "model")
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
When dealing with messy data, it is sometimes useful to use a row as
column names, and vice versa. This can be done with
row_to_colnames()
and colnames_to_row()
.
<- data.frame(
x X_1 = c(NA, "Title", 1:3),
X_2 = c(NA, "Title2", 4:6)
) x
## X_1 X_2
## 1 <NA> <NA>
## 2 Title Title2
## 3 1 4
## 4 2 5
## 5 3 6
<- x %>%
x2 row_to_colnames(row = 2)
x2
## Title Title2
## 1 <NA> <NA>
## 3 1 4
## 4 2 5
## 5 3 6
%>%
x2 colnames_to_row()
## x1 x2
## 1 Title Title2
## 11 <NA> <NA>
## 3 1 4
## 4 2 5
## 5 3 6
Note that we use before
and
after
whereas dplyr::relocate()
uses
.before
and .after
.↩︎