This vignette shows how to use the R package covid19br for downloading and exploring data from the COVID-19 pandemic in Brazil and the globe as well. The package downloads datasets from the following repositories:
The official Brazilian repository provided by the Brazilian government: https://covid.saude.gov.br;
The Johns Hopkins University’s repository: https://github.com/CSSEGISandData/COVID-19
The last repository has data on the COVID-19 pandemic at the global level (daily counts of confirmed cases, deaths, and recovered patients by countries and territories), and has been widely used all over the world as a reliable source of data information on the COVID-19 pandemic. The former repository, on the other hand, possesses data on the Brazilian territory by city, state, region, and national levels.
We hope that this package may be helpful to other researchers and scientists to understand and fight this terrible pandemic that has been plaguing the world.
We will get started by showing how to use the package to load into R data sets of the COVID-19 pandemic by downloading the COVID-19 data set from the official Brazilian repository https://covid.saude.gov.br
library(covid19br)
library(tidyverse)
# downloading the data (at national level):
<- downloadCovid19("brazil")
brazil
# looking at the downloaded data:
glimpse(brazil)
#> Rows: 764
#> Columns: 9
#> $ date <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…
# plotting the accumulative number of deaths:
ggplot(brazil, aes(x = date, y = accumDeaths)) +
geom_point() +
geom_path()
Next, will show how to draw a plot with the daily count of new deaths
along with its respective moving averarge. Here, we will use the
function pracma::movavg()
to compute the moving
average.
library(pracma)
# computing the moving average:
<- brazil %>%
brazil mutate(
ma_newDeaths = movavg(newDeaths, n = 7, type = "s")
)
# looking at the transformed data:
glimpse(brazil)
#> Rows: 764
#> Columns: 10
#> $ date <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…
#> $ ma_newDeaths <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…
After computing the desired moving average, it is convenient to reorganize the data to fit the so-called tidy data format. This task can be easily done with the aid of the function pivot_long():
<- brazil %>%
deaths select(date, newDeaths, ma_newDeaths) %>%
pivot_longer(
cols = c("newDeaths", "ma_newDeaths"),
values_to = "deaths", names_to = "type"
%>%
) mutate(
type = recode(type,
ma_newDeaths = "moving average",
newDeaths = "count",
)
)
# looking at the (tidy) data:
glimpse(deaths)
#> Rows: 1,528
#> Columns: 3
#> $ date <date> 2020-02-25, 2020-02-25, 2020-02-26, 2020-02-26, 2020-02-27, 20…
#> $ type <chr> "count", "moving average", "count", "moving average", "count", …
#> $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
# drawing the desired plot:
ggplot(deaths, aes(x = date, y=deaths, color = type)) +
geom_point() +
geom_path() +
theme(legend.position="bottom")
When dealing with epidemiological data we are often interested in
computing quantities such as incidence, mortality and lethality rates.
The function covid19br::add_epi_rates()
can be used to add
those rates to the downloaded data, as shown below:
# downloading the data (region level):
<- downloadCovid19("regions")
regions
# adding the rates to the downloaded data:
<- regions %>%
regions add_epi_rates()
# looking at the data:
glimpse(regions)
#> Rows: 3,820
#> Columns: 13
#> $ region <chr> "Midwest", "Midwest", "Midwest", "Midwest", "Midwest", "M…
#> $ date <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 3, 4, …
#> $ accumCases <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 5, 9, …
#> $ newDeaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop <dbl> 16297074, 16297074, 16297074, 16297074, 16297074, 1629707…
#> $ incidence <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.000…
#> $ lethality <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ mortality <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
The function plotly::ggplotly()
can be used to draw an
interactive plot as follows:
library(plotly)
<- ggplot(regions, aes(x = date, y = mortality, color = region)) +
p geom_point() +
geom_path()
ggplotly(p)
In our last example, we will obtain a table summarizing the for the 27 Brazilian capitals in 2022-03-29.
library(kableExtra)
<- downloadCovid19("cities")
cities
<- cities %>%
capitals filter(capital == TRUE, date == max(date)) %>%
add_epi_rates() %>%
select(region, state, city, newCases, newDeaths, accumCases, accumDeaths, incidence, mortality, lethality) %>%
arrange(desc(lethality), desc(mortality), desc(incidence))
# printing the table:
%>%
capitals kable(
full_width = F,
caption = "Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states."
)
region | state | city | newCases | newDeaths | accumCases | accumDeaths | incidence | mortality | lethality |
---|---|---|---|---|---|---|---|---|---|
Northeast | MA | São Luís | 216 | 37 | 58090 | 2698 | 5271.880 | 244.8534 | 4.64 |
Southeast | SP | São Paulo | 441 | 25 | 1049194 | 42085 | 8563.435 | 343.4943 | 4.01 |
Southeast | RJ | Rio de Janeiro | 1097 | 14 | 950318 | 36653 | 14143.946 | 545.5206 | 3.86 |
North | PA | Belém | 124 | 0 | 137601 | 5300 | 9217.984 | 355.0506 | 3.85 |
North | AM | Manaus | 28 | 0 | 290265 | 9694 | 13298.054 | 444.1160 | 3.34 |
South | PR | Curitiba | 24 | 2 | 245942 | 8166 | 12722.641 | 422.4292 | 3.32 |
Northeast | CE | Fortaleza | 28 | 13 | 361022 | 10974 | 13524.756 | 411.1126 | 3.04 |
Northeast | BA | Salvador | 129 | 5 | 292684 | 8606 | 10189.716 | 299.6156 | 2.94 |
Midwest | MT | Cuiabá | 79 | 1 | 129374 | 3668 | 21120.665 | 598.8112 | 2.84 |
Northeast | PE | Recife | 736 | 7 | 222818 | 6087 | 13539.184 | 369.8669 | 2.73 |
Northeast | AL | Maceió | 35 | 2 | 115474 | 3045 | 11332.669 | 298.8376 | 2.64 |
Midwest | GO | Goiânia | 699 | 2 | 285625 | 7443 | 18839.295 | 490.9265 | 2.61 |
North | RO | Porto Velho | 122 | 0 | 105807 | 2658 | 19980.776 | 501.9413 | 2.51 |
South | RS | Porto Alegre | 224 | 4 | 246318 | 6189 | 16600.810 | 417.1129 | 2.51 |
Midwest | MS | Campo Grande | 42 | 3 | 188098 | 4419 | 20993.502 | 493.2019 | 2.35 |
Northeast | PI | Teresina | 0 | 4 | 119192 | 2764 | 13781.892 | 319.5948 | 2.32 |
Northeast | RN | Natal | 0 | 0 | 131402 | 2931 | 14862.428 | 331.5153 | 2.23 |
Northeast | PB | João Pessoa | 24 | 0 | 146214 | 3178 | 18073.089 | 392.8234 | 2.17 |
Southeast | MG | Belo Horizonte | 2183 | 0 | 373286 | 7617 | 14859.697 | 303.2161 | 2.04 |
North | AP | Macapá | 2 | 0 | 82691 | 1578 | 16428.882 | 313.5139 | 1.91 |
North | AC | Rio Branco | 0 | 0 | 62933 | 1178 | 15450.544 | 289.2082 | 1.87 |
Northeast | SE | Aracaju | 1 | 2 | 150168 | 2537 | 22856.169 | 386.1415 | 1.69 |
Midwest | DF | Brasília | 285 | 3 | 691980 | 11579 | 22949.204 | 384.0123 | 1.67 |
North | RR | Boa Vista | 9 | 0 | 119553 | 1618 | 29947.171 | 405.2974 | 1.35 |
Southeast | ES | Vitória | 98 | 2 | 107806 | 1404 | 29772.685 | 387.7414 | 1.30 |
South | SC | Florianópolis | 115 | 0 | 120285 | 1231 | 24010.276 | 245.7218 | 1.02 |
North | TO | Palmas | 4 | 0 | 73745 | 727 | 24653.408 | 243.0406 | 0.99 |