The flight data in the flightsbr package is downloaded from Brazil’s Civil Aviation Agency (ANAC). The data includes detailed information on every international flight to and from Brazil, as well as domestic flights within the country. The data include flight-level information of airports of origin and destination, flight duration, aircraft type, payload, and the number of passengers, and several other variables.
Now we can load some libraries we’ll use in this vignette:
library(flightsbr)
library(data.table)
library(ggplot2)
Download data of all flights:
# in a given **month* of a given **year** (yyyymm)
<- read_flights(date=201506, showProgress = FALSE)
df_201506
# in a given year (yyyy)
<- read_flights(date=2015, showProgress = FALSE) df_2015
If you know already what data columns you need, you can pass a vector with their names to select
parameter and read_flights()
will only load those columns. This will make the function a bit faster.
<- read_flights(date=201506,
df_201506 showProgress = FALSE,
select = c('id_empresa', 'nr_voo', 'dt_partida_real',
'sg_iata_origem' , 'sg_iata_destino'))
head(df_201506)
The package makes it easy to compare daily number of passengers across different years. In the example below we compare daily number of air passengers in Brazil in 2019 and 2020. This gives us a glimpse in the impact of COVID-19 on Brazilian aviation, similarly to study of Bazzo, Braga and Pereira (2021).
# download flights data
<- read_flights(date=2019, showProgress = TRUE)
df_2019 <- read_flights(date=2020, showProgress = TRUE)
df_2020
# count daily passengers
<- df_2019[, .(total_pass = sum(nr_passag_pagos, na.rm=TRUE)) , by = dt_partida_real]
count_2019 <- df_2020[, .(total_pass = sum(nr_passag_pagos, na.rm=TRUE)) , by = dt_partida_real]
count_2020
# reformat date
<- count_2019[ between(dt_partida_real, as.Date('2019-01-01'), as.Date('2019-12-31')) ]
count_2019 <- count_2020[ between(dt_partida_real, as.Date('2020-01-01'), as.Date('2020-12-31')) ]
count_2020
:= paste0("2030-", format(dt_partida_real, "%m-%d"))]
count_2019[, date := as.IDate(date, format="%Y-%m-%d") ]
count_2019[, date
:= paste0("2030-", format(dt_partida_real, "%m-%d"))]
count_2020[, date := as.IDate(date, format="%Y-%m-%d") ]
count_2020[, date
# plot
<- ggplot() +
fig geom_point( data= count_2019, aes(x=date, y=total_pass, color='gray50'), alpha=.4, size=1) +
geom_point( data= count_2020, aes(x=date, y=total_pass, color='#006890') , alpha=.7, size=1) +
scale_y_log10(name="Number of Passengers",
labels = unit_format(unit = ""), limit=c(1000,NA)) +
scale_x_date(date_breaks = "1 months", date_labels = "%b") +
labs(subtitle ='Daily number of air passengers in Brazil', color = "Legend") +
scale_color_identity(labels = c("2020", "2019"), name = "", guide = "legend") +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
axis.text = element_text(size = 7),
axis.title=element_text(size=9),
plot.background = element_rect(fill='white', colour='white'))