This article provides an example of how to use owidR
to
create a country level dataset consisting of multiple variables. It does
this in the context of trying to answer the research question: does
higher internet use lead to higher levels of democracy? This is based on
research done by (citation) We will do a similar analysis using data
gathered from Our World in Data. This analysis is only intended to be an
example of how to use owidR and not a robust research paper. The article
assumes some basic knowledge of the tidyverse
, especially
dplyr
and ggplot2
. If you aren’t familiar with
either of these you should still be able to follow along but I would
recommend reading R for Data
Science. An understanding of basic R is essential.
To begin we’ll load owidR
using the
library()
function. We’ll also load dplyr
,
ggplot2
, plm
as well as texreg
which we’ll be using to do the analysis.
library(owidR)
library(dplyr)
library(ggplot2)
library(plm)
library(texreg)
Searching for and importing data using owidR
is very
easy. First, we can search for data on a topic using
owid_search()
. We’ll start by searching for data to use as
our outcome variable: internet To do this I enter the keyword “internet”
as the argument in owid_search()
.
owid_search("internet")
When running this line of code around 10 datasets about the internet
are returned. Let’s use the dataset with the title: Share of the
population using the Internet. The corresponding chart_id to this data
is: “Share of the population using the Internet”. Using the chart_id as
an argument to owid()
imports that data into R, assigning
it to an object called internet. We use the rename argument to give the
value column a shorter a clean name.
<- owid("share-of-individuals-using-the-internet", rename = "internet_use")
internet
internet#> # A tibble: 6,660 × 4
#> entity code year internet_use
#> * <chr> <chr> <int> <dbl>
#> 1 Afghanistan AFG 1990 0
#> 2 Afghanistan AFG 1991 0
#> 3 Afghanistan AFG 1992 0
#> 4 Afghanistan AFG 1993 0
#> 5 Afghanistan AFG 1994 0
#> 6 Afghanistan AFG 1995 0
#> 7 Afghanistan AFG 2001 0.00472
#> 8 Afghanistan AFG 2002 0.00456
#> 9 Afghanistan AFG 2003 0.0879
#> 10 Afghanistan AFG 2004 0.106
#> # … with 6,650 more rows
#> # ℹ Use `print(n = ...)` to see more rows
We can find information about the source of data by using
owid_source()
, with the owid dataset object as the
argument. This gives us the original publisher of the data as well as a
link to the data. For some datasets additional information about how the
variables is calculated is also provided. Using
view_chart()
takes you to the Our World in Data webpage for that
dataset, where there is also additional information and a pretty
graph.
owid_source(internet)
#> Dataset Name: International Telecommunication Union (via World Bank)
#>
#> Published By: World Development Indicators - World Bank (2022.05.26)
#>
#> Link: https://datacatalog.worldbank.org/search/dataset/0037712/World-Development-Indicators
view_chart(internet)
To create simple plots to see how internet use has changed over time
simply use owid_plot()
, filtering to give the World total.
Given that this function is a wrapper around ggplot2
you
can use normal ggplot2
functions to further manipulate the
graph. I’m going to add a title using labs()
and change the
y axis scale so that it starts from 0 (this makes the graph clearer to
interpret given that the value is a percentage, otherwise small
variations can appear large).
owid_plot(internet, filter = "World") +
labs(title = "Share of the World Population using the Internet") +
scale_y_continuous(limits = c(0, 100))
#> Loading required namespace: showtext
We can see how internet use varies between countries by creating a
choropleth map using owid_map
. It shows that, in 2018,
there is still a large variation in the level of internet use in
countries, with many African countries having particularly low use.
owid_map(internet, year = 2017) +
labs(title = "Share of Population Using the Internet, 2017")
It’s also possible to compare countries level of internet use across
time, again using owid_plot()
. By using the argument
summarise = FALSE
, owid_plot()
will show
individual countries instead of aggregating them into the total. You can
then use the filter
argument to select which countries you
want to be displayed.
owid_plot(internet, summarise = FALSE, filter = c("United Kingdom", "Spain", "Russia", "Egypt", "Nigeria")) +
labs(title = "Share of Population with Using the Internet") +
scale_y_continuous(limits = c(0, 100), labels = scales::label_number(suffix = "%")) # The labels argument allows you to make it clear that the value is a percentage
Now let’s get data on democracy, first searching for a data source
and then importing it using owid()
. Using that data we’ll
do some similar exploration to what we did with internet use data.
owid_search("democrac")
<- owid("electoral-democracy", rename = c("electoral_democracy", "vdem_high", "vdem_low"))
democracy
democracy#> # A tibble: 31,995 × 6
#> entity code year electoral_democracy vdem_high vdem_low
#> * <chr> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Afghanistan AFG 1789 0.018 0.026 0.012
#> 2 Afghanistan AFG 1790 0.018 0.026 0.012
#> 3 Afghanistan AFG 1791 0.018 0.026 0.012
#> 4 Afghanistan AFG 1792 0.018 0.026 0.012
#> 5 Afghanistan AFG 1793 0.018 0.026 0.012
#> 6 Afghanistan AFG 1794 0.018 0.026 0.012
#> 7 Afghanistan AFG 1795 0.018 0.026 0.012
#> 8 Afghanistan AFG 1796 0.018 0.026 0.012
#> 9 Afghanistan AFG 1797 0.018 0.026 0.012
#> 10 Afghanistan AFG 1798 0.018 0.026 0.012
#> # … with 31,985 more rows
#> # ℹ Use `print(n = ...)` to see more rows
owid_source(democracy)
#> Value:
#>
#> Dataset Name: OWID based on V-Dem (v12) and Lührmann et al. (2018)
#>
#> Published By: Our World in Data, Bastian Herre
#>
#> Link: http://v-dem.net/vdemds.html
#>
#> This dataset provides information on political regimes, using data from the Varieties of Democracy project (v11.1), and the Regimes of the World classification by Lührmann et al. (2018).
#>
#> We expand the countries and years covered, and refine the coding of the Regimes of the World classification. You can read a detailed description of the data in this post: https://ourworldindata.org/regimes-of-the-world-data
#>
#> You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/political_regimesValue:
#>
#> Dataset Name: OWID based on V-Dem (v12) and Lührmann et al. (2018)
#>
#> Published By: Our World in Data, Bastian Herre
#>
#> Link: http://v-dem.net/vdemds.html
#>
#> This dataset provides information on political regimes, using data from the Varieties of Democracy project (v11.1), and the Regimes of the World classification by Lührmann et al. (2018).
#>
#> We expand the countries and years covered, and refine the coding of the Regimes of the World classification. You can read a detailed description of the data in this post: https://ourworldindata.org/regimes-of-the-world-data
#>
#> You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/political_regimesValue:
#>
#> Dataset Name: OWID based on V-Dem (v12) and Lührmann et al. (2018)
#>
#> Published By: Our World in Data, Bastian Herre
#>
#> Link: http://v-dem.net/vdemds.html
#>
#> This dataset provides information on political regimes, using data from the Varieties of Democracy project (v11.1), and the Regimes of the World classification by Lührmann et al. (2018).
#>
#> We expand the countries and years covered, and refine the coding of the Regimes of the World classification. You can read a detailed description of the data in this post: https://ourworldindata.org/regimes-of-the-world-data
#>
#> You can download the code and complete dataset, including supplementary variables, from GitHub: https://github.com/owid/notebooks/tree/main/BastianHerre/political_regimes
owid_map(democracy, palette = "YlGn") +
labs(title = "Electoral Democracy")
So we’ve done some nice exploratory analysis and produced some pretty graphs, but now let’s get into some more in depth analysis. We’ll use a fixed effect (FE) regression analysis to estimate the average effect that an increase in internet use has on democracy within a country. If you aren’t familiar with FE regression this article explain its purpose well and this chapter from Introduction to Econometrics with R shows how to implement it in R. To estimate the effect of internet use we’re going to use a within-unit fixed effects model.
This model will require us to adjust for confounding factors, so
we’ll need some extra data. I’m going to use data on variables that I
think might be confounding the relationship between internet use and
democracy. These are: GDP per Capita, Government Expenditure, Age
Dependency and Unemployment. There are almost certainly more confounding
factors so feel free to use owid_search()
to find data on
other variables you think might be confounders and add them to the
analysis.
<- owid("gdp-per-capita-worldbank", rename = "gdp")
gdp
<- owid("total-gov-expenditure-gdp-wdi", rename = "gov_exp")
gov_exp
<- owid("age-dependency-ratio-of-working-age-population", rename = "age_dep")
age_dep
<- owid("unemployment-rate", rename = "unemp") unemployment
In order to create an FE model, all these separate dataframes now
need to combined into one. To do this I’m going to use the
left_join()
function from dplyr
and create a
new dataframe called data
that combines all the other
dataframes.
<- internet %>%
data left_join(democracy) %>%
left_join(gdp) %>%
left_join(gov_exp) %>%
left_join(age_dep) %>%
left_join(unemployment)
#> Joining, by = c("entity", "code", "year")
#> Joining, by = c("entity", "code", "year")
#> Joining, by = c("entity", "code", "year")
#> Joining, by = c("entity", "code", "year")
#> Joining, by = c("entity", "code", "year")
Now that we have a combined dataset we can get to the analysis.
First, let’s use ggplot2
create a graph to see the
correlation between internet access and democracy in 2015.
%>%
data filter(year == 2015) %>%
ggplot(aes(internet_use, electoral_democracy)) +
geom_point(colour = "#57677D") +
geom_smooth(method = "lm", colour = "#DC5E78") +
labs(title = "Relationship Between Internet Use and Polity IV Score", x = "Internet Use", y = "Polity IV") +
theme_owid()
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 43 rows containing non-finite values (stat_smooth).
#> Warning: Removed 43 rows containing missing values (geom_point).
There appears to be some relationship but this could easily be explained by countries with higher development also being more democratic and not actually the result of internet access. That’s why we control for GDP and the other confounders. Next, we’ll create two models, one with just internet use and democracy, and the other with the confounders added.
<- plm(electoral_democracy ~ internet_use, data,
fe_model effect = c("individual"), index = "entity")
<- plm(electoral_democracy ~ internet_use + gdp + gov_exp + age_dep + unemp, data,
fe_model_2 effect = c("individual"), index = "entity")
htmlreg(list(fe_model, fe_model_2))
Model 1 | Model 2 | |
---|---|---|
internet_use | 0.00*** | 0.00 |
(0.00) | (0.00) | |
gdp | 0.00 | |
(0.00) | ||
gov_exp | 0.00** | |
(0.00) | ||
age_dep | -0.00*** | |
(0.00) | ||
unemp | 0.00*** | |
(0.00) | ||
R2 | 0.03 | 0.08 |
Adj. R2 | -0.01 | 0.03 |
Num. obs. | 5313 | 2700 |
***p < 0.001; **p < 0.01; *p < 0.05 |
You can see that internet use has a significant positive effect in the first model, but once the confounders are added the effect is insignificant. This means that our model provides no evidence that internet use has an effect on democracy. However, feel free to play around with this data yourself and see if you get a different result when other variables are used.