library(knitr)
library(kableExtra)
This guide is an entry in a series of proposed vignettes in which we walk through a deep cleaning or exploratory data analysis (EDA) of a widely employed environment-security dataset. For this entry, we will explore the Varieties of Democracy dataset (V-Dem; (Coppedge2020?)). V-Dem is a massive dataset that aims to provide quantitative assessments of historical and nation-state democracy. V-Dem provides both multidimensional and disaggregated measures of democracy across five primary principals: electoral, liberal, participatory, deliberative, and egalitarian (Pemstein2018?). The V-Dem team is comprised of dozens of scientists spread across the globe working with thousands of local experts to quantify local and regional aspects of democracy.
V-Dem is not alone in its efforts to quantify qualitative aspects of nation-state democracy, civil liberties, and elections. Similar datasets include Polity5 (Marshall2002?), Freedom House’s Freedom in the World, Countries at the Crossroads, and Freedom of the Press (FreedomHouse2014?), and the Institutions and Elections Project (Wig2015?). Although these datasets are similar in many ways, V-Dem stands out with the sheer number of metrics included. V-Dem features over 470 indicators, 82 indices, and 5 high-level metrics. That is an overwhelming amount of data on par with the World Development Indicators (TheWorldBank2017?). Let’s get started.
The most recent V-Dem dataset is available from the V-Dem data homepage in
preconfigured csv
, SPSS
, and
STATA
formats, however, there is a recommended package
available to R users available on GitHub. Installing the remote package
from GitHub requires devtools. As a non-standard package (not
on CRAN or Bioconductor), vdemdata can cause issues for certain
workflows, but you can use the demcon::get_vdem()
function
to directly download the latest dataset from vdemdata’s GitHub
repo without installing the non-standard package.
For this guide we’ll be using data.table, but all of these steps could be performed with dplyr and the greater tidyverse, or even base R if you’re a sadist. Lastly, to assist with country coding, we’ll be using the states package.
# If you would like to install the package over GitHub.
::install_github("vdeminstitute/vdemdata")} devtools
After the packages are installed load vdemdata, data.table, and states.
# library(vdemdata)
library(data.table)
library(states)
We’ll get the dataset with demcon::get_vdem()
.
<- demcon::get_vdem(write_out = FALSE)
vdem ::setDT(vdem) data.table
For the purposes of this guide we’ll focus on 2 widely used
high-level metrics from vdem: v2x_libdem
and
v2x_polyarchy
. The codebook can be filtered to provide
greater context.
<-c('v2x_libdem','v2x_polyarchy') metrics
The
V-Dem codebook reveals that these are 2 high level
(vartype==D
) democracy indices quantifying the extent of
electoral (v2x_polyarchy
) and liberal
(v2x_libdem
) democracy. Both metrics are continuous
variables bound by 0-1. In addition to our desired indices, we should
also subset the raw data for identification metrics such as country
names, observation year, coding schemes that assist with harmonizing
V-Dem data with other datasets, and indicators for country start and
stop dates to manage secessions, civil wars, etc..
<-c('country_name', 'COWcode','histname' ,'codingstart_contemp', 'codingend_contemp','year')
id.vars<-c(id.vars, metrics) vars
Now we can subset the raw data and toss what we don’t need.
<-vdem[, ..vars] vdem
We’ll perform a last bit of pruning for temporal considerations. V-Dem has a large historical record dating back to 1789. This is valuable data, but far greater than many practitioners or analysts require. More commonly, analyses will start just before or after key events; i.e. WWII, the Cold War, and the War on Terror. Practically speaking, when preparing historical country-year data, we are most concerned with the headaches brought on by coding nation-state secessions, independence, unifications, etc.
With this in mind, important periods to consider/avoid are: Sudan 2011, Yugoslavia/Kosovo/Serbia/Montenegro 2003-2008, Eritrea 1993, Czech/Slovakia 1993, the complicated Yugoslavian dissolution, and Cold War fallout 1989-1991. Sudan is usually an easy check, but Yugoslavia/Kosovo/Serbia/Montenegro are almost always a real pain to manage across multiple datasets and they usually must be included in the analysis. For the purpose of this guide we will subset our data to 1995 and investigate any issues associated with Yugoslavia/Kosovo/Serbia/Montenegro.
<- vdem[year>1950] vdem
The most important issue to address with country-year datasets is accurate annual country codes. This includes nation-state secessions and independence (Sudan, Yugoslavia), independently listed territories (Hong Kong, Puerto Rico, Guam, French Guiana), and states with limited international recognition (Kosovo, West Bank/Palestine, Taiwan). These issues afflict international datasets in a wide variety of ways. Before you attempt to “fix” these issues, it’s important to consider how they will be addressed in all the datasets required for your analysis. Do not spend copious amounts of time coding changes to Kosovo and the West Bank if they’re completely ignored in your other datasets of concern.
V-Dem contains Correlates of War (CoW; COWcode
) country
codes. This is a popular coding scheme that makes country-coding an
easier task. We’ll start be renaming the variable, because we will have
to manipulate it a lot.
names(vdem)[2]<-"cow"
The states package can serve as a reference to check
Correlates of War and Gleditsch and Ward country codes. Both are
embedded in the package and available with calls to
states::cowstates
or states::gwstates
. Let’s
start by checking if any CoW codes are missing.
unique(vdem[is.na(cow),country_name])
#> [1] "Palestine/West Bank" "Palestine/Gaza" "Somaliland"
#> [4] "Hong Kong"
It may seem like the easy way out, but these states are commonly ignored in popular environment-security datasets, and can usually be dropped from analysis. One dataset where they would be included is United Nations refugee and asylum seeker data, in which case, you would have to introduce ISO codes to harmonize them with other United Nations data. This could be done with minimal trouble using the countrycode package, but will likely lead to other issues.
library(countrycode)
:=countrycode::countrycode(cow,
vdem[, iso3origin = "cown",
destination = "iso3c")]
#> Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: 260, 265, 315, 345, 347, 511, 678, 680, 817
Now we go down the rabbit hole; who were matched unambiguously?
%in% c(260, 265, 315, 345, 347, 511, 678, 680, 817), unique(country_name)]
vdem[cow #> [1] "Yemen" "South Yemen"
#> [3] "Republic of Vietnam" "Kosovo"
#> [5] "Germany" "German Democratic Republic"
#> [7] "Czech Republic" "Serbia"
#> [9] "Zanzibar"
These require hard-coded fixes to their ISO3 values. This is beyond
the scope of the purpose of this vignette so we will drop the missing
cow
observations in V-Dem and move on, but I wanted to
illustrate the beginning of a country code black hole.
<- vdem[!is.na(cow)][, iso3:=NULL] vdem
Official CoW codes for Yugoslavia, Serbia, Montenegro, and Kosovo are
345, 345, 341, and 347, respectively. CoW maintains the 345 numeric AND
YUG character designations for Serbia after the Yugoslavia break. CoW
assigns Montenegro 341 starting in 2006 and Kosovo 347 in 2008 (review
these changes in states::cowstates
).
Check how V-Dem assigns these changes.
dcast(vdem[cow %in% c(345, 341, 347), .(country_name, cow, year)],
~cow, value.var = "country_name")
year#> Key: <year>
#> year 341 345 347
#> <num> <char> <char> <char>
#> 1: 1951 <NA> Serbia <NA>
#> 2: 1952 <NA> Serbia <NA>
#> 3: 1953 <NA> Serbia <NA>
#> 4: 1954 <NA> Serbia <NA>
#> 5: 1955 <NA> Serbia <NA>
#> 6: 1956 <NA> Serbia <NA>
#> 7: 1957 <NA> Serbia <NA>
#> 8: 1958 <NA> Serbia <NA>
#> 9: 1959 <NA> Serbia <NA>
#> 10: 1960 <NA> Serbia <NA>
#> 11: 1961 <NA> Serbia <NA>
#> 12: 1962 <NA> Serbia <NA>
#> 13: 1963 <NA> Serbia <NA>
#> 14: 1964 <NA> Serbia <NA>
#> 15: 1965 <NA> Serbia <NA>
#> 16: 1966 <NA> Serbia <NA>
#> 17: 1967 <NA> Serbia <NA>
#> 18: 1968 <NA> Serbia <NA>
#> 19: 1969 <NA> Serbia <NA>
#> 20: 1970 <NA> Serbia <NA>
#> 21: 1971 <NA> Serbia <NA>
#> 22: 1972 <NA> Serbia <NA>
#> 23: 1973 <NA> Serbia <NA>
#> 24: 1974 <NA> Serbia <NA>
#> 25: 1975 <NA> Serbia <NA>
#> 26: 1976 <NA> Serbia <NA>
#> 27: 1977 <NA> Serbia <NA>
#> 28: 1978 <NA> Serbia <NA>
#> 29: 1979 <NA> Serbia <NA>
#> 30: 1980 <NA> Serbia <NA>
#> 31: 1981 <NA> Serbia <NA>
#> 32: 1982 <NA> Serbia <NA>
#> 33: 1983 <NA> Serbia <NA>
#> 34: 1984 <NA> Serbia <NA>
#> 35: 1985 <NA> Serbia <NA>
#> 36: 1986 <NA> Serbia <NA>
#> 37: 1987 <NA> Serbia <NA>
#> 38: 1988 <NA> Serbia <NA>
#> 39: 1989 <NA> Serbia <NA>
#> 40: 1990 <NA> Serbia <NA>
#> 41: 1991 <NA> Serbia <NA>
#> 42: 1992 <NA> Serbia <NA>
#> 43: 1993 <NA> Serbia <NA>
#> 44: 1994 <NA> Serbia <NA>
#> 45: 1995 <NA> Serbia <NA>
#> 46: 1996 <NA> Serbia <NA>
#> 47: 1997 <NA> Serbia <NA>
#> 48: 1998 Montenegro Serbia <NA>
#> 49: 1999 Montenegro Serbia Kosovo
#> 50: 2000 Montenegro Serbia Kosovo
#> 51: 2001 Montenegro Serbia Kosovo
#> 52: 2002 Montenegro Serbia Kosovo
#> 53: 2003 Montenegro Serbia Kosovo
#> 54: 2004 Montenegro Serbia Kosovo
#> 55: 2005 Montenegro Serbia Kosovo
#> 56: 2006 Montenegro Serbia Kosovo
#> 57: 2007 Montenegro Serbia Kosovo
#> 58: 2008 Montenegro Serbia Kosovo
#> 59: 2009 Montenegro Serbia Kosovo
#> 60: 2010 Montenegro Serbia Kosovo
#> 61: 2011 Montenegro Serbia Kosovo
#> 62: 2012 Montenegro Serbia Kosovo
#> 63: 2013 Montenegro Serbia Kosovo
#> 64: 2014 Montenegro Serbia Kosovo
#> 65: 2015 Montenegro Serbia Kosovo
#> 66: 2016 Montenegro Serbia Kosovo
#> 67: 2017 Montenegro Serbia Kosovo
#> 68: 2018 Montenegro Serbia Kosovo
#> 69: 2019 Montenegro Serbia Kosovo
#> 70: 2020 Montenegro Serbia Kosovo
#> 71: 2021 Montenegro Serbia Kosovo
#> year 341 345 347
Thankfully the codes themselves are correct, however, V-Dem maintains independent listings for all three states even while they were unified under various arrangements between 1992-2006. The course of action here depends on your intended use and additional datasets. Taking the mean of Serbia and Montenegro (maybe even Kosovo) over this time period is one potential correction. For this guide, we will average Serbia and Montenegro. You may want to consider doing the same for Kosovo and Serbia or all 3 states.
for(i in 1995:2005) vdem[cow %in% c(341,345) & year==i, (metrics):=lapply(.SD, mean, na.rm = TRUE), .SDcols = metrics]
The coverage and coding for Kosovo is correct; it can be left if other data of interest recognizes the state.
Sudan (625) and South Sudan (626) split in 2011. Check them in V-Dem.
dcast(vdem[cow %in% c(625,626), .(country_name, cow, year)],year~cow, value.var = "country_name")
#> Key: <year>
#> year 625 626
#> <num> <char> <char>
#> 1: 1951 Sudan <NA>
#> 2: 1952 Sudan <NA>
#> 3: 1953 Sudan <NA>
#> 4: 1954 Sudan <NA>
#> 5: 1955 Sudan <NA>
#> 6: 1956 Sudan <NA>
#> 7: 1957 Sudan <NA>
#> 8: 1958 Sudan <NA>
#> 9: 1959 Sudan <NA>
#> 10: 1960 Sudan <NA>
#> 11: 1961 Sudan <NA>
#> 12: 1962 Sudan <NA>
#> 13: 1963 Sudan <NA>
#> 14: 1964 Sudan <NA>
#> 15: 1965 Sudan <NA>
#> 16: 1966 Sudan <NA>
#> 17: 1967 Sudan <NA>
#> 18: 1968 Sudan <NA>
#> 19: 1969 Sudan <NA>
#> 20: 1970 Sudan <NA>
#> 21: 1971 Sudan <NA>
#> 22: 1972 Sudan <NA>
#> 23: 1973 Sudan <NA>
#> 24: 1974 Sudan <NA>
#> 25: 1975 Sudan <NA>
#> 26: 1976 Sudan <NA>
#> 27: 1977 Sudan <NA>
#> 28: 1978 Sudan <NA>
#> 29: 1979 Sudan <NA>
#> 30: 1980 Sudan <NA>
#> 31: 1981 Sudan <NA>
#> 32: 1982 Sudan <NA>
#> 33: 1983 Sudan <NA>
#> 34: 1984 Sudan <NA>
#> 35: 1985 Sudan <NA>
#> 36: 1986 Sudan <NA>
#> 37: 1987 Sudan <NA>
#> 38: 1988 Sudan <NA>
#> 39: 1989 Sudan <NA>
#> 40: 1990 Sudan <NA>
#> 41: 1991 Sudan <NA>
#> 42: 1992 Sudan <NA>
#> 43: 1993 Sudan <NA>
#> 44: 1994 Sudan <NA>
#> 45: 1995 Sudan <NA>
#> 46: 1996 Sudan <NA>
#> 47: 1997 Sudan <NA>
#> 48: 1998 Sudan <NA>
#> 49: 1999 Sudan <NA>
#> 50: 2000 Sudan <NA>
#> 51: 2001 Sudan <NA>
#> 52: 2002 Sudan <NA>
#> 53: 2003 Sudan <NA>
#> 54: 2004 Sudan <NA>
#> 55: 2005 Sudan <NA>
#> 56: 2006 Sudan <NA>
#> 57: 2007 Sudan <NA>
#> 58: 2008 Sudan <NA>
#> 59: 2009 Sudan <NA>
#> 60: 2010 Sudan <NA>
#> 61: 2011 Sudan South Sudan
#> 62: 2012 Sudan South Sudan
#> 63: 2013 Sudan South Sudan
#> 64: 2014 Sudan South Sudan
#> 65: 2015 Sudan South Sudan
#> 66: 2016 Sudan South Sudan
#> 67: 2017 Sudan South Sudan
#> 68: 2018 Sudan South Sudan
#> 69: 2019 Sudan South Sudan
#> 70: 2020 Sudan South Sudan
#> 71: 2021 Sudan South Sudan
#> year 625 626
This is correct. Lastly, we should check V-Dem against our CoW
reference (states::cowstates
) to see if V-Dem is missing
any countries.
<-data.table::setDT(states::cowstates)
cowstates<-cowstates[end >= sprintf("%s-01-01", 1995)][!cowcode %in% vdem$cow]
missing_in_vdem::kable(missing_in_vdem) knitr
cowcode | cowc | country_name | start | end | microstate |
---|---|---|---|---|---|
31 | BHM | Bahamas | 1973-07-10 | 9999-12-31 | FALSE |
54 | DMA | Dominica | 1978-11-03 | 9999-12-31 | TRUE |
55 | GRN | Grenada | 1974-02-07 | 9999-12-31 | TRUE |
56 | SLU | St. Lucia | 1979-02-22 | 9999-12-31 | TRUE |
57 | SVG | St. Vincent and the Grenadines | 1979-10-27 | 9999-12-31 | TRUE |
58 | AAB | Antigua & Barbuda | 1981-11-01 | 9999-12-31 | TRUE |
60 | SKN | St. Kitts and Nevis | 1983-09-19 | 9999-12-31 | TRUE |
80 | BLZ | Belize | 1981-09-21 | 9999-12-31 | FALSE |
221 | MNC | Monaco | 1993-05-28 | 9999-12-31 | TRUE |
223 | LIE | Liechtenstein | 1990-09-18 | 9999-12-31 | TRUE |
232 | AND | Andorra | 1993-07-28 | 9999-12-31 | TRUE |
331 | SNM | San Marino | 1992-03-02 | 9999-12-31 | TRUE |
835 | BRU | Brunei | 1984-01-01 | 9999-12-31 | FALSE |
946 | KIR | Kiribati | 1999-09-14 | 9999-12-31 | TRUE |
947 | TUV | Tuvalu | 2000-09-05 | 9999-12-31 | TRUE |
955 | TON | Tonga | 1999-09-14 | 9999-12-31 | TRUE |
970 | NAU | Nauru | 1999-09-14 | 9999-12-31 | TRUE |
983 | MSI | Marshall Islands | 1991-09-17 | 9999-12-31 | TRUE |
986 | PAL | Palau | 1994-12-15 | 9999-12-31 | TRUE |
987 | FSM | Federated States of Micronesia | 1991-09-17 | 9999-12-31 | TRUE |
990 | WSM | Samoa | 1976-12-15 | 9999-12-31 | TRUE |
There is nothing of consequence here; these are mostly microstates that are commonly omitted from environment-security analysis. For simplicity, the remaining microstates included in V-Dem may be dropped unless you are carrying out a specialized analysis.
<- cowstates[microstate==TRUE,unique(cowcode)]
microstates <-vdem[!(cow %in% microstates)] vdem
Finally, we’ll examine V-Dem for duplicate country names to ensure we don’t miss any peculiarities.
<-unique(vdem[,.(country_name, cow)])
dupes# check for duplicate names across codes
table(duplicated(dupes$country_name))
#>
#> FALSE TRUE
#> 175 3
Excellent!
As previously covered, v2x_polyarchy
and
v2x_libdem
are 2 high level (vartype==D
)
democracy indices quantifying the extent of electoral and liberal
democracy in a given state. Both metrics are continuous variables bound
by 0-1. We can quickly check their distributions to get a better sense
of the data.
<-melt(vdem,
hist.datid.vars = c("cow", "year"),
measure.vars = c("v2x_libdem",
"v2x_polyarchy"),
variable.name = "metric",
value.name = "value")
::ggplot(hist.dat, ggplot2::aes(x=value))+
ggplot2::geom_histogram()+
ggplot2::facet_wrap(~metric)+
ggplot2::labs(title = "V-Dem Metric Distributions",
ggplot2x = "Value",
y= "Count")+
::theme_minimal()
ggplot2#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 106 rows containing non-finite values (stat_bin).
These look pretty good with (mostly) uniform converage. The warnings have tipped us off to a few missing values; let’s investigate further.
is.na(v2x_libdem) | is.na(v2x_polyarchy),.(unique(country_name), n=.N, last_year=max(year)), by=cow]
vdem[#> cow V1 n last_year
#> <num> <char> <int> <num>
#> 1: 482 Central African Republic 2 1965
#> 2: 860 Timor-Leste 48 1998
#> 3: 705 Kazakhstan 1 1990
#> 4: 565 Namibia 1 1979
#> 5: 701 Turkmenistan 1 1990
#> 6: 692 Bahrain 51 2001
There are only 11 missing values, but they should be investigated. First Timor-Leste.
==860, .(country_name, year, v2x_libdem, v2x_polyarchy)]
vdem[cow#> country_name year v2x_libdem v2x_polyarchy
#> <char> <num> <num> <num>
#> 1: Timor-Leste 1951 NA 0.016
#> 2: Timor-Leste 1952 NA 0.016
#> 3: Timor-Leste 1953 NA 0.016
#> 4: Timor-Leste 1954 NA 0.016
#> 5: Timor-Leste 1955 NA 0.016
#> 6: Timor-Leste 1956 NA 0.016
#> 7: Timor-Leste 1957 NA 0.016
#> 8: Timor-Leste 1958 NA 0.016
#> 9: Timor-Leste 1959 NA 0.016
#> 10: Timor-Leste 1960 NA 0.017
#> 11: Timor-Leste 1961 NA 0.017
#> 12: Timor-Leste 1962 NA 0.017
#> 13: Timor-Leste 1963 NA 0.017
#> 14: Timor-Leste 1964 NA 0.017
#> 15: Timor-Leste 1965 NA 0.017
#> 16: Timor-Leste 1966 NA 0.017
#> 17: Timor-Leste 1967 NA 0.017
#> 18: Timor-Leste 1968 NA 0.017
#> 19: Timor-Leste 1969 NA 0.017
#> 20: Timor-Leste 1970 NA 0.017
#> 21: Timor-Leste 1971 NA 0.017
#> 22: Timor-Leste 1972 NA 0.017
#> 23: Timor-Leste 1973 NA 0.047
#> 24: Timor-Leste 1974 NA 0.049
#> 25: Timor-Leste 1975 NA 0.069
#> 26: Timor-Leste 1976 NA 0.021
#> 27: Timor-Leste 1977 NA 0.076
#> 28: Timor-Leste 1978 NA 0.076
#> 29: Timor-Leste 1979 NA 0.076
#> 30: Timor-Leste 1980 NA 0.076
#> 31: Timor-Leste 1981 NA 0.076
#> 32: Timor-Leste 1982 NA 0.076
#> 33: Timor-Leste 1983 NA 0.076
#> 34: Timor-Leste 1984 NA 0.076
#> 35: Timor-Leste 1985 NA 0.076
#> 36: Timor-Leste 1986 NA 0.076
#> 37: Timor-Leste 1987 NA 0.076
#> 38: Timor-Leste 1988 NA 0.076
#> 39: Timor-Leste 1989 NA 0.076
#> 40: Timor-Leste 1990 NA 0.078
#> 41: Timor-Leste 1991 NA 0.078
#> 42: Timor-Leste 1992 NA 0.078
#> 43: Timor-Leste 1993 NA 0.078
#> 44: Timor-Leste 1994 NA 0.078
#> 45: Timor-Leste 1995 NA 0.078
#> 46: Timor-Leste 1996 NA 0.078
#> 47: Timor-Leste 1997 NA 0.078
#> 48: Timor-Leste 1998 NA 0.090
#> 49: Timor-Leste 1999 0.087 0.090
#> 50: Timor-Leste 2000 0.186 0.225
#> 51: Timor-Leste 2001 0.237 0.293
#> 52: Timor-Leste 2002 0.384 0.503
#> 53: Timor-Leste 2003 0.414 0.568
#> 54: Timor-Leste 2004 0.414 0.568
#> 55: Timor-Leste 2005 0.416 0.576
#> 56: Timor-Leste 2006 0.414 0.575
#> 57: Timor-Leste 2007 0.471 0.616
#> 58: Timor-Leste 2008 0.477 0.627
#> 59: Timor-Leste 2009 0.482 0.632
#> 60: Timor-Leste 2010 0.488 0.633
#> 61: Timor-Leste 2011 0.488 0.633
#> 62: Timor-Leste 2012 0.498 0.644
#> 63: Timor-Leste 2013 0.501 0.648
#> 64: Timor-Leste 2014 0.474 0.632
#> 65: Timor-Leste 2015 0.461 0.632
#> 66: Timor-Leste 2016 0.454 0.615
#> 67: Timor-Leste 2017 0.471 0.648
#> 68: Timor-Leste 2018 0.492 0.674
#> 69: Timor-Leste 2019 0.508 0.684
#> 70: Timor-Leste 2020 0.480 0.664
#> 71: Timor-Leste 2021 0.487 0.680
#> country_name year v2x_libdem v2x_polyarchy
They are missing v2x_libdem
for 1995-1998. These years
are during the Indonesian occupation and prior to their internationally
recognized independence. They can be ignored or dropped unless you have
a special use case.
Now Bahrain.
==692, .(country_name, year, v2x_libdem, v2x_polyarchy)]
vdem[cow#> country_name year v2x_libdem v2x_polyarchy
#> <char> <num> <num> <num>
#> 1: Bahrain 1951 NA 0.023
#> 2: Bahrain 1952 NA 0.023
#> 3: Bahrain 1953 NA 0.026
#> 4: Bahrain 1954 NA 0.026
#> 5: Bahrain 1955 NA 0.026
#> 6: Bahrain 1956 NA 0.026
#> 7: Bahrain 1957 NA 0.025
#> 8: Bahrain 1958 NA 0.025
#> 9: Bahrain 1959 NA 0.025
#> 10: Bahrain 1960 NA 0.023
#> 11: Bahrain 1961 NA 0.024
#> 12: Bahrain 1962 NA 0.024
#> 13: Bahrain 1963 NA 0.024
#> 14: Bahrain 1964 NA 0.024
#> 15: Bahrain 1965 NA 0.024
#> 16: Bahrain 1966 NA 0.024
#> 17: Bahrain 1967 NA 0.024
#> 18: Bahrain 1968 NA 0.024
#> 19: Bahrain 1969 NA 0.024
#> 20: Bahrain 1970 NA 0.024
#> 21: Bahrain 1971 NA 0.026
#> 22: Bahrain 1972 NA 0.076
#> 23: Bahrain 1973 NA 0.150
#> 24: Bahrain 1974 NA 0.126
#> 25: Bahrain 1975 NA 0.105
#> 26: Bahrain 1976 NA 0.051
#> 27: Bahrain 1977 NA 0.051
#> 28: Bahrain 1978 NA 0.051
#> 29: Bahrain 1979 NA 0.051
#> 30: Bahrain 1980 NA 0.049
#> 31: Bahrain 1981 NA 0.049
#> 32: Bahrain 1982 NA 0.049
#> 33: Bahrain 1983 NA 0.049
#> 34: Bahrain 1984 NA 0.049
#> 35: Bahrain 1985 NA 0.049
#> 36: Bahrain 1986 NA 0.049
#> 37: Bahrain 1987 NA 0.049
#> 38: Bahrain 1988 NA 0.049
#> 39: Bahrain 1989 NA 0.049
#> 40: Bahrain 1990 NA 0.049
#> 41: Bahrain 1991 NA 0.049
#> 42: Bahrain 1992 NA 0.049
#> 43: Bahrain 1993 NA 0.049
#> 44: Bahrain 1994 NA 0.049
#> 45: Bahrain 1995 NA 0.049
#> 46: Bahrain 1996 NA 0.049
#> 47: Bahrain 1997 NA 0.049
#> 48: Bahrain 1998 NA 0.049
#> 49: Bahrain 1999 NA 0.049
#> 50: Bahrain 2000 NA 0.067
#> 51: Bahrain 2001 NA 0.113
#> 52: Bahrain 2002 0.071 0.147
#> 53: Bahrain 2003 0.085 0.216
#> 54: Bahrain 2004 0.081 0.216
#> 55: Bahrain 2005 0.082 0.225
#> 56: Bahrain 2006 0.089 0.230
#> 57: Bahrain 2007 0.088 0.232
#> 58: Bahrain 2008 0.088 0.232
#> 59: Bahrain 2009 0.087 0.230
#> 60: Bahrain 2010 0.084 0.227
#> 61: Bahrain 2011 0.052 0.184
#> 62: Bahrain 2012 0.045 0.166
#> 63: Bahrain 2013 0.046 0.164
#> 64: Bahrain 2014 0.047 0.162
#> 65: Bahrain 2015 0.045 0.138
#> 66: Bahrain 2016 0.045 0.131
#> 67: Bahrain 2017 0.043 0.126
#> 68: Bahrain 2018 0.042 0.121
#> 69: Bahrain 2019 0.048 0.118
#> 70: Bahrain 2020 0.048 0.118
#> 71: Bahrain 2021 0.052 0.122
#> country_name year v2x_libdem v2x_polyarchy
Bahrain declared independence in 1971 and converted to a Constitutional Monarchy in 2001. The missing value in 2001 may pose an issue when trying to join on additional data sets. A simple fix would be to replace the 2001 value with the 2002 value. A more complicated fix would be some type of lead-in imputation. Let’s examine the time series.
::ggplot(vdem[cow==692], ggplot2::aes(x=year, y=v2x_libdem))+
ggplot2::geom_point(size = 2)+
ggplot2::labs(title="Bahrain Libdem Time Series",
ggplot2x = "Year",
y = "Libdem")+
::theme_minimal()
ggplot2#> Warning: Removed 51 rows containing missing values (geom_point).
There is a bit of a linear trend, but imputation would be more trouble than it’s worth. An adequate correction is to put in the 2002 value.
==692 & year==2001, v2x_libdem := vdem[cow==692 & year==2002, v2x_libdem]] vdem[cow
Before finishing, we will perform a few final processing steps. First, extract only the minimum number of variables.
<- vdem[, .(cow, year, v2x_libdem, v2x_polyarchy)] vdem
Next, set the year and CoW columns to integers.
<-c("cow", "year")
cols:=lapply(.SD, as.integer), .SDcols = cols] vdem[, (cols)
Lastly, if you are working with other colleagues, strip the data.table class from the object.
::setDF(vdem) data.table
And we’re finished. I hope you found this exercise informative, and please contact me with any questions, concerns, or tips.