The goal of package datazoom.amazonia is to facilitate access to official data about agriculture, deforestation, production and other activities in the Brazilian Amazon.The package provides functions that download and pre-process selected datasets. Our users should know that our functions pull data from databases from other institutions and therefore if one of these institutions decides to stop sharing their data with the public then there’s nothing we can do about it.
datazoom.amazonia::datasets_link()
#> # A tibble: 125 x 6
#> survey dataset sidra_code available_time available_geo link
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 PAM-IBGE all_crops 5457/all/all 1974-2020 Country, State,~ http~
#> 2 PAM-IBGE permanent_crops 1613/all/all 1974-2020 Country, State,~ http~
#> 3 PAM-IBGE temporary_crops 1612/all/all 1974-2020 Country, State,~ http~
#> 4 PAM-IBGE corn 839/all/all 2003-2020 Country, State,~ http~
#> 5 PAM-IBGE potato 1001/all/all 2003-2020 Country, State,~ http~
#> 6 PAM-IBGE peanut 1000/all/all 2003-2020 Country, State,~ http~
#> 7 PAM-IBGE beans 1002/all/all 2003-2020 Country, State,~ http~
#> 8 PAM-IBGE temporary_total 1612/c81/0 1974-2020 Country, State,~ http~
#> 9 PAM-IBGE pineapple 1612/c81/2688 1974-2020 Country, State,~ http~
#> 10 PAM-IBGE alfafa 1612/c81/40471 1974-1987 Country, State,~ http~
#> # ... with 115 more rows
You can install the released version of datazoom.amazonia
from CRAN with:
And the development version from GitHub with:
Before installing our development version, it is required to install first this package:
After installing it, you will be able to download our development version package.
1 - The Structure of the functions
Loads information on clearcut deforestation in the Legal Amazon and annual deforestation rates in the region. Data is available from 2000 to 2020.
There are four parameters in this function:
1. dataset = "prodes"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. time_period: available from 2000 to 2020
4. language: you can choose between portuguese and english
# Download raw data (raw_data = TRUE) from 2000 to 2005 (time_period = 2000:2005).
data <- load_prodes(dataset = "prodes",
raw_data = TRUE,
time_period = 2000:2005)
# Download treated data (raw_data = FALSE) from 2010 (time_period = 2010)
# in portuguese (language = 'pt').
data <- load_prodes(dataset = "prodes",
raw_data = FALSE,
time_period = 2010,
language = 'pt')
Loads information on change in forest cover in the Amazon.
There are four parameters in this function:
1. dataset: there are two options:
# information about Amazon: dataset = "deter_amz"
# information about Cerrado: dataset = "deter_cerrado"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. language: you can choose between portuguese and english
# Download raw data (raw_data = TRUE) from Amazonia (dataset = "deter_amz")
deter_amz <- load_deter(dataset = 'deter_amz',
raw_data = TRUE)
# Download treated data (raw_data = FALSE) from Cerrado (dataset = "deter_cerrado")
# in portuguese (language = 'pt')
deter_cer <- load_deter(dataset = 'deter_cerrado',
raw_data = FALSE,
language = "pt")
Loads information on forest degradation in the Brazilian Amazon, replaced by DETER-B in December 2016. Data is available from 2007 to 2016.
There are four parameters in this function:
1. dataset = "degrad"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. time_period: data is available from 2007 to 2016
4. language: you can choose between portuguese and english
# download raw data (raw_data = TRUE) related to forest degradation from 2010 to 2012 (time_period = 2010:2012).
data <- load_degrad(dataset = 'degrad',
raw_data = TRUE,
time_period = 2010:2012)
# download treated data (raw_data = FALSE) related to forest degradation from 2013 (time_period = 2013) in portuguese (language = "pt").
data <- load_degrad(dataset = 'degrad',
raw_data = FALSE,
time_period = 2013,
language = 'pt')
The Comex dataset gathers data extracted from Siscomex (Integrated System of Foreign Trade), which is a database containing information from all products that are imported to or exported from Brazil. Using data reported from the companies which are responsible for the process of transporting the products, the system adheres to internationally standardized nomenclatures, such as the Harmonized System and the Mercosul Common Nomenclature (which pertains to members of the Mercosul organization).
The data has a monthly frequency and is available starting from the year 1989. From 1989 to 1996, a different system of nomenclatures was adopted, but all conversions are available on a dictionary in the Comex website (https://www.gov.br/produtividade-e-comercio-exterior/pt-br/assuntos/comercio-exterior/estatisticas/base-de-dados-bruta/).
There are four parameters in this function:
1. dataset: there are three choices:
# "comex_export_mun": selects exports data by municipality
# "comex_import_mun": selects imports data by municipality
# "comex_export_prod": selects exports data by producer
# "comex_import_prod": selects imports data by producer
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. time_period: available starting from the year 1989
4. language: you can choose between portuguese and english
# download treated (raw_data = FALSE) exports data by municipality (dataset = "comex_export_mun") from 1997 to 2021 (time_period = 1997:2021)
data <- load_br_trade(dataset = "comex_export_mun",
raw_data = FALSE,
time_period = 1997:2021)
# download treated(raw_data = FALSE) imports data by municipality (dataset = "comex_import_mun") from 1997 to 2021 (time_period = 1997:2021)
data <- load_br_trade(dataset = "comex_import_mun",
raw_data = FALSE,
time_period = 1997:2021)
Loads information on gross domestic product at current prices, taxes, net of subsidies, on products at current prices and gross value added at current prices, total and by economic activity, and respective shares. Data is available at country, state and municipality level and from 2002 to 2018.
There are six parameters in this function:
1. dataset = "pibmunic"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. time_period: data is available from 2002 to 2018
4. geo_level: there are three options
# "country"
# "state"
# "municipality"
5. language: you can choose between portuguese and english
6. legal_amazon_only: setting the return of Legal Amazon Data (legal_amazon_only = TRUE) or Country Data (legal_amazon_only = FALSE)
# Download raw data (raw_data = TRUE) on gross domestic product
# (dataset = 'pibmunic') from the entire country (legal_amazon_only = FALSE)
# by state (geo_level = 'state') from 2012 (time_period = 2012)
data <- load_pibmunic(dataset = 'pibmunic',
raw_data = TRUE,
geo_level = 'state',
time_period = 2012,
legal_amazon_only = FALSE)
# Download treated data (raw_data = FALSE) on gross domestic product
# (dataset = 'pibmunic') from the Legal Amazon (legal_amazon_only = TRUE)
# by municipality (geo_level = 'municipality') from 2012 (time_period = 2012).
data <- load_pibmunic(dataset = 'pibmunic',
raw_data = FALSE,
geo_level = 'municipality',
time_period = 2012,
legal_amazon_only = TRUE)
Loads information on companies and other organizations and their respective formally constituted local units, registered with the CNPJ - National Register of Legal Entities. Data is available from 2006 to 2019.
There are seven parameters in this function:
1. dataset = "cempre"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. geo_level: "country", "state" or "municipality"
4. time_period: data is available from 2006 to 2019
5. language: you can choose between portuguese and english
6. sectors: defines if the data will be return separated by sectors (sectors = TRUE) or not (sectors = FALSE)
7. legal_amazon_only: setting the return of Legal Amazon Data (legal_amazon_only = TRUE) or Country´s Data (legal_amazon_only = FALSE)
# Download raw data (raw_data = TRUE) with the aggregation level being the country ( geo_level = "country") from 2008 to 2010 (time_period = 2008:2010).
# In this example, the user did not want to visualize data by sector (sectors = FALSE) and the user also did not want the data to be restricted to the Legal Amazon area (legal_amazon_only = FALSE).
data <- load_cempre(dataset = "cempre",
raw_data = TRUE,
geo_level = "country",
time_period = 2008:2010,
sectors = FALSE,
legal_amazon_only = FALSE)
# Download treted data (raw_data = FALSE) by state (geo_level = "state") from 2008 to 2010 (time_period = 2008:2010) in portuguese (language = "pt")..
# In this example, the user wanted to visualize data by sector (sectors = TRUE) and the user also wanted the data to be restricted to the Legal Amazon area (legal_amazon_only = TRUE).
data <- load_cempre(dataset = "cempre",
raw_data = FALSE,
geo_level = "state",
time_period = 2008:2010,
language = "pt",
sectors = TRUE,
legal_amazon_only = TRUE)
Municipal Agricultural Production (PAM, in Portuguese) is a nationwide annual survey conducted by IBGE (Brazilian Institute of Geography and Statistics) which provides information on agricultural products, such as quantity produced, area planted and harvested, average quantity of output and monetary value of such output. The products are divided in permanent and temporary farmed land, as well as dedicated surveys to the four products that yield multiple harvests a year (beans, potato, peanut and corn), which all sum to a total survey of 64 agricultural products (31 of temporary tillage and 33 of permanent tillage). Output, however, is only included in the dataset if the planted area occupies over 1 acre or if output exceeds one tonne.
There are five parameters in this function:
1. dataset: There are seven possible choices.
These datasets contain statistics such as quantity produced, area planted and harvested, average quantity of output and monetary value of such output.
# 'pam_all_crops': it selects data about agricultural products divided in permanent and temporary farmed land.
# 'pam_permanent_crops': it selects the data about agricultural products produced in permanent farmed land.
# 'pam_temporary_crops':it selects the data about agricultural products produced in temporary farmed land.
# 'pam_corn': it selects the data from the first and the second harvests of corn.
# 'pam_potato': it selects the data from the first, the second and the third harvests of potato.
# 'pam_peanut': it selects the data from the first and the second harvests of peanut.
# 'pam_beans': it selects the data from the first, the second and the third harvests of bean.
2. raw_data: there are two options:
# TRUE: if you want the data as it is in the IBGE's site.
# FALSE: if you want the treated (more organized) version of the data.
3. geo_level: there are four options:
# "country"
# "region"
# "state"
# "municipality"
4. time_period:
# For pam_all_crops, pam_permanent_crops and pam_temporary_crops, data is available from 1974 to 2019.
# For pam_corn, pam_potato, pam_peanut and pam_beans, data is avaiable from 2003 to 2019.
5. language: you can choose between portuguese and english
The datasets supported are shown in the tables below, made up of both the original databases and their narrower subsets. Note that downloading only specific crops is considerably faster. First, the datasets provided by IBGE in their entirety:
dataset |
---|
all_crops |
temporary_crops |
permanent_crops |
corn |
potato |
peanut |
beans |
Datasets generated from Temporary Crops:
dataset | Name (pt) | Name (eng) |
---|---|---|
pineapple | Abacaxi | Pineapple |
alfafa | Alfafa Fenada | Alfafa Fenada |
cotton_herbaceous | Algodao Herbaceo (em Caroco) | Herbaceous Cotton (in Caroco) |
garlic | Alho | Garlic |
peanut_temporary | Amendoim (em Casca) | Peanuts (in Shell) |
rice | Arroz (em Casca) | Rice (in husk) |
oats | Aveia (em Grao) | Oats (in grain) |
sweet_potato | Batata Doce | Sweet potato |
potato_temporary | Batata Inglesa | English potato |
sugar_cane | Cana de Acucar | Sugar cane |
forage_cane | Cana para Forragem | Forage cane |
onion | Cebola | Onion |
rye | Centeio (em Grao) | Rye (in grain) |
barley | Cevada (em Grao) | Barley (in Grain) |
pea | Ervilha (em Grao) | Pea (in Grain) |
broad_bean | Fava (em Grao) | Broad Bean (in Grain) |
beans_temporary | Feijao (em Grao) | Beans (in Grain) |
tobacco | Fumo (em Folha) | Smoke (in Sheet) |
sunflower_seeds | Girassol (em Grao) | Sunflower (in Grain) |
jute_fiber | Juta (Fibra) | Jute (Fiber) |
linen_seeds | Linho (Semente) | Linen (Seed) |
malva_fiber | Malva (Fibra) | Malva (Fiber) |
castor_bean | Mamona (Baga) | Castor bean (Berry) |
cassava | Mandioca | Cassava |
watermelon | Melancia | watermelon |
melon | Melao | Melon |
corn_temporary | Milho (em Grao) | corn (in grain) |
ramie_fiber | Rami (Fibra) | Ramie (Fiber) |
soybean | Soja (em Grao) | Soybean (in grain) |
sorghum | Sorgo (em Grao) | Sorghum (in Grain) |
tomato | Tomate | Tomato |
wheat | Trigo (em Grao) | Wheat in grain) |
triticale | Triticale (em Grao) | Triticale (in grain) |
temporary_total | Total | Total |
Datasets generated from Permanent Crops:
dataset | Name (pt) | Name (eng) |
---|---|---|
avocado | Abacate | Avocado |
cotton_arboreo | Algodao Arboreo (em Caroco) | Arboreo cotton (in Caroco) |
acai | Acai | Acai |
olive | Azeitona | Olive |
banana | Banana (Cacho) | Banana (Bunch) |
rubber_coagulated_latex | Borracha (Latex Coagulado) | Rubber (Coagulated Latex) |
rubber_liquid_latex | Borracha (Latex Liquido) | Rubber (Liquid Latex) |
cocoa_beans | Cacau (em Amendoa) | Cocoa (in Almonds) |
coffee_total | Cafe (em Grao) Total | Coffee (in Grain) Total |
coffee_arabica | Cafe (em Grao) Arabica | Cafe (in Grao) Arabica |
coffee_canephora | Cafe (em Grao) Canephora | Cafe (in Grain) Canephora |
cashew | Caju | Cashew |
khaki | Caqui | Khaki |
cashew_nut | Castanha de Caju | Cashew Nuts |
india_tea | Cha da India (Folha Verde) | India Tea (Leaf) |
coconut | Coco da Baia | Coconut |
coconut_bunch | Dende (Cacho de Coco) | Coconut Bunch |
yerba_mate | Erva Mate (Folha Verde) | Mate Herb (Leaf) |
fig | Figo | Fig |
guava | Goiaba | Guava |
guarana_seeds | Guarana (Semente) | Guarana (Seed) |
orange | Laranja | Orange |
lemon | Limao | Lemon |
apple | Maca | Apple |
papaya | Mamao | Papaya |
mango | Manga | Mango |
passion_fruit | Maracuja | Passion fruit |
quince | Marmelo | Quince |
walnut | Noz (Fruto Seco) | Walnut (Dry Fruit) |
heart_of_palm | Palmito | Palm heart |
pear | Pera | Pear |
peach | Pessego | Peach |
black_pepper | Pimenta do Reino | Black pepper |
sisal_or_agave | Sisal ou Agave (Fibra) | Sisal or Agave (Fiber) |
tangerine | Tangerina | Tangerine |
tung | Tungue (Fruto Seco) | Tung (Dry Fruit) |
annatto_seeds | Urucum (Semente) | Annatto (Seed) |
grape | Uva | Grape |
permanent_total | Total | Total |
# Download treated (raw_data = FALSE) data related to the production from permanent and temporary farmed lands (dataset = 'pam_all_crops') by state (geo_level = "state") from 1980 to 1990 (time_period = 1980:1990) in english (language = "eng").
data <- load_pam(dataset = 'all_crops',
raw_data = FALSE,
geo_level = "state",
time_period = 1980:1990,
language = "eng")
# Download raw data (raw_data = TRUE) related to the corn production (dataset = 'pam_corn') by municipality (geo_level = "municipality") from 2010 to 2012 (time_period = 2010:2012) in portuguese (language = "pt").
data <- load_pam(dataset = 'corn',
raw_data = TRUE,
geo_level = "municipality",
time_period = 2010:2012,
language = "pt")
Loads information on the amount and value of the production of the exploitation of native plant resources and planted forest massifs, as well as existing total and harvested areas of forest crops. Survey is done at the municipal level and data is available from 1986 to 2019.
There are five parameters in this function:
1. dataset: there are three choices:
# 'pevs_forest_crops': provides data related to both quantity and value of the forestry activities. The data goes from 1986 to 2019 and it is divided by type of product.
# 'pevs_silviculture': provides data related to both quantity and value of the silviculture. The data goes from 1986 to 2019 and it is divided by type of product.
# 'pevs_silviculture_area': total existing area used for silviculture in 12/31.The data goes from 2013 to 2019 and it is divided by forestry species.
2. raw_data: there are two options:
# TRUE: if you want the data as it is in the IBGE's site.
# FALSE: if you want the treated (more organized) version of the data.
3. geo_level: there are four options:
# "country"
# "region"
# "state"
# "municipality"
4. time_period: data goes from 1986 to 2019, except for the pevs_silviculture_area dataset where data is available from 2013 to 2019.
5. language: you can choose between portuguese and english
# Download treated (raw_data = FALSE) silviculture data (dataset = 'pevs_silviculture') by state (geo_level = 'state') from 2012 (time_period = 2012) in portuguese (language = "pt")
data <- load_pevs(dataset = 'pevs_silviculture',
raw_data = FALSE,
geo_level = 'state',
time_period = 2012,
language = "pt")
# Download raw (raw_data = TRUE) forest crops data (dataset = 'pevs_forest_crops') by municipality (geo_level = 'municipality') from 2012 to 2013 (time_period = 2012:2013) in english (language = "eng")
data <- load_pevs(dataset = 'pevs_forest_crops',
raw_data = TRUE,
geo_level = "municipality",
time_period = 2012:2013,
language = "eng")
Downloads data from PPM (“Pesquisa da Pecuária Municipal”). This survey contains information of the livestock inventories (e.g:cattle, pigs and hogs) in Brazilian Municipalities. It also provides information on the production of animal origin (e.g:output of milk, hen eggs, quail eggs, honey) and the value of the production during the reference year.
The periodicity of the survey is annual. The geographic coverage is national, with results released for Brazil, Major Regions, Federation Units, Mesoregions, Microregions and Municipalities.
There are five parameters in this function:
1. dataset: There are five possible choices.
# 'ppm_livestock_inventory'
# 'ppm_sheep_farming'
# 'ppm_animal_orig_production'
# 'ppm_cow_farming'
# 'ppm_aquaculture'
2. raw_data: there are two options:
# TRUE: if you want the data as it is in the IBGE's site.
# FALSE: if you want the treated (more organized) version of the data.
3. geo_level: there are four options:
# "country"
# "region"
# "state"
# "municipality"
4. time_period:
# For ppm_livestock_inventory, ppm_sheep_farming, ppm_animal_orig_production and ppm_cow_farming: data is avaiable from 1974 to 2019.
# For ppm_aquaculture: data is avaiable from 2013 to 2019
5. language: you can choose between portuguese and english
# Download treated data (raw_data = FALSE) about aquaculture (dataset = 'ppm_aquaculture') from 2013 to 2015 (time_period = 2013:2015) in english (language = "eng") with the level of aggregation being the country (geo_level = "country").
data <- load_ppm(dataset = 'ppm_aquaculture',
raw_data = FALSE,
geo_level = "country",
time_period = 2013:2015,
language = "eng")
# Download raw data (raw_data = TRUE) about sheep farming (dataset = 'ppm_sheep_farming') by state (geo_level = "state") from 1980 to 1995 (time_period = 1980:1995) in portuguese (language = "pt")
data <- load_ppm(dataset = 'ppm_sheep_farming',
raw_data = TRUE,
geo_level = "state",
time_period = 1980:1995,
language = "pt")
Loads information on the social and environmental performance of the Legal Amazon. Survey is done at the municipal level and data is available in 2014 and 2018.
There are four parameters in this function:
1. dataset = "ips"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. time_period: data is available in 2014 and 2018.
4. language: you can choose between portuguese and english
# Download raw data (raw_data = TRUE) from 2014 (time_period = 2014)
data <- load_ips(dataset = "ips", raw_data = TRUE, time_period = 2014)
# Download treated data (raw_data = FALSE) from 2018
# (time_period = 2018) in portuguese (language = 'pt')
data <- load_ips(dataset = "ips", raw_data = FALSE,
time_period = 2018, language = 'pt')
Loads data of estimates of emission of greenhouse gases of Brazilian cities and states.
According to the “SEEG Brasil” website: all five sectors that are sources of emissions - Agriculture, Energy, Land Use Change, Industrial Processes and Waste with the same degree of detail contained in the emissions inventories are evaluated. The data provided in SEEG’s Collection 9 is a series covering the period from 1970 to 2020, except for the Land Use Change Sector that has the series from 1990 to 2020.
There are four parameters in this function:
1. dataset: There are six choices:
# "seeg": provides all sectors in a same dataframe. Only works with raw_data = TRUE.
# "seeg_farming"
# "seeg_industry"
# "seeg_energy"
# "seeg_land"
# "seeg_residuals"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. geo_level: there are three options:
# "country"
# "state"
# "municipality"
4. language: you can choose between portuguese and english
# Download raw data (raw_data = TRUE) of greenhouse gases (dataset = "seeg") by municipality (geo_level = "municipality")
data <- load_seeg(dataset = "seeg",
raw_data = TRUE,
geo_level = "municipality")
# Download treated data (raw_data = FALSE) of industry greenhouse gases (dataset = "seeg_industry") by state (geo_level = "state")
data <- load_seeg(dataset = "seeg_industry",
raw_data = FALSE,
geo_level = "state")
# Download treated data (raw_data = FALSE) of energy greenhouse gases (dataset = "seeg_energy") by state (geo_level = "state")
data <- load_seeg(dataset = "seeg_energy",
raw_data = FALSE,
geo_level = "state")
# Download treated data (raw_data = FALSE) of land greenhouse gases (dataset = "seeg_land") by country (geo_level = "country")
data <- load_seeg(dataset = "seeg_land",
raw_data = FALSE,
geo_level = "country")
# Download raw data (raw_data = TRUE) of greenhouse gases (dataset = "seeg") by state (geo_level = "state")
data <- load_seeg(dataset = "seeg",
raw_data = TRUE,
geo_level = "state")
Downloads and compiles data on environmental fines at the municipality or state levels considering the Amazon region.
The function returns either the raw data or a data frame with aggregates considering, for each time-location period, counts for total the number of infractions, infractions that already went to trial, and number of unique perpetrators of infractions.
There are five parameters in this function:
1. dataset = "areas_embargadas"
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed version of the data.
3. geo_level: "municipality"
4. language: you can choose between portuguese and english
5. legal_amazon_only: setting the return of Legal Amazon Data (legal_amazon_only = TRUE) or Country´s Data (legal_amazon_only = FALSE)
# Download treated data (raw_data = FALSE) from the entire country
# (legal_amazon_only = FALSE) in english (language = "eng")
data <- load_ibama(dataset = "areas_embargadas", raw_data = FALSE,
language = "eng", legal_amazon_only = FALSE)
# Download raw data (raw_data = TRUE) from Legal Amazon region
# (legal_amazon_only = TRUE)
data <- load_ibama(dataset = "areas_embargadas", raw_data = TRUE,
legal_amazon_only = TRUE)
Loads information the mines being explored legally in Brazil, including their location, status, product being mined and area in square meters etc. Survey is done at municipal and state level. The National Mining Agency (ANM) is responsible for this survey.
There are three parameters in this function:
1. dataset = 'sigmine_active'
2. raw_data: there are two options:
# TRUE: if you want the raw data.
# FALSE: if you want the processed data.
3. language: you can choose between portuguese and english
# Download raw data (raw_data = TRUE).
data <- load_sigmine(dataset = 'sigmine_active', raw_data = TRUE)
# Download treated data (raw_data = FALSE) in portuguese (language = "pt").
data <- load_sigmine(dataset = 'sigmine_active',
raw_data = FALSE,
language = "pt")
The MAPBIOMAS project gathers data reporting the type of land covering each year by area, that is, for example, the area used for a temporary crop of soybeans. It also reports the transition between coverings during given years.
1. dataset: There are five possible choices.
'mapbiomas_cover': type of land covering by biomes, states and municipalities.
data is available from 1985 to 2020.
'mapbiomas_transition': transition between coverings by biomes, states and municipalities.
data is available from 1985 to 2020.
'mapbiomas_deforestation_regeneration': deforestation and regeneration by biomes and states.
data is available from 1988 to 2020.
'mapbiomas_irrigation': data about irrigation area by biomes and states.
data is available from 2000 to 2020
'mapbiomas_grazing_quality': data about grazing quality by biome, state and municipality
data is available from 2010 and 2018
2. raw_data: there are two options:
# TRUE: if you want the data as it is in the mapbiomas's site.
# FALSE: if you want the treated (more organized) version of the data.
3. geo_level: 'state', 'municipality', 'biomes'
Read the details on the datasets because some of them don't have data for 'municipality'
4. time_period: The data availability for each dataset is detailed above
5. language: you can choose between portuguese ('pt') and english ('eng')
6. time_id: 'year'
7. cover_level: Five options:
# cover_level = 0 has categories such as: Anthropic, Natural, Not Applied
# cover_level = 1 has categories such as: Forest, Non Forest Natural Formation, Farming, Non Vegetated Area, Water, Non Observed
# cover_level = 2 has categories such as: Agriculture, Aquaculture, Beach and Dune, Forest Plantation, Pasture, River, Lake and Ocean
# cover_level = 3 has categories such as: Aquaculture, Beach and Dune, Forest Formation, Forest Plantation
# cover_level = 4 has categories such as: Aquaculture, Beach and Dune, Forest Formation, Forest Plantation
# download treated Mapbiomas Cover data in english
data = load_mapbiomas(dataset = "mapbiomas_cover", raw_data = FALSE, time_period = "all",
language = "eng")
# download treated Mapbiomas Transition data in portuguese
data = load_mapbiomas(dataset = "mapbiomas_transition", raw_data = FALSE, time_period = "all",
language = "pt")
# download treated Mapbiomas Grazing Quality data in portuguese
data = load_mapbiomas(dataset = "mapbiomas_grazing_quality", raw_data = FALSE, time_period = "all", language = "pt")
# download treated Mapbiomas Irrigation data in english
data = load_mapbiomas(dataset = "mapbiomas_irrigation", raw_data = FALSE, time_period = "all", language = "eng")
# download treated Mapbiomas deforestauon/regeneration data in portuguese
data = load_mapbiomas(dataset = "mapbiomas_deforestation_regeneration", raw_data = FALSE, time_period = "all", language = "eng")
Loads Plataforma CIPÓ’s mappings on the organizations involved in combating environmental crimes. Each mapping consists on a listing of actors or agreements along with attributions and many descriptions.
There are two parameters in this function:
1. dataset: There are three choices:
# "brazilian_actors": mapping of Brazilian actors involved in preventing environmental crimes.
# "international_cooperation": mapping of international agreements, treaties and conventions.
# "forest_governance": mapping of forest governance arrangements.
2. search: Filters the dataset to the rows containing the chosen search parameter.
# download the spreacdsheet on Brazilian actors involved in fighting environmental crimes
brazilian_actors <- load_cipo(dataset = "brazilian_actors")
# searching only for entries containing IBAMA
actors_ibama <- load_cipo(dataset = "brazilian_actors",
search = "ibama")
# entries containing IBAMA or FUNAI
actors_ibama <- load_cipo(dataset = "brazilian_actors",
search = "ibama|funai")
Loads TerraClimate data by Climatology Lab, on many available climate variables. Data is avaliable from 1958 to 2020.
There are five parameters in this function:
1. dataset: picks the variable to be read. Possible options are shown in the table below.
2. raw_data: there are two options:
# TRUE: if you want the data with fewer manipulations.
# FALSE: if you want the treated (more organized) version of the data.
3. time_period: avaliable from 1958 to 2020
4. language: you can choose between portuguese ('pt') and english ('eng')
5. legal_amazon_only: setting the return of Legal Amazon Data (legal_amazon_only = TRUE) or Country´s Data (legal_amazon_only = FALSE)
Possible dataset choices:
Dataset | Code | Description | Units |
---|---|---|---|
max_temperature | tmax | Maximum 2-m Temperature | degC |
min_temperature | tmin | Minimum 2-m Temperature | degC |
wind_speed | ws | Wind Speed at 10-m | m/s |
vapor_pressure_deficit | vpd | Vapor Pressure Deficit | kPa |
vapor_pressure | vap | 2-m Vapor Pressure | kPa |
snow_water_equivalent | swe | Snow Water Equivalent at End of Month | mm |
shortwave_radiation_flux | srad | Downward Shortwave Radiation Flux at the Surface | W/m^2 |
soil_moisture | soil | Soil Moisture at End of Month | mm |
runoff | q | Runoff | mm |
precipitation | ppt | Accumulated Precipitation | mm |
potential_evaporation | pet | Reference Evapotranspiration | mm |
climatic_water_deficit | def | Climatic Water Deficit | mm |
water_evaporation | aet | Actual Evapotranspiration | mm |
palmer_drought_severity_index | PDSI | Palmer Drought Severity Index | unitless |
# Downloading maximum temperature data from 2000 to 2020
max_temp <- load_climate(dataset = "max_temperature", time_period = 2000:2020)
# Downloading precipitation data only for the legal Amazon in 2010
amz_precipitation <- load_climate(dataset = "precipitation",
time_period = 2010,
legal_amazon_only = TRUE)
Loads disaggregated data on bilateral trade flows for more than 5000 products and 200 countries.
There are four parameters in this function:
1. dataset: There is one choice:
# "HS92" which follows the Harmonized System method
2. raw_data: there are two options:
# TRUE: if you want the data as it is in the CEPII site.
# FALSE: if you want the treated (more organized) version of the data.
3. time_period: The data availability for each dataset is detailed above
4. language: you can choose between portuguese ('pt') and english ('eng')
# download raw data
raw_baci <- load_baci(dataset = "HS92", raw_data = TRUE, time_period = 2016)
# download treated data
clean_baci <- load_baci(dataset = "HS92", raw_data = FALSE, time_period = 2016,
language = "pt")
1. Bind Global Variables: The goal is to ensure that all the variables in the function were initialized to some value. We also do this to avoid errors when we check the function.
2. Define Basic Parameters: Create a list with all the parameters from the function. The list param will be an organized list with all the parameters of interest.
3. Download Data: In the majority of our functions, we download data by using external_download(). However, when we download data from IBGE, we use a function called sidra_download(). Both of these functions can be found in the “download.R” file.
4. Data Engineering: In this section of the code, we (i) exclude variables that we judge not to be relevant;(ii) sometimes we change the class of some variables; (iii) sometimes we change data to be organized in the long format or in the wide format depending on what we want; (iv) generally speaking, it’s in this part of the code that we make the most changes in the original Data Frame.
5. Harmonizing Variable Names: Rename columns with better names.
6. Load Dictionary: In the functions that work with IBGE’s data, we use the function “load_dictionary()”. This function creates an organized correspondence between the code of each product, its name, its unit of measure and other attributes.
7. Translation / add variables: After having organized the Data Frame, we then translate it. In some functions, the translation will start in a section called “Labelling” and data from the “dictionary.R” file will be used. In other functions, you will see the names of the columns being translated first and then each line of the original Data Frame will be translated.
8. Return Data Frame: In the structure of our functions, you will see (raw_data == TRUE){return(dat)} right after “Downloading Data”. All the changes explained in this document will only happen in case the user specifies (raw_data == FALSE).
1. Bind Global Variables: example from load_cempre()
sidra_code <- available_time <- AMZ_LEGAL <- municipio_codigo <- ano <- ano_codigo <- classificacao_nacional_de_atividades_economicas_cnae_2_0_codigo <- geo_id <- id_code <- nivel_territorial <- nivel_territorial_codigo <- valor <- variavel <- unidade_de_medida <- unidade_de_medida_codigo <- NULL
2. Define Basic Parameters: example from load_deter()
# param=list()
# param$dataset = dataset
# param$time_period = time_period
# param$language = language
# param$raw_data = raw_data
# param$survey_name = datasets_link() %>%
# dplyr::filter(dataset == param$dataset) %>%
# dplyr::select(survey) %>%
# unlist()
# param$url = datasets_link() %>%
# dplyr::filter(dataset == param$dataset) %>%
# dplyr::select(link) %>%
# unlist()
3. Download Data: example from load_degrad(). It uses the external_download() function.
# dat = suppressWarnings(as.list(param$time_period) %>%
# purrr::map(
# function(t){external_download(dataset = param$dataset,
# source='degrad', year = t) %>%
# janitor::clean_names()
# }
# ))
4. Data Engineering: example from load_pam(). In this process, we decided to exclude some columns and convert the variable “valor” to become numeric. After that we excluded all the lines with NA.
# dat = dat %>%
# janitor::clean_names() %>%
# dplyr::mutate_all(function(var){stringi::stri_trans_general(str=var,id="Latin-ASCII")})# %>%
# dplyr::mutate_all(clean_custom)
# dat = dat %>%
# dplyr::select(-c(nivel_territorial_codigo,nivel_territorial,ano_codigo)) %>%
# dplyr::mutate(valor=as.numeric(valor))
# dat = dat %>%
# dplyr::filter(!is.na(valor))
5. Harmonizing Variable Names: example from load_pam(). We localize some datasets by using their numerical codes and within each of these datasets we renamed some columns.
# if (param$code == 5457){
# dat = dat %>%
# dplyr::rename(produto_das_lavouras_codigo = produto_das_lavouras_temporarias_e_permanentes_codigo,
# produto_das_lavouras = produto_das_lavouras_temporarias_e_permanentes)
# }
# if (param$code == 1613){
# dat = dat %>%
# dplyr::rename(produto_das_lavouras_codigo = # produto_das_lavouras_permanentes_codigo,
# produto_das_lavouras = produto_das_lavouras_permanentes)
# }
# if (param$code %in% c(839,1000,1001,1002,1612)){
# dat = dat %>%
# dplyr::rename(produto_das_lavouras_codigo = # produto_das_lavouras_temporarias_codigo,
# produto_das_lavouras = produto_das_lavouras_temporarias)
# }
6. Load Dictionary: example from load_pam(). For functions with data from IBGE, we load the dictionary and then we convert the variable “var_code” to become a character. Finally we exclude the observations where var_code == “0”.
# dic = load_dictionary(param$dataset)
# types = as.character(dic$var_code)
# types = types[types != "0"]
7. Translation / add variables: example from load_degrad().This section translates the names of the columns of the original Data Frame.In this example, the original columns (variables) were in English and therefore we translated it to Portuguese in case the user chooses it.
# if (param$language == 'pt'){
# dat_mod = dat %>%
# dplyr::select(ano = year, linkcolumn, scene_id,
# cod_uf = code_state, cod_municipio = code_muni,
# classe = class_name, pathrow, area, data = view_date,
# julday, geometry
# ) %>%
# dplyr::arrange(ano, cod_municipio, classe)
# }
DataZoom is developed by a team at Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), Department of Economics. Our official website is at: http://www.econ.puc-rio.br/datazoom/.