It is crucial to assess the indivudal exposure of environmental factors at different exposure periods (i.e, progestation and second trimester), so as to accurately assess how the air pollutant exposure work on the certain outcomes (i.e, birth defect and preterm labor). Compared with the traditional way which only uses the average concnetration among the whole monitoring sites, the EnvExpInd
package taken the spatial heterogeneity into consideration and obtained a more accurate assessment of environmental exposure.
Generally, three datasets including the concentration of environmental factors at each time point, individual information and the environmental monitoring sites should be prepared. Here, we take a simple example to illustrate how we can use the EnvExpInd packageto estimate the individual exposure of air pollutants.
The package hasn’t been published in R-Project,whereas, you can install it from the github (Spatial-R):
devtools::install_github("Spatial-R/EnvExpInd")
After you have installed the EnvExpInd package, you can load the “envid” dataset in the package using the following codes: data("envind)
. The envid dataset contains the information about the air pollutants, montoring site and the detailed information for each patient. In order to omit the potential error in the following procedure, you should make sure that the variables regarding date information in the air pollutants and patient dataset should be formated as “date/time”.
library(sp);
library(gstat);
library(EnvExpInd)
data("envind")
individual_data_tem$date <- as.Date(individual_data_tem$date);
pollutant_data$date <- as.Date(pollutant_data$date)
pollutant_data_tem <- pollutant_data
The individual_data_tem contains at least three variables: id (unique ID for the each patient), date (date of outcome occured, formated as “2014-10-20”) and address (detailed address for each individual).
The pollutant_data also contains at least three variables: date (date of montoring, formated as “2014-10-20”), site_name (the name for the monitoring site) and concentration of certain air pollutants. (Notes: There were some missing values in the pullutant dataset, therefore, you should make some interpolations to fill with)
The site_data contains at least two variables: site_name (the name for the monitoring site) and address (detailed address for the montoring site).
There are several ways to get the longitude and latitude information based on the detailed address: Baidu, Google, and Amap. For the user in China, you can use the API of Baidu and Amap to transform the address into the longitude and latitude. Here, we provide a simple function get_latlon_china to help the uses to realise the process.
(Note: Daily call limitation for each api_key was 6000)
site_data <- get_latlon_china(data = site_data, add_var = "address", api_key = "your key")
individual_data_tem <- get_latlon_china(data = individual_data_tem, add_var = "address", api_key = "your key")
Generally, there were two ways to calculate the environmental exposure on the individual level: simple or complicated. The simple one here is that we choose the air pollutant concentration in the nearest monitoring site as the reference for each individual; however, for the complicated one, we use the spatial interpolation method (i.e, inverse distance weighted, Ordinary kriging, and land use regression) to assess the air pollutant concentration. In the current version of EnvExpInd, there are only two spatial interpolation methods available: inverse distance weighted and kriging.
Two improtant functions can be used to realised this aim: the function get_refrence_id_simple
and expoure_estimate_simple
. The function get_refrence_id_simple
is used to match the nearest monitoring site for each individual. There were eight arugments (four arugments for the individual dataset and four arugments for the monitoring sites) that we should define firstly:
individual_data
: dataframe with at least three variables: longitude, latitude and unique id.individual_lat
: column name of latitude for each individual.individual_lon
: column name of longitude for each individual.individual_id
: column name of unique id for each individual.site_data
: dataframe with at least three variables: longitude, latitude and unique id.site_lat
: column name of latitude for each monitoring site.site_lon
: column name of longitude for each monitoring site.site_id
: column name of unique id for each monitoring site.The codes are as followed:
individual_data_tem$refrence_id <- get_refrence_id_simple(individual_data = individual_data_tem,
individual_lat = "lat",
individual_lon ="lon",
individual_id = "id",
site_data = site_data,
site_lat = "lat",
site_lon = "lon",
site_id = "site")
kable(head(individual_data_tem[,-3]))
After that, we can use the function expoure_estimate_simple
to get the exposure during the study period:
pollutant_data_tem$date <- as.Date(pollutant_data_tem$date)
individual_data_tem$date <- as.Date(individual_data_tem$date)
exposure.simple <- expoure_estimate_simple(individual_data = individual_data_tem,
individual_id = "id",
refrence_id = "refrence_id",
exposure_date = "date",
pollutant_data = pollutant_data_tem,
pollutant_site = "site.name",
pollutant_date = "date",
pollutant_name = c("PM10","SO2"),
estimate_interval = c(0:1))
kable(head(exposure.simple$PM10),digits= 1) #### PM10 estimation
id | day.0 | day.1 |
---|---|---|
2564 | 202 | 206 |
2563 | 206 | 220 |
2562 | 206 | 220 |
2561 | 230 | 200 |
2560 | 212 | 146 |
2559 | 176 | 136 |
kable(head(exposure.simple$SO2),digits= 1) #### SO2 estimation
id | day.0 | day.1 |
---|---|---|
2564 | 27 | 47 |
2563 | 47 | 60 |
2562 | 47 | 60 |
2561 | 82 | 52 |
2560 | 54 | 72 |
2559 | 44 | 52 |
The spatial interpolation method can be realised using the gstat package. In the EnvExpInd package, we can use exposure_estimate_idw
function to estimate the individual level exposure based on the inverse distance weighted method.
pollutant_data_tem_idw <- merge(pollutant_data_tem,site_data[,c(1,3,4)],by.x = "site.name",by.y = "site", all.x = T)
exposure.idw <- exposure_estimate_idw(individual_data = individual_data_tem,
individual_id = "id",
exposure_date ="date",
individual_lat ="lat",
individual_lon ="lon",
pollutant_data = pollutant_data_tem_idw,
pollutant_date = "date",
pollutant_site_lat = "lat",
pollutant_site_lon = "lon",
pollutant_name = c("PM10","SO2"),
estimate_interval = c(0:1))
kable(head(exposure.idw$PM10),digits = 1) #### PM10 estimation
id | day.0 | day.1 |
---|---|---|
2564 | 196.1 | 198.6 |
2563 | 220.2 | 198.6 |
2562 | 227.0 | 194.7 |
2561 | 191.0 | 224.1 |
2560 | 141.6 | 186.6 |
2559 | 136.1 | 176.1 |
kable(head(exposure.idw$SO2),digits= 1) #### SO2 estimation
id | day.0 | day.1 |
---|---|---|
2564 | 52.9 | 33.1 |
2563 | 72.1 | 53.0 |
2562 | 73.3 | 53.2 |
2561 | 50.8 | 81.0 |
2560 | 58.7 | 44.5 |
2559 | 51.9 | 44.0 |
Ordinary kriging to estimate the environmental exposure can be realised by the function exposure_estimate_krige
. Generally, you should define the semivariogram firstly.
example.date <- range(pollutant_data_tem$date)[2]
test.pollutant <- filter(pollutant_data_tem,date == example.date)[,c(2,5)]
test.pollutant <- merge(test.pollutant,site_data,by.x = "site.name",by.y = "site")
coordinates(test.pollutant) <- ~lat + lon
m <- fit.variogram(variogram(PM10~1, test.pollutant), vgm(1, "Sph", 200, 1))
estimate.krige <- exposure_estimate_krige(individual_data = individual_data_tem,
individual_id = "id",
exposure_date = "date",
individual_lat = "lat",
individual_lon = "lon",
pollutant_data = pollutant_data_tem_idw,
pollutant_date = "date",
pollutant_site_lat = "lat",
pollutant_site_lon = "lon",
pollutant_name = c("PM10","SO2"),
estimate_interval = c(0:1),
krige_model = m,
nmax = 7,
krige_method = "med")
kable(head(estimate.krige$PM10),digits = 1) #### PM10 estimation
id | day.0 | day.1 |
---|---|---|
2564 | 194 | 198 |
2563 | 220 | 198 |
2562 | 220 | 194 |
2561 | 192 | 220 |
2560 | 136 | 178 |
2559 | 136 | 178 |
kable(head(estimate.krige$SO2),digits = 1) #### SO2 estimation
id | day.0 | day.1 |
---|---|---|
2564 | 58.0 | 35.0 |
2563 | 82.0 | 58.0 |
2562 | 80.0 | 58.0 |
2561 | 46.7 | 82.0 |
2560 | 52.0 | 46.7 |
2559 | 52.0 | 46.7 |
Li Wu, Lei Jin, Tingming Shi, Bing Zhang, et.al. Association between ambient particulate matter exposure and semen quality in Wuhan, China, Environment International, 98, 2017,219-228.
Tao Liu, Min Kang, Bing Zhang, et.al. Independent and interactive effects of ambient temperature and absolute humidity on the risks of avian influenza A(H7N9) infection in China. Science of The Total Environment, 2018, 619-620: 1358-1365.
EnvExpInd is a package to estimate the environmental exposure on the individual level. If you have found any bugs, please set up an issue here.