2. Data manipulation with cubble

Global Historical Climatology Network (GHCN) provides daily climate measures from stations across the world. The dataset tmax_hist extracts the maximum temperature for 39 Australian stations in the period 1970 - 1975 and 2016 - 2020 from the full Australian historical records starting from year 1969. tmax_hist is already in a cubble, with id as the key, yearmonth as the index, and c(longitude, latitude) as the coordinates. This vignette shows the basic manipulation of cubble in an example to compares the maximum temperature in these two periods for weather stations in Victoria, Australia.

Spatial manipulation

Victoria stations can be subset by the station number: Australia GHCN station number starts with “ASN00” and followed by the Bureau of Meteorology (BOM) station number, where the 2nd and 3rd digit (7th and 8th in the GHCN number) define the state of the station. Victoria stations start from 76 to 90 and filtering Victoria stations is an operation in the spatial dimension. Hence we want to use the nested form:

tmax <- tmax_hist %>% 
  filter(between(stringr::str_sub(id, 7, 8), 76, 90))

tmax
#> # cubble:   id [24]: nested form
#> # bbox:     [142, -39.13, 149.92, -34.23]
#> # temporal: yearmonth [mth], tmax [dbl]
#>    id          latitude longitude elevation name                 wmo_id ts      
#>    <chr>          <dbl>     <dbl>     <dbl> <chr>                 <dbl> <list>  
#>  1 ASN00076031    -34.2      142.      50   mildura airport       94693 <tibble>
#>  2 ASN00076047    -35.1      142.      50.3 ouyen (post office)   94831 <tibble>
#>  3 ASN00076064    -35.1      142.     105   walpeup research      95831 <tibble>
#>  4 ASN00079028    -36.7      142.     133   longerenong           95835 <tibble>
#>  5 ASN00080015    -36.2      145.      96   echuca  aerodrome     94861 <tibble>
#>  6 ASN00080023    -35.7      144.      77.7 kerang                94844 <tibble>
#>  7 ASN00080091    -36.3      145.     105   kyabram dpi           95833 <tibble>
#>  8 ASN00081049    -36.4      145.     114   tatura inst sustain…  95836 <tibble>
#>  9 ASN00082039    -36.1      147.     175   rutherglen research   95837 <tibble>
#> 10 ASN00084016    -37.6      150.      15.2 gabo island lightho…  94933 <tibble>
#> # … with 14 more rows

Temporal manipulation

We would also like to summarise the daily recording into monthly to smooth the maximum temperature and since it is also a time dimension operation, we can keep using the long form:

tmax <- tmax %>% 
  face_temporal() %>% 
  group_by(
    month = lubridate::month(yearmonth),
    group = as.factor(ifelse(lubridate::year(yearmonth) > 2015, "2016 ~ 2020","1971 ~ 1975"))) %>%  
  summarise(tmax = mean(tmax, na.rm = TRUE))

tmax
#> # cubble:  month, group, id [24]: long form
#> # bbox:    [142, -39.13, 149.92, -34.23]
#> # spatial: latitude [dbl], longitude [dbl], elevation [dbl], name [chr], wmo_id
#> #   [dbl]
#>    id          month group        tmax
#>    <chr>       <dbl> <fct>       <dbl>
#>  1 ASN00076031     1 1971 ~ 1975  30.8
#>  2 ASN00076031     1 2016 ~ 2020  34.6
#>  3 ASN00076031     2 1971 ~ 1975  31.0
#>  4 ASN00076031     2 2016 ~ 2020  32.6
#>  5 ASN00076031     3 1971 ~ 1975  27.7
#>  6 ASN00076031     3 2016 ~ 2020  30.1
#>  7 ASN00076031     4 1971 ~ 1975  23.4
#>  8 ASN00076031     4 2016 ~ 2020  25.2
#>  9 ASN00076031     5 1971 ~ 1975  19.4
#> 10 ASN00076031     5 2016 ~ 2020  19.6
#> # … with 542 more rows

Back to spatial

A data quality issue with the rnoaa data is that while it records the first and last year recorded of each series without the period of missingness. For example, station ASN00085279 starts it first record in 1943, pauses for a period from 1946 to 1982, and then resumes it recording till today. We would like to remove stations like this by examining whether the summarised time series has a total of 24 months. This is a station-wise operation and face_spatial() can be used to convert the cubble back to the nested form:

tmax <- tmax %>%  face_spatial() %>%  filter(nrow(ts) == 24)
tmax
#> # cubble:   id [22]: nested form
#> # bbox:     [142, -39.13, 149.92, -34.23]
#> # temporal: month [dbl], group [fct], tmax [dbl]
#>    id          latitude longitude elevation name                 wmo_id ts      
#>    <chr>          <dbl>     <dbl>     <dbl> <chr>                 <dbl> <list>  
#>  1 ASN00076031    -34.2      142.      50   mildura airport       94693 <tibble>
#>  2 ASN00076047    -35.1      142.      50.3 ouyen (post office)   94831 <tibble>
#>  3 ASN00076064    -35.1      142.     105   walpeup research      95831 <tibble>
#>  4 ASN00079028    -36.7      142.     133   longerenong           95835 <tibble>
#>  5 ASN00080015    -36.2      145.      96   echuca  aerodrome     94861 <tibble>
#>  6 ASN00080023    -35.7      144.      77.7 kerang                94844 <tibble>
#>  7 ASN00080091    -36.3      145.     105   kyabram dpi           95833 <tibble>
#>  8 ASN00081049    -36.4      145.     114   tatura inst sustain…  95836 <tibble>
#>  9 ASN00082039    -36.1      147.     175   rutherglen research   95837 <tibble>
#> 10 ASN00084016    -37.6      150.      15.2 gabo island lightho…  94933 <tibble>
#> # … with 12 more rows

The final step

In the glyph map we are about to create, the four variables (major and minor axis for x and y) need to be in the same table. In cubble, you can move time invariant variable into the long form with unfold():

tmax <- tmax %>%  face_temporal() %>%  unfold(latitude, longitude)
tmax
#> # cubble:  month, group, id [22]: long form
#> # bbox:    [142, -39.13, 149.92, -34.23]
#> # spatial: latitude [dbl], longitude [dbl], elevation [dbl], name [chr], wmo_id
#> #   [dbl]
#>    id          month group        tmax latitude longitude
#>    <chr>       <dbl> <fct>       <dbl>    <dbl>     <dbl>
#>  1 ASN00076031     1 1971 ~ 1975  30.8    -34.2      142.
#>  2 ASN00076031     1 2016 ~ 2020  34.6    -34.2      142.
#>  3 ASN00076031     2 1971 ~ 1975  31.0    -34.2      142.
#>  4 ASN00076031     2 2016 ~ 2020  32.6    -34.2      142.
#>  5 ASN00076031     3 1971 ~ 1975  27.7    -34.2      142.
#>  6 ASN00076031     3 2016 ~ 2020  30.1    -34.2      142.
#>  7 ASN00076031     4 1971 ~ 1975  23.4    -34.2      142.
#>  8 ASN00076031     4 2016 ~ 2020  25.2    -34.2      142.
#>  9 ASN00076031     5 1971 ~ 1975  19.4    -34.2      142.
#> 10 ASN00076031     5 2016 ~ 2020  19.6    -34.2      142.
#> # … with 518 more rows

Glyph map

The cubble package also provides a geom_glyph() interface to plot the glyph map. The geometry asks for x_major, y_major, x_minor, and y_minor as the required aesthetics:

nsw_vic <- ozmaps::abs_ste %>%  filter(NAME %in% c("Victoria"))

ggplot() + 
  geom_sf(data = nsw_vic, fill = "transparent", color = "grey", linetype = "dotted") + 
  geom_glyph(data = tmax, 
             aes(x_major = longitude, x_minor = month, 
                 y_major = latitude, y_minor = tmax,
                 group = interaction(id, group), color = group),
             width = 1, height = 0.4) +
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() + 
  coord_sf(xlim = c(141, 150)) + 
  theme(legend.position = "bottom") +
  labs(x = "Longitude", y = "Latitude")

Remember that Australia is in the southern hemisphere, so the temperature curves are in U shapes :)