Switching between space and time: Spatio-temporal analysis with
cubble

H. Sherry Zhang

Monash University, Australia

2023 May 02

Hi!

  • A final year PhD student in the Department of Econometrics and Business Statistics

  • My research centers on exploring multivariate spatio-temporal data with data wrangling and visualisation tool.

  • Find me on

    • Twitter: huizezhangsh,
    • GitHub: huizezhang-sherry, and
    • https://huizezhangsh.netlify.app/

Spatio-temporal data

People can talk about a whole range of different things when they refer to their data as spatio-temporal!

The focus of today will be on vector data.

Example of vector data

Physical sensors that measure the temperature, rainfall, and wind speed & direction

Australian weather station data:

stations
# A tibble: 88 × 6
  id            lat  long  elev name              wmo_id
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl>
1 ASN00001006 -15.5  128.   3.8 wyndham aero       95214
2 ASN00002032 -17.0  128. 203   warmun             94213
3 ASN00003080 -17.6  124.  77.5 curtin aero        94204
4 ASN00005007 -22.2  114.   5   learmonth airport  94302
5 ASN00006044 -25.9  114.   9   denham             94402
# … with 83 more rows

ts
# A tibble: 32,208 × 5
  id          date        prcp  tmax  tmin
  <chr>       <date>     <dbl> <dbl> <dbl>
1 ASN00001006 2020-01-01   164  38.3  25.3
2 ASN00001006 2020-01-02     0  40.6  30.5
3 ASN00001006 2020-01-03    16  39.7  27.2
4 ASN00001006 2020-01-04     0  38.2  27.3
5 ASN00001006 2020-01-05     2  39.3  26.7
# … with 32,203 more rows

What’s available for spatio-temporal data? - stars

Cubble: a spatio-temporal vector data structure

Cubble: a spatio-temporal vector data structure

Cubble is a nested object built on tibble that allow easy pivoting between spatial and temporal form.

Cast your data into a cubble

(weather <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, coords = c(long, lat)
))
# cubble:   id [88]: nested form
# bbox:     [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
  id            lat  long  elev name              wmo_id ts                
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>            
1 ASN00001006 -15.5  128.   3.8 wyndham aero       95214 <tibble [366 × 4]>
2 ASN00002032 -17.0  128. 203   warmun             94213 <tibble [366 × 4]>
3 ASN00003080 -17.6  124.  77.5 curtin aero        94204 <tibble [366 × 4]>
4 ASN00005007 -22.2  114.   5   learmonth airport  94302 <tibble [366 × 4]>
5 ASN00006044 -25.9  114.   9   denham             94402 <tibble [366 × 4]>
# … with 83 more rows
  • the spatial data (stations) can be an sf object and temporal data (ts) can be a tsibble object.

Switch between the two forms

long form

(weather_long <- weather %>% 
  face_temporal())
# cubble:  date, id [88]: long form
# bbox:    [113.53, -43.49, 153.64, -10.58]
# spatial: lat [dbl], long [dbl], elev [dbl],
#   name [chr], wmo_id [dbl]
  id          date        prcp  tmax  tmin
  <chr>       <date>     <dbl> <dbl> <dbl>
1 ASN00001006 2020-01-01   164  38.3  25.3
2 ASN00001006 2020-01-02     0  40.6  30.5
3 ASN00001006 2020-01-03    16  39.7  27.2
4 ASN00001006 2020-01-04     0  38.2  27.3
5 ASN00001006 2020-01-05     2  39.3  26.7
# … with 32,203 more rows

back to the nested form:

(weather_back <- weather_long %>% 
   face_spatial())
# cubble:   id [88]: nested form
# bbox:     [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl],
#   tmin [dbl]
  id         lat  long  elev name  wmo_id ts      
  <chr>    <dbl> <dbl> <dbl> <chr>  <dbl> <list>  
1 ASN0000… -15.5  128.   3.8 wynd…  95214 <tibble>
2 ASN0000… -17.0  128. 203   warm…  94213 <tibble>
3 ASN0000… -17.6  124.  77.5 curt…  94204 <tibble>
4 ASN0000… -22.2  114.   5   lear…  94302 <tibble>
5 ASN0000… -25.9  114.   9   denh…  94402 <tibble>
# … with 83 more rows
identical(weather_back, weather)
[1] TRUE

Access variables in the other form

Reference temporal variables with $

weather %>% 
  mutate(avg_tmax = mean(ts$tmax, na.rm = TRUE))
# cubble:   id [88]: nested form
# bbox:     [113.53, -43.49, 153.64, -10.58]
# temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
  id            lat  long  elev name              wmo_id ts                 avg_tmax
  <chr>       <dbl> <dbl> <dbl> <chr>              <dbl> <list>                <dbl>
1 ASN00001006 -15.5  128.   3.8 wyndham aero       95214 <tibble [366 × 4]>     36.7
2 ASN00002032 -17.0  128. 203   warmun             94213 <tibble [366 × 4]>     35.8
3 ASN00003080 -17.6  124.  77.5 curtin aero        94204 <tibble [366 × 4]>     35.9
4 ASN00005007 -22.2  114.   5   learmonth airport  94302 <tibble [366 × 4]>     33.2
5 ASN00006044 -25.9  114.   9   denham             94402 <tibble [366 × 4]>     27.1
# … with 83 more rows

Move spatial variables into the long form

weather_long %>% unfold(long, lat)
# cubble:  date, id [88]: long form
# bbox:    [113.53, -43.49, 153.64, -10.58]
# spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
  id          date        prcp  tmax  tmin  long   lat
  <chr>       <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1 ASN00001006 2020-01-01   164  38.3  25.3  128. -15.5
2 ASN00001006 2020-01-02     0  40.6  30.5  128. -15.5
3 ASN00001006 2020-01-03    16  39.7  27.2  128. -15.5
4 ASN00001006 2020-01-04     0  38.2  27.3  128. -15.5
5 ASN00001006 2020-01-05     2  39.3  26.7  128. -15.5
# … with 32,203 more rows

Why do you need a glyph map?

Why do you need a glyph map?

Glyph map transformation

DATA %>%
  ggplot() +
  geom_glyph(
    aes(x_major = X_MAJOR, x_minor = X_MINOR,
        y_major = Y_MAJOR, y_minor = Y_MINOR)) +
  ...

Aggregated temp. by month

cb <- as_cubble(
  list(spatial = stations, temporal = ts),
  key = id, index = date, 
  coords = c(long, lat)
)

cb_glyph <- cb %>%
  face_temporal() %>%
  mutate(month = lubridate::month(date)) %>%
  group_by(month) %>% 
  summarise(tmax = mean(tmax, na.rm = TRUE)) %>%
  unfold(long, lat)

cb_glyph %>% 
  ggplot(aes(x_major = long, 
             x_minor = month,
             y_major = lat, 
             y_minor = tmax)) +
  geom_sf(data = oz_simp, fill = "grey90",
          color = "white", inherit.aes = FALSE) +
  geom_glyph_box(width = 1.3, height = 0.5) + 
  geom_glyph(width = 1.3, height = 0.5) + 
  ggthemes::theme_map()

Now with German stations

Acknowledgements

  • The slides are made with Quarto, available at
https://sherryzhang-germany2023.netlify.app
  • All the materials used to prepare the slides are available at
https://github.com/huizezhang-sherry/germany2023

Reference