(tie-dye graphs)

Long time series datasets can be difficult to plot and examine using static figures. Interactive plots using the dygraphs package are a great tool but are currently not easily implemented for tidy data. This package provides methods for dygraph() to make it easier to plot tidy time series. Currently there are methods for dataframes and tibbles, grouped tibbles, and tsibbles.

Installation

Currently tydygraphs is only available on GitHub. You can install the current development version using remotes or by downloading and building from source:

# Install using remotes
remotes::install_github("jpshanno/tydygraphs")

# or from source
tydygraphs_source <- file.path(tempdir(), "tydygraphs-master.zip")
download.file("https://github.com/jpshanno/tydygraphs/archive/master.zip",
              tydygraphs_source)
unzip(tydygraphs_source,
      exdir = dirname(tydygraphs_source))
install.packages(sub(".zip$", "", tydygraphs_source), 
                 repos = NULL,
                 type = "source")

Use

A Simple Example

The dataset great_lakes_hydro included in tydygraphs demonstrates the problems with examing long-term time series datasets and the benefits of using dygraphs. The dataset consists of monthly water levels, precipitation, evaporation, discharge from, and runoff into Lake Superior and the combined Lakes Michigan and Huron. The data were downloaded from NOAA’s Great Lakes Dashboard Project (Smith et al. 2016)

library(tydygraphs)
#> Loading required package: dygraphs
#> 
#> Attaching package: 'tydygraphs'
#> The following object is masked from 'package:dygraphs':
#> 
#>     dygraph
great_lakes_hydro
#> # A tibble: 1,535 x 7
#>    lake  measurement_date water_level_m precip_mm evaporation_mm
#>    <chr> <date>                   <dbl>     <dbl>          <dbl>
#>  1 Supe… 1950-01-01             183.42       95           128.15
#>  2 Supe… 1950-02-01             183.362      39.1          49.4 
#>  3 Supe… 1950-03-01             183.316      48.2          38.01
#>  4 Supe… 1950-04-01             183.347      72.9          25.4 
#>  5 Supe… 1950-05-01             183.539      85.3          -0.51
#>  6 Supe… 1950-06-01             183.725      95.5          -4.86
#>  7 Supe… 1950-07-01             183.801      82            -5.26
#>  8 Supe… 1950-08-01             183.841      81.8           1.66
#>  9 Supe… 1950-09-01             183.81       56.1          24.13
#> 10 Supe… 1950-10-01             183.795      61.6          42.62
#> # … with 1,525 more rows, and 2 more variables: discharge_mm <dbl>,
#> #   runoff_mm <dbl>

Let’s start simple and show how the we can use the function for a single time series by filtering and supplying just a single variable to dygraph. The first column representing a time series will be used if nothing is supplied for time.

great_lakes_hydro %>% 
  filter(lake == "Superior") %>% 
  dygraph(water_level_m)
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo

Adding More Series

And if you noticed in the data, all of the other variables are expressed in millimeters, but because tydygraphs is built on top of dygraphs we can easily add two a second y-axis to look at precipitation in the same plot. This plot also makes it pretty clear why a tool like dygraphs is great for long time series. When the entire dataset is displayed it is nearly useless for exploratory analysis or diagnostics, but now we have the ability to zoom in on periods of interest.

two_variables <- 
  great_lakes_hydro %>% 
  filter(lake == "Superior") %>% 
  dygraph(precip_mm,
          water_level_m) %>% 
  dySeries("precip_mm", 
           axis = 'y2')

two_variables

Making it Look Nice

If we wanted to use this plot in something like a Shiny application or a blog post we could build even more using the dygraphs customizations. The chunk below makes precipitation into a step plot, adds a border around the water level series, and starts the plot zoomed in on a period of interest.

There are lots of ways to customize dygraphs and I highly recommend looking at the documentation for dygraphs.

# Start with the plot from above
two_variables %>% 
  # Modify precipitation to make a filled step plot with no border by adjusting
  # the strokePattern
  dySeries("precip_mm",
           axis = "y2",
           stepPlot = TRUE,
           fillGraph = TRUE,
           color = "dodgerblue",
           strokePattern = c(0,1)) %>%
  # Modify water level to add a white border around the line to ensure it stands
  # out from precipitation
  dySeries("water_level_m",
           strokeBorderWidth = 2,
           color = "darkblue") %>%  
  dyRangeSelector(dateWindow = c("2000-01-01", "2010-01-01")) %>% 
  dyLegend(labelsSeparateLines = TRUE)

Working with Grouped Data

And the final example is showing how we can plot our tidy data by passing a grouped tibble to dygraph().

great_lakes_hydro %>% 
  group_by(lake) %>% 
  dygraph(water_level_m)

Future Plans

  • Add unit tests

References

Smith, Joeseph P., Timothy S. Hunter, Anne H. Clites, Craig A. Stow, Tad Slawecki, Glenn C. Muhr, and Andrew D. Gronewold. 2016. β€œAn Expandable Web-Based Platform for Visually Analyzing Basin-Scale Hydro-Climate Time Series Data.” Environmental Modelling & Software 78 (April). Elsevier BV: 97–105. https://doi.org/10.1016/j.envsoft.2015.12.005.