Exogenous variables are external factors that provide additional information about the behavior of the target variable in time series forecasting. These variables, which are correlated with the target, can significantly improve predictions. Examples of exogenous variables include weather data, economic indicators, holiday markers, and promotional sales.
TimeGPT allows you to include exogenous variables when
generating a forecast. This vignette will show you how to include them.
It assumes you have already set up your API key. If you haven’t done
this, please read the Get
Started vignette first.
For this vignette, we will use the electricity consumption dataset
with exogenous variables included in nixtlar. This dataset
contains hourly prices from five different electricity markets, along
with two exogenous variables related to the prices and binary variables
indicating the day of the week.
df_exo_vars <- nixtlar::electricity_exo_vars
head(df_exo_vars)
#>   unique_id                  ds     y Exogenous1 Exogenous2 day_0 day_1 day_2
#> 1        BE 2016-10-22 00:00:00 70.00      49593      57253     0     0     0
#> 2        BE 2016-10-22 01:00:00 37.10      46073      51887     0     0     0
#> 3        BE 2016-10-22 02:00:00 37.10      44927      51896     0     0     0
#> 4        BE 2016-10-22 03:00:00 44.75      44483      48428     0     0     0
#> 5        BE 2016-10-22 04:00:00 37.10      44338      46721     0     0     0
#> 6        BE 2016-10-22 05:00:00 35.61      44504      46303     0     0     0
#>   day_3 day_4 day_5 day_6
#> 1     0     0     1     0
#> 2     0     0     1     0
#> 3     0     0     1     0
#> 4     0     0     1     0
#> 5     0     0     1     0
#> 6     0     0     1     0When using exogenous variables, nixtlar distinguishes
between historical and future exogenous variables:
Historical Exogenous Variables: These should be
included in the input data immediately following the
id_col, ds, and y columns. If
your dataset contains additional columns that are not exogenous
variables, you must remove them before using any core functions of
nixtlar.
Future Exogenous Variables: These correspond to
the X_df parameter and should cover the entire forecast
horizon. This dataset must include columns with the appropriate
timestamps and, if applicable, unique identifiers.
future_exo_vars <- nixtlar::electricity_future_exo_vars
head(future_exo_vars)
#>   unique_id                  ds Exogenous1 Exogenous2 day_0 day_1 day_2 day_3
#> 1        BE 2016-12-31 00:00:00      64108      70318     0     0     0     0
#> 2        BE 2016-12-31 01:00:00      62492      67898     0     0     0     0
#> 3        BE 2016-12-31 02:00:00      61571      68379     0     0     0     0
#> 4        BE 2016-12-31 03:00:00      60381      64972     0     0     0     0
#> 5        BE 2016-12-31 04:00:00      60298      62900     0     0     0     0
#> 6        BE 2016-12-31 05:00:00      60339      62364     0     0     0     0
#>   day_4 day_5 day_6
#> 1     0     1     0
#> 2     0     1     0
#> 3     0     1     0
#> 4     0     1     0
#> 5     0     1     0
#> 6     0     1     0To generate a forecast with exogenous variables, use the
nixtla_client_forecast function as you would for forecasts
without them. The only difference is that you must add the future
exogenous variables using the X_df argument.
fcst_exo_vars <- nixtla_client_forecast(df_exo_vars, h = 24, X_df = future_exo_vars)
#> Frequency chosen: h
#> Using historical exogenous features: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
#> Using future exogenous features: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
head(fcst_exo_vars)
#>   unique_id                  ds  TimeGPT
#> 1        BE 2016-12-31 00:00:00 74.54077
#> 2        BE 2016-12-31 01:00:00 43.34429
#> 3        BE 2016-12-31 02:00:00 44.42921
#> 4        BE 2016-12-31 03:00:00 38.09440
#> 5        BE 2016-12-31 04:00:00 37.38914
#> 6        BE 2016-12-31 05:00:00 39.08574For comparison, we will also generate a forecast without exogenous variables.
df <- nixtlar::electricity # same dataset but without exogenous variables
fcst <- nixtla_client_forecast(df, h = 24)
#> Frequency chosen: h
head(fcst)
#>   unique_id                  ds  TimeGPT
#> 1        BE 2016-12-31 00:00:00 45.19045
#> 2        BE 2016-12-31 01:00:00 43.24445
#> 3        BE 2016-12-31 02:00:00 41.95839
#> 4        BE 2016-12-31 03:00:00 39.79649
#> 5        BE 2016-12-31 04:00:00 39.20454
#> 6        BE 2016-12-31 05:00:00 40.10878nixtlar includes a function to plot the historical data
and any output from nixtla_client_forecast,
nixtla_client_historic,
nixtla_client_anomaly_detection and
nixtla_client_cross_validation. If you have long series,
you can use max_insample_length to only plot the last N
historical values (the forecast will always be plotted in full).