This research develops new methods to forecast an economic variable based on a (very) large collection of potentially relevant variables. For example, it tries to predict the unemployment in the Netherlands based on the popularity of google search queries such as “werkloosheidsuitkering” and “vacatures”. Traditional economic models consider the effects of only a few variables at the same time. Nowadays, there is access to much larger datasets that potentially contain new information for us to explore. A simple idea would be to simply throw all the data into the same model, but that provides a lot of room for mistakes. Especially in economics, it is known that economic variables such as unemployment display strongly trending behaviour over time. If unemployment was very high in January, it will likely be high in February as well. This kind of behaviour requires careful treatment in statistical models. Therefore, different techniques from the statistics literature were combined to create an estimation method that automatically removes the irrelevant variables from the model, while at the same time respecting the unique (trending) characteristics of the variables in the estimated model.
|Award date||27 Mar 2020|
|Publication status||Published - 2020|
- big data
- high-dimensional statistics
- time series