Why consider using quantile regression in your research

Posted by Andrew Yim - Dec 15, 2020

Why consider using quantile regression in your research

In graduate school, we were taught about the nice properties of the coefficients estimated with the ordinary least square (OLS) regression model: They are BLUE (best linear unbiased estimators). (Note:  The ‘best’ in BLUE refers to the sampling distribution having the minimum variance, i.e., most efficient.) If normality is additionally assumed on the disturbance term, then the estimated coefficients are also normally distributed, allowing hypotheses on them to be tested with t- and F-tests. What might not have been emphasized are the consequences when the normality assumption is violated.

Profitability (as scaled earnings) is among the most important inputs used for valuation. In a profitability forecasting setting, my co-authors and I show that median regression, as a special case of quantile regression (with 𝜏 = 0.5), produces more accurate forecasts than OLS regression does. Simulation and archival-data analyses indicate that the incremental forecasting accuracy is related to the tail-heaviness of the earnings distribution. As an external validation, the distributional shape analysis is applied to cash flow forecasting and yields the same conclusion (Tian, Yim, and Newton [2020]: Tail-Heaviness, Asymmetry, and Profitability Forecasting by Quantile Regression, forthcoming in Management Science). Recognizing quantile regression’s advantage, other accounting researchers have also used this estimation approach in their research (e.g., Easton et al 2020).   

Even in an inference setting, disregarding the violation of the normality assumption can critically bias the conclusion of statistical testing because the t- and F-tests are not accurate when the OLS coefficients are no longer normally distributed. A large sample size might not be an effective fix.   

In practice, one often needs to deal with the violation of the normality assumption arising from heavy tails (see the discussion in this blog post: The dangerous disregard for fat tails in quantitative finance). A frequently forgotten truth is that with heavy tails, the sample median can be better than the sample mean as an estimator for the population mean  (see this publication in The American Statistician for a more systematic comparison.)  

Prior research has proposed median regression as an alternative to OLS to avoid misleading conclusions that might result from the latter when the distribution of the error term has heavy tails (Harden and Desmarais 2011). For large samples, OLS and median regression estimates are often quite similar (see the example on p. 16 of these slides). If the normality assumption is met, the cost of using median regression is that the estimate being not as efficient is likely to set a higher hurdle for rejecting a null hypothesis. However, if the error term has heavy tails, median regression has the benefits of producing more efficient estimates and more robust conclusions less affected by outliers (see the discussions here and here).

Tian, Yim, and Newton (2020) have only scratched the surface of quantile regression’s usefulness by focusing on the median regression as its special case. Quantile regression in general can produce optimal estimates/forecasts for asymmetric loss functions (when 𝜏 ≠ 0.5). Prior research has argued that financial analysts have an asymmetric loss function (Clatworthy, Peel, and Pope [2012]: Are Analysts’ Loss Functions Asymmetric?, Journal of Forecasting). If they do, would they find formulating their forecasts based on quantile regression with 𝜏 ≠ 0.5 more aligned with their forecasting objective? What is the implied 𝜏 that can be inferred from analyst earnings forecasts? Are the implied 𝜏’s similar across different types of analyst forecasts (cash flow forecasts, revenue forecasts, etc)? These are interesting questions left for future research to answer.