Review Article
Quantile regression and restricted cubic splines are useful for exploring relationships between continuous variables

https://doi.org/10.1016/j.jclinepi.2008.05.015Get rights and content

Abstract

Objective

Ordinary least squares (OLS) regression, commonly called linear regression, is often used to assess, or adjust for, the relationship between a continuous independent variable and the mean of a continuous dependent variable, implicitly assuming a linear relationship between them. Linearity may not hold, however, and analyzing the mean of the dependent variable may not capture the full nature of such relationships. Our goal is to demonstrate how combined use of quantile regression and restricted cubic splines (RCS) can reveal the true nature and complexity of relationships between continuous variables.

Study Design and Setting

We provide a review of methodologic concepts, followed by two examples using real data sets. In the first example, we analyzed the relationship between cognition and disease duration in multiple sclerosis. In the second example, we analyzed the relationship between length of stay (LOS) and severity of illness in the intensive care unit (ICU).

Results

In both examples, quantile regression showed that the relationship between the variables of interest was heterogeneous. In the second example, RCS uncovered nonlinearity of the relationship between severity of illness and length of stay.

Conclusion

Together, quantile regression and RCS are a powerful combination for exploring relationships between continuous variables.

Introduction

In a well-designed study, appropriate statistical methods enhance the understanding of relationships between variables. Health research frequently evaluates relationships between continuous variables, such as the influence of age on health care costs. Making valid comparisons may require adjusting for variables that confound the relationship under study; for example, failing to adjust for age could produce erroneous results in an observational study of health care costs in smokers vs. nonsmokers. Even in randomized studies, unadjusted comparison between groups can be confounded by significant baseline differences with respect to continuous independent variables [1].

In these situations, researchers often use ordinary least squares regression (OLS, commonly referred to as linear regression) to assess, or adjust for, the relationship between the continuous independent variable and the continuous dependent variable. Generally, however, no single algebraic relationship accurately describes how independent variables are related to a continuous dependent variable [2]. OLS regression only describes how the conditional mean of a continuous dependent variable relates to the independent variables. Additionally, investigators usually assume a linear relationship between dependent variables and continuous independent variables; yet, linearity may or may not hold in health care research.

Two statistical methods tackle these issues: restricted cubic splines (RCS) and quantile regression. Cubic splines provide a way to represent nonlinear relationships for continuous independent variables. Quantile regression allows one to evaluate the relationship of independent variables across the full range of a continuous dependent variable rather than just its conditional mean. Although modern statistical packages make both of these methods broadly accessible, they are not widely used by researchers. One reason for this may be a lack of familiarity with, and appreciation for, these methods. A better understanding of these methods could extend their use, and improve our understanding of relationships in health research. We hope to make these methods more accessible by discussing and illustrating their use.

Section snippets

Regression for continuous variables

When the outcome being studied can be represented as a continuous variable, researchers often use OLS regression [3]. A Medline search for the terms “ordinary least squares regression” or “OLS regression,” or “linear regression” returned 4,541 hits for 2006, a number that increased continuously from 627 hits in 1990. Approximately 3% of original articles in six high profile journals used OLS regression analysis [4].

To explain key details for creating and interpreting an OLS regression model, we

Examples

To illustrate the combined use of quantile regression and RCS, we applied them to two real data sets. Data are presented as mean ± standard deviation (SD), or median (interquartile range [IQR]). P < 0.05 is considered significant. Analysis was done using Stata 10.0 (StataCorp, College Station, TX, USA).

Discussion

Relationships in health research should not be assumed to be homogeneous [17] or linear [18]. Yet, both of these assumptions are implicit in most OLS regression analyses. In analyzing a continuous dependent variable, OLS regression accurately describes the YX relationship only for individuals with values of Y near the mean of the conditional distribution of that variable; furthermore, it can produce misleading results if underlying assumptions of normality and homoscedasticity are violated.

References (18)

There are more references available in the full text version of this article.

Cited by (221)

View all citing articles on Scopus
View full text