Review ArticleQuantile regression and restricted cubic splines are useful for exploring relationships between continuous variables
Introduction
In a well-designed study, appropriate statistical methods enhance the understanding of relationships between variables. Health research frequently evaluates relationships between continuous variables, such as the influence of age on health care costs. Making valid comparisons may require adjusting for variables that confound the relationship under study; for example, failing to adjust for age could produce erroneous results in an observational study of health care costs in smokers vs. nonsmokers. Even in randomized studies, unadjusted comparison between groups can be confounded by significant baseline differences with respect to continuous independent variables [1].
In these situations, researchers often use ordinary least squares regression (OLS, commonly referred to as linear regression) to assess, or adjust for, the relationship between the continuous independent variable and the continuous dependent variable. Generally, however, no single algebraic relationship accurately describes how independent variables are related to a continuous dependent variable [2]. OLS regression only describes how the conditional mean of a continuous dependent variable relates to the independent variables. Additionally, investigators usually assume a linear relationship between dependent variables and continuous independent variables; yet, linearity may or may not hold in health care research.
Two statistical methods tackle these issues: restricted cubic splines (RCS) and quantile regression. Cubic splines provide a way to represent nonlinear relationships for continuous independent variables. Quantile regression allows one to evaluate the relationship of independent variables across the full range of a continuous dependent variable rather than just its conditional mean. Although modern statistical packages make both of these methods broadly accessible, they are not widely used by researchers. One reason for this may be a lack of familiarity with, and appreciation for, these methods. A better understanding of these methods could extend their use, and improve our understanding of relationships in health research. We hope to make these methods more accessible by discussing and illustrating their use.
Section snippets
Regression for continuous variables
When the outcome being studied can be represented as a continuous variable, researchers often use OLS regression [3]. A Medline search for the terms “ordinary least squares regression” or “OLS regression,” or “linear regression” returned 4,541 hits for 2006, a number that increased continuously from 627 hits in 1990. Approximately 3% of original articles in six high profile journals used OLS regression analysis [4].
To explain key details for creating and interpreting an OLS regression model, we
Examples
To illustrate the combined use of quantile regression and RCS, we applied them to two real data sets. Data are presented as mean ± standard deviation (SD), or median (interquartile range [IQR]). P < 0.05 is considered significant. Analysis was done using Stata 10.0 (StataCorp, College Station, TX, USA).
Discussion
Relationships in health research should not be assumed to be homogeneous [17] or linear [18]. Yet, both of these assumptions are implicit in most OLS regression analyses. In analyzing a continuous dependent variable, OLS regression accurately describes the Y–X relationship only for individuals with values of Y near the mean of the conditional distribution of that variable; furthermore, it can produce misleading results if underlying assumptions of normality and homoscedasticity are violated.
References (18)
- et al.
High-dose furosemide for established ARF: a prospective, randomized, double-blind, placebo-controlled, multicenter trial
Am J Kidney Dis
(2004) - et al.
Impact of body mass index on outcomes following critical care
Chest
(2003) - et al.
A gentle introduction to quantile regression for ecologists
Front Ecol Environ
(2003) - et al.
Applied regression analysis and multivariable methods
(1998) - et al.
Medicine residents' understanding of the biostatistics and results in the medical literature
JAMA
(2007) Regression diagnostics
(1991)- et al.
The risk of determining risk with multivariable methods
Ann Intern Med
(1993) - et al.
Quantile regression an introduction
J Econ Perspect
(2001) - et al.
Goodness of fit and related inference processes for quantile regression
J Am Stat Assoc
(1999)
Cited by (221)
Long-term embryo vitrification is associated with reduced success rates in women undergoing frozen embryo transfer following a failed fresh cycle
2024, European Journal of Obstetrics and Gynecology and Reproductive BiologyGeriatric nutritional risk index independently predicts delirium in older patients in intensive care units: A multicenter cohort study
2024, Archives of Gerontology and GeriatricsThe Danish newborn standard and the International Fetal and Newborn Growth Consortium for the 21st Century newborn standard: a nationwide register-based cohort study
2023, American Journal of Obstetrics and GynecologyAssociations between trace level thallium and multiple health effects in rural areas: Chinese Exposure and Response Mapping Program (CERMP)
2023, Science of the Total Environment