Original Contribution
Inter-Rater, Intra-Rater, and Inter-Machine Reliability of Quantitative Ultrasound Measurements of the Patellar Tendon

https://doi.org/10.1016/j.ultrasmedbio.2012.12.001Get rights and content

Abstract

The use of ultrasound (US) to perform quantitative measurements of musculoskeletal tissues requires accurate and reliable measurements between investigators and ultrasound machines. The objective of this study was to evaluate inter-rater and intra-rater reliability of patellar tendon measurements between providers with different levels of US experience and inter-machine reliability of US machines. Sixteen subjects without a history of knee pain were evaluated with US examinations of the patellar tendon. Each tendon was scanned independently by two investigators using two different ultrasound machines. Tendon length and cross-sectional area (CSA) were obtained, and examiners were blinded to each other's results. Tendon length was measured using a validated system involving surface markers and calipers, and CSA was measured using each machine's measuring software. Intra-class correlation coefficients (ICCs) were used to determine reliability of measurements between observers, where ICC > 0.75 was considered good and ICC > 0.9 was considered excellent. Inter-rater reliability between sonographers was excellent and revealed an ICC of 0.90 to 0.92 for patellar tendon CSA and an ICC of 0.96 for tendon length. ICC for intra-rater reliability of tendon CSA was also generally excellent, with ICC between 0.87 and 0.96. Inter-machine reliability was excellent, with ICC of 0.91–0.98 for tendon CSA and 0.96–0.98 for tendon length. Bland–Altman plots were constructed to measure validity and demonstrated a mean difference between sonographers of 0.03 mm2 for CSA measurements and 0.2 mm for tendon length. Using well-defined scanning protocols, a novice and an experienced musculoskeletal sonographer attained high levels of inter-rater agreement, with similarly excellent results for intra-rater and inter-machine reliability. To our knowledge, this study is the first to report inter-machine reliability in the setting of quantitative musculoskeletal ultrasound.

Introduction

The use of musculoskeletal ultrasound for evaluating soft tissue structures is increasing rapidly both in the research and clinical settings, with advantages including high axial resolution, short time to conduct the test, real-time image capture, lack of ionizing radiation, wide availability, and relatively low cost. A wide number of ultrasound machines are available to researchers and clinicians, and as sonographic standards are established for the measurement of anatomical structures, it is critical that these measurements are comparable from one machine to another. In addition, ultrasound is frequently described as an operator-dependent imaging modality (Wakefield et al. 2005); therefore, ensuring the repeatability of measurements between sonographers is of high importance. Few studies have evaluated the inter-rater and intra-rater reliability of quantitative ultrasound-based measurements of the musculoskeletal system, and no studies have evaluated inter-machine reliability.

Issues regarding reliability and repeatability are not limited to musculoskeletal ultrasound, and the subject is of broad clinical significance in many applications of diagnostic ultrasound. Inter- and intra-rater reliability have been evaluated in other areas of ultrasound, both in the research and clinical settings. In the research setting, with the use of high-frequency transducers (40 MHz) and a strict imaging protocol, an extremely high level of inter-rater and intra-rater reliability is possible. For example, when measuring murine colon wall thickness, mean inter-rater and intra-rater differences of 0.03 and 0.06 mm, respectively, have recently been reported (Abdelrahman et al. 2012). In the clinical setting, protocols may be less strict, imaging environments are often less controlled, and available transducers tend to be lower in frequency, especially for deeper anatomic structures. These factors can preclude such negligible variation. A systematic review evaluating reliability in measurements of the abdominal aorta diameter revealed significant variability regarding reliability, with several studies showing inter-rater reliability within the clinically acceptable level of 5 mm and other studies showing reliability outside this range (Beales et al. 2011). Therefore, ongoing uncertainty regarding the ability to minimize inter-rater and intra-rater variability remains, especially in the clinical arena.

The present study was designed to address these issues in the setting of musculoskeletal ultrasound, specifically to evaluate the reliability of a validated method of measuring patellar tendon dimensions between two sonographers with different levels of experience, and between two different ultrasound machines. The patellar tendon was chosen because of its clinical importance in tendinopathy, patella infera and patella alta, as well as its relatively straightforward anatomic course.

Section snippets

Study sample

A convenience sample of 16 healthy subjects without a prior history of knee pain participated in the study. The subjects were medical residents at a university hospital and underwent ultrasound scanning of their dominant knee as part of an educational activity. Seven men and nine women, 25–36 years old, were included in the study. One subject was excluded when significant patellar tendinopathy was discovered during the scanning session. The two sonographers were medical residents with minimal

Results

The inter-rater, intra-rater, and inter-machine reliability is shown in Table 1. Inter-rater reliability in measuring both tendon CSA and tendon length was excellent, with ICC between 0.90 and 0.96. Intra-rater reliability for tendon CSA was also generally excellent, with ICC between 0.87 and 0.96. Intra-rater reliability was unable to be calculated for tendon length measurements, because only one length measurement per tendon was obtained by each investigator. Inter-machine reliability was

Discussion

The operator dependence of ultrasound as an imaging modality has been highlighted as potentially problematic for both qualitative and quantitative anatomic measurements (Wakefield et al. 2005). Two discrete tasks are required to obtain ultrasound measurements, both of which require an operator's input: physically obtaining the ultrasound images, and measuring or grading the structure of interest on the acquired images. This procedure is notably different from other modern imaging modalities

Conclusions

An experienced and novice sonographer attained high levels of inter-rater reliability when measuring the patellar tendon using strict scanning protocols. Inter-machine and intra-rater reliability were similarly excellent. To our knowledge, this study is the first to report inter-machine reliability in the setting of quantitative musculoskeletal ultrasound tendon measurements.

Acknowledgments

The authors thank Michael Z. Levy for his invaluable assistance with statistical support.

References (21)

There are more references available in the full text version of this article.

Cited by (52)

  • B-Mode Ultrasonography Is a Reliable and Valid Alternative to Magnetic Resonance Imaging for Measuring Patellar Tendon Cross-Sectional Area

    2023, Ultrasound in Medicine and Biology
    Citation Excerpt :

    In addition, US is an attractive alternative to assess tendon properties because of its affordability, time efficiency, portability and non-invasive nature. Despite the widespread use of US in musculoskeletal research, the reliability of US tendon measures is debated within the literature (Gellhorn and Carlson 2013; McAuliffe et al. 2017). For example, US measures of PT CSA have been reported to be reliable when measured on multiple days (Reeves and Narici 2003), by multiple operators with different experience, using multiple machines (Gellhorn and Carlson 2013).

  • Ultrasonographic assessment of patellar tendon thickness at 16 clinically relevant measurement sites – A study of intra- and interrater reliability

    2019, Journal of Bodywork and Movement Therapies
    Citation Excerpt :

    Precision for intrarater measurements varied from 0.04 cm to 0.13 cm (13.3%–38.7%) while ranging from 0.06 cm to 0.15 cm (19.1%–42.5%) for interrater measurements. Previous intrarater- and interrater USI studies on muscle- and tendon thickness reveal a cumulative ICC range from 0.64 to 0.97 (Bentman et al., 2010; Cheng et al., 2012; Costa et al., 2009; Craig et al., 2008; Gellhorn and Carlson, 2013; Koppenhaver et al., 2009; Liang et al., 2007; O'Sullivan et al., 2007; Rathleff et al., 2011; Skou and Aalkjaer, 2013; Wallwork et al., 2007) and from 0.40 to 0.97, respectively (Bentman et al., 2010; Cheng et al., 2012; Gellhorn and Carlson, 2013; O'Sullivan et al., 2007; Rathleff et al., 2011; Skou and Aalkjaer, 2013; Wallwork et al., 2007). Previous results on measurement precision (LOA-%) reveal a cumulative range from 1.8% to 53% for intrarater (Bentman et al., 2010; Bjordal et al., 2003; Costa et al., 2009; Koppenhaver et al., 2009; O'Connor et al., 2004; O'Sullivan et al., 2007; Rathleff et al., 2011; Skou and Aalkjaer, 2013; Springer et al., 2006; Wallwork et al., 2007; Ying et al., 2003) and 15.8%–49% for interrater (Bentman et al., 2010; O'Sullivan et al., 2007; Rathleff et al., 2011; Skou and Aalkjaer, 2013; Wallwork et al., 2007; Ying et al., 2003) for USI-derived measures of muscle- and tendon thickness.

View all citing articles on Scopus
View full text