Respiratory research

Performance of popular pulse oximeters compared with simultaneous arterial oxygen saturation or clinical-grade pulse oximetry: a cross-sectional validation study in intensive care patients

Abstract

Objectives To evaluate the performance of direct-to-consumer pulse oximeters under clinical conditions, with arterial blood gas measurement (SaO2) as reference standard.

Design Cross-sectional, validation study.

Setting Intensive care.

Participants Adult patients requiring SaO2-monitoring.

Interventions The studied oximeters are top-selling in Europe/USA (AFAC FS10D, AGPTEK FS10C, ANAPULSE ANP 100, Cocobear, Contec CMS50D1, HYLOGY MD-H37, Mommed YM101, PRCMISEMED F4PRO, PULOX PO-200 and Zacurate Pro Series 500 DL). Directly after collection of a SaO2 blood sample, we obtained pulse oximeter readings (SpO2). SpO2-readings were performed in rotating order, blinded for SaO2 and completed <10 min after blood sample collection.

Outcome measures Bias (SpO2–SaO2) mean, root mean square difference (ARMS), mean absolute error (MAE) and accuracy in identifying hypoxaemia (SaO2 ≤90%). As a clinical index test, we included a hospital-grade SpO2-monitor (Philips).

Results In 35 consecutive patients, we obtained 2258 SpO2-readings and 234 SaO2-samples. Mean bias ranged from −0.6 to −4.8. None of the pulse oximeters met ARMS ≤3%, the requirement set by International Organisation for Standardisation (ISO)-standards and required for Food and Drug Administration (FDA) 501(k)-clearance. The MAE ranged from 2.3 to 5.1, and five out of ten pulse oximeters met the requirement of ≤3%. For hypoxaemia, negative predictive values were 98%–99%. Positive predictive values ranged from 11% to 30%. Highest accuracy (95% CI) was found for Contec CMS50D1; 91% (86–94) and Zacurate Pro Series 500 DL; 90% (85–94). The hospital-grade SpO2-monitor had an ARMS of 3.0% and MAE of 1.9, and an accuracy of 95% (91%–97%).

Conclusion Top-selling, direct-to-consumer pulse oximeters can accurately rule out hypoxaemia, but do not meet ISO-standards required for FDA-clearance

Key messages

What is the key question?

  • To evaluate the performance of direct-to-consumer pulse oximeters in a clinical setting

What is the bottom line?

  • The tested pulse oximeters do not meet the required standards, but can be used safely to rule out (but not rule in) hypoxaemia.

Why read on?

  • Pulse oximeters are widely used for making treatment or referral decisions, thus understanding their limitations is essential.

Background

Pulse oximetry has become an indispensable, low-cost, non-invasive, diagnostic tool to assess a patients’ oxygen saturation. Typically, this tool has a clip that can be put on a patient’s finger to obtain information on the peripheral arterial oxygen saturation (SpO2), which serves as a proxy for tissue oxygenation.1 While most direct-to-consumer oximeters are not intended for use in clinical settings, pulse oximeters have evolved to play a pivotal role in routine medical care, and are an essential bedside tool in making treatment and/or referral decisions in community-based healthcare settings. In the current COVID-19 pandemic, the use of pulse oximeters has become even more indispensable, as hypoxaemia is a common diagnostic finding, with reports indicating that patients with COVID-19 may have few symptoms relative to the degree of hypoxaemia (termed ‘silent hypoxaemia’).2 Given the risk of hypoxaemia, the WHO recommends home oximetry monitoring for patients with COVID-19 and risk factors for progression to severe disease.3 Given its importance in guiding medical decision-making, it is remarkable how little is known about the diagnostic accuracy of these devices when used under clinical conditions in actual patients.4 As such, we aimed to evaluate whether popular direct-to-consumer fingertip pulse oximeters meet the standards for accuracy, as proposed by regulatory bodies under real-world conditions.

Methods

We reported this diagnostic accuracy study in accordance with the Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015 statement.5

Study design

We enrolled consecutive patients, who were at least 18 years of age, admitted to the intensive care unit (ICU) of a large community-based hospital (Flevoziekenhuis, Almere, the Netherlands). Eligible patients had a radial artery catheter for arterial blood oxygen saturation assessments (SaO2) as part of routine medical care. Exclusion criteria were patients without a clinical indication for arterial access, those with known inherited forms of abnormal haemoglobin, and those who rapidly deteriorated due to acute haemodynamic compromise, in which the measurements required for this study could hinder medical interventions and thereby negatively affect patient safety.

Study procedures

Intensive Care personnel notified the site investigator (LB) of a potentially eligible patient for study enrolment. The site investigator enrolled each patient after verbal and written consent, either by the patient or his/her legal representative. During office hours, the site investigator was notified by the Intensive Care personnel when a blood gas sample was about to be performed. The site investigator positioned the pulse oximeters for readings directly after the arterial blood gas sample (SaO2) was obtained. In order to reduce detection bias, the site investigator applied the pulse oximeters in a rotating fashion, maintaining a fixed device order but starting with a different device in each consecutive sample. As each SpO2 measurement took 30 s, the measurement time window involved at least 300 s. The Intensive Care personnel (SaO2) and site investigator (SpO2-readings) reported their findings on separate digital forms, and were blinded for each other’s findings.

Index test and devices

We evaluated 10 oximeters, which were selected from the top 10 of most purchased pulse oximeters on Amazon in at least two of the following countries: USA, UK, Germany, Italy or France. Amazon was chosen because of its dominance on the e-commerce market. In alphabetical order, these oximeters were as follows: AFAC FS10D, AGPTEK FS10C, ANAPULSE ANP 100, Cocobear, Contec CMS50D1, HYLOGY MD-H37, Mommed YM101, PRCMISEMED F4PRO, PULOX PO-200 and Zacurate Pro Series 500 DL. These pulse oximeters cost between 20€ and 50€ each, and all claimed to meet International Organisation for Standardisation (ISO) standards (see paragraph ‘outcomes of interest’ for specifics). As a clinical index test, we also included a hospital-grade pulse oximeter (Philips M1191BL sensor glove, Philips, The Netherlands), which was used as the clinical standard of care for continuous SpO2 monitoring at the study site, and has met ISO standards and received 510(k) clearance of the Food and Drug Administration (FDA) (clearance number: K062455). We used the SpO2 value that was shown on the pulse oximeter’s display 30 s after placement on a patient’s fingertip. The same fingertip was used for each oximeter. When no result was displayed 30 s after placement, this was documented as an invalid reading.

Reference standard

The reference standard was a point of care testing analyser (ABL90 Flex Plus, Radiometer Medical ApS, Brønshøj, Denmark, calibrated as per regulatory standards), which was used to perform blood gas analysis on arterial blood gas samples to obtain the arterial oxygen saturation (SaO2) at the study site as part of routine Intensive Care.

Outcomes of interest

We formulated the following outcome measures: mean bias, root mean square difference (ARMS), the mean absolute error (MAE) and diagnostic accuracy for hypoxaemia, defined as SaO2 ≤90%. Mean bias is calculated as SpO2–SaO2. ARMS and MAE are derived from calculations involving mean bias and precision (SD of bias). Because outliers have an excessive negative effect on results of the ARMS parameter, we also assessed the MAE, a measurement that is more robust in the presence of outliers. The formulas to calculate these outcomes can be found in the supplement. We evaluated the diagnostic accuracy of the pulse oximeters according to the standards defined by the International Organisation for Standardisation in ISO 80601-2-61:2017, which supersedes the ISO 80601-2-61:2011 standard advised by the American FDA in their 510(k) Premarket Notification Submissions Guidance for pulse oximeters. This standard considers an ARMS of ≤3% in the SaO2 range of 70%–100% acceptable.6 7

Data collection and sample size

We included data on SpO2, SaO2, heart rate, systolic-blood pressure and diastolic blood pressure, sex, age, skin type (Fitzpatrick classification scale) as assessed by the site investigator, vasopressor use (ie, noradrenalin dose (mg/h)), body temperature (oC) and hand temperature to touch as assessed by the site investigator. We followed the FDA advice for sample size determination, which states that at least 200 measurements per pulse oximeter should be obtained from at least 10 subjects, of which at least two or 15% have a dark skin type (Fitzpatrick IV–VI).7

Statistical analysis

We assessed mean bias (SpO2–SaO2) and SD, and subsequently calculated the ARMS and MAE using the formulas as described in the supplemental data. We created Bland-Altman plots for each pulse oximeter to graphically display its bias (SpO2–SaO2) to the reference standard (SaO2). We added a zero-line and upper and lower limits of agreement (±1.96 SD). To visualise the accuracy standards required by the regulatory bodies, ±3% lines are also displayed in the figures. As multiple observations were performed per individual, we calculated the SD by using the within-subject variance (σ2) and the between-subject variance (σμ2). Since ARMS can be easily affected by the presence of outliers, we added MAE, which is more robust in the presence of outliers, as well as a sensitivity analysis restricting the sample to measurements within 1.96 SD of each pulse oximeter’s mean and calculating ARMS and MAE on this sample. This sensitivity analysis is analogous to discarding extreme readings as done in routine clinical care when a reading does not coincide with the patient’s apparent clinical state. We also assessed the diagnostic performance (sensitivity, specificity, predictive values and accuracy) of pulse oximeters in detecting hypoxaemia, which we defined as SaO2 ≤90%. Finally, we evaluated factors associated with poor performance of each pulse oximeter. We used bias as a continuous measurement for performance, and used logistic regression models in which we included relevant patient and pulse oximeter characteristics. We used SPSS (IBM SPSS, V.26.0, IBM, Armonk, USA), R software (R V.3.6.1, The R Foundation for statistical computing) and MedCalc Statistical Software V.18.5 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2018) to conduct the analyses. We assessed statistical significance at the 0.05 level in all analyses.

Results

Baseline characteristics

In July and August 2020, we enrolled 35 consecutive patients, with a median age of 69 years, and 40% female. Table 1 displays the baseline characteristics. Patients were primarily admitted for respiratory failure due to COVID-19 or other pulmonary diseases (such as chronic obstructive pulmonary disease). Other diagnoses were septic shock, pulmonary embolism, and arterial thrombosis in extremities, severe hyperglycaemic and suicide attempt. Almost half of patients required vasopressors, and all patients required supplemental oxygen.

Table 1
|
Patient and pulse oximeter characteristics

Blood gas samples and pulse oximeter readings

We obtained a total of 234 arterial blood gas samples; SaO2 ranged between 85.6% and 99.8%, with the distribution illustrated in figure 1. Of those 234 SaO2 samples, 12 samples (5.1%) were classified as hypoxaemia. Moreover, each SaO2 sample was followed by 10 SpO2 measurements (one from each pulse oximeter), which resulted in a total of 2340 SpO2 measurements. Eighty-two (3.5%) SpO2 measurements were invalid, as the pulse oximeter did not display a result after 30 s (also see table 1), leaving 2258 interpretable SpO2 measurements, which were all obtained within 10 min (range 0–540 s) after acquiring the reference standard’s blood sample.

Figure 1
Figure 1

Distribution of arterial blood gas saturation (SaO2) among obtained samples.

Test results of pulse oximeters

Table 2 shows the test characteristics of the pulse oximeters that are graphically displayed in figure 2 and online supplemental figure S1. We found that most pulse oximeters overall had a negative mean bias, thus SpO2 was on average lower than SaO2. Moreover, none of the tested pulse oximeters met the requirement of diagnostic accuracy, using the ARMS≤3% threshold. When using MAE, which is less affected by outliers, five oximeters performed within a ≤3% difference margin of the reference standard. When excluding extreme outliers (>1.96 SD), performance of Contec CMS50D1, Mommed YM101, Pulox-PO-200 and Zacurate Pro Series 500 DL were within the ARMS≤3% limit, and all but two oximeters (ANAPULSE ANP 100 and Cocobear) met the MAE≤3% thresholds.

Figure 2
Figure 2

Bland-Altman plots of the bias compared with SaO2 of two pulse oximeters with the lowest mean bias (Contec CMS50D1 −0.6 and Zacurate Pro series 500 DL −1.4). *Bland-Altman plots for each pulse oximeter to graphically display its bias (SpO2–SaO2) to the reference standard (SaO2). We added a zero-line and upper and lower limits of agreement (±1.96 SD). To visualise the accuracy standards required by the regulatory bodies, ±3% lines are also displayed in the figures. *Bland-Altman plots of the other pulse oximeters can be found as a supplemental figure.

Table 2
|
Performance of pulse oximeters compared with arterial blood gas (using the metrics: mean bias, root mean square difference and mean absolute error)

Detection of hypoxaemia

Table 3 summarises the accuracy of each pulse oximeter in detecting hypoxaemia. Overall, the oximeters with the highest specificity were Contec CMS50D1 (93%) and Zacurate Pro Series 500 DL (91%). Oximeters with the highest sensitivity were Hylogy MD-H37 (92%) and Anapulse ANP 100 (91%). In terms of predictive values, all pulse oximeters performed well in ruling out hypoxaemia, with negative predictive values of 98%–99% in a population with ≈5% hypoxaemia measurements. Confirming the presence of hypoxaemia was poor, with positive predictive values of 11%—30%. For all reading, accuracy was highest for Contec CMS50D1 (91%) and Zacurate Pro Series 500 DL (90%). As a sensitivity analysis, we also provided data on the performance of pulse oximeters at different abnormal oxygenation thresholds (SaO2 ≤92% and SaO2 ≤94), which can be found as online supplemental table S1).

Table 3
|
Accuracy of pulse oximeters in detecting hypoxaemia (prevalence of SaO2 ≤90% is 5.1%)

Factors associated with poor pulse oximeter performance

Online supplemental table S2 displays factors that were associated with bias (SpO2–SaO2) of each pulse oximeter. Overall, darker skin complexion and an inaccurate pulse rate measurement (difference between pulse oximeter pulse rate and ‘true’ pulse rate as captured by ICU monitor), were associated with poorer SpO2 performance in five out of 10 pulse oximeters. Other factors that negatively affected results were systolic blood pressure and poor peripheral perfusion (cold hands to touch), in four and three pulse oximeters, respectively.

Test results and diagnostic accuracy of the SpO2 continuous monitor (clinical index test)

The hospital-grade Philips sensor glove SpO2 continuous monitor, our clinical index test, performed better with an ARMS of 3.0 and MAE of 1.9 for all measurements (n=234), and ARMS of 2.1 and MAE of 1.6 when excluding outliers (1.96 SD). For detecting hypoxaemia, the hospital-grade Philips sensor glove SpO2 monitor had a sensitivity of 67% (35%–90%), specificity of 96% (93%–98%), positive and negative predictive values of 50% (31–69) and 98% (96%–99%), respectively, and an accuracy of 95% (91%–97%). The presence of cold hands/acra was associated with higher inaccuracy of the SpO2 continuous monitor compared with SaO2.

Discussion

Our study found that the top selling low-cost, popular direct-to-consumer pulse oximeters do not meet the requirements set by the regulatory bodies (ISO/FDA), when compared with the gold standard of arterial oxygen saturation, as obtained by blood gas samples, in a population of intensive care patients. However, when extreme outliers were disregarded, four pulse oximeters would meet the ARMS≤3% requirements, and eight pulse oximeters would meet MAE≤3% standards. The hospital-grade Philips sensor glove SpO2 monitor performed slightly better. We found that the direct-to-consumer pulse oximeters tested in this study performed well in ruling out hypoxaemia, but are not reliable in confirming the presence of hypoxaemia. Therefore, when such a pulse oximeter indicates a below normal SpO2 (for instance, when used by a patient for home monitoring), confirmation with a medical-grade oximeter is required. Caution is warranted when factors are present that negatively affect the reliability of these pulse oximeters, such as an inaccurate pulse rate reading, darker skin pigmentation or cold extremities.

Strengths and limitations

Our study was performed under clinical conditions involving consecutive patients with direct access to arterial blood gas samples. Patients were diverse in terms of age, skin type and underlying clinical conditions. We used popular pulse oximeter devices and the number of samples was sufficient to draw accurate conclusions. Moreover, we used a hospital-grade SpO2 device as a clinical index test. However, there are also a number of limitations that should be mentioned. First, the performance of fingertip pulse oximeters may have been affected by the poor health status and poorer peripheral perfusion of intensive care patients, when compared with community-based patients. Moreover, the distribution of SaO2 is different in intensive care populations (ie, oxygenation and ventilation settings are set to strive for SaO2 of 92%), when compared with community based populations, which would also affect performance. Furthermore, the use of sequential SpO2 measurements meant that the time interval between the arterial blood gas draw (for SaO2) and the SpO2 measurement may have been up to several minutes. This is relevant, as minute-to-minute variation in oximetry readings may be present among critically ill patients, even in the relatively stable ICU patients that were enrolled in this study. We applied a rotating order of device usage in order to reduce bias from this limitation. Finally, we did not capture detailed ICU data (such as severity scores), or laboratory findings on our patients, such as haemoglobin or acid–base status. These variables are of relevance as SpO2-readings are based on photopletysmographic measurements using infrared wavelengths through (de)oxyhaemoglobin. As such, a decrease in haemoglobin blood concentration may affect capturing a sufficient signal. Furthermore, a change in carbon dioxide concentration and/or pH shift could in turn alter the oxyhaemoglobin dissociation curve (Bohr effect), resulting in inaccurate SpO2 as well as SaO2 measurements.

Prior studies

Many commonly used direct-to-consumer pulse oximeters do not undergo rigorous in vivo testing, and thus, little is known about the accuracy of these devices. An important study which did perform in vivo testing of inexpensive pulse oximeters was published by Lipnick et al in 2016.4 In this study, six finger pulse oximeters (not cleared by the FDA) were evaluated in 22 healthy subjects, in which stable SaO2 plateaus between 70% and 100% were achieved under controlled conditions via a partial rebreathing circuit. The study found that two pulse oximeters tested (Contec CMS50DL and Beijing Choice C20) demonstrated an ARMS of ≤3%, hereby meeting the ISO criteria for accuracy. Of the tested oximeters, the Contec CMS50DL may be comparable with the Contec CMS50D1 model that we tested. In our study, which was performed under clinical conditions, the Contec device did not fare as well, although it was still one of the better pulse oximeters that we tested. Prior studies found that oximeters performed worse in hypoxic conditions, with mean bias increasing at lower oxygen saturations compared with arterial blood gas4 or a conventional bedside pulse oximeter8 9 In our study, we found a similar observation in some, but not all oximeters. Still, from a clinical perspective, it is particularly important to have minimal bias in the range of 90%–95%, as this is where a hypoxaemic state should be differentiated from a non-hypoxaemic state.

Implications for practice

Due to the current COVID-19 pandemic, pulse oximeters have become more popular than ever before. Despite their limitations, these devices present a welcome tool for remote monitoring of patients and for ruling out hypoxaemia, particularly in a population where the prevalence of hypoxaemia is low. In our population, in which approximately 5% was hypoxaemic, a selection of top selling low-cost devices were able to safely rule out hypoxaemia in virtually all cases (98%–99%). This percentage would even further approach 100% in low prevalence settings. Still, our findings, as those of other studies4 8 9 illustrate that in patients with other symptoms suggestive of hypoxaemia, physicians should remain alert, particularly in high-risk patients with preexisting pulmonary disease. In these scenarios, devices that are FDA-cleared should be used instead, as they show a smaller degree of increasing bias during lower SaO2 conditions.10–12 Finally, irrespective of the device used, it is important for clinicians to realise that there are a number of factors that negatively impact the reliability of pulse oximetry. These factors include poor peripheral perfusion, inaccurate pulse rate measurement, motion, anaemia, nail polish, and dark skin pigmentation, as shown in our study as well as by others.13–16

Conclusion

Direct-to-consumer pulse oximeters are widely available and in high demand. Most of these low cost pulse oximeters have not been rigorously tested. In this study, we tested 10 popular pulse oximeters in ICU patients with direct access to arterial blood gas. Overall, we found that the tested pulse oximeters would not meet strict ISO requirements used by the FDA in their 510(k) premarket notification clearance process. However, most devices can safely rule out hypoxaemia in the vast majority of patients, which is particularly relevant in community-based populations with a low a priori hypoxaemia risk. Future studies are warranted to further assess the accuracy of pulse oximeters in community-based patients, and to gain insight how to further improve this non-invasive, low-cost, and potentially life-saving technology.