Objectives To evaluate the performance of direct-to-consumer pulse oximeters under clinical conditions, with arterial blood gas measurement (SaO2) as reference standard.
Design Cross-sectional, validation study.
Setting Intensive care.
Participants Adult patients requiring SaO2-monitoring.
Interventions The studied oximeters are top-selling in Europe/USA (AFAC FS10D, AGPTEK FS10C, ANAPULSE ANP 100, Cocobear, Contec CMS50D1, HYLOGY MD-H37, Mommed YM101, PRCMISEMED F4PRO, PULOX PO-200 and Zacurate Pro Series 500 DL). Directly after collection of a SaO2 blood sample, we obtained pulse oximeter readings (SpO2). SpO2-readings were performed in rotating order, blinded for SaO2 and completed <10 min after blood sample collection.
Outcome measures Bias (SpO2–SaO2) mean, root mean square difference (ARMS), mean absolute error (MAE) and accuracy in identifying hypoxaemia (SaO2 ≤90%). As a clinical index test, we included a hospital-grade SpO2-monitor (Philips).
Results In 35 consecutive patients, we obtained 2258 SpO2-readings and 234 SaO2-samples. Mean bias ranged from −0.6 to −4.8. None of the pulse oximeters met ARMS ≤3%, the requirement set by International Organisation for Standardisation (ISO)-standards and required for Food and Drug Administration (FDA) 501(k)-clearance. The MAE ranged from 2.3 to 5.1, and five out of ten pulse oximeters met the requirement of ≤3%. For hypoxaemia, negative predictive values were 98%–99%. Positive predictive values ranged from 11% to 30%. Highest accuracy (95% CI) was found for Contec CMS50D1; 91% (86–94) and Zacurate Pro Series 500 DL; 90% (85–94). The hospital-grade SpO2-monitor had an ARMS of 3.0% and MAE of 1.9, and an accuracy of 95% (91%–97%).
Conclusion Top-selling, direct-to-consumer pulse oximeters can accurately rule out hypoxaemia, but do not meet ISO-standards required for FDA-clearance
- respiratory measurement
Data availability statement
Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as supplementary information.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
What is the key question?
To evaluate the performance of direct-to-consumer pulse oximeters in a clinical setting
What is the bottom line?
The tested pulse oximeters do not meet the required standards, but can be used safely to rule out (but not rule in) hypoxaemia.
Why read on?
Pulse oximeters are widely used for making treatment or referral decisions, thus understanding their limitations is essential.
Pulse oximetry has become an indispensable, low-cost, non-invasive, diagnostic tool to assess a patients’ oxygen saturation. Typically, this tool has a clip that can be put on a patient’s finger to obtain information on the peripheral arterial oxygen saturation (SpO2), which serves as a proxy for tissue oxygenation.1 While most direct-to-consumer oximeters are not intended for use in clinical settings, pulse oximeters have evolved to play a pivotal role in routine medical care, and are an essential bedside tool in making treatment and/or referral decisions in community-based healthcare settings. In the current COVID-19 pandemic, the use of pulse oximeters has become even more indispensable, as hypoxaemia is a common diagnostic finding, with reports indicating that patients with COVID-19 may have few symptoms relative to the degree of hypoxaemia (termed ‘silent hypoxaemia’).2 Given the risk of hypoxaemia, the WHO recommends home oximetry monitoring for patients with COVID-19 and risk factors for progression to severe disease.3 Given its importance in guiding medical decision-making, it is remarkable how little is known about the diagnostic accuracy of these devices when used under clinical conditions in actual patients.4 As such, we aimed to evaluate whether popular direct-to-consumer fingertip pulse oximeters meet the standards for accuracy, as proposed by regulatory bodies under real-world conditions.
We reported this diagnostic accuracy study in accordance with the Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015 statement.5
We enrolled consecutive patients, who were at least 18 years of age, admitted to the intensive care unit (ICU) of a large community-based hospital (Flevoziekenhuis, Almere, the Netherlands). Eligible patients had a radial artery catheter for arterial blood oxygen saturation assessments (SaO2) as part of routine medical care. Exclusion criteria were patients without a clinical indication for arterial access, those with known inherited forms of abnormal haemoglobin, and those who rapidly deteriorated due to acute haemodynamic compromise, in which the measurements required for this study could hinder medical interventions and thereby negatively affect patient safety.
Intensive Care personnel notified the site investigator (LB) of a potentially eligible patient for study enrolment. The site investigator enrolled each patient after verbal and written consent, either by the patient or his/her legal representative. During office hours, the site investigator was notified by the Intensive Care personnel when a blood gas sample was about to be performed. The site investigator positioned the pulse oximeters for readings directly after the arterial blood gas sample (SaO2) was obtained. In order to reduce detection bias, the site investigator applied the pulse oximeters in a rotating fashion, maintaining a fixed device order but starting with a different device in each consecutive sample. As each SpO2 measurement took 30 s, the measurement time window involved at least 300 s. The Intensive Care personnel (SaO2) and site investigator (SpO2-readings) reported their findings on separate digital forms, and were blinded for each other’s findings.
Index test and devices
We evaluated 10 oximeters, which were selected from the top 10 of most purchased pulse oximeters on Amazon in at least two of the following countries: USA, UK, Germany, Italy or France. Amazon was chosen because of its dominance on the e-commerce market. In alphabetical order, these oximeters were as follows: AFAC FS10D, AGPTEK FS10C, ANAPULSE ANP 100, Cocobear, Contec CMS50D1, HYLOGY MD-H37, Mommed YM101, PRCMISEMED F4PRO, PULOX PO-200 and Zacurate Pro Series 500 DL. These pulse oximeters cost between 20€ and 50€ each, and all claimed to meet International Organisation for Standardisation (ISO) standards (see paragraph ‘outcomes of interest’ for specifics). As a clinical index test, we also included a hospital-grade pulse oximeter (Philips M1191BL sensor glove, Philips, The Netherlands), which was used as the clinical standard of care for continuous SpO2 monitoring at the study site, and has met ISO standards and received 510(k) clearance of the Food and Drug Administration (FDA) (clearance number: K062455). We used the SpO2 value that was shown on the pulse oximeter’s display 30 s after placement on a patient’s fingertip. The same fingertip was used for each oximeter. When no result was displayed 30 s after placement, this was documented as an invalid reading.
The reference standard was a point of care testing analyser (ABL90 Flex Plus, Radiometer Medical ApS, Brønshøj, Denmark, calibrated as per regulatory standards), which was used to perform blood gas analysis on arterial blood gas samples to obtain the arterial oxygen saturation (SaO2) at the study site as part of routine Intensive Care.
Outcomes of interest
We formulated the following outcome measures: mean bias, root mean square difference (ARMS), the mean absolute error (MAE) and diagnostic accuracy for hypoxaemia, defined as SaO2 ≤90%. Mean bias is calculated as SpO2–SaO2. ARMS and MAE are derived from calculations involving mean bias and precision (SD of bias). Because outliers have an excessive negative effect on results of the ARMS parameter, we also assessed the MAE, a measurement that is more robust in the presence of outliers. The formulas to calculate these outcomes can be found in the supplement. We evaluated the diagnostic accuracy of the pulse oximeters according to the standards defined by the International Organisation for Standardisation in ISO 80601-2-61:2017, which supersedes the ISO 80601-2-61:2011 standard advised by the American FDA in their 510(k) Premarket Notification Submissions Guidance for pulse oximeters. This standard considers an ARMS of ≤3% in the SaO2 range of 70%–100% acceptable.6 7
Data collection and sample size
We included data on SpO2, SaO2, heart rate, systolic-blood pressure and diastolic blood pressure, sex, age, skin type (Fitzpatrick classification scale) as assessed by the site investigator, vasopressor use (ie, noradrenalin dose (mg/h)), body temperature (oC) and hand temperature to touch as assessed by the site investigator. We followed the FDA advice for sample size determination, which states that at least 200 measurements per pulse oximeter should be obtained from at least 10 subjects, of which at least two or 15% have a dark skin type (Fitzpatrick IV–VI).7
We assessed mean bias (SpO2–SaO2) and SD, and subsequently calculated the ARMS and MAE using the formulas as described in the supplemental data. We created Bland-Altman plots for each pulse oximeter to graphically display its bias (SpO2–SaO2) to the reference standard (SaO2). We added a zero-line and upper and lower limits of agreement (±1.96 SD). To visualise the accuracy standards required by the regulatory bodies, ±3% lines are also displayed in the figures. As multiple observations were performed per individual, we calculated the SD by using the within-subject variance (σ2) and the between-subject variance (σμ2). Since ARMS can be easily affected by the presence of outliers, we added MAE, which is more robust in the presence of outliers, as well as a sensitivity analysis restricting the sample to measurements within 1.96 SD of each pulse oximeter’s mean and calculating ARMS and MAE on this sample. This sensitivity analysis is analogous to discarding extreme readings as done in routine clinical care when a reading does not coincide with the patient’s apparent clinical state. We also assessed the diagnostic performance (sensitivity, specificity, predictive values and accuracy) of pulse oximeters in detecting hypoxaemia, which we defined as SaO2 ≤90%. Finally, we evaluated factors associated with poor performance of each pulse oximeter. We used bias as a continuous measurement for performance, and used logistic regression models in which we included relevant patient and pulse oximeter characteristics. We used SPSS (IBM SPSS, V.26.0, IBM, Armonk, USA), R software (R V.3.6.1, The R Foundation for statistical computing) and MedCalc Statistical Software V.18.5 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2018) to conduct the analyses. We assessed statistical significance at the 0.05 level in all analyses.
In July and August 2020, we enrolled 35 consecutive patients, with a median age of 69 years, and 40% female. Table 1 displays the baseline characteristics. Patients were primarily admitted for respiratory failure due to COVID-19 or other pulmonary diseases (such as chronic obstructive pulmonary disease). Other diagnoses were septic shock, pulmonary embolism, and arterial thrombosis in extremities, severe hyperglycaemic and suicide attempt. Almost half of patients required vasopressors, and all patients required supplemental oxygen.
Blood gas samples and pulse oximeter readings
We obtained a total of 234 arterial blood gas samples; SaO2 ranged between 85.6% and 99.8%, with the distribution illustrated in figure 1. Of those 234 SaO2 samples, 12 samples (5.1%) were classified as hypoxaemia. Moreover, each SaO2 sample was followed by 10 SpO2 measurements (one from each pulse oximeter), which resulted in a total of 2340 SpO2 measurements. Eighty-two (3.5%) SpO2 measurements were invalid, as the pulse oximeter did not display a result after 30 s (also see table 1), leaving 2258 interpretable SpO2 measurements, which were all obtained within 10 min (range 0–540 s) after acquiring the reference standard’s blood sample.
Test results of pulse oximeters
Table 2 shows the test characteristics of the pulse oximeters that are graphically displayed in figure 2 and online supplemental figure S1. We found that most pulse oximeters overall had a negative mean bias, thus SpO2 was on average lower than SaO2. Moreover, none of the tested pulse oximeters met the requirement of diagnostic accuracy, using the ARMS≤3% threshold. When using MAE, which is less affected by outliers, five oximeters performed within a ≤3% difference margin of the reference standard. When excluding extreme outliers (>1.96 SD), performance of Contec CMS50D1, Mommed YM101, Pulox-PO-200 and Zacurate Pro Series 500 DL were within the ARMS≤3% limit, and all but two oximeters (ANAPULSE ANP 100 and Cocobear) met the MAE≤3% thresholds.
Detection of hypoxaemia
Table 3 summarises the accuracy of each pulse oximeter in detecting hypoxaemia. Overall, the oximeters with the highest specificity were Contec CMS50D1 (93%) and Zacurate Pro Series 500 DL (91%). Oximeters with the highest sensitivity were Hylogy MD-H37 (92%) and Anapulse ANP 100 (91%). In terms of predictive values, all pulse oximeters performed well in ruling out hypoxaemia, with negative predictive values of 98%–99% in a population with ≈5% hypoxaemia measurements. Confirming the presence of hypoxaemia was poor, with positive predictive values of 11%—30%. For all reading, accuracy was highest for Contec CMS50D1 (91%) and Zacurate Pro Series 500 DL (90%). As a sensitivity analysis, we also provided data on the performance of pulse oximeters at different abnormal oxygenation thresholds (SaO2 ≤92% and SaO2 ≤94), which can be found as online supplemental table S1).
Factors associated with poor pulse oximeter performance
Online supplemental table S2 displays factors that were associated with bias (SpO2–SaO2) of each pulse oximeter. Overall, darker skin complexion and an inaccurate pulse rate measurement (difference between pulse oximeter pulse rate and ‘true’ pulse rate as captured by ICU monitor), were associated with poorer SpO2 performance in five out of 10 pulse oximeters. Other factors that negatively affected results were systolic blood pressure and poor peripheral perfusion (cold hands to touch), in four and three pulse oximeters, respectively.
Test results and diagnostic accuracy of the SpO2 continuous monitor (clinical index test)
The hospital-grade Philips sensor glove SpO2 continuous monitor, our clinical index test, performed better with an ARMS of 3.0 and MAE of 1.9 for all measurements (n=234), and ARMS of 2.1 and MAE of 1.6 when excluding outliers (1.96 SD). For detecting hypoxaemia, the hospital-grade Philips sensor glove SpO2 monitor had a sensitivity of 67% (35%–90%), specificity of 96% (93%–98%), positive and negative predictive values of 50% (31–69) and 98% (96%–99%), respectively, and an accuracy of 95% (91%–97%). The presence of cold hands/acra was associated with higher inaccuracy of the SpO2 continuous monitor compared with SaO2.
Our study found that the top selling low-cost, popular direct-to-consumer pulse oximeters do not meet the requirements set by the regulatory bodies (ISO/FDA), when compared with the gold standard of arterial oxygen saturation, as obtained by blood gas samples, in a population of intensive care patients. However, when extreme outliers were disregarded, four pulse oximeters would meet the ARMS≤3% requirements, and eight pulse oximeters would meet MAE≤3% standards. The hospital-grade Philips sensor glove SpO2 monitor performed slightly better. We found that the direct-to-consumer pulse oximeters tested in this study performed well in ruling out hypoxaemia, but are not reliable in confirming the presence of hypoxaemia. Therefore, when such a pulse oximeter indicates a below normal SpO2 (for instance, when used by a patient for home monitoring), confirmation with a medical-grade oximeter is required. Caution is warranted when factors are present that negatively affect the reliability of these pulse oximeters, such as an inaccurate pulse rate reading, darker skin pigmentation or cold extremities.
Strengths and limitations
Our study was performed under clinical conditions involving consecutive patients with direct access to arterial blood gas samples. Patients were diverse in terms of age, skin type and underlying clinical conditions. We used popular pulse oximeter devices and the number of samples was sufficient to draw accurate conclusions. Moreover, we used a hospital-grade SpO2 device as a clinical index test. However, there are also a number of limitations that should be mentioned. First, the performance of fingertip pulse oximeters may have been affected by the poor health status and poorer peripheral perfusion of intensive care patients, when compared with community-based patients. Moreover, the distribution of SaO2 is different in intensive care populations (ie, oxygenation and ventilation settings are set to strive for SaO2 of 92%), when compared with community based populations, which would also affect performance. Furthermore, the use of sequential SpO2 measurements meant that the time interval between the arterial blood gas draw (for SaO2) and the SpO2 measurement may have been up to several minutes. This is relevant, as minute-to-minute variation in oximetry readings may be present among critically ill patients, even in the relatively stable ICU patients that were enrolled in this study. We applied a rotating order of device usage in order to reduce bias from this limitation. Finally, we did not capture detailed ICU data (such as severity scores), or laboratory findings on our patients, such as haemoglobin or acid–base status. These variables are of relevance as SpO2-readings are based on photopletysmographic measurements using infrared wavelengths through (de)oxyhaemoglobin. As such, a decrease in haemoglobin blood concentration may affect capturing a sufficient signal. Furthermore, a change in carbon dioxide concentration and/or pH shift could in turn alter the oxyhaemoglobin dissociation curve (Bohr effect), resulting in inaccurate SpO2 as well as SaO2 measurements.
Many commonly used direct-to-consumer pulse oximeters do not undergo rigorous in vivo testing, and thus, little is known about the accuracy of these devices. An important study which did perform in vivo testing of inexpensive pulse oximeters was published by Lipnick et al in 2016.4 In this study, six finger pulse oximeters (not cleared by the FDA) were evaluated in 22 healthy subjects, in which stable SaO2 plateaus between 70% and 100% were achieved under controlled conditions via a partial rebreathing circuit. The study found that two pulse oximeters tested (Contec CMS50DL and Beijing Choice C20) demonstrated an ARMS of ≤3%, hereby meeting the ISO criteria for accuracy. Of the tested oximeters, the Contec CMS50DL may be comparable with the Contec CMS50D1 model that we tested. In our study, which was performed under clinical conditions, the Contec device did not fare as well, although it was still one of the better pulse oximeters that we tested. Prior studies found that oximeters performed worse in hypoxic conditions, with mean bias increasing at lower oxygen saturations compared with arterial blood gas4 or a conventional bedside pulse oximeter8 9 In our study, we found a similar observation in some, but not all oximeters. Still, from a clinical perspective, it is particularly important to have minimal bias in the range of 90%–95%, as this is where a hypoxaemic state should be differentiated from a non-hypoxaemic state.
Implications for practice
Due to the current COVID-19 pandemic, pulse oximeters have become more popular than ever before. Despite their limitations, these devices present a welcome tool for remote monitoring of patients and for ruling out hypoxaemia, particularly in a population where the prevalence of hypoxaemia is low. In our population, in which approximately 5% was hypoxaemic, a selection of top selling low-cost devices were able to safely rule out hypoxaemia in virtually all cases (98%–99%). This percentage would even further approach 100% in low prevalence settings. Still, our findings, as those of other studies4 8 9 illustrate that in patients with other symptoms suggestive of hypoxaemia, physicians should remain alert, particularly in high-risk patients with preexisting pulmonary disease. In these scenarios, devices that are FDA-cleared should be used instead, as they show a smaller degree of increasing bias during lower SaO2 conditions.10–12 Finally, irrespective of the device used, it is important for clinicians to realise that there are a number of factors that negatively impact the reliability of pulse oximetry. These factors include poor peripheral perfusion, inaccurate pulse rate measurement, motion, anaemia, nail polish, and dark skin pigmentation, as shown in our study as well as by others.13–16
Direct-to-consumer pulse oximeters are widely available and in high demand. Most of these low cost pulse oximeters have not been rigorously tested. In this study, we tested 10 popular pulse oximeters in ICU patients with direct access to arterial blood gas. Overall, we found that the tested pulse oximeters would not meet strict ISO requirements used by the FDA in their 510(k) premarket notification clearance process. However, most devices can safely rule out hypoxaemia in the vast majority of patients, which is particularly relevant in community-based populations with a low a priori hypoxaemia risk. Future studies are warranted to further assess the accuracy of pulse oximeters in community-based patients, and to gain insight how to further improve this non-invasive, low-cost, and potentially life-saving technology.
Data availability statement
Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as supplementary information.
Patient consent for publication
The study was approved by the local ethics committee and the board of directors of the Flevoziekenhuis. We obtained informed written consent from each participant. When a patient was unable to provide informed consent himself/herself, for example, when sedated or unconscious, a legal representative provided informed consent on the patient’s behalf.
We would like to thank all participating patients, their families, as well as the personnel of the Intensive Care of FlevoZiekenhuis.
REH and LB are joint first authors.
Contributors RH and LB designed the study protocol. LB was involved in the recruitment of patients for participation in data collection, supervised by MES. LB performed the statistical analysis and interpreted all data with statistical guidance from JCLH, LDC, EPMK, WL and RH. RH and LB drafted the manuscript and all authors contributed to its revision. All authors read and approved the final manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.