Study design
We enrolled consecutive patients, who were at least 18 years of age, admitted to the intensive care unit (ICU) of a large community-based hospital (Flevoziekenhuis, Almere, the Netherlands). Eligible patients had a radial artery catheter for arterial blood oxygen saturation assessments (SaO2) as part of routine medical care. Exclusion criteria were patients without a clinical indication for arterial access, those with known inherited forms of abnormal haemoglobin, and those who rapidly deteriorated due to acute haemodynamic compromise, in which the measurements required for this study could hinder medical interventions and thereby negatively affect patient safety.
Study procedures
Intensive Care personnel notified the site investigator (LB) of a potentially eligible patient for study enrolment. The site investigator enrolled each patient after verbal and written consent, either by the patient or his/her legal representative. During office hours, the site investigator was notified by the Intensive Care personnel when a blood gas sample was about to be performed. The site investigator positioned the pulse oximeters for readings directly after the arterial blood gas sample (SaO2) was obtained. In order to reduce detection bias, the site investigator applied the pulse oximeters in a rotating fashion, maintaining a fixed device order but starting with a different device in each consecutive sample. As each SpO2 measurement took 30 s, the measurement time window involved at least 300 s. The Intensive Care personnel (SaO2) and site investigator (SpO2-readings) reported their findings on separate digital forms, and were blinded for each other’s findings.
Index test and devices
We evaluated 10 oximeters, which were selected from the top 10 of most purchased pulse oximeters on Amazon in at least two of the following countries: USA, UK, Germany, Italy or France. Amazon was chosen because of its dominance on the e-commerce market. In alphabetical order, these oximeters were as follows: AFAC FS10D, AGPTEK FS10C, ANAPULSE ANP 100, Cocobear, Contec CMS50D1, HYLOGY MD-H37, Mommed YM101, PRCMISEMED F4PRO, PULOX PO-200 and Zacurate Pro Series 500 DL. These pulse oximeters cost between 20€ and 50€ each, and all claimed to meet International Organisation for Standardisation (ISO) standards (see paragraph ‘outcomes of interest’ for specifics). As a clinical index test, we also included a hospital-grade pulse oximeter (Philips M1191BL sensor glove, Philips, The Netherlands), which was used as the clinical standard of care for continuous SpO2 monitoring at the study site, and has met ISO standards and received 510(k) clearance of the Food and Drug Administration (FDA) (clearance number: K062455). We used the SpO2 value that was shown on the pulse oximeter’s display 30 s after placement on a patient’s fingertip. The same fingertip was used for each oximeter. When no result was displayed 30 s after placement, this was documented as an invalid reading.
Reference standard
The reference standard was a point of care testing analyser (ABL90 Flex Plus, Radiometer Medical ApS, Brønshøj, Denmark, calibrated as per regulatory standards), which was used to perform blood gas analysis on arterial blood gas samples to obtain the arterial oxygen saturation (SaO2) at the study site as part of routine Intensive Care.
Outcomes of interest
We formulated the following outcome measures: mean bias, root mean square difference (ARMS), the mean absolute error (MAE) and diagnostic accuracy for hypoxaemia, defined as SaO2 ≤90%. Mean bias is calculated as SpO2–SaO2. ARMS and MAE are derived from calculations involving mean bias and precision (SD of bias). Because outliers have an excessive negative effect on results of the ARMS parameter, we also assessed the MAE, a measurement that is more robust in the presence of outliers. The formulas to calculate these outcomes can be found in the supplement. We evaluated the diagnostic accuracy of the pulse oximeters according to the standards defined by the International Organisation for Standardisation in ISO 80601-2-61:2017, which supersedes the ISO 80601-2-61:2011 standard advised by the American FDA in their 510(k) Premarket Notification Submissions Guidance for pulse oximeters. This standard considers an ARMS of ≤3% in the SaO2 range of 70%–100% acceptable.6 7
Data collection and sample size
We included data on SpO2, SaO2, heart rate, systolic-blood pressure and diastolic blood pressure, sex, age, skin type (Fitzpatrick classification scale) as assessed by the site investigator, vasopressor use (ie, noradrenalin dose (mg/h)), body temperature (oC) and hand temperature to touch as assessed by the site investigator. We followed the FDA advice for sample size determination, which states that at least 200 measurements per pulse oximeter should be obtained from at least 10 subjects, of which at least two or 15% have a dark skin type (Fitzpatrick IV–VI).7
Statistical analysis
We assessed mean bias (SpO2–SaO2) and SD, and subsequently calculated the ARMS and MAE using the formulas as described in the supplemental data. We created Bland-Altman plots for each pulse oximeter to graphically display its bias (SpO2–SaO2) to the reference standard (SaO2). We added a zero-line and upper and lower limits of agreement (±1.96 SD). To visualise the accuracy standards required by the regulatory bodies, ±3% lines are also displayed in the figures. As multiple observations were performed per individual, we calculated the SD by using the within-subject variance (σ2) and the between-subject variance (σμ2). Since ARMS can be easily affected by the presence of outliers, we added MAE, which is more robust in the presence of outliers, as well as a sensitivity analysis restricting the sample to measurements within 1.96 SD of each pulse oximeter’s mean and calculating ARMS and MAE on this sample. This sensitivity analysis is analogous to discarding extreme readings as done in routine clinical care when a reading does not coincide with the patient’s apparent clinical state. We also assessed the diagnostic performance (sensitivity, specificity, predictive values and accuracy) of pulse oximeters in detecting hypoxaemia, which we defined as SaO2 ≤90%. Finally, we evaluated factors associated with poor performance of each pulse oximeter. We used bias as a continuous measurement for performance, and used logistic regression models in which we included relevant patient and pulse oximeter characteristics. We used SPSS (IBM SPSS, V.26.0, IBM, Armonk, USA), R software (R V.3.6.1, The R Foundation for statistical computing) and MedCalc Statistical Software V.18.5 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2018) to conduct the analyses. We assessed statistical significance at the 0.05 level in all analyses.