The Apgar score is used to describe the clinical condition of newborns. However, clinicians show low reliability when assigning Apgar scores to video recordings of actual neonatal resuscitations. Simulators provide a controlled environment for recreating and recording resuscitations. Clinicians assigned Apgar scores to such recordings to test the representativeness of simulator and recordings. Study design was guided by Brunswik’s probabilistic functionalism.
Judgment analysis methods were used to design 51 recordings of neonatal resuscitation scenarios, simulated with SimNewB (Laerdal, Stavanger, Norway). A step-by-step explanation of the design, preparation, and testing of the recordings is provided.
Recorded Apgar scores, calculated from the presentation of clinical signs, were compared against the designed scores. Working independently and without feedback, three experts assigned Apgar scores to confirm that the recordings could be interpreted as intended. Seventeen neonatal resuscitation clinicians scored the recordings in a separate experiment.
Correlations between Apgar scores assigned by the 20 viewers (experts plus clinicians) and recorded Apgar scores were high (0.78–0.91) and significant (P < 0.01). Fourteen of the 20 viewers scored the recordings without significant bias. Correlations between viewers’ scores and scores of individualized linear models calculated for each viewer were high (0.79–0.97) and significant (P < 0.01), indicating systematic judgments.
SimNewB provided a realistic presentation of clinical conditions that was preserved in the recordings. Clinicians could interpret clinical conditions systematically and accurately without feedback or detailed instructions. These methods are applicable to future research about accuracy of clinical assessments in actual and simulated environments.