CUSUM and monitoring
cause of the poor performance. The CUSUM mon-
itoring scheme is then restarted. Restart should theor-
etically be at 0, but one often restarts at h as the new
X-axis, so that a rising CUSUM graph can be obtained
to represent the learning curve that is typically seen for
a trainee.
in this study could understand what a beta of 0.2
(power=0.8) means in relation to monitoring and how
changes to beta could affect the scheme; while it was
easy to explain to them that an OC-ARL of 12 means
that for an operator performing at an unacceptable
level, on the average the chart would take 12 consecutive
procedures before it signals. If they find 12 unacceptable,
they could suggest a higher or lower number that they
may be more comfortable with before being subjected
to monitoring. Specification in terms of ARL also makes
explicit the trade-offs between sensitivity and false
alarm, and forces participants to be aware of the trade-
offs they are making when their inputs are sought at
the design stage of the CUSUM scheme. Otherwise, as
a result of lack of understanding, there is a tendency
to resort to conventional specifications like power=
0.8 and alpha=0.05, as in hypothesis testing in the
context of clinical trial. It is obviously undesirable to
have one set of specifications for all procedures being
monitored. For the purpose of monitoring, trade-offs
between alpha and beta error should be allowed to
vary depending on the nature of the procedure being
monitored.
h is determined by specifying the in-control (IC) and
out-of control (OC) average run length (ARL) of a
CUSUM chart. The IC-ARL is the average number of
consecutive procedures required for a CUSUM chart
to cross a decision interval despite an individual per-
forming at an acceptable level. This is analogous to a
Type I (alpha) or false positive error in hypothesis
testing. The design with the short IC-ARL (large Type
I error) is prone to false alarm. The OC-ARL is the
average number of procedures performed before the
CUSUM chart signal during the period when an in-
dividual is performing at an unacceptable level. The
OC-ARL is a measure of sensitivity and is analogous
to power [1-Type II (beta) or 1-false negative error] in
hypothesis testing. The design with the short OC-ARL
(high power) will quickly detect poor performance. In
general, we want a CUSUM monitoring scheme to have
long runs before a false alarm (long IC-ARL or small
Type I error) and short runs before the chart signals
actual deterioration in performance (short OC-ARL or
high power). Unfortunately these objectives conflict, so
we have to trade-off between them. This is also ana-
logous to the trade-offs between Type I and Type II
errors in hypothesis testing. Thus, a desirably long IC-
ARL (small Type I error) will lead to an unacceptably
long OC-ARL (low power). On the other hand, the
desired short OC-ARL (high power) will lead to more
frequent false alarms (large Type I error). The amount
of trade-off between IC- and OC-ARL that is acceptable
to the doctor clearly depends on the nature of what is
being monitored. For example, a monitoring scheme
for cardiothoracic surgery that entails life-threatening
complications would require a highly sensitive chart to
detect poor performance but at the expense of more
frequent false alarms. On the other hand, for a procedure
like renal biopsy, we would be prepared to tolerate a
less sensitive scheme so as not to be frequently distracted
by false alarms.
4. max (0, Cn–1 + Xn – k) is the maximum function that
returns the larger of the two arguments, 0 and Cn–1
+
Xn – k. This function applies only to monitoring for
an upward shift in mean (upward CUSUM). That is,
monitoring to detect deviation from an acceptable to
an unacceptable level of performance. This was the
purpose of this study. For a scheme designed to detect
‘better’ than acceptable performance, the function is
min (0, Cn–1 + Xn – k) with a signal if Cn< –h. Such
a scheme (downward CUSUM) is not defined for this
study for several reasons:
Some acceptable standards are so good [for example the
2% failure rate for breast biopsy (see below)] that designing
to detect better performance at say 1% is difficult.
There was genuinely no interest at all in detecting ‘better’
than acceptable performance. Acceptable performance ought
to reflect the performance of trained and experienced op-
erators. Admittedly, a few exceptional individuals may perform
better than their peers. It is, however, undesirable to base a
monitoring scheme on results of ‘star’ performers. On the
other hand, if most experienced operators performed at the
‘better’ level, it is only logical to define that level as the
acceptable level.
The participating doctors in this study specify the
acceptable IC- and OC-ARL for monitoring their per-
formance. Once these are specified, the decision interval
h can be calculated [9]. The larger the specified IC-ARL
(the OC-ARL will be correspondingly large), the larger
is h. We could have specified the CUSUM design in
terms of Type I and Type II error rates since they are
analogous to IC- and OC-ARL, respectively. However,
in our experience in designing the various CUSUM
monitoring schemes in this study, it turns out that
specification in terms of ARL was more intuitive and
easier to explain to doctors. This is important because
their inputs are required when designing a CUSUM
scheme. For example, not a single doctor participating
Upward CUSUM chart design for the procedures
studied
In designing the CUSUM monitoring schemes used in this
study, we have to explicitly specify the following for each
procedure being monitored:
1. Acceptable and unacceptable levels of performance for
the chosen outcome measure. Ideally these should be
based on universally accepted standards published by
authoritative medical professional bodies. Unfortun-
ately, to our knowledge, such performance standards
253