impact that GER has on patients’ symptoms given a set of
physical findings.
TABLE V.
Mean Severity Scores for GER Findings According to Intent
to Treat.
We were somewhat surprised that the otolaryngolo-
gists were unable to demonstrate good intrarater reliabil-
ity (Table IV). It is apparent from the statistical analysis
that individual otolaryngologists may have difficulty in
consistently identifying and rating physical findings in
their endoscopic assessments of the larynx. However,
posthoc analysis of intrarater consistency found that the
otolaryngologist-raters were able to separate patients into
what they thought were treatment-appropriate and
treatment-inappropriate groups (Table V), which were
consistent with their mean severity scores for LPRD find-
ings. Furthermore, Table V does provide affirmative data
to suggest that there can be consistency in the evaluation
of LPRD. The intent-to-treat data indicate that, overall,
the otolaryngologists each individually used a consistent
method or methods to determine whether or not to recom-
mend treatment for presumed LPRD. Therefore, it is pos-
sible that despite inter-rater and intrarater variability for
the individual laryngeal findings associated with LPRD,
some overall consistency may be expected in the diagnosis
based on physical findings. This important fact implies
that other factors in the examination may be important in
suggesting LPRD. These factors may include other fea-
tures such as subcordal edema, hypervascularity, or other
features not yet determined.
The average-measure intraclass correlation coeffi-
cients were reasonably high. This indicates that when
multiple raters are used to evaluate these physical exam-
ination variables, reasonable reliability can be expected
from the average of their ratings. This indicates that in
future studies attempting to correlate physical findings
with pH monitoring results or other tests for LPRD, mul-
tiple raters should be used for each patient examination
because the reliability of the average measure among
those raters is acceptable. However, the lowest values for
the average-measure intraclass correlation coefficients
were noted for both edema and erythema of the aryte-
noids. The posterior larynx, which has traditionally been
thought to be the primary site affected by LPRD, is the
most difficult area to rate and assess consistently. It is
likely that this stems from the fact that the musculomem-
branous folds are sharply demarcated anatomic entities
and are generally white and lustrous. Therefore, edema
and erythema, when present, are relatively easily dis-
cerned. The same does not hold true for the arytenoid
region, which has an exceedingly variable natural archi-
tecture from patient to patient.
Recommend
Treatment Group
No GER
Treatment Group
P value
Rater 1
Rater 2
Rater 3
Rater 4
Rater 5
1.83
1.71
2.19
2.07
2.00
0.40
0.16
1.12
0.94
0.22
Ͻ.001
Ͻ.001
Ͻ.001
Ͻ.001
Ͻ.001
GER ϭ gastroesophageal reflux.
laryngitis, reflux laryngitis, and LPRD.2 Various laryn-
geal findings have been associated with the diagnosis of
LPRD. These include erythema or edema of the posterior
one-third of the glottis, hyperemia of the posterior larynx,
cobblestoning, and “heaping up” or thickening of the in-
terarytenoid mucosa (pachydermia laryngis).12 The wide
array of nonspecific laryngeal findings and vague diagnos-
tic terminology all suggest that the true pathophysiology
of LPRD is poorly understood. If otolaryngologists are to
be successful in making a diagnosis of LPRD based on
clinical assessment without pH probe data, two criteria
must be satisfied. First, sensitivities and specificities
should be determined for various laryngoscopic findings in
the diagnosis of LPRD. Second, otolaryngologists must be
able to demonstrate reliability in identifying these physi-
cal findings to make accurate diagnoses. If the laryngeal
findings cannot be reliably determined among different
otolaryngologists, even with knowledge of the sensitivities
and specificities for the various physical findings, accurate
clinical diagnosis of LPRD will be difficult. Therefore, we
sought to determine the reliability among otolaryngolo-
gists for the identification and quantification of various
laryngeal physical findings potentially associated with
LPRD.
Our data indicate that otolaryngologists vary signif-
icantly in their ratings of the various laryngoscopic phys-
ical findings that could be associated with LPRD. We
found relatively poor inter-rater reliability for all of the
visually assessed variables. This indicates that, even if the
most sensitive or specific physical examination findings
among these variables were known for the diagnosis of
LPRD, different otolaryngologists might be unable to ac-
curately diagnose LPRD based solely on such findings. We
were not entirely surprised that such variability among
the otolaryngologist-raters was encountered because all of
these clinical variables are subjective in interpretation.
Otolaryngologists are not alone in their difficulties with
rating mucosal disease potentially attributable to GER.
Studies in the gastroenterology literature have docu-
mented poor correlation coefficients ranging from 0.15 to
0.40 for the assessment and grading of reflux esophagitis
among different endoscopists.13 Furthermore, the poor re-
liability of the ratings for the “severity of GER” and “like-
lihood GER component” variables suggests additional in-
consistency. Because these correlation coefficients were
lower than those of the physical finding variables, otolar-
yngologists are also likely to disagree on the degree of
Notably, this study used an idealized form of laryn-
geal assessment for comparisons. In clinical practice, oto-
laryngologists may use indirect laryngoscopy or flexible
fiberoptic laryngoscopy to visualize the larynx. Also, pa-
tients may be examined at different times of the day or
after different dietary challenges the night before their
evaluation. Thus, additional variability may be introduced
into the examination process, which could lead to further
unreliability in assessment of the laryngeal findings in
LPRD. It is possible that otolaryngologists who “subspe-
cialize” in the management of voice disorders would ex-
hibit better inter-rater and intrarater reliabilities for
Laryngoscope 112: June 2002
Branski et al.: Laryngopharyngeal Reflux Disease
1023