|
|
||||||||
1
Department of Radiology, University of Iowa Hospitals and Clinics, 200 Hawkins
Dr., Iowa City, IA 52240.
2
College of Medicine, University of Iowa Hospitals and Clinics, Iowa City, IA
52240.
3
Department of Orthopaedic Surgery, University of Iowa Hospitals and Clinics,
Iowa City, IA 52240.
4
Present address: Department of Radiology, St. Mary's Hospital, 707 S. Mills
St., Madison, WI 53715.
5
Present address: Radiology, P.C., 1221 Pleasant St., Ste. 150, Des Moines, IA
50309.
6
Present address: Associated Radiologists, 1125 E. Southern Ave., Ste. 300,
Mesa, AZ 85204.
Received May 30, 2000;
accepted after revision August 14, 2000.
Address correspondence to E. A. Brandser.
Abstract
|
|
|---|
MATERIALS AND METHODS. Radiographs of the left hand of 107 patients were interpreted by four radiologists on two separate occasions, once with and once without knowledge of the patient's chronologic age at time of interpretation. Twenty-five radiographs were randomly selected and reevaluated twice by each radiologist. Interobserver and intraobserver variability were calculated and compared for the two conditions. The distribution of studies with normal and abnormal findings was then compared across knowledge conditions for all observers and by individual observer, using two standard deviations above and below chronologic age as the range of "normal".
RESULTS. When the chronologic age was known, the interobserver reliability coefficient for knowledge of chronologic age was 0.954 and when not known, 0.952. The intraobserver reliability coefficients when chronologic age was known ranged from 0.944 to 0.967, and when not known from 0.938 to 0.980. Observers interpreted 58% (248/428) of the radiographs as having normal findings when chronologic age was known and 48% (205/428) when chronologic age was not known.
CONCLUSION. Knowing chronologic age before assessing bone age radiographs does not affect the reproducibility of assessment. However, observers are more likely to interpret the radiograph as showing normal findings when chronologic age is known than if the interpretation is performed with the observer unaware of chronologic age.
|
|
|---|
Some researchers have criticized the Greulich and Pyle method as having high variability [5, 11, 12]. Little research has been done with the intent of minimizing inter- and intraobserver variability of this method or to address the lack of a standardized technique for radiograph review. Greulich and Pyle themselves did not indicate a specific method for assessing the standardized plates and leave the interpreter to develop his or her own style [1, 2, 13]. Some observers begin the assessment by comparing the patient's radiograph with the standard plate corresponding to the patient's chronologic age, whereas others match plates without knowledge of chronologic age. Although the method has not been formally studied, other researchers have expressed concern that the former method may introduce bias toward normal findings [5, 14, 15]. It has not been determined whether this methodologic alteration in radiograph review alters reproducibility or affects the probability of study findings being assessed as normal or abnormal.
Our goal was to address these inconsistencies by assessing the impact of one methodologic variation in the application of Greulich and Pylenamely, the knowledge of the patient's chronologic age when comparing the hand and wrist radiograph with the atlas. We examined the effect of knowledge on interobserver (variation among different observers) and intraobserver (variation within individual observers) variability and final interpretation.
|
|
|---|
Each radiograph and its report was copied, and all were separated into two groups according to sex. The radiographs in each group were then randomly ordered. Chronologic age in years, months, and days was calculated by subtracting the patient's birth date from the service date on the radiography report. If the number of days was greater than 15, the age was rounded up to the next month. The standard deviation from the Greulich and Pyle atlas [1] corresponding to each chronologic age was converted into years.
Interpretation of radiographs involved assigning a bone age to each radiograph on a worksheet that was provided. All 107 radiographs were interpreted in one session on two separate occasions by four radiologists, once with the patient's chronologic age known at the time of interpretation and once without this knowledge. The four radiologists received no detailed training for using the Greulich and Pyle method of assessment. Observers consisted of one fourth-year resident, two fellows, and one attending radiologist; observers therefore had varied experience using the Greulich and Pyle method. Patient-identifying information, except sex, was masked during interpretation. Observers worked independently. The radiograph order was randomized between interpretations, and all interpretations were separated by a 5-week period to minimize recall bias. Each interpretation session included cases in which chronologic age was known and cases in which chronologic age was unknown; for each patient, the order of known to unknown was randomized. To evaluate intraobserver reliability, 25 radiographs (13 males and 12 females) were randomly selected from the sample of 107 and were reevaluated by the radiologists twice, again with and without knowledge of chronologic age. Again, order of presentation and knowledge condition was randomized across the sample on each occasion. All observers used the first edition of the work of Greulich and Pyle [16]. Interpolation of bone ages between standard Greulich and Pyle atlas references was not allowed.
The bone age assigned during each interpretation (n = 428) was then evaluated as normal or abnormal. "Normal" bone age is defined as being within two standard deviations of chronologic age [1]. The distribution of normal or abnormal was calculated for the two knowledge conditions for the observers as a group and for each individual observer.
We used statistical methods similar to those used in previous work done by an involved author [17]. Reliability was estimated using generalizability coefficients generated by the GENOVA software package (version 2.2; American College Testing Program, Iowa City, IA). The mean square estimates used in the reliability calculation were computed using the ProcGLM (general linear modeling) program in the SAS system (version 6.12 for Windows [Microsoft, Redmond, WA]; SAS Institute, Cary, NC). Inter- and intraobserver reliability coefficients were calculated to simulate a clinical situation in which one clinician evaluates a radiograph once and then assigns a bone age. Inter- and intraobserver generalizability coefficients were calculated separately for each knowledge condition. The distribution of cases in terms of normality was calculated for knowledge condition. The Greulich and Pyle method uses reference images, which we termed "plates," and some of the reference images span more time than others. Therefore, we also quantified disagreement among and within individual observers in terms of the number of standardized plates and in terms of the bone age assigned.
|
|
|---|
|
Average bone age differences among and within individual observers are given in Table 2. The average difference in bone age assessments among observers when chronologic age was known was 0.69 ± 0.48 years (range, 0.00-1.95 years). When the age was not known, the mean was 0.62±0.42 years (range, 0.00-1.83 years). The average interobserver difference in bone age assessments between the two knowledge conditions was 0.59±0.66 years (range, 0.00-3.17 years). The average difference in assessments within individual observers with knowledge of chronologic age ranged from 0.29 to 0.49 years. When the age was not known, the average difference ranged from 0.18 to 0.85 years.
|
We then compared how much the interpretations differed in regard to the absolute number of standardized plates under each knowledge condition. The interobserver average difference when chronologic age was known was 0.68 ± 0.44 plates (range, 0.00-1.83 plates). The average difference when the age was not known was 0.63 ± 0.40 plates (range, 0.00-1.83 plates). The average interobserver difference in number of standard plates between the knowledge conditions was 0.59 ± 0.64 plates (range, 0-3 plates). The intraobserver differences showed that no observer varied by more than two plates. The average difference in standard plates within individual observers is given in Table 3. The average difference within individual observers with knowledge of chronologic age ranged from 0.28 to 0.50 plates. The average difference without knowledge of age ranged from 0.17 to 0.88 plates.
|
Using repeated measures analysis of variance, we found no differences in bone age attributable to observer, knowledge condition, or interaction of these two variables (all p values > 0.05). Additionally, repeated measurements analysis of variance revealed no difference in bone age for the first interpretation versus the second interpretation, the knowledge condition, or interaction between these two variables (all p values > 0.05).
Table 4 describes the consistency of clinical decision making in terms of interpreting a finding as normal or abnormal among and within individual observers. The distribution of normal and abnormal was calculated under both knowledge conditions using assessments from all 428 interpretations. When chronologic age was known, 58% of the assigned bone ages occurred in the normal age range; when the chronologic age was not known, 48% of the assignments occurred in the normal range. This difference was shown to be statistically significant using the McNemar test (p < 0.0001). Using radiographs that each observer had evaluated twice when chronologic age was known, we found the percentage of assigned bone ages evaluated as normal ranged from 59% to 65%, and when chronologic age was not known, from 44% to 51%. The percentage of cases each observer evaluated as normal differed from 11% to 20% between the two knowledge conditions.
|
|
|
|---|
Skeletal maturity can be assessed in a number of ways, but the two most common methods used in clinical practice are the atlas by Greulich and Pyle and the Tanner-Whitehouse II system [2, 3, 14, 18, 20, 21]. Although some researchers claim Tanner-Whitehouse II is more reproducible, it is also known to be more time-consuming and difficult to apply [2,3,4,5,6, 15, 21, 22]. Simplicity, convenience, and speed have made the Greulich and Pyle method the most commonly used standard of reference for assessing skeletal maturation [2,3,4,5,6,7,8,9,10].
The Greulich and Pyle atlas is based on T. Wingate Todd's investigation of left hand and wrist radiographs [1]. The method involves directly comparing the radiograph to be assessed with a series of standard plates of the same sex by analyzing characteristics such as the appearance of ossification centers, contours of bones, and thinning of growth plates. The standards are stratified by sex and represent the median skeletal maturity for the chronologic age. The bone-specific approach (Tanner-Whitehouse II) assigns a separate rating for each bone of the hand and wrist, with the mean or median rating used as the skeletal age. This approach is more accurate, but rarely done [19]. More commonly, the bone age is determined by the closest overall match using a generalized approach and is considered normal if the bone age is within two standard deviations (as provided by the Greulich and Pyle atlas) of chronologic age. Because skeletal development provides the only means of assessing rates of maturational change throughout the growth period [9], it is imperative to determine the degree of skeletal maturity as accurately as possible.
Some researchers criticize the Greulich and Pyle method as having high variability [5, 11, 12], yet examination of actual studies of reproducibility [2, 3, 8, 13, 14, 20, 23,24,25,26,27,28,29] shows considerable controversy, as shown in Table 5. To our knowledge, only five studies [14, 19, 22, 26, 29] specified that chronologic age was not known when skeletal age was assessed. Our results show that knowing or not knowing the chronologic age before assessing bone age radiographs does not differentially affect inter- and intraobserver reliability. Observers are extremely reliable under both knowledge conditions, and no difference was seen in the assessments between the two conditions within individual observers. This conclusion also holds when evaluating the absolute difference in the number of standard plates among and within individual observers' assessments. However, clinical judgments of normality do differ depending on whether one knows the chronologic age before assessment. An observer is more likely to interpret a radiograph as normal if the chronologic age is known than when it is not known. In fact, all four observers assessed more patients as normal when chronologic age was known. The pattern was consistent both among and within individual observers. Therefore, the difference was not attributable to the observer or to the specific 25 radiographs reevaluated. This finding confirms the concern of other researchers that knowledge of chronologic age before bone age assessment introduces bias toward normal [5, 14, 16]. These findings stress the importance of radiologists being consistent when assessing bone age. If one wants to increase sensitivity, then observers should not know chronologic age when evaluating bone age. However, if one wants to maximize specificity, knowledge of chronologic age is recommended. Ultimately, the decision of whether to access chronologic age before assessment should depend on the consequences of the diagnosis of normal or abnormal.
|
The internal validity of these conclusions may have been affected by the following factors. This was a blinded study, with the order of radiograph presentation and knowledge conditions randomized. Assessments occurred for a long enough time to rule out recall bias as an explanation for the high level of reliability. Also, the observers knew that their assessments would be evaluated and compared with those of their colleagues, which could have produced a Hawthorne effect [30] and increased observers' motivation and commitment. These conditions served to increase internal validity, but perhaps at the expense of external validity.
On the other hand, certain features of this study do improve the extent to which these conclusions are generalizable to typical clinical practice. We chose not to give a formal training session on the Greulich and Pyle method in order to more closely match the diversified approach used in clinical practice. Likewise, because most assessments of bone ages are not performed by physicians specializing in bone age assessment [6], we chose observers of varying experience in general radiology. We used a randomly selected set of cases regardless of pathology to mimic a typical practice. Interpolating the assessed bone age between standard plates may have improved accuracy [31], but we felt justified using only standard plates because this is the usual clinical practice [22].
In summary, knowledge of chronologic age does not affect the reliability of bone age assessments. However, observers are more likely to interpret the radiograph as normal when chronologic age is known than when it is not known. Therefore, it is important that each radiologist, group, or institution adopt a policy indicating whether each will consistently interpret bone age studies with or without knowledge of the patient's chronologic age.
|
|
|---|
This article has been cited by other articles:
![]() |
R. B. Gunderman Biases in Radiologic Reasoning Am. J. Roentgenol., March 1, 2009; 192(3): 561 - 564. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Loy and L. Irwig Accuracy of Diagnostic Tests Read With and Without Clinical Information: A Systematic Review JAMA, October 6, 2004; 292(13): 1602 - 1609. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Dhingsa, A. Qayyum, F. V. Coakley, Y. Lu, K. D. Jones, M. G. Swanson, P. R. Carroll, H. Hricak, and J. Kurhanewicz Prostate Cancer Localization with Endorectal MR Imaging and MR Spectroscopic Imaging: Effect of Clinical Data on Reader Accuracy Radiology, January 1, 2004; 230(1): 215 - 220. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. T. Griscom A Suggestion: Look at the Images First, Before You Read the History Radiology, April 1, 2002; 223(1): 9 - 10. [Full Text] [PDF] |
||||
![]() |
N. T. Griscom and M. Berst Effect of Knowledge of Actual Age on Bone Age Determination Am. J. Roentgenol., September 1, 2001; 177(3): 715 - 715. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |