|
|
||||||||
Fundamentals of Clinical Research |
1
American College of Radiology, 1891 Preston White Dr., Reston, VA 20191.
2
Department of Radiology, Rainbow Babies & Children's Hospital, 11100
Euclid Ave., Cleveland, OH 44106-5056.
3
Present address: Department of Radiology, Riley Hospital for Children, 702
Barnhill Dr., Indianapolis, IN 46202-5200.
Received April 3, 2001;
accepted after revision April 24, 2001.
Series editors: Craig A. Beam, C. Craig Blackmore, Stephen J. Karlik, and
Caroline Reinhold.
Introduction
|
|
|---|
"On being asked to talk on the principles of research, my first thought was to arise after the chairman's introduction, to say, `Be careful', and to sit down..." by J. Cornfield [1].
Universally lamented by experienced clinical researchers as an important but often ignored aspect of medical research, good study design and data collection are critical to the success of any clinical study [2, 3]. Although most researchers are, by their very nature, excited by experimentation and analysis, few find enjoyment in the design and implementation of data collection, although these factors are critical to successful research. Too often, researchers pay little attention to how data will be collected, if the data are available or can be measured, or how much data will be incorrect or missing. Even fewer researchers carefully train the data collectors and periodically check their work.
This paper outlines seven basic elements of data collection. We discuss defining the research question, deciding on what data to collect, obtaining institutional review board (IRB) approval, planning statistical analyses, designing the data collection system establishing quality control, and organizing data entry. This article is by no means comprehensive but provides guidelines that we believe will improve clinical research in radiology.
Three general rules of data collection underlie this discussion. First, researchers should assume they will underestimate the amount of time and effort involved in data collection. Second, the more complex the data collection process is, the longer it will take to acquire and enter the data. Finally, systematic and individual data collection errors must be addressed early in the process, because it is unwise to trust human memory or a statistician's creativity to resolve errors in the data.
Define the Primary Research Question
|
|
|---|
The end point is the dependent variable, the variable you wish to better understand. Identifying other variables becomes an exercise in determining what factors might explain variations in the study's end point [4, 5, 6]. These factors, known as independent variables, usually include basic demographics such as age, sex, and race. Other independent variables could include comorbidity, stage of disease, signs or symptoms, laboratory test results, imaging test results, clinician experience and training, type of imaging equipment, and patient movement, to name only a few. To be worthy of inclusion in the study, independent variables should either relate directly to the research question or provide useful controls for defining the study population and sample.
|
|
|---|
Three additional elements must also be considered when determining what data will be collected. Essential for designing data collection forms and creating data files, these elements are the unit of analysis, data precision, and the collection sequence. The involvement of a statistician at this stage of data collection design cannot be overemphasized. They can provide guidance on determining the unit of analysis, data precision, and many other research design issues essential to a defensible statistical analysis.
Unit of Analysis
Determining the unit of analysis is a basic task in designing a study, not
only for methodologic reasons, but also because it affects the design of data
collection forms, the storage and linking of documentation, and the design of
electronic data files. The most common unit of analysis is the individual
patient, but there are many other possibilities, such as the institution, the
type of procedure, the images, or in the case of reader studies, even
individual radiologists.
Data Precision
The degree of accuracy needed in the collected data also deserves early
attention [2,
4]. There are likely to be
several different ways to measure the data you collect. For example, when
recording carotid stenosis is it sufficient to record stenosis to one decimal
place (0.4), two decimal places (0.44), or three (0.435)? Obviously, the more
precise the measure the better, but the goal of precision may need to be
tempered by consideration of the cost in both time and money and the
substantive importance of the measure.
Whenever possible, use well-established measurements and common terminology to reduce design time and improve comparability with other studies [6]. In addition, good research design must address reliability (consistency and reproducibility, such as the extent to which a measure obtains similar results on identical patients) and validity (how often the positive test result is correct) [2, 4, 5, 6]. Both are important for establishing the accuracy of the study outcome.
Collection Sequence
Finally, study collaborators should consider the sequence of data
collection early in the study design. This will allow for thoughtful
preparation of data forms, the design of an adequate data file format, and
development of a suitable analysis plan. Many studies incorporate patient
follow-up, often at multiple intervals. Follow-up measurements should be
recorded on well-focused forms that are coded with a common linking identifier
(generally the case identification number) to ensure they can be aggregated
with previously collected data.
|
|
|---|
|
|
|---|
56) converts the
scale of data from interval (computing mean age in years) to ordinal (not
useful for computing a mean). As a result, developing a clear statistical
analysis plan at this early stage can be very useful
[10] not only in providing
focus for the data collection effort (such as specifying sample size
estimation) but also pointing out weaknesses in the scale and precision of the
data before data collection begins. The statistical analysis plan is a detailed out-line of what data will be analyzed and how. This plan should include clear definitions of variables and statistical end points (descriptive or inferential), a description of the required subgroup analyses, and identification of the most appropriate statistical techniques and their relationship to the research hypotheses. Although a poor research methodology, data are frequently collected without a clear understanding of how it will be analyzed or the scale of data necessary for a particular statistical technique. A useful but time-consuming tool in designing a statistical plan is to draft the tables you will use to present the results of your analysis [5]. This approach is helpful in identifying important comparisons while clarifying statistical method requirements and data needs.
Designing the Data Collection System
|
|
|---|
Regardless of the complexity, factors to consider in designing the data collection system include creating data forms, avoiding systematic bias, and preparing a plan for data administration.
Data Collection Forms
The case report form is a common tool used to collect multiple sources of
data (patients, physicians, records) into one document. In developing and
designing this type of data form, it is wise to allow for detailed notes,
regardless of the number of investigators involved in the study
[12]. These notes may or may
not be entered into an electronic data file, but they can become invaluable in
explaining otherwise unexplainable variations in the data later in the study.
Examples could be patient movement or other uncooperative activity, equipment
malfunctions, previously undisclosed comorbidities, or exceptions to protocol
guidelines that can occur for many reasons including human error and clinical
necessity.
Form development is both an art and a science, but there are a few basic rules to follow. First, forms should be self-explanatory to the person entering the data. Second, data should not require extensive interpretation before recording. Third, the unit of measurement should be defined. Using time as an example, specify which unit of measurement is required (hours, days, weeks, months, or years). Level of precision should be evident (fractions of hours, round to nearest full day).
Also, consistent and complete responses should be required for each section of the form. Never leave a section blank. Leaving a section blank may mean the issues are not applicable (which is important to code) or the originator of the data forgot to respond. In the case of missing data, an assumption of irrelevance may be entirely wrong.
Finally, the form should be visually appealing, easy to navigate, and conducive to data entry [2, 4]. It is often helpful to have the coding conventions for data entry included directly on the form (female [0], male [1]).
Pilot Testing
Pretest the forms on individuals who are characteristically the same as
those who will fill out the forms in the study; have physicians fill out
physician forms, technologists fill out technologist forms, and someone who is
not a physician or healthcare worker fill out the patient questionnaires.
Involve everyone who will be handling the data collection, data entry, or
analysis in the form design and testing process. The initial data collection
form may be piloted on a small sample of potential patients to determine
whether the desired data are available and whether the data form is easy to
complete and enter into the database. The data form can then be revised before
the full study has begun. Finally, the principal investigator should not rely
on memory to recall study design issues such as units of analyses, data
measurement techniques, definitions of each measurement and variable, and time
sequence. All members of the research team need a copy of the methodology and
a "code book" to serve as a reminder of the methods established
for data collection.
Avoid Systemic Bias
Although some biases can be corrected in the analysis, some are fatal and
may render the study invalid. Therefore, it is always best to design the study
to avoid or minimize these biases. There are many sources of bias to consider,
however; some are more closely related to the data collection effort than
others. In particular, steps should be taken to maintain objectivity in the
data collection system while avoiding bias in patient recruitment and
minimizing the effects of "interpretation bias" and
"response bias."
Objective measurement of data reduces the likelihood of collection bias, but the degree of objectivity can vary, depending on the means of measurement. As an example, measures of body weight on a calibrated digital scale are unlikely to vary depeding on who weighs the patient. In contrast, if surveyors are asked to indicate whether a patient is underweight, normal in weight, or overweight, much will depend on individual perceptions. It is a subjective measure.
Subjective measures are particularly susceptible to prior knowledge of the treatment arm of a clinical trial [10]. In blinded studies neither the patient nor the data collector know who is in the control group or in the treatment group, therefore minimizing bias. In open studies both the patient and the data collector know who is given treatment. In the event complete blinding is not possible, a blinded clinician could be used to review the data from both groups for consistency. Although a complete discussion is beyond the scope of this paper, be aware that a clinical study may require procedures that fail to completely mirror clinical practice, such as having all available patient information before making an assessment.
If your study calls for patient randomization into multiple study arms, it is essential that the randomization is, if at all possible, either done by a third party or automation. Before obtaining patient consent, the study monitor should have no knowledge of that patient's study arm placement.
In diagnostic imaging, comparisons often involve the same patient receiving two diagnostic tests [13]. Ensuring that the technologists and radiologists are unaware of the competing test results is essential to prevent interpretation bias. This sort of blinding will require two separate, and probably different, data report forms.
Response bias can occur during the follow-up stage of a study because of incomplete responses or patients lost to follow-up [14]. Ill patients may be more likely to complete a follow-up quality of life survey than patients who are not ill. As a result, aggregate quality of life estimates may be lower than what they would have been if all participants responded.
|
|
|---|
In general, all personal identifiable information should be collected and stored separately from the case report form. Even though patient identifying information should be separate from the case report form, each set of data should be stored in a secure location with limited access. Assign responsibilities for data storage and maintenance of the master list that contains both patient information (names and addresses) and assigned case IDs. Similarly, a computer specialist must secure the confidentiality of the electronic files.
|
|
|---|
Data Cleaning
Cleaning data requires developing a scheme for ensuring that the data are
consistent and accurate. Much depends on the study design, but consider
monitoring for the following: out-of-range data values; missing data; lack of
variability (survey questionnaires can include reversed questions to see
whether the respondent is using the scale appropriately); logic traps (check
combinations of responses for inconsistency, such as a female record that
lists chronic prostatitis as a comorbidity); and date checking (verify forms
are completed in sequential order)
[12]. An entry error in the
year field is much easier to catch early on than it is after the data are
entered and combined with other participants.
All members of the research team should understand the goals and design of a study so that they may flag questionable data [2]. The study design should clearly identify the target and study populations and patient selection criteria so that variability among centers and the individual investigators who enroll patients is minimized. Develop and enforce consistent rules for data review and cleaning, including specification of how to handle missing dating. These rules should be delineated before data aggregation in which the temptation to justify certain decisions in favor of a particular outcome is strongest [10]. The principal investigators should also determine whether "interim analysis" is necessary and determine prospectively when and what is analyzed and identify the decision rules for discontinuing the study [10, 16]. An interim analysis is generally done when it is important to monitor the efficacy or safety of two treatments.
Once the data are clean, the "database lock" occurs, the point at which no additional cases or data will be added to the data file. Always assume, however, that there will be data errors even after a complete quality control plan is used [4, 10]. During the statistical analysis, do not be surprised if it becomes necessary to pull original case report forms to answer questions from the analysis. Outliers can be very revealing in a statistical analysis and it is not unusual to want to verify data integrity when the results run counter to theory or prior experience.
Amoral Consequence of Dishonesty
Our discussion of bias thus far has assumed that errors in data collection
are the result of unintentional practices, such as misunderstanding
instructions or rationalizing postprotocol changes in study design that result
in collecting and reporting inaccurate information. In contrast, dishonesty
biases data collection through deliberate falsification of either the raw data
or the conditions essential for maintaining the integrity of the clinical
study (such as proper patient recruitment). Regardless of whether data
collection errors are accidental, well-intentioned, or the result of a
deliberate fabrication, the amoral consequence is bias
[3]. Some will conclude that
quality control is a necessary evil to prevent the errors caused by others
involved in the study. For most studies, however, the danger of instituting
error in data collection rests less with the dishonest than with those
well-intentioned researchers who fail to recognize and take steps to mitigate
their own potential for bias.
|
|
|---|
Although there are many available data file formats and complex organizational structures, such as relational databases, most statisticians prefer the traditional rectangular data file (Table 1). Analogous to the common spreadsheet, each row typically represents one case (a patient) and each column represents a variable (a data element). Ideally, most data entries are numerical codes [1, 4]. As an example, although it is possible to enter "male" or "female" for the sex variable, data entry and subsequent statistical programming are much simpler if numbers (numerical fields) are used in place of words (string fields). In Table 1, female patients are coded "0" and male patients are coded "1." If possible, avoid open-ended entries (e.g., free text comment or description fields) because they will inevitably lead to interpretation error. Popular electronic data files include delimited text files, Excel (Microsoft, Redmond, WA), SAS (SAS Institute, Cary, NC), SPSS (SPSS, Chicago, IL), Access (Microsoft), and Epi Info (Centers for Disease Control and Prevention, Atlanta, GA).
|
|
|
|---|
Data collection requires thoughtful preparation and consistent implementation. To be successful, all aspects of data collection must be focused on the goal of obtaining substantively important data that are consistent, accurate, and unbiased. Data collection begins with a clear research question and is followed by careful attention to identifying data needs, anticipating missing or incorrect data, planning statistical analyses, designing a data collection system, establishing quality control, and planning for data entry. Considerable misspent effort can be avoided if the principal investigators, data managers, and statisticians work together early in the design of a data collection effort.
We have presented elements of a data collection checklist that should be addressed in most, if not all, clinical research. This list is not comprehensive; much will depend on the specifics of a particular study, but recognition of the seven primary issues can dramatically improve the quality of research in radiology.
|
|
|
|
|---|
This article has been cited by other articles:
![]() |
G. T. Sica Bias in Research Studies Radiology, March 1, 2006; 238(3): 780 - 789. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |