|
|
||||||||
Computers |
1 Department of Radiology, Medical College of Wisconsin, 9200 W Wisconsin Ave.,
Milwaukee, WI 53226.
1 Department of Electrical Engineering and Computer Science, University of
Wisconsin, Milwaukee, WI.
Received December 30, 2006; accepted after revision February 21, 2007.
Supported in part by the American Roentgen Ray Society.
OBJECTIVE. We sought to create an Internet-based search engine to retrieve images from a large collection of figures published in peer-reviewed journals.
CONCLUSION. The GoldMiner search engine provides easy, rapid access to a large library of images and their associated text, and it is freely available for use on the Internet.
Keywords: computers in radiology digital images GoldMiner image database index Internet online educational materials search engine teaching files
Images published in peer-reviewed radiology journals serve as a valuable source of information for medical education and clinical decision support. Although the articles in which the figures appear are indexed by Medical Subject Headings (MeSH) codes, the more granular information in the individual figures requires additional information for satisfactory search and retrieval. To facilitate concept-based image retrieval, one must index the images using keywords and concepts extracted from the associated captions. To improve radiologists' access, we sought to create an Internet-based search engine to retrieve images from a large collection of high-quality figures published in peer-reviewed journals.
Image Library
We identified open-access content from five leading peer-reviewed radiology journals. Several radiology societiesincluding the American Roentgen Ray Society, the American Society of Neuroradiology, the British Institute of Radiology, and the Radiological Society of North America (RSNA)make the content of their journals available online without charge 1224 months after publication. All of the selected journals are written in English and are hosted online by HighWire Press, a division of Stanford University Libraries (highwire.stanford.edu). We also included content from the European Society of Radiology's EURORAD E-Learning Initiative, which comprises more than 2,100 peer-reviewed case reports with high-quality images (www.eurorad.org). The collection incorporated a total of 94,256 images from 11,712 articles (Table 1).
|
We created software, a so-called "Web crawler," to harvest the figures and their captions from these online sources. We stored a small, low-resolution thumbnail image of each figure or figure part. For each article, we recorded the title; journal; URL of the full-text online article; and digital object identifier (DOI), if available. For journal articles, we captured the PubMed identifier (PMID) and obtained MeSH codes from Medline using the National Library of Medicine's eQuery and eFetch Web-based utilities. For EURORAD articles, we captured the assigned MeSH codes. Data were stored in a MySQL database (version 4.1, MySQL AB; www.mysql.net). Customized software was written in the PHP Hypertext Preprocessor programming language.
|
|
Search Engine
GoldMiner has a simple, Web-based user interface (goldminer.arrs.org). The search engine applies two distinct retrieval techniqueskeyword search and concept searchand returns those images found using either technique. The complementarity of these techniques is a unique aspect of the search engine.
First, GoldMiner searches for the given term as a "keyword"that is, as a case-insensitive string. For example, the search term "gallstone" would match any figure with a caption that contained the word "gallstone," "Gallstone," or "GALLSTONE." It would not, however, match text that contained "gall stone" (two words) or "gallstones" (the plural form).
The second, more powerful, technique is concept-based search. With this technique, GoldMiner uses the knowledge contained in the UMLS Metathesaurus to search using the meaning of the specified term. The Metathesaurus contains lexical variants of terms, such as "gallstone" and "gallstones," and also contains synonyms, such as "cholelithiasis." The Metathesaurus also recognizes that "gallstones" is a subtype of "gallbladder disease." Thus, when a user enters "gallstone" as a search term, GoldMiner understands that images labeled with "gallstone, "gallstones," and "cholelithiasis" should be retrieved, too.
|
|
GoldMiner displays a thumbnail image and a portion of the caption for each retrieved image (Fig. 1). Each thumbnail image points to the original figure at its source Website (Fig. 2). Thus, by clicking on a figure, a user can link to the original full-resolution image and its complete caption. GoldMiner also displays the source and title of the article from which each retrieved figure was derived. The title is linked to the full-text article at the original Website. The search engine also displays, if possible, the age and sex of the image's subject, the imaging technique, and the image's figure number in its source article.
GoldMiner includes the ability to limit, or filter, search results by
imaging technique, patient age group, and patient sex. From each figure's
caption text, the search engine attempts to identify the imaging technique and
the patient's age and sex. The filters are presented as a set of pull-down
tabs at the top of the search page (Fig.
3A,
3B). Each tab lists the
available selections and the number of corresponding images. Users can apply
one or more of the filters as needed. For example, one could search for
"breast cancer" and then limit the search to male subjects.
Imaging techniques include typical classificationssuch as radiography,
CT, MRI, sonography, PET, and nuclear medicineand categories for photos
(e.g., photomicrographs and endoscopic images) and graphics (e.g., charts and
illustrations). Patients are grouped by age as infants (< 2 years),
children (217 years), and adults (
18 years).
Discussion
GoldMiner returns results for most search queries in less than 0.5 second, and informal responses from users have been positive. Our search engine has several advantages over more generic search engines such as Yahoo! and Google. First, GoldMiner is limited to peer-reviewed radiology materials. Such pre-selection limits the universe of potential Web-based materials to those of greatest potential use and interest to the search engine's intended audience. Second, GoldMiner's concept-based search algorithms find targets that conventional search engines cannot. For example, whereas GoldMiner identified 75 images for the term "phakomatosis," a search using Google yielded only one image, a photograph of a skin lesion.
The current system has several limitations, some of which may be addressed in the coming year. The MMTx software does not recognize negation. As a result, for example, although a figure caption might state, "The image shows no evidence of appendicitis," that image would be indexed with the concept "appendicitis." This concern could be addressed through the use of more sophisticated natural language processing techniques, although one could argue that it is still valuable to index any mentionpositive or negativeof a concept. GoldMiner does not conform to RSNA's Medical Image Resource Center (MIRC) database architecture [3] and cannot at present be searched as a MIRC database.
We are conducting detailed empiric analyses of the effectiveness of GoldMiner's retrieval and filtering techniques. We plan to incorporate an Advanced Search facility to allow more sophisticated search criteria, such as the ability to require or exclude specific words or concepts. We will integrate the RadLex vocabulary, developed by the RSNA, to allow searching using this unified lexicon for radiologic anatomy, findings, and procedures [4]. We are exploring approaches for content-based image retrieval to improve identification of the imaging technique directly from the images themselves. GoldMiner's collection of images will grow automatically over time as new images are published and become available through their publishers' open-access policies.
In summary, we have developed and applied a process to construct a large Internet-based library of peer-reviewed radiology images. The free-text figure captions were mapped to concepts in a set of controlled vocabularies, which were used to index the images. The techniques developed here provide a fully automated approach to constructing a large, richly indexed collection of radiology images. Radiologists can search across all of the sources simultaneously. The search engine provides an easy-to-use tool for access to a large pool of images and their associated text. GoldMiner is freely available for use on the Internet.
References
This article has been cited by other articles:
![]() |
S. R. Pomerantz Net Assets: Personal Technology for Productivity in Radiology Radiology, May 1, 2008; 247(2): 307 - 310. [Full Text] [PDF] |
||||
![]() |
A. E. Flanders Next Generation Web Search: Augmenting Information Access for Radiologists RadioGraphics, September 1, 2007; 27(5): 1519 - 1521. [Full Text] [PDF] |
||||
![]() |
R. J. Stanley Our Practice of Radiology: Reflections on its Growth and Stature Am. J. Roentgenol., June 1, 2007; 188(6): 1439 - 1439. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |