Information retrieval is defined as "a branch of computer or library science relating to the storage, locating, searching, and selecting, upon demand, relevant data on a given subject."[1] As noted by Carl Sagan, "human beings have, in the most recent few tenths of a percent of our existence, invented not only extra-genetic but also extrasomatic knowledge: information stored outside our bodies, of which writing is the most notable example."[2] The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge or transactive memory have been shown in comparisons with rote memory.[3][4]
Although information retrieval is usually thought of being done by computer, retrieval can also be done by humans for other humans.[5] In addition, some Internet search engines such as mahalo.com and http://www.chacha.com/ may have human supervision or editors.
Information discovery is searching for information that the searcher has not seen before and the searcher does not know for sure that the information exists. Information discovery includes searching in order to answer a question at hand, or searching for a topic without a specific question in order to improve knowledge of a topic.
Information awareness has also been described as "'systematic serendipity' - an organized process of information discovery of that which he [the searcher] did not know existed".[8] Information awareness can be further divided into:[9]
Information familiarity
Knowledge acquisition (or called recollection) is the ability to apply the new knowledge.
Examples of information awareness prior to the Internet include reading print and online periodicals. With the Internet, new methods include email newsletters[10], email alerts, and RSS feeds.[11]
These methods may increase information familiarity.[9]
Poor formulation of the question to be searched[15]
Difficulty designing a search strategy when multiple resources are available[15]
"Uncertainty about how to know when all the relevant evidence has been found so that the search can stop"[15]
Difficulty synthesizing an answer across multiple documents[15]
Factors associated with successful retrieval[edit]
Characteristics of how the information is stored[edit]
For storage of text content, the quality of the index to the content is important. For example, the use of stemming, or truncating, words by removing suffixes may help.[16]
Information that is structured may be more effective according to controlled studies.[17][18] In addition, the structure should be layered with a summary of the content being the first layer that the readers sees.[19] This allows the reader to take only an overview, or choose more detail. Some Internet search engines such as http://www.kosmix.com/ try to organize search results beyond a one dimensional list of results.
Regarding display of results from search engines, an interface designed to reduce anchoring and order bias may improve decision making.[20]
John Battelle has described features of the perfect search engine of the future.[21] For example, the use of Boolean searching may not be as efficient.[22] Meta-searching and task based searching may improve decision velocity.[23]
Meta-search engines search multiple resources and integrate the results for the user. Examples in health care include Trip Database, MacPLUS, and QuickClinical.
In healthcare, searchers are more likely to be successful if their answer is answer before searching, they have experience with the system they are searching, and they have a high spatial visualization score.[14] Also in healthcare, physicians with less experience are more likely to want more information.[24] Physicians who report stress when uncertain are more likely to search textbooks than source evidence.[25]
In healthcare, using expert searchers on behalf of physicians led to increased satisfaction by the physicians with the search results.[26]
Use of term overlap is associated with success.[27]
The benefits of enhancing personal knowledge with retrieval of extrasomatic knowledge has been shown in a controlled comparison with rote memory.[3]
Various before and after comparisons are summarized in the tables.
Impact of medical searching by physicians and medical students[28][29][23][14]
Search engine
Users
Questions
Portion of answers correct
Portion of answers that moved from correct to incorrect
RCT of 407 inpatients compared to 402 control inpatients Searcher sought answers to questions that arose during "morning report". Search resources did not include UpToDate. Results emailed to teams.
Before after study of 146 inpatients Searcher sought answers to corroborate principle treated decisions for all patients. Search resources included UpToDate. Search results given to attendings. Blinded outcome assessment
• Treatments changed in 18% • Treatments improved in 14%
Evaluation of the quality of information retrieval[edit]
Survival curve modeling amount of time taken to answer questions. The units for time are arbitrary and meaningless in this example.Logistic curve modeling rate of correct answers over time. The units for time are arbitrary and meaningless in this example.
Various methods exist to evaluate the quality of information retrieval.[37][38][39] Hersh[38] noted the classification of evaluation developed by Wancaster and Warner[37] in which the first level of evaluation is:
Costs/resources consumed in learning and using a system
Coverage. An estimated of coverage can be crudely automated.[41] However, more accurate judgment of relevance requires a human judge which introduces subjectivity.[42]
Precision and recall
Novelty. This has been judged by independent reviewers.[31]
Completeness and accuracy of results. An easy method of assessing this is to let the searcher make a subjective assessment.[32][43][44][45] Other methods may be to use a bank of questions with known target documents[46] or known answers[14].
Recall is the fraction of relevant documents that are successfully retrieved. This is the same as sensitivity. The recall has also been called the "yield"[47] and comprehensiveness[48].
Precision is the fraction of retrieved documents that are relevant to the search. Precision has also been called efficiency.[48] This is the same as positive predictive value.
The number Needed to Read (NNR) is "how many papers in a journal have to be read to find one of adequate clinical quality and relevance."[50][51][52][53] Of note, the NNR has been proposed as a metric to help libraries to decide which journals to subscribe to.[50] The NNR has also been called the "burden."[47]
Time need to answer a question can be compared between two systems with a Kaplan-Meir survival analysis method.[23]
In health care, difficult questions make take hours to answer.[56]
If the correct answer to the search question is known, a logistic function can model rate of correct answers over time. The result is an S-curve (also called sigmoid curve or logistic growth curve) in which most questions are answered after an initial delay; however, a minority of questions take a much longer time.
↑Sagan, Carl (1993). The Dragons of Eden: Speculations on the Evolution of Human Intelligence. New York: Ballantine Books. ISBN 0-345-34629-7.
↑ 3.03.13.2de Bliek R, Friedman CP, Wildemuth BM, Martz JM, Twarog RG, File D (1994). "Information retrieved from a database and the augmentation of personal knowledge". J Am Med Inform Assoc1 (4): 328–38. PMID 7719819. [e]
↑Shaughnessy AF, Slawson DC, Bennett JH (November 1994). "Becoming an information master: a guidebook to the medical information jungle". J Fam Pract39 (5): 489–99. PMID 7964548. [e]
↑ 8.08.1Garfield, E. “ISI Eases Scientists’ Information Problems: Provides Convenient Orderly Access to Literature,” Karger Gazette No. 13, pg. 2 (March 1966). Reprinted as “The Who and Why of ISI,” Current Contents No. 13, pages 5-6 (March 5, 1969), which was reprinted in Essays of an Information Scientist, Volume 1: ISI Press, pages 33-37 (1977). http://www.garfield.library.upenn.edu/essays/V1p033y1962-73.pdf
↑ 9.09.1Tanna GV, Sood MM, Schiff J, Schwartz D, Naimark DM (2011). "Do E-mail Alerts of New Research Increase Knowledge Translation? A "Nephrology Now" Randomized Control Trial.". Acad Med86 (1): 132-138. DOI:10.1097/ACM.0b013e3181ffe89e. PMID 21099399. Research Blogging.
↑Roland M. Grad et al., “Impact of Research-based Synopses Delivered as Daily email: A Prospective Observational Study,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2563v1 (accessed December 21, 2007).
↑ 12.012.112.212.312.412.5Berthier Ribeiro-Neto; Ricardo Baeza-Yates; Ribeiro, Berthier de Araújo Neto (2009). Modern information retrieval. Boston: Addison-Wesley. ISBN 0-321-41691-0.
↑Hersh, William R (2001). “Information Retrieval Systems”, Fagan, Lawrence Marvin; Shortliffe, Edward Hance; Perreault, Leslie E.; Wiederhold, Gio: Medical Informatics: Computer Applications in Health Care and Biomedicine. Berlin: Springer, 549. ISBN 0-387-98472-0.
↑Beck AL, Bergman DA (September 1986). "Using structured medical information to improve students' problem-solving performance". J Med Educ61 (9 Pt 1): 749–56. PMID 3528494. [e]
↑John Battelle. The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. Portfolio Trade. ISBN 1-59184-141-0.
↑Verhoeff, J (2001). Inefficiency of the use of Boolean functions for information retrieval system. Communications of the ACM. 1961;4:557 DOI:10.1145/366853.366861
↑Gruppen LD, Wolf FM, Van Voorhees C, Stross JK (1988). "The influence of general and case-related experience on primary care treatment decision making". Arch. Intern. Med.148 (12): 2657–63. PMID 3196128. [e]
↑McKibbon KA, Fridsma DB, Crowley RS (2007). "How primary care physicians' attitudes toward risk and uncertainty affect their use of electronic information resources". J Med Libr Assoc95 (2): 138–46, e49–50. DOI:10.3163/1536-5050.95.2.138. PMID 17443246. Research Blogging.
↑Shelagh A. Mulvaney et al., “A Randomized Effectiveness Trial of a Clinical Informatics Consult Service: Impact on Evidence Based Decision-Making and Knowledge Implementation,” J Am Med Inform Assoc (December 20, 2007), http://www.jamia.org/cgi/content/abstract/M2461v1 (accessed December 21, 2007).
↑ 37.037.1Lancaster, Frederick Wilfrid; Warner, Amy J. (1993). Information retrieval today. Arlington, Va: Information Resources Press. ISBN 0-87815-064-1.
↑ 38.038.1Hersh, William R. (2008). Information Retrieval: A Health and Biomedical Perspective (Health Informatics). Berlin: Springer. ISBN 0-387-78702-X.Google books
↑Gorman P (2001). "Information needs in primary care: a survey of rural and nonrural primary care physicians". Stud Health Technol Inform84 (Pt 1): 338–42. PMID 11604759. [e]
↑Haynes RB, McKibbon KA, Walker CJ, Ryan N, Fitzgerald D, Ramsden MF (January 1990). "Online access to MEDLINE in clinical settings. A study of use and usefulness". Ann. Intern. Med.112 (1): 78–84. PMID 2403476. [e]
↑McKibbon KA, Wilczynski NL, Haynes RB (2004). "What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals?". BMC Med2: 33. DOI:10.1186/1741-7015-2-33. PMID 15350200. Research Blogging.
↑Bachmann LM, Coray R, Estermann P, Ter Riet G (2002). "Identifying diagnostic studies in MEDLINE: reducing the number needed to read". J Am Med Inform Assoc9 (6): 653–8. PMID 12386115. [e]
↑Haase A, Follmann M, Skipka G, Kirchner H (2007). "Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance". BMC Med Res Methodol7: 28. DOI:10.1186/1471-2288-7-28. PMID 17603909. Research Blogging.
↑Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR (2006). "Using citation data to improve retrieval from MEDLINE". J Am Med Inform Assoc13 (1): 96–105. DOI:10.1197/jamia.M1909. PMID 16221938. Research Blogging.
Berthier Ribeiro-Neto; Ricardo Baeza-Yates; Ribeiro, Berthier de Araújo Neto (2009). Modern information retrieval. Boston: Addison-Wesley. ISBN 0-321-41691-0.
Shortliffe, Edward Hance; Cimino, James D. (2006). Biomedical informatics: computer applications in health care and biomedicine. Berlin: Springer. ISBN 0-387-28986-0.