Findings from these activities

Next: Content and library Up: Clinical infrastructure Previous: Activities

Findings from these activities

The first results of the research studies are related to the process of enhancing information retrieval by automatically creating queries that account for specific characteristics of individual patients. An automated method was created to extract information from MEDLINE citations in 4 groups of clinical questions: therapy, diagnosis, etiology, and prognosis. The automated process generated 135,667 MeSH pairs in the therapy group, 110,586 in the prognosis group, 142,915 in the etiology group, and 111,713 in the diagnosis group. The generation of all possible semantic pairs based on the MeSH pairs increased the number of pairs generated.

The statistical analysis was done after merging the co-occurrence pairs of the same semantic types. We found that 157 (6.10%) pairs differ significantly from the others in the therapy group, 161 (7.43%) in the prognosis group, 201 (7.32%) in the etiology group, and 189 (8.51%) in the diagnosis group. (p < .05, Bonferroni correction). The analysis of performance showed that the performance varies across the tasks. Performance in the therapy task was significantly better than in the 3 other tasks (p<0.05). The pilot study performed in order to evaluate the clinical validity of the information retrieved showed that the results were suitable for the intended purpose (literature retrieval), especially in the therapy group.

One of the key problems in using patient specific information is the accurate acquisition of that information from patient records. We have extended the natural language processing extraction tool, MedLEE, the function of which is to extract and encode clinical information from the patient record. Once extracted, this information will be used to tailor user queries to the patient. In the first year of the project, MedLEE was extended to the domain of electrocardiogram reports and an independent evaluation was undertaken to determine the performance of the system in that domain. The study was performed by a student in the Department of Medical Informatics as part of her educational training. A paper concerning this study will be submitted to a journal for publication.

As part of the MedLEE project, we have been examining the possibility of sharing processing steps in the matching processs. We decided to use UMLS CUIs (Concept Unique Identifiers) as a unique interface between our projects. The termfinder already produces these CUIs. A first prototype of a matcher which finds commonalities in terminology between patient records and medical articles based on the CUIs has been implemented. Since MedLEE does not deliver CUIs yet (but will do so in fall), we manually simulated the mapping of MedLEE's output. In related work, we test our termfinder on the patient records, which produces lower quality (but probably still acceptable) output. Another experiment varied from which part of the article (abstract, method section, result section, full article) the terms were drawn. This processing produces tables of correlation between patient records and articles, and currently these tables are being evaluated by our doctors.

Another area of research we have been involved in concerns the mapping of the information in patient reports to a controlled clinical vocabulary, called the Unified Medical Language System (UMLS) . Our aim is to use MedLEE to assist us in a relevant mapping. Currently, the output of the NLP system consists of structured target output which includes canonical forms, but not codes. The group concerned with retrieving relevant articles for user queries has based their work on the use of the UMLS. A Ph.D. student at the CUNY Graduate Center is helping with this work. Yet another area of research concerns word sense disambiguation (WSD). We are working on adapting machine learning techniques and plan on incorporating a word sense disambiguation component into MedLEE. This will improve performance and will also facilitate mapping to the UMLS. Finally, we will extend MedLEE to other domains in cardiology, such as to cardiac catherization reports.

Next: Content and library Up: Clinical infrastructure Previous: Activities

Noemie Elhadad
2000-08-01