Next: Content and library
Up: Clinical infrastructure
Previous: Activities
The first results of the research studies are related to the process of
enhancing information retrieval by automatically creating queries that account
for specific characteristics of individual patients. An automated method was
created to extract information from MEDLINE citations in 4 groups of clinical
questions: therapy, diagnosis, etiology, and prognosis. The automated process
generated 135,667 MeSH pairs in the therapy group, 110,586 in the prognosis
group, 142,915 in the etiology group, and 111,713 in the diagnosis group. The
generation of all possible semantic pairs based on the MeSH pairs increased the
number of pairs generated.
The statistical analysis was done after merging the co-occurrence pairs of the
same semantic types. We found that 157 (6.10%) pairs differ significantly
from the others in the therapy group, 161 (7.43%) in the prognosis group, 201
(7.32%) in the etiology group, and 189 (8.51%) in the diagnosis group. (p <
.05, Bonferroni correction). The analysis of performance showed that the
performance varies across the tasks. Performance in the therapy task was
significantly better than in the 3 other tasks (p<0.05). The pilot study
performed in order to evaluate the clinical validity of the information
retrieved showed that the results were suitable for the intended purpose
(literature retrieval), especially in the therapy group.
One of the key problems in using patient specific information is the accurate
acquisition of that information from patient records. We have extended the
natural language processing extraction tool, MedLEE, the function of which is
to extract and encode clinical information from the patient record. Once
extracted, this information will be used to tailor user queries to the patient.
In the first year of the project, MedLEE was extended to the domain of
electrocardiogram reports and an independent evaluation was undertaken to
determine the performance of the system in that domain. The study was performed
by a student in the Department of Medical Informatics as part of her
educational training. A paper concerning this study will be submitted to a
journal for publication.
As part of the MedLEE project, we have been examining the possibility of
sharing processing steps in the matching processs. We decided to use UMLS CUIs
(Concept Unique Identifiers) as a unique interface between our projects. The
termfinder already produces these CUIs. A first prototype of a matcher which
finds commonalities in terminology between patient records and medical articles
based on the CUIs has been implemented. Since MedLEE does not deliver CUIs yet
(but will do so in fall), we manually simulated the mapping of MedLEE's output.
In related work, we test our termfinder on the patient records, which produces
lower quality (but probably still acceptable) output. Another experiment
varied from which part of the article (abstract, method section, result
section, full article) the terms were drawn. This processing produces tables
of correlation between patient records and articles, and currently these tables
are being evaluated by our doctors.
Another area of research we have been involved in concerns the mapping of the
information in patient reports to a controlled clinical vocabulary, called the
Unified Medical Language System (UMLS) . Our aim is to use MedLEE to assist us
in a relevant mapping. Currently, the output of the NLP system consists of
structured target output which includes canonical forms, but not codes. The
group concerned with retrieving relevant articles for user queries has based
their work on the use of the UMLS. A Ph.D. student at the CUNY Graduate Center
is helping with this work. Yet another area of research concerns word sense
disambiguation (WSD). We are working on adapting machine learning techniques
and plan on incorporating a word sense disambiguation component into MedLEE.
This will improve performance and will also facilitate mapping to the UMLS.
Finally, we will extend MedLEE to other domains in cardiology, such as to
cardiac catherization reports.
Next: Content and library
Up: Clinical infrastructure
Previous: Activities
Noemie Elhadad
2000-08-01