Next: Multimedia resource summarization
Up: Multimedia search over distributed
Previous: Activities
Following the results of an initial search, our approach includes
re-ranking of search results based on a more detailed comparison, using natural
language, of the resulting articles against the patient record. Articles which
match the patient records are determined to be more relevant to the end user.
We have begun by applying this approach to journal articles only. Locating
medical terms and their associated values in the documents and matching them
against the patient record is a central element in our approach to document
filtering and re-ranking of the search results. During the past year, we
implemented new tools for extracting candidate terms and filtering them to keep
only the ones important in the medical domain. We have converted our finite
state extraction grammar for candidate terms so that it produces a portable XML
format, allowing data exchange and co-operation with additional tools and
improving efficiency and accuracy. We have also implemented an additional
filter that uses WordNet to determine the likelihood that a noun or noun phrase
is used broadly in the language rather than specifically in the medical domain.
We are in the process of combining the syntactic filter offered by the grammar,
the WordNet-based filter, and knowledge in the UMLS (Unified Medical Language
System) database maintained by NLM with additional statistical filters.
Our plans for the immediate future include the addition of statistical tests
that will rely on measurable properties of word frequency distributions, such
as dispersion across documents, variance between a medical and general English
corpus, and collocation strength, to complement our existing grammar- and
knowledge-based techniques for identifying terms. We will explore the relevance
of semantic categories provided by the UMLS for term recognition and for
classifying modifiers of terms as values or qualifiers (e.g., "severe CHF" vs.
"atrial fibrillation"). In cooperation with domain experts at CPMC, we are
planning a first evaluation of the terms extracted and of the matching process
that utilizes these terms, to be completed by December 2000. Data from this
analysis will be used to adapt the weighting mechanism employed by our
document matching subsystem.
In the first year, we have completed the following:
- 1.
- a preliminary study together with echocardiograph specialists from CPMC
in acquiring the domain knowledge in terms of typical recording procedures,
important video views, and diagnosis report structure;
- 2.
- an automatic video view segmentation algorithm and its software
implementation. Our algorithm is unique in combining domain-specific
syntactic structure with low-level video features (such as color, window
shape,and object). It achieves satisfactory accuracy (more than 90%) in
recognizing constituent views in echocardiogram video. However, it does not
run in real time.
- 3.
- an algorithm for using the EKG chart embedded in the echo video to
automatically extract the key frame of each view. Our algorithm uses a unique
approach exploring multi-modal information. It detects the peak points in
EKG (dynamic graphic data) to determine the end-diastole view (image) within
each heart pumping cycle in the video.
The goals for the next year include improving the video view segmentation
software in terms of speed and accuracy. Our goal is real-time processing of
video and near-perfect accuracy. We are confident we can achieve near-perfect
accuracy by using the highly consistent video capturing rules used in practical
clinical situations. We are also developing tools for linking findings in
textual reports to individual views in video. We are working with Dr. Henry Wu
(M.D.) of CPMC's echocardiogragh department and Dr. Carol Friedman of medical
informatic department in building the initial test set for the echocardiagram
study consisting of video and associated textual reports. We will develop an
initial set of standard-case video to be used in PERSIVAL. Representative
views and video skims of each importance abnormal/normal case will be indexed
and randomly accessble. In addition, we will also include a small set of video
studies of actual patients in PERSIVAL. Finally, we will start the collection
and analysis of multimedia content from the World Wide Web related to
cardiology. We will first manually analyze the distribution and
characteristics of such content and identify classification algorithms for
suitable categories. Our goal is to integrate multimedia content from external
sources with those from special sources such as the echocardiagram video
collection at CPMC.
Next: Multimedia resource summarization
Up: Multimedia search over distributed
Previous: Activities
Noemie Elhadad
2000-08-01