Findings from these activities

Next: Multimedia resource summarization Up: Multimedia search over distributed Previous: Activities

Findings from these activities

Following the results of an initial search, our approach includes re-ranking of search results based on a more detailed comparison, using natural language, of the resulting articles against the patient record. Articles which match the patient records are determined to be more relevant to the end user. We have begun by applying this approach to journal articles only. Locating medical terms and their associated values in the documents and matching them against the patient record is a central element in our approach to document filtering and re-ranking of the search results. During the past year, we implemented new tools for extracting candidate terms and filtering them to keep only the ones important in the medical domain. We have converted our finite state extraction grammar for candidate terms so that it produces a portable XML format, allowing data exchange and co-operation with additional tools and improving efficiency and accuracy. We have also implemented an additional filter that uses WordNet to determine the likelihood that a noun or noun phrase is used broadly in the language rather than specifically in the medical domain. We are in the process of combining the syntactic filter offered by the grammar, the WordNet-based filter, and knowledge in the UMLS (Unified Medical Language System) database maintained by NLM with additional statistical filters.

Our plans for the immediate future include the addition of statistical tests that will rely on measurable properties of word frequency distributions, such as dispersion across documents, variance between a medical and general English corpus, and collocation strength, to complement our existing grammar- and knowledge-based techniques for identifying terms. We will explore the relevance of semantic categories provided by the UMLS for term recognition and for classifying modifiers of terms as values or qualifiers (e.g., "severe CHF" vs. "atrial fibrillation"). In cooperation with domain experts at CPMC, we are planning a first evaluation of the terms extracted and of the matching process that utilizes these terms, to be completed by December 2000. Data from this analysis will be used to adapt the weighting mechanism employed by our document matching subsystem.

In the first year, we have completed the following:

1.: a preliminary study together with echocardiograph specialists from CPMC in acquiring the domain knowledge in terms of typical recording procedures, important video views, and diagnosis report structure;
2.: an automatic video view segmentation algorithm and its software implementation. Our algorithm is unique in combining domain-specific syntactic structure with low-level video features (such as color, window shape,and object). It achieves satisfactory accuracy (more than 90%) in recognizing constituent views in echocardiogram video. However, it does not run in real time.
3.: an algorithm for using the EKG chart embedded in the echo video to automatically extract the key frame of each view. Our algorithm uses a unique approach exploring multi-modal information. It detects the peak points in EKG (dynamic graphic data) to determine the end-diastole view (image) within each heart pumping cycle in the video.

The goals for the next year include improving the video view segmentation software in terms of speed and accuracy. Our goal is real-time processing of video and near-perfect accuracy. We are confident we can achieve near-perfect accuracy by using the highly consistent video capturing rules used in practical clinical situations. We are also developing tools for linking findings in textual reports to individual views in video. We are working with Dr. Henry Wu (M.D.) of CPMC's echocardiogragh department and Dr. Carol Friedman of medical informatic department in building the initial test set for the echocardiagram study consisting of video and associated textual reports. We will develop an initial set of standard-case video to be used in PERSIVAL. Representative views and video skims of each importance abnormal/normal case will be indexed and randomly accessble. In addition, we will also include a small set of video studies of actual patients in PERSIVAL. Finally, we will start the collection and analysis of multimedia content from the World Wide Web related to cardiology. We will first manually analyze the distribution and characteristics of such content and identify classification algorithms for suitable categories. Our goal is to integrate multimedia content from external sources with those from special sources such as the echocardiagram video collection at CPMC.

Next: Multimedia resource summarization Up: Multimedia search over distributed Previous: Activities

Noemie Elhadad
2000-08-01