Current and Past Research Activities

I am currently in the Parallel and Distributed Intelligent Systems Laboratory (PI: Sal Stolfo), Computer Science Department, Columbia University.  We are doing research in data mining and meta-learning. We are developing JAM (Java Agents for Meta-learning), which is an infrastructure to support collaborative learning over distributed database. We are applying JAM to fraud and intrusion detection.

Ph.D. Thesis: A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems
My thesis research automates the development process for Intrusion Detection Systems (IDSs). I designed and developed a data mining framework for adaptively building intrusion detection models. The central idea is to use system audit programs to extract an extensive set of features that describe each network connection or host session, and apply data mining programs to learn rules that accurately capture the behavior of intrusions and normal activities. These rules are then automatically converted into executable modules for real-time intrusion detection. Detection models for new intrusions or specific (new) components of a network system are incorporated into an existing IDS through a meta-learning (or co-operative learning) process, which produces a meta detection model that combines evidence from multiple models. To efficiently compute only the "useful" patterns from the large amount of audit data, I modified the basic association rules and frequent episodes algorithms to use axis attribute(s) and reference attribute(s) as forms of item constraints to encode domain knowledge, and an iterative level-wise approximate mining procedure as a means to uncover the low frequency but important patterns.

Recently, I participated in the 1998 DARPA Intrusion Detection Evaluation program. The results showed that our system was one of the best IDSs among those submitted to the evaluation. It performed comparably well with the best knowledge engineered system. The detection models (classification rules) automatically constructed by our data mining framework were very effective (with high detection rates and low false positive rates) in detecting ``known'' intrusions (with instances in the training data) and ``new'' intrusions (with no instance seen in the training data) in several attack categories.

During Summer 1997, I was in IBM T. J. Watson Research Center, doing research in Information Economy. I implemented a prototype multi-agent system to simulate the market dynamics of information filtering.

During Summer 1996, I was in the Network Services Research Lab, AT&T Labs - Research, Murray Hill, New Jersey, where I did research in distributed data visualization environments. I designed and implemented a Java-based DAGs drawing and viewing system.

From Fall 1994 through Spring 1996, I was in the Programming Systems Laboratory (PI: Gail Kaiser), Computer Science Department, Columbia University. I did research in software development environments and collaborative workflow systems. I developed several modules of Oz, a workflow system, and applied Oz technologies to healthcare.