Technical Reports | Department of Computer Science, Columbia University

Search By Year

Title	Authors	Published	Abstract	Publication Details
From Brain-Computer Interfaces to AI-Enhanced Diagnostics: Developing Cutting-Edge Tools for Medical and Interactive Technologies	Haowen Wei	2024-06-24	This thesis presents a series of studies that explore advanced computational techniques and interfaces in the domain of human-computer interaction (HCI), specifically focusing on brain-computer interfaces (BCIs), vision transformers for medical diagnosis, and eye-tracking input systems. The first study introduces PhysioLabXR, a Python platform designed for real-time, multi-modal BCI and extended reality experiments. This platform enhances the interaction in neuroscience and HCI by integrating physiological signals with computational models, supporting sophisticated data analysis and visualization tools that cater to a wide range of experimental needs. The second study delves into the application of vision transformers to the medical field, particularly for glaucoma diagnosis. We developed an expert knowledge-distilled vision transformer that leverages deep learning to analyze ocular images, providing a highly accurate and non-invasive tool for detecting glaucoma, thereby aiding in early diagnosis and treatment strategies. The third study explores SwEYEpe, an innovative eye-tracking input system designed for text entry in virtual reality (VR) environments. By leveraging eye movement data to predict text input, SwEYEpe offers a novel method of interaction that enhances user experience by minimizing physical strain and optimizing input efficiency in immersive environments. Together, these studies contribute to the fields of HCI and medical informatics by providing robust tools and methodologies that push the boundaries of how we interact with and through computational systems. This thesis not only demonstrates the feasibility of these advanced technologies but also paves the way for future research that could further integrate such systems into everyday applications for enhanced interaction and diagnostic processes.	(pdf) (ps)
Sample efficient learning of alternative splicing with active learning and deep generative models	Ting Chen	2024-05-06	Recent works have utilized massively parallel reporter assays (MPRAs) which combine high throughput sequencing with extensive sequence variation in order to measure the regulatory activity of sequences. However, these studies often rely on generating and assaying large libraries of randomized sequences. In this work, we look to develop data efficient methods for designing MPRAs by utilizing uncertainty estimates in an active learning setting. These uncertainty estimates are then used to identify subsequent sequences of interest to test and label. We evaluate our methods in an alternative splicing prediction task and overall find that Gaussian Process (GP) based methods outperform other baselines that utilize lower quality uncertainty estimates. We also present a preliminary findings motivating the use of deep generative models to produce synthetic candidate sequences to assay.	(pdf) (ps)
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning	Ryan Shea	2024-01-07	Maintaining a consistent persona is a key quality for any open domain dialogue system. Current state-of-the-art systems do this by training agents with supervised learning or online reinforcement learning (RL). However, systems trained with supervised learning often lack consistency as they are never punished for uttering contradictions. Additional training with RL can alleviate some of these issues, however the training process is expensive. Instead, we propose an offline RL framework to improve the persona consistency of dialogue systems. Our framework allows us to combine the advantages of previous methods as we can inexpensively train our model on existing data as in supervised learning, while punishing and rewarding specific utterances as in RL. We also introduce a simple importance sampling method to reduce the variance of importance weights in offline RL training which we call Variance-Reducing MLE-Initialized (VaRMI) importance sampling. Our automatic and human evaluations show that our framework improves both the persona consistency and dialogue quality of a state-of-the-art social chatbot.	(pdf) (ps)
Beyond Split-Reads: Leveraging Pseudoalignment for Enhanced Splicing Event Detection in Low-Coverage Data with Leafcutter	Xingpei Zhang	2024-01-05	The precise definition and detection of alternative splicing (AS) have long posed challenges for researchers. Previous studies have introduced various methods to address this issue, including rMATS, DEXSeq, and Leafcutter (Anders et al., 2012; Shen et al., 2014; Li et al., 2018). Leafcutter, in particular, uses intron excision ratios to identify differential splicing events. While these methods perform adequately on high-coverage bulk RNA-seq data, their performance deteriorates on low-coverage data, particularly single-cell data, due to limited supporting reads for differential splicing events. To tackle this issue, we leverage Salmon, a pseudoalignment tool, to enhance the detection capabilities of Leafcutter in low-coverage bulk and single-cell pseudobulk data (Patro et al., 2017; Li et al., 2018). Here, we present LeafcutterITI (for “Isoform to Intron”), a modified version of Leafcutter, to quantify introns from the abundance of corresponding isoforms, rather than relying on split reads alone. This method permits more reads to support the intron excision events than what a split-read approach can offer. We validate the performance of this modification through simulations and on real data. Lastly, we analyze single-cell pseudobulk data from the Tabula Muris project (Tabula Muris Consortium et al., 2018) for seven non-myeloid cell types in the mouse brain and identify the intron excision events that are differential used across these cell types.	(pdf) (ps)
Computer Vision-Powered Applications for Interpreting and Interacting with Movement	Basel Nitham Hindi	2023-12-24	Movement and our ability to perceive it are core elements of the human experience. To bridge the gap between artificial intelligence research and the daily lives of people, this thesis explores leveraging advancements in the field of computer vision to enhance human experiences related to movement. Through two projects, I leverage computer vision to aid Blind and Low Vision (BLV) people in perceiving sports gameplay, and provide navigation assistance for pedestrians in outdoor urban environments. I present Front Row, a system that enables BLV viewers to interpret tennis matches through immersive audio cues, along with StreetNav, a system that repurposes street cameras for real-time, precise outdoor navigation assistance and environmental awareness. User studies and technical evaluations demonstrate the potential of these systems in augmenting people’s experiences perceiving and interacting with movement. This exploration also uncovers challenges in deploying such solutions along with opportunities in the design of future technologies.	(pdf) (ps)
Advancing Few-Shot Multi-Label Medication Prediction in Intensive Care Units: The FIDDLE-Rx Approach	Xinghe Chen	2023-12-04	Contemporary intensive care units (ICUs) are navigating the challenge of enhancing medical service quality amidst financial and resource constraints. Machine learning models have surfaced as valuable tools in this context, showcasing notable effectiveness in supporting healthcare delivery. Despite advancements, a gap remains in real-time medical interventions. To bridge this gap, we introduce FIDDLE-Rx, a novel, data-driven machine learning approach designed specifically for real-time medication recommendations in ICUs. This method leverages the eICU Collaborative Research Database (eICU-CRD) for its analysis, which encompasses diverse electronic health records from ICUs (ICU-EHRs) sourced from multiple critical care centers across the US. FIDDLE-Rx employs the Flexible Data-Driven Pipeline (FIDDLE) for transforming tabular data into binary matrix representations and standardizes medication labels using the RxNorm (Rx) API. With the processed dataset, FIDDLE-Rx applies various machine learning models to forecast the requirements for 238 medications. Compared with previous studies, FIDDLE-Rx stands out by extending the scope of the research of ICU-EHRs beyond mortality prediction, offering a more comprehensive approach to enhancing critical care. The experimental results of our models demonstrate high efficacy, evidenced by their impressive performance across two key metrics: the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Remarkably, these results were achieved even when the model was trained with just 20% of the database, underlining its strong generalizability. By broadening the scope of ICU-EHRs research to encompass real-time medication recommendations, FIDDLE-Rx presents a scalable and effective solution for improving patient care in intensive care environments.	(pdf) (ps)
Koopman Constrained Policy Optimization: A Koopman operator theoretic method for differentiable optimal control in robotics	Matthew Retchin	2023-05-10	Deep reinforcement learning has recently achieved state-of-the-art results for robotic control. Robots are now beginning to operate in unknown and highly nonlinear environments, expanding their usefulness for everyday tasks. In contrast, classical control theory is not suitable for these unknown, nonlinear environments. However, it retains an immense advantage over traditional deep reinforcement learning: guaranteed satisfaction of hard constraints, which is critically important for the performance and safety of robots. This thesis introduces Koopman Constrained Policy Optimization (KCPO), combining implicitly differentiable model predictive control with a deep Koopman autoencoder. KCPO brings new optimality guarantees to robot learning in unknown and nonlinear dynamical systems. The use of KCPO is demonstrated in Simple Pendulum and Cartpole with continuous state and action spaces and unknown environments. KCPO is shown to be able to train policies end-to-end with hard box constraints on controls. Compared to several baseline methods, KCPO exhibits superior generalization to constraints that were not part of its training.	(pdf) (ps)
Formal Verification of a Multiprocessor Hypervisor on Arm Relaxed Memory Hardware	Runzhou Tao, Jianan Yao, Xupeng Li, Shih-Wei Li, Jason Nieh, Ronghui Gu	2021-06-01	As Arm servers are increasingly used by cloud providers, the complexity of its system software, such as operating systems and hypervisors, poses a growing security risk, as large codebases contain many vulnerabilities. While formal verification offers a potential solution for secure concurrent systems software, existing approaches have not been able to prove the correctness of systems software on Arm relaxed memory hardware. We introduce VRM, a new framework that can be used to verify kernel-level system software which satisfies a set of synchronization and memory access properties such that these programs can be mostly verified on a sequentially consistent hardware model and the proofs will automatically hold on Arm relaxed memory hardware. VRM can be used to verify concurrent kernel code that is not data race free, which is typical for kernel code responsible for managing shared page tables. Using VRM, we prove for the first time the security guarantees of a retrofitted implementation of the Linux KVM multiprocessor hypervisor on Arm. For multiple versions of KVM, we prove KVM’s security properties on a sequentially consistent model, then prove that KVM satisfies VRM’s required program properties such that its security proofs hold for Arm relaxed memory hardware. Our experimental results across multiple verified KVM versions show that the retrofit does not adversely affect the scalability of verified KVM, as it performs comparably to unmodified KVM when running many virtual machines concurrently with real application workloads on Arm server hardware.	(pdf) (ps)
Topics in Landmarking and Elementwise Mapping	Mehmet Kerem Turkcan	2021-04-12	In this thesis, we consider a number of different landmarking and elementwise mapping problems and propose solutions that are thematically interconnected with each other. We consider diverse problems ranging from landmarking to deep dictionary learning, pan-sharpening, compressive sensing magnetic resonance imaging and microgrid control, introducing novelties that go beyond the state of the art for the problems we discuss. We start by introducing a manifold landmarking approach trainable via stochastic gradient descent that allows for the consideration of structural regularization terms in the objective. We extend the approach for semi-supervised learning problems, showing that it is able to achieve comparable or better results than equivalent $k$-means based approaches. Inspired by these results, we consider an extension of this approach for general supervised and semi-supervised classification for structurally similar deep neural networks with self-modulating radial basis kernels. Secondly, we consider convolutional networks that perform image-to-image mappings for the problems of pan-sharpening and compressive sensing magnetic resonance imaging. Using extensions of deep state of the art image-to-image mapping architectures specifically tailored for these problems, we show that they could be addressed naturally and effectively. After this, we move on to describe a method for multilayer dictionary learning and feedforward sparse coding by formulating the dictionary learning problem using a general deep learning layer architecture inspired by analysis dictionary learning. We find this method to be significantly faster to train than classical online dictionary learning approaches and capable of addressing supervised and semi-supervised classification problems more naturally. Lastly, we look at the problem of per-user power supply delivery on a microgrid powered by solar energy. Using real-world data obtained via The Earth Institute, we consider the problem of deciding the amount of power to supply to all each user for a certain period of time given their current power demand as well as past demand/supply data. We approach the problem as one of demand-to-supply mapping, providing results for a policy network trained via regular propagation for worst-case control and classical deep reinforcement learning.	(pdf) (ps)
SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation	Yiqing Liang, Boyuan Chen, Shuran Song	2021-04-06	This thesis focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment. To complete this task, the algorithm should simultaneously locate and navigate to an instance of the category. In comparison to the traditional point goal navigation, this task requires the agent to have a stronger contextual prior to indoor environments. This thesis introduces SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent’s navigation planning. Given a partial observation of the environment, SSCNav first infers a complete scene representation with semantic labels for the unobserved scene together with a confidence map associated with its own prediction. Then, a policy network infers the action from the scene completion result and confidence map. The experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies. Code and data: https://sscnav.cs.columbia.edu/	(pdf) (ps)
Semantic Controllable Image Generation in Few-shot Settings	Jianjin Xu	2021-04-06	Generative Adversarial Networks (GANs) are able to generate high-quality images, but it remains difficult to explicitly specify the semantics of synthesized images. In this work, we aim to better understand the semantic representation of GANs, and thereby enable semantic control in GAN’s generation process. Interestingly, we find that a well-trained GAN encodes image semantics in its internal feature maps in a surprisingly simple way: a linear transformation of feature maps suffices to extract the generated image semantics. To verify this simplicity, we conduct extensive experiments on various GANs and datasets; and thanks to this simplicity, we are able to learn a semantic segmentation model for a trained GAN from a small number (e.g., 8) of labeled images. Last but not least, leveraging our findings, we propose two few-shot image editing approaches, namely Semantic-Conditional Sampling and Semantic Image Editing. Given an unsupervised GAN and as few as eight semantic annotations, the user is able to generate diverse images subject to a user-provided semantic layout, and control the synthesized image semantics.	(pdf) (ps)
SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation	Yiqing Liang, Boyuan Chen, Shuran Song	2021-03-30	This paper focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment. To complete this task, the algorithm should simultaneously locate and navigate to an instance of the category. In comparison to the traditional point goal navigation, this task requires the agent to have a stronger contextual prior to indoor environments. We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent's navigation planning. Given a partial observation of the environment, SSCNav first infers a complete scene representation with semantic labels for the unobserved scene together with a confidence map associated with its own prediction. Then, a policy network infers the action from the scene completion result and confidence map. Our experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies. Code and data: https://sscnav.cs.columbia.edu/	(pdf) (ps)
Using Continous Logic Networks for Hardware Allocation	Anthony Saieva, Dennis Roellke, Suman Jana, Gail Kaiser	2020-08-04	Increased production efficiency combined with a slowdown in Moore's law and the end of Dennard scaling have made hardware accelerators increasingly important. Accelerators have become available on many different systems from the cloud to embedded systems. This modern computing paradigm makes specialized hardware available at scale in a way it never has before. While accelerators have shown great efficiency in terms of power consumption and performance, matching software functions with the best available hardware remains problematic without manual selection. Since there is some software representation of each accelerator's function, selection can be automated via code analysis. Static similarity analysis has traditionally been based on solving satisfiable modulo theorems (SMT), but continuous logic networks (CLNs) have provided a faster and more efficient alternative to traditional SMT-solving by replacing boolean functions with smooth estimations. These smooth estimates create the opportunity to leverage gradient descent to learn the solution. We present AccFinder, the first CLN-based code similarity solution and evaluate its effectiveness on a realistically complex accelerator benchmark.	(pdf) (ps)
SABER: Identifying SimilAr BEhavioR for Program Comprehension	Aditya Sridhar, Guanming Qiao, Gail Kaiser	2020-07-06	Modern software engineering practices rely on program comprehension as the most basic underlying component for improving developer productivity and software reliability. Software developers are often tasked to work with unfamiliar code in order to remove security vulnerabilities, port and refactor legacy code, and enhance software with new features desired by users. Automatic identification of behavioral clones, or behaviorally-similar code, is one program comprehension technique that can provide developers with assistance. The idea is to identify other code that "does the same thing" and that may be more intuitive; better documented; or familiar to the developer, to help them understand the code at hand. Unlike the detection of syntactic or structural code clones, behavioral clone detection requires executing workloads or test cases to find code that executes similarly on the same inputs. However, a key problem in behavioral clone detection that has not received adequate attention is the "preponderance of the evidence" problem, which advocates for more convincing evidence from nontrivial test case executions to gain confidence in the behavioral similarities. In other words, similar outputs for some inputs matter more than for others. We present a novel system, SABER, to address the "preponderance of the evidence" problem, for which we adapt the legal metaphor of "more likely to be true than not true" burden of proof. We develop a novel test case generation methodology with three primary dynamic analysis techniques for identifying important behavioral clones. Further, we investigate filtering and weighting schemes to guide developers toward the most convincing behavioral similarities germane to specific software engineering tasks, such as code review, debugging, and introducing new features.	(pdf) (ps)
A Secure and Formally Verified Linux KVM Hypervisor	Shih-Wei Li, Xupeng Li, Ronghui Gu, Jason Nieh, John Zhuang Hui	2020-06-27	Commodity hypervisors are widely deployed to support virtual machines (VMs) on multiprocessor hardware. Their growing complexity poses a security risk. To enable formal verification over such a large code base, we present MicroV, a microverification approach for verifying commodity multiprocessor hypervisors. MicroV retrofits an existing, full-featured hypervisor, into a large collection of untrusted hypervisor services, and a small, verifiable hypervisor core. MicroV introduces security-preserving layers to gradually prove that the implementation of the core refines its high-level layered specification, and ensure that security guarantees proven at the top layer are propagated down through all the layers, such that they hold for the entire implementation. MicroV supports proving noninterference in the sanctioned presence of encrypted data sharing, using data oracles to distinguish between intentional and unintentional information flow. Using MicroV, we retrofitted the Linux KVM hypervisor with only modest modifications to its code base and verify in Coq that the retrofitted KVM protects the confidentiality and integrity of VM data. Our work is the first machine-checked security proof for a commodity multiprocessor hypervisor.	(pdf) (ps)
Ad hoc Test Generation Through Binary Rewriting	Anthony Saieva, Gail Kaiser	2020-04-09	When a security vulnerability or other critical bug is not detected by the developers’ test suite, and is discovered post-deployment, developers must quickly devise a new test that reproduces the buggy behavior. Then the developers need to test whether their candidate patch indeed fixes the bug, without breaking other functionality, while racing to deploy before cyberattackers pounce on exposed user installations. This can be challenging when the bug discovery was due to factors that arose, perhaps transiently, in a specific user environment. If recording execution traces when the bad behavior occurred, record-replay technology faithfully replays the execution, in the developer environment, as if the program were executing in that user environment under the same conditions as the bug manifested. This includes intermediate program states dependent on system calls, memory layout, etc. as well as any externally-visible behavior. So the bug is reproduced, and many modern record-replay tools also integrate bug reproduction with interactive debuggers to help locate the root cause, but how do developers check whether their patch indeed eliminates the bug under those same conditions? State-of-the-art record-replay does not support replaying candidate patches that modify the program in ways that diverge program state from the original recording, but successful repairs necessarily diverge so the bug no longer manifests. This work builds on recordreplay, and binary rewriting, to automatically generate and run tests for candidate patches. These tests reflect the arbitrary (ad hoc) user and system circumstances that uncovered the vulnerability, to check whether a patch indeed closes the vulnerability but does not modify the corresponding segment of the program’s core semantics. Unlike conventional ad hoc testing, each test is reproducible and can be applied to as many prospective patches as needed until developers are satisfied. The proposed approach also enables users to make new recordings of her own workloads with the original version of the program, and automatically generate and run the corresponding ad hoc tests on the patched version, to validate that the patch does not introduce new problems before adopting.	(pdf) (ps)
Privacy Threats from Seemingly Innocuous Sensors	Shirish Singh, Anthony Saieva, Gail Kaiser	2020-04-09	Smartphones incorporate a plethora of diverse and powerful sensors that enhance user experience. Two such sensors are the accelerometer and gyroscope, which measure acceleration in all three spatial dimensions and rotation along the three axes of the smartphone, respectively. These sensors are often used by gaming and fitness apps. Unlike other sensors deemed to carry sensitive user data, such as GPS, camera, and microphone, the accelerometer and gyroscope do not require user permission on Android to transmit data to apps. This paper presents our IRB-approved studies showing that the accelerometer and gyroscope gather sufficient data to quickly infer the user's gender. We started with 33 in-person participants, with 88% accuracy, and followed up with 259 on-line participants to show the effectiveness of our technique. Our unobtrusive ShakyHands technique draws on these sensors to deduce additional demographic attributes that might be considered sensitive information, notably pregnancy. We have implemented ShakyHands for Android as an app, available from Google Play store, and as a Javascript browser web-app for Android and iOS smartphones. We show that even a low-skilled attacker, without expertise in signal processing or deep learning, can succeed at inferring demographic information such as gender and pregnancy. Our approach does not require tampering with the victim's device or specialized hardware; all our study participants used their own phones.	(pdf) (ps)
The FHW Project: High-Level Hardware Synthesis from Haskell Programs	Stephen A. Edwards	2019-08-04	The goal of the FHW project was to produce a compiler able to translate programs written in a functional language (we chose Haskell) into synthesizable RTL (we chose SystemVerilog) suitable for execution on an FPGA or ASIC that was highly parallel. We ultimately produced such a compiler, relying on the Glasgow Haskell Compiler (GHC) as a front-end and writing our own back-end that performed a series of lowering transformations to restructure such constructs as recursion, polymorphism, and frst-order functions, into a form suitable for hardware, then transform the now-restricted functional IR into a datafow representation that is then finally transformed into synthesizable SystemVerilog.	(pdf) (ps)
Compiling Irregular Software to Specialized Hardware	Richard Townsend	2019-06-05	High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an accelerator’s behavior in a “high-level” language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications. In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism. This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before.	(pdf) (ps)
Extractive Text Summarization Methods Inspired By Reinforcement Learning for Better Generalization	Yan Virin	2019-05-23	This master thesis opens with a description of several text summarization methods based on machine learning approaches inspired by reinforcement learning. While in many cases Maximum Likelihood Estimation (MLE) approaches work well for text summarization, they tend to suffer from poor generalization. We show that techniques which expose the model to more opportunities to learn from data tend to generalize better and generate summaries with less lead bias. In our experiments we show that out of the box these new models do not perform significantly better than MLE when evaluated using Rouge, however do possess interesting properties which may be used to assemble more sophisticated and better performing summarization systems. The main theme of the thesis is getting machine learning models to generalize better using ideas from reinforcement learning. We develop a new labeling scheme inspired by Reward Augmented Maximum Likelihood (RAML) methods developed originally for the machine translation task, and discuss how difficult it is to develop models which sample from their own distribution while estimating the gradient e.g. in Minimum Risk Training (MRT) and Reinforcement Learning Policy Gradient methods. We show that RAML can be seen as a compromise between direct optimization of the model towards optimal expected reward using Monte Carlo methods which may fail to converge, and standard MLE methods which fail to explore the entire space of summaries, overfit during training by capturing prominent position features and thus perform poorly on unseen data. To that end we describe and show results of domain transfer experiments, where we train the model on one dataset and evaluate on another, and position distribution experiments, in which we show how the distribution of positions of our models differ from the distribution in MLE. We also show that our models work better on documents which are less lead biased, while standard MLE models get significantly worse performance on those documents in particular. Another topic covered in the thesis is Query Focused text summarization, where a search query is used to produce a summary with the query in mind. The summary needs to be relevant to the query, rather than solely contain important information from the document. We use ii the recently published Squad dataset and adapt it for the Query Focused summarization task. We also train deep learning Query Focused models for summarization and discuss problems associated with that approach. Finally we describe a method to reuse an already trained QA model for the Query Focused text summarization by introducing a reduction of the QA task into the Query Focused text summarization. The source code in python for all the techniques and approaches described in this thesis are available at https://github.com/yanvirin/material.	(pdf) (ps)
Easy Email Encryption with Easy Key Management	John S. Koh, Steven M. Bellovin, Jason Nieh	2018-10-02	Email privacy is of crucial importance. Existing email encryption approaches are comprehensive but seldom used due to their complexity and inconvenience. We take a new approach to simplify email encryption and improve its usability by implementing receiver-controlled encryption: newly received messages are transparently downloaded and encrypted to a locally-generated key; the original message is then replaced. To avoid the problem of users having to move a single private key between devices, we implement per-device key pairs: only public keys need be synchronized to a single device. Compromising an email account or email server only provides access to encrypted emails. We have implemented this scheme for both Android and as a standalone daemon; we show that it works with both PGP and S/MIME, is compatible with widely used email services including Gmail and Yahoo! Mail, has acceptable overhead, and that users consider it intuitive and easy to use.	(pdf) (ps)
Analysis of the CLEAR Protocol per the National Academies' Framework	Steven M. Bellovin, Matt Blaze, Dan Boneh, Susan Landau, Ronald L. Rivest	2018-05-10	The debate over "exceptional access"--the government’s ability to read encrypted data--has been going on for many years and shows no signs of resolution any time soon. On the one hand, some people claim it can be accomplished safely; others dispute that. In an attempt to make progress, a National Academies study committee propounded a framework to use when analyzing proposed solutions. We apply that framework to the CLEAR protocol and show the limitations of the design.	(pdf) (ps)
Robot Learning in Simulation for Grasping and Manipulation	Beatrice Liang	2018-05-10	Teaching a robot to acquire complex motor skills in complicated environments is one of the most ambitious problems facing roboticists today. Grasp planning is a subset of this problem which can be solved through complex geometric and physical analysis or computationally expensive data driven analysis. As grasping problems become more difficult, building analytical models becomes challenging. Consequently, we aim to learn a grasping policy through a simulation-based data driven approach. In this paper, we create and execute tests to evaluate simulator’s suitability for manipulating objects in highly constrained settings. We investigate methods for creating forward models of a robot’s dynamics, and apply a Model Free Reinforcement Learning approach with the goal of developing a grasping policy based solely on proprioception.	(pdf) (ps)
Partial Order Aware Concurrency Sampling	Xinhao Yuan, Junfeng Yang, Ronghui Gu	2018-04-15	We present POS, a concurrency testing approach that directly samples the partial orders of a concurrent program. POS uses a novel priority-based scheduling algorithm that naturally considers partial order information dynamically, and guarantees that each partial order will be explored with significant probability. This probabilistic guarantee of error detection is exponentially better than state-of-the-art sampling approaches. Besides theoretical guarantees, POS is extremely simple and lightweight to implement. Evaluations show that POS is effective in covering the partial-order space of micro-benchmarks and finding concurrency bugs in real-world programs such as Firefox’s JavaScript engine SpiderMonkey.	(pdf) (ps)
Stretchcam: zooming using thin, elastic optics	Daniel Sims, Oliver Cossairt, Yonghao Yue, Shree Nayar	2017-12-31	Stretchcam is a thin camera with a lens capable of zooming with small actuations. In our design, an elastic lens array is placed on top of a sparse, rigid array of pixels. This lens array is then stretched using a small mechanical motion in order to change the field of view of the system. We present in this paper the characterization of such a system and simulations which demonstrate the capabilities of stretchcam. We follow this with the presentation of images captured from a prototype device of the proposed design. Our prototype system is able to achieve 1.5 times zoom when the scene is only 300 mm away with only a 3% change of the lens array’s original length.	(pdf) (ps)
Design and Implementation of IoT Android Commissioner	Andy Lianghua Xu, Jan Janak, Henning Schulzrinne	2017-09-21	As Internet of Things (IoT) devices gain more popularity, device management gradually becomes a major issue to IoT device users. To manage an IoT device, the user first needs to join it to an existing network. Then, the IoT device has to be authenticated by the user. The authentication process often requires a two-way communication between the new device and a trusted entity, which is typically a hand- held device owned by the user. To ease and standardize this process, we present the Device Enrollment Protocol (DEP) as a solution to the enrollment problem described above. Starting from DEP, we then showcase the design of an IoT device commissioner and its prototype implementation on Android, named Android Commissioner. The application allows the user to authenticate IoT devices and join them to an existing protected network.	(pdf) (ps)
Searching for Meaning in RNNs using Deep Neural Inspection	Kevin Lin, Eugene Wu	2017-06-01	Recent variants of Recurrent Neural Networks (RNNs)---in particular, Long Short-Term Memory (LSTM) networks---have established RNNs as a deep learning staple in modeling sequential data in a variety of machine learning tasks. However, RNNs are still often used as a black box with limited understanding of the hidden representation that they learn. Existing approaches such as visualization are limited by the manual effort to examine the visualizations and require considerable expertise, while neural attention models change, rather than interpret, the model. We propose a technique to search for neurons based on existing interpretable models, features, or programs.	(pdf)
Reliable Synchronization in Multithreaded Servers	Rui Gu	2017-05-15	State machine replication (SMR) leverages distributed consensus protocols such as PAXOS to keep multiple replicas of a program consistent in face of replica failures or network partitions. This fault tolerance is enticing on implementing a principled SMR system that replicates general programs, especially server programs that demand high availability. Unfortunately, SMR assumes deterministic execution, but most server programs are multithreaded and thus non-deterministic. Moreover, existing SMR systems provide narrow state machine interfaces to suit specific programs, and it can be quite strenuous and error-prone to orchestrate a general program into these interfaces This paper presents CRANE, an SMR system that trans- parently replicates general server programs. CRANE achieves distributed consensus on the socket API, a common interface to almost all server programs. It leverages deterministic multithreading (specifically, our prior system PARROT) to make multithreaded replicas deterministic. It uses a new technique we call time bubbling to efficiently tackle a difficult challenge of non-deterministic network input timing. Evaluation on five widely used server programs (e.g., Apache, ClamAV, and MySQL) shows that CRANE is easy to use, has moderate overhead, and is robust.	(pdf)
Deobfuscating Android Applications through Deep Learning	Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Baishakhi Ray	2017-05-12	Android applications are nearly always obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables, modifying the control flow of methods, or inserting additional code. Prior approaches toward automated deobfuscation of Android applications have relied on certain structural parts of apps remaining as landmarks, un-touched by obfuscation. For instance, some prior approaches have assumed that the structural relation- ships between identifiers (e.g. that A represents a class, and B represents a field declared directly in A) are not broken by obfuscators; others have assumed that control flow graphs maintain their structure (e.g. that no new basic blocks are added). Both approaches can be easily defeated by a motivated obfuscator. We present a new approach to deobfuscating Android apps that leverages deep learning and topic modeling on machine code, MACNETO. MACNETO makes few assumptions about the kinds of modifications that an obfuscator might perform, and we show that it has high precision when applied to two different state-of-the-art obfuscators: ProGuard and Allatori.	(pdf) (ps)
Analysis of Super Fine-Grained Program Phases	Van Bui, Martha A. Kim	2017-04-18	Dynamic reconfiguration systems guided by coarse-grained program phases has found success in improving overall program performance and energy efficiency. These performance/energy savings are limited by the granularity that program phases are detected since phases that occur at a finer granularity goes undetected and reconfiguration opportunities are missed. In this study, we detect program phases using interval sizes on the order of tens, hundreds, and thousands of program cycles. This is in stark contrast with prior phase detection studies where the interval size is on the order of several thousands to millions of cycles. The primary goal of this study is to begin to fill a gap in the literature on phase detection by characterizing super fine-grained program phases and demonstrating an application where detection of these relatively short-lived phases can be instrumental. Traditional models for phase detection including basic block vectors and working set signatures are used to detect super fine-grained phases as well as a less traditional model based on microprocessor activity. Finally, we show an analytical case study where super fine-grained phases are applied to voltage and frequency scaling optimizations.	(pdf)
Understanding and Detecting Concurrency Attacks	Rui Gu, Bo Gan, Jason Zhao, Yi Ning, Heming Cui, Junfeng Yang	2016-12-30	Just like bugs in single-threaded programs can lead to vulnerabilities, bugs in multithreaded programs can also lead to concurrency attacks. Unfortunately, there is little quantitative data on how well existing tools can detect these attacks. This paper presents the first quantitative study on concurrency attacks and their implications on tools. Our study on 10 widely used programs reveals 26 concurrency attacks with broad threats (e.g., OS privilege escalation), and we built scripts to successfully exploit 10 attacks. Our study further reveals that, only extremely small portions of inputs and thread interleavings (or schedules) can trigger these attacks, and existing concurrency bug detectors work poorly because they lack help to identify the vulnerable inputs and schedules. Our key insight is that the reports in existing detectors have implied moderate hints on what inputs and schedules will likely lead to attacks and what will not (e.g., benign bug reports). With this insight, this paper presents a new directed concurrency attack detection approach and its implementation, OWL. It extracts hints from the reports with static analysis, augments existing detectors by pruning out the benign inputs and schedules, and then directs detectors and its own runtime vulnerability verifiers to work on the remaining, likely vulnerable inputs and schedules. Evaluation shows that OWL reduced 94.3% reports caused by benign inputs or schedules and detected 7 known concurrency attacks. OWL also detected 3 previously unknown concurrency attacks, including a use-after-free attack in SSDB confirmed as CVE-2016-1000324, an integer overflow, HTML integrity violation in Apache and three new MySQL data races confirmed with bug ID 84064, 84122, 84241. All OWL source code, exploit scripts, and results are available at https://github.com/ruigulala/ConAnalysis.	(pdf) (ps)
Mysterious Checks from Mauborgne to Fabyan	Steven M. Bellovin	2016-11-28	It has long been known that George Fabyan's Riverbank Laboratories provided the U.S. military with cryptanalytic and training services during World War I. The relationship has always be seen as voluntary. Newly discovered evidence raises the question of whether Fabyan was in fact paid, at least in part, for his services, but available records do not provide a definitive answer.	(pdf)
Further Information on Miller's 1882 One-Time Pad	Steven M. Bellovin	2016-11-25	New information has been discovered about Frank Miller's 1882 one-time pad. These documents explain Miller's threat model and show that he had a reasonably deep understanding of the problem; they also suggest that his scheme was used more than had been supposed.	(pdf)
Kensa: Sandboxed, Online Debugging of Production Bugs with No Overhead	Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser	2016-10-27	Short time-to-bug localization and resolution is extremely important for any 24x7 service-oriented application. In this work, we present a novel-mechanism which allows debugging of production systems on-the-fly. We leverage user-space virtualization technology (OpenVZ/LXC), to launch replicas from running instances of a production application, thereby having two containers: prod uction (which provides the real output), and debug (for debugging). The debug container provides a sandbox environment for debugging without any perturbation to the production environment. Customized network-proxy agents asynchronously replicate and replay network inputs from clients to both the production and debug-container, as well as safely discard all network output from the debug-container. We evaluated this low-overhead record and replay technique on five real-world applications, finding that it was effective at reproducing real bugs. In comparison to existing monitoring solutions which can slow-down production applications, Kensa allows application monitoring at â€œzero-overheadâ€.	(pdf) (ps)
Discovering Functionally Similar Code with Taint Analysis	Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Simha Sethumadhavan	2016-09-30	Identifying similar code in software systems can assist many software engineering tasks such as program understanding and software refactoring. While most approaches focus on identifying code that looks alike, some techniques aim at detecting code that functions alike. Detecting these functional clones â€” code that functions alike â€” in object oriented languages remains an open question because of the difficulty in exposing and comparing programsâ€™ functionality effectively, in general cases undecidable. We propose a novel technique, In-Vivo Clone Detection, which detects functional clones in arbitrary programs by identifying and mining their inputs and outputs. The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputs. Further, to identify inputs and outputs of programs appropriately, we use the techniques of static and dynamic data flow analysis. These enhancements mitigate the problems in object oriented languages with respect to identifying program I/Os as reported by prior work. We implement such techniques in our system, HitoshiIO, which is open source and freely available. Our experimental results show that HitoshiIO detects âˆ¼ 900 and âˆ¼ 2, 000 functional clones by static and dynamic data flow analysis, respectively, across a corpus of 118 projects. In a random sample of the detected clones by the static data flow analysis, HitoshiIO achieves 68+% true positive rate with only 15% false positive rate.	(pdf) (ps)
Heterogeneous Multi-Mobile Computing	Naser AlDuaij, Alexander Van't Hof, Jason Nieh	2016-08-02	As smartphones and tablets proliferate, there is a growing need to provide ways to combine multiple mobile systems into more capable ones, including using multiple hardware devices such as cameras, displays, speakers, microphones, sensors, and input. We call this multi-mobile computing. However, the tremendous device, hardware, and software heterogeneity of mobile systems makes this difficult in practice. We present M2, a system for multi-mobile computing across heterogeneous mobile systems that enable new ways of sharing and combining multiple devices. M2 leverages higher-level device abstractions and encoding and decoding hardware in mobile systems to define a client-server device stack that shares devices seamlessly across heterogeneous systems. M2 introduces device transformation, a new technique to mix and match heterogeneous input and output device data including rich media content. Example device transformations for transparently making existing unmodified apps multi-mobile include fused devices, which combine multiple devices into a more capable one, and translated devices, which can substitute use of one type of device for another. We have implemented an M2 prototype on Android that operates across heterogeneous hardware and software, including multiple versions of Android and iOS devices, the latter allowing iOS users to also run Android apps. Our results using unmodified apps from Google Play show that M2 can enable apps to be combined in new ways, and can run device-intensive apps across multiple mobile systems with modest overhead and qualitative performance indistinguishable from using local device hardware.	(pdf) (ps)
Why Are We Permanently Stuck in an Elevator? A Software Engineering Perspective on Game Bugs	Iris Zhang	2016-06-01	In the past decade, the complexity of video games have increased dramatically and so have the complexity of software systems behind them. The difficulty in designing and testing games invariably leads to bugs that manifest themselves across funny video reels on graphical glitches and millions of submitted support tickets. This paper presents an analysis of game developers and their teams who have knowingly released bugs to see what factors may motivate them in doing so. It examines different development environments as well as inquiring into varied types of game platforms and play-style. Above all, it seeks out how established research on software development best practices and challenges should inform understanding of these bugs. These findings may lead to targeted efforts to mitigate some of the factors leading to glitches, tailored to the specific needs of the game development team.	(pdf)
Software Engineering Methodologies and Life	Scott Lennon	2016-06-01	The paradigms of design patterns and software engineering methodologies are methods that apply to areas outside the software space. As a business owner and student, I implement many software principles daily in both my work and personal life. After experiencing the power of Agile methodologies outside the scope of software engineering, I always think about how I can integrate the computer science skills that I am learning at Columbia in my life. For my study, I seek to learn about other software engineering development processes that can be useful in life. I theorize that if a model such as Agile can provide me with useful tools, then a model that the government and most of the world trusts should have paradigms I can learn with as well. The software model I will study is open source software (OSS). My research examines the lateral software standards of (OSS) and closed source software (CSS). For the scope of this paper, I will focus on research primarily on Linux as the OSS model and Agile as the CSS model. OSS has had an extraordinary impact on the software revolution [1], and CSS models have gained such popularity that itâ€™s paradigms extend far beyond the software engineering space. Before delving into research, I thought the methodologies of OSS and CSS would be radically different. My study shall describe the similarities that exist between these two methodologies. In the process of my research, I was able to implement the values and paradigms that define the OSS development model to work more productively in my business. Software engineering core values and models can be used as a tool to improve our lives.	(pdf)
User Study: Programming Understanding from Similar Code	Anush Ramsurat	2016-06-01	The aim of the user study conducted is primarily threefold: â€¢ To accurately judge, based on a number of parameters, whether showing similar code helps in code comprehension. â€¢ To investigate separately, a number of cases involving dynamic code, static code, the effect of options on accuracy of responses, and so on. â€¢ To distribute the user survey, collect data, responses and feedback from the user study and draw conclusions.	(pdf)
YOLO: A New Security Architecture for Cyber-Physical Systems	Miguel Arroyo, Jonathan Weisz, Simha Sethumadhavan, Hidenori Kobayashi, Junfeng Yang	2016-05-24	Cyber-physical systems (CPS) are defined by their unique characteristics involving both the cyber and physical domains. Their hybrid nature introduces new attack vectors but also provides an opportunity to design of new security architectures. In this work, we present YOLO,--- You Only Live Once --- a security architecture that leverages two unique physical properties of a CPS, inertia: the tendency of objects to stay at rest or in motion, and its built-in reliability to intermittent faults to survive CPS attacks. At a high level, YOLO aims to use a new diversified variant for every new sensor input to the CPS'. The delays involved in YOLO, viz., the delays for rebooting and diversification, are easily absorbed by the CPS because of the inherent inertia and their ability to withstand minor perturbations. We implement YOLO on an open source Car Engine Control Unit, and with measurements from a real race car engine show that YOLO is imminently practical.	(pdf) (ps)
Identifying Functionally Similar Code in Complex Codebases	Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Simha Sethumadhavan	2016-02-18	Identifying similar code in software systems can assist many software engineering tasks, including program understand- ing. While most approaches focus on identifying code that looks alike, some researchers propose to detect instead code that functions alike, which are known as functional clones. However, previous work has raised the technical challenges to detect these functional clones in object oriented languages such as Java. We propose a novel technique, In-Vivo Clone Detection, a language-agnostic technique that detects functional clones in arbitrary programs by observing and mining inputs and outputs. We implemented this technique targeting programs that run on the JVM, creating HitoshiIO (available freely on GitHub), a tool to detect functional code clones. Our experimental results show that it is powerful in detecting these functional clones, finding 185 methods that are functionally similar across a corpus of 118 projects, even when there are only very few inputs available. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate, while the false positive rate is only 15%.	(pdf)
Cambits: A Reconfigurable Camera System	Makoto Odamaki, Shree K. Nayar	2016-02-11	Cambits is a set of physical blocks that can be used to build a wide variety of cam-eras with different functionalities. A unique feature of Cambits is that it is easy and quick to reconfigure. The blocks are assembled using magnets, without any screws or cables. When two blocks are attached, they are electrically connected by spring-loaded pins that carry power, data and control signals. With this novel architecture we can use Cambits to configure various imaging systems. The host computer al-ways knows the current configuration and presents the user with a menu of functionalities that the configuration can perform. We demonstrate a wide range of computational photography methods including HDR, wide angle, panoramic, collage, kaliedoscopic, post-focus, light field and stereo imaging. Cambits can even be used to configure a microscope. Cambits is a scalable system, allowing new blocks and accompanying software to be added to the existing set.	(pdf) (ps)
Grandet: A Unified, Economical Object Store for Web Applications	Yang Tang, Gang Hu, Xinhao Yuan, Lingmei Weng, Junfeng Yang	2016-02-02	Web applications are getting ubiquitous every day because they offer many useful services to consumers and businesses. Many of these web applications are quite storage-intensive. Cloud computing offers attractive and economical choices for meeting their storage needs. Unfortunately, it remains challenging for developers to best leverage them to minimize cost. This paper presents Grandet, a storage system that greatly reduces storage cost for web applications deployed in the cloud. Grandet provides both a key-value interface and a file system interface, supporting a broad spectrum of web applications. Under the hood, it supports multiple heterogeneous stores, and unifies them by placing each data object at the store deemed most economical. We implemented Grandet on Amazon Web Services and evaluated Grandet on a diverse set of four popular open-source web applications. Our results show that Grandet reduces their cost by an average of 42.4%, and it is fast, scalable, and easy to use. The source code of Grandet is at http://columbia.github.io/grandet.	(pdf)
A Measurement Study of ARM Virtualization Performance	Christoffer Dall, Shih-Wei Li, Jintack Lim, Jason Nieh	2015-11-30	ARM servers are becoming increasingly common, making server technologies such as virtualization for ARM of grow- ing importance. We present the first in-depth study of ARM virtualization performance on ARM server hardware, including measurements of two popular ARM and x86 hypervisors, KVM and Xen. We show how the ARM hardware support for virtualization can support much faster transitions between the VM and the hypervisor, a key hypervisor operation. However, current hypervisor designs, including both KVM (Type 1) and Xen (Type 2), are not able to lever- age this performance benefit in practice for real application workloads. We discuss the reasons why and show that other factors related to hypervisor software design and implementation have a larger role in overall performance than the speed of micro architectural operations. Based on our measurements, we discuss changes to ARMâ€™s hardware virtualization support that can potentially bridge the gap to bring its faster virtual machine exit mechanism to modern Type 2 hypervisors running real applications. These changes have been incorporated into the latest ARM architecture.	(pdf)
Use of Fast Multipole to Accelerate Discrete Circulation-Preserving Vortex Sheets for Soap Films and Foams	Fang Da, Christopher Batty, Chris Wojtan, Eitan Grinspun	2015-11-07	We report the integration of a FMM (Fast Multipole Method) template library â€œFMMTLâ€ into the discrete circulation-preserving vortex sheets method to accelerate the Biot-Savart integral. We measure the speed-up on a bubble oscillation test with varying mesh resolution. We also report a few examples with higher complexity than previously achieved.	(pdf)
Hardware in Haskell: Implementing Memories in a Stream-Based World	Richard Townsend, Martha Kim, Stephen Edwards	2015-09-21	Recursive functions and data types pose significant challenges for a Haskell-to-hardware compiler. Directly translating these structures yields infinitely large circuits; a subtler approach is required. We propose a sequence of abstraction-lowering transformations that exposes time and memory in a Haskell program. producing a simpler form for hardware translation. This paper outlines these transformations on a specific example; future research will focus on generalizing and automating them in our group's compiler.	(pdf)
Improving System Reliability for Cyber-Physical Systems	Leon Wu	2015-09-14	Cyber-physical systems (CPS) are systems featuring a tight combination of, and coordination between, the systemâ€™s computational and physical elements. Cyber-physical systems include systems ranging from critical infrastructure such as a power grid and transportation system to health and biomedical devices. System reliability, i.e., the ability of a system to perform its intended function under a given set of environmental and operational conditions for a given period of time, is a fundamental requirement of cyber-physical systems. An unreliable system often leads to disruption of service, financial cost and even loss of human life. An important and prevalent type of cyber-physical system meets the following criteria: processing large amounts of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and an accountability requirement for safety critical systems. This thesis aims to improve system reliability for this type of cyber-physical system. To improve system reliability for this type of cyber-physical system, I present a system evaluation approach entitled automated online evaluation (AOE), which is a data-centric runtime monitoring and reliability evaluation approach that works in parallel with the cyber-physical system to conduct automated evaluation along the workflow of the system continuously using computational intelligence and self-tuning techniques and provide operator-in-the-loop feedback on reliability improvement. For example, abnormal input and output data at or between the multiple stages of the system can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and increased system reliability. One technique used by the approach is data quality analysis using computational intelligence, which applies computational intelligence in evaluating data quality in an automated and efficient way in order to make sure the running system perform reliably as expected. Another technique used by the approach is self-tuning which automatically self-manages and self-configures the evaluation system to ensure that it adapts itself based on the changes in the system and feedback from the operator. To implement the proposed approach, I further present a system architecture called autonomic reliability improvement system (ARIS). This thesis investigates three hypotheses. First, I claim that the automated online evaluation empowered by data quality analysis using computational intelligence can effectively improve system reliability for cyber-physical systems in the domain of interest as indicated above. In order to prove this hypothesis, a prototype system needs to be developed and deployed in various cyber-physical systems while certain reliability metrics are required to measure the system reliability improvement quantitatively. Second, I claim that the self-tuning can effectively self-manage and self-configure the evaluation system based on the changes in the system and feedback from the operator-in-the-loop to improve system reliability. Third, I claim that the approach is effcient. It should not have a large impact on the overall system performance and introduce only minimal extra overhead to the cyberphysical system. Some performance metrics should be used to measure the effciency and added overhead quantitatively. Additionally, in order to conduct efficient and cost-effective automated online evaluation for data-intensive CPS, which requires large volumes of data and devotes much of its processing time to I/O and data manipulation, this thesis presents COBRA, a cloud-based reliability assurance framework. COBRA provides automated multi-stage runtime reliability evaluation along the CPS workflow using data relocation services, a cloud data store, data quality analysis and process scheduling with self-tuning to achieve scalability, elasticity and efficiency. Finally, in order to provide a generic way to compare and benchmark system reliability for CPS and to extend the approach described above, this thesis presents FARE, a reliability benchmark framework that employs a CPS reliability model, a set of methods and metrics on evaluation environment selection, failure analysis, and reliability estimation. The main contributions of this thesis include validation of the above hypotheses and empirical studies of ARIS automated online evaluation system, COBRA cloud-based reliability assurance framework for data-intensive CPS, and FARE framework for benchmarking reliability of cyber-physical systems. This work has advanced the state of the art in the CPS reliability research, expanded the body of knowledge in this field, and provided some useful studies for further research.	(pdf)
Exploiting Visual Perception for Sampling-Based Approximation on Aggregate Queries	Daniel Alabi	2015-09-07	Efficient sampling algorithms have been developed for approximating answers to aggregate queries on large data sets. In some formulations of the problem, concentration inequalities (such as Hoeffdingâ€™s inequality) are used to estimate the confidence interval for an approximated aggregated value. Samples are usually chosen until the confidence interval is arbitrarily small enough regardless of how the approximated query answers will be used (for example, in interactive visualizations). In this report, we show how to exploit visualization-specific properties to reduce the sampling complexity of a sampling-based approximate query processing algorithm while preserving certain visualization guarantees (the visual property of relative ordering) with a very high probability.	(pdf)
Code Relatives: Detecting Similar Software Behavior	Fang-Hsiang Su, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser, Tony Jebara	2015-08-28	Detecting â€œsimilar codeâ€ is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term â€œcode relativesâ€ to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitorâ€™s 61%.	(pdf)
Dynamic Taint Tracking for Java with Phosphor (Formal Tool Demonstration)	Jonathan Bell, Gail Kaiser	2015-04-07	Dynamic taint tracking is an information flow analysis that can be applied to many areas of testing. Phosphor is the first portable, accurate and performant dynamic taint tracking system for Java. While previous systems for performing general-purpose taint tracking in the JVM required specialized research JVMs, Phosphor works with standard off-the-shelf JVMs (such as Oracle's HotSpot and OpenJDK's IcedTea). Phosphor also differs from previous portable JVM taint tracking systems that were not general purpose (e.g. tracked only tags on Strings and no other type), in that it tracks tags on all variables. We have also made several enhancements to Phosphor, allowing it to track taint tags through control flow (in addition to data flow), as well as allowing it to track an arbitrary number of relationships between taint tags (rather than be limited to only 32 tags). In this demonstration, we show how developers writing testing tools can benefit from Phosphor, and explain briefly how to interact with it.	(pdf)
Hardware Synthesis from a Recursive Functional Language	Kuangya Zhai, Richard Townsend, Lianne Lairmore, Martha A. Kim, Stephen A. Edwards	2015-04-01	Abstraction in hardware description languages stalled at the register-transfer level decades ago, yet few alternatives have had much success, in part because they provide only modest gains in expressivity. We propose to make a much larger jump: a compiler that synthesizes hardware from behavioral functional specifications. Our compiler translates general Haskell programs into a restricted intermediate representation before applying a series of semantics-preserving transformations, concluding with a simple syntax-directed translation to SystemVerilog. Here, we present the overall framework for this compiler, focusing on the IRs involved and our method for translating general recursive functions into equivalent hardware. We conclude with experimental results that depict the performance and resource usage of the circuitry generated with our compiler.	(pdf)
M2: Multi-Mobile Computing	Naser AlDuaij, Alexander Van't Hof, Jason Nieh	2015-03-31	With the widespread use of mobile systems, there is a growing demand for apps that can enable users to collaboratively use multiple mobile systems, including hardware device features such as cameras, displays, speakers, microphones, sensors, and input. We present M2, a system for multi-mobile computing by enabling remote sharing and combining of devices across multiple mobile systems. M2 leverages higher-level device abstractions and encoding and decoding hardware in mobile systems to define a cross-platform interface for remote device sharing to operate seamlessly across heterogeneous mobile hardware and software. M2 can be used to build new multi-mobile apps as well as make existing unmodified apps multi-mobile aware through the use of fused devices, which transparently combine multiple devices into a more capable one. We have implemented an M2 prototype on Android that operates across heterogeneous hardware and software, including using Android and iOS remote devices, the latter allowing iOS users to also run Android apps. Our results using unmodified apps from Google Play show that M2 can enable even display-intensive 2D and 3D games to use remote devices across multiple mobile systems with modest overhead and qualitative performance indistinguishable from using local device hardware.	(pdf)
Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing	Fang-Hsiang Su, Jonathan Bell, Christian Murphy, Gail Kaiser	2015-02-27	Metamorphic testing is an advanced technique to test programs and applications without a test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, KABU, which can dynamically infer properties that describe the characteristics of a program before and after transforming its input at the method level. Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort do identify them. This paper also proposes a new testing concept, Metamorphic Differential Testing (MDT). By comparing the MPs between different versions of the same application, KABU can detect potential bugs in the program. We have performed a preliminary evaluation of KABU by comparing the MPs detected by humans with the MPs detected by KABU. Our preliminary results are very promising: KABU can find more MPs as human developers, and its differential testing mechanism is effective at detecting function changes in programs.	(pdf)
DisCo: Displays that Communicate	Kensei Jo, Mohit Gupta, Shree Nayar	2014-12-16	We present DisCo, a novel display-camera communication system. DisCo enables displays and cameras to communicate with each other, while also displaying and capturing images for human consumption. Messages are transmitted by temporally modulating the display brightness at high frequencies so that they are imperceptible to humans. Messages are received by a rolling shutter camera which converts the temporally modulated incident light into a spatial flicker pattern. In the captured image, the flicker pattern is superimposed on the pattern shown on the display. The flicker and the display pattern are separated by capturing two images with different exposures. The proposed system performs robustly in challenging real-world situations such as occlusion, variable display size, defocus blur, perspective distortion and camera rotation. Unlike several existing visible light communication methods, DisCo works with off-the-shelf image sensors. It is compatible with a variety of sources (including displays, single LEDs), as well as reflective surfaces illuminated with light sources. We have built hardware prototypes that demonstrate DisCoâ€™s performance in several scenarios. Because of its robustness, speed, ease of use and generality, DisCo can be widely deployed in several CHI applications, such as advertising, pairing of displays with cell-phones, tagging objects in stores and museums, and indoor navigation.	(pdf)
The Internet is a Series of Tubes	Henning Schulzrinne	2014-11-28	This is a contribution for the November 2014 Dagstuhl workshop on affordable Internet access. The contribution describes the issues of availability, affordability and relevance, with a particular focus on the experience with providing universal broadband Internet access in the United States.	(pdf)
Making Lock-free Data Structures Verifiable with Artificial Transactions	Xinhao Yuan, David Williams-King, Junfeng Yang, Simha Sethumadhavan	2014-11-11	Among all classes of parallel programming abstractions, lock-free data structures are considered one of the most scalable and efficient because of their fine-grained style of synchronization. However, they are also challenging for developers and tools to verify because of the huge number of possible interleavings that result from fine-grained synchronizations. This paper address this fundamental problem between performance and verifiability of lock-free data structures. We present TXIT, a system that greatly reduces the set of possible interleavings by inserting transactions into the implementation of a lock-free data structure. We leverage hardware transactional memory support from Intel Haswell processors to enforce these artificial transactions. Evaluation on six popular lock-free data structures shows that TXIT makes it easy to verify lock-free data structures while incurring acceptable runtime overhead. Further analysis shows that two inefficiencies in Haswell are the largest contributors to this overhead.	(pdf)
Metamorphic Runtime Checking of Applications Without Test Oracles	Jonathan Bell, Chris Murphy, Gail Kaiser	2014-10-20	For some applications, it is impossible or impractical to know what the correct output should be for an arbitrary input, making testing difficult. Many machine-learning applications for â€œbig dataâ€, bioinformatics and cyberphysical systems fall in this scope: they do not have a test oracle. Metamorphic Testing, a simple testing technique that does not require a test oracle, has been shown to be effective for testing such applications. We present Metamorphic Runtime Checking, a novel approach that conducts metamorphic testing of both the entire application and individual functions during a programâ€™s execution. We have applied Metamorphic Runtime Checking to 9 machine-learning applications, finding it to be on average 170% more effective than traditional metamorphic testing at only the full application level.	(pdf)
Repeatable Reverse Engineering for the Greater Good with PANDA	Brendan Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, Ryan Whelan	2014-10-01	We present PANDA, an open-source tool that has been purpose-built to support whole system reverse engineering. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., is represented by only a few hundred MB. Further, PANDA leverages QEMU's support of thirteen different CPU architectures to make analyses of those diverse instruction sets possible within the LLVM IR. In this way, PANDA can have a single dynamic taint analysis, for example, that precisely supports many CPUs. PANDA analyses are written in a simple plugin architecture which includes a mechanism to share functionality between plugins, increasing analysis code re-use and simplifying complex analysis development. We demonstrate PANDA's effectiveness via a number of use cases, including enabling an old but legitimate version of Starcraft to run despite a lost CD key, in-depth diagnosis of an Internet Explorer crash, and uncovering the censorship activities and mechanisms of a Chinese IM client.	(pdf)
Detecting, Isolating and Enforcing Dependencies Between and Within Test Cases	Jonathan Bell	2014-07-06	Testing stateful applications is challenging, as it can be difficult to identify hidden dependencies on program state. These dependencies may manifest between several test cases, or simply within a single test case. When it's left to developers to document, understand, and respond to these dependencies, a mistake can result in unexpected and invalid test results. Although current testing infrastructure does not currently leverage state dependency information, we argue that it could, and that by doing so testing can be improved. Our results thus far show that by recovering dependencies between test cases and modifying the popular testing framework, JUnit, to utilize this information, we can optimize the testing process, reducing time needed to run tests by 62% on average. Our ongoing work is to apply similar analyses to improve existing state of the art test suite prioritization techniques and state of the art test case generation techniques. This work is advised by Professor Gail Kaiser.	(pdf)
Phasor Imaging: A Generalization of Correlation-Based Time-of-Flight Imaging	Mohit Gupta, Shree Nayar, Matthias Hullin, Jaime Martin	2014-06-26	In correlation-based time-of-flight (C-ToF) imaging systems, light sources with temporally varying intensities illuminate the scene. Due to global illumination, the temporally varying radiance received at the sensor is a combination of light received along multiple paths. Recovering scene properties (e.g., scene depths) from the received radiance requires separating these contributions, which is challenging due to the complexity of global illumination and the additional temporal dimension of the radiance. We propose phasor imaging, a framework for performing fast inverse light transport analysis using C-ToF sensors. Phasor imaging is based on the idea that by representing light transport quantities as phasors and light transport events as phasor transformations, light transport analysis can be simplified in the temporal frequency domain. We study the effect of temporal illumination frequencies on light transport, and show that for a broad range of scenes, global radiance (multi-path interference) vanishes for frequencies higher than a scene-dependent threshold. We use this observation for developing two novel scene recovery techniques. First, we present Micro ToF imaging, a ToF based shape recovery technique that is robust to errors due to global illumination. Second, we present a technique for separating the direct and global components of radiance. Both techniques require capturing as few as 3-4 images and minimal computations. We demonstrate the validity of the presented techniques via simulations and experiments performed with our hardware prototype.	(pdf)
Schur complement trick for positive semi-definite energies	Alec Jacobson	2014-06-12	The â€œSchur complement trickâ€ appears sporadically in numerical optimization methods [Schur 1917; Cottle 1974]. The trick is especially useful for solving Lagrangian saddle point problems when minimizing quadratic energies subject to linear equality constraints [Gill et al. 1987]. Typically, to apply the trick, the energyâ€™s Hessian is assumed positive definite. I generalize this technique for positive semi-definite Hessians.	(pdf)
Model Aggregation for Distributed Content Anomaly Detection	Sean Whalen, Nathaniel Boggs, Salvatore J. Stolfo	2014-06-02	Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic deviates from this model. Content anomaly detection (CAD) is a variant of this approach that models the payloads of such traffic instead of higher level attributes. Zero-day attacks then appear as outliers to properly trained CAD sensors. In the past, CAD was unsuited to cloud environments due to the relative overhead of content inspection and the dynamic routing of content paths to geographically diverse sites. We challenge this notion and introduce new methods for efficiently aggregating content models to enable scalable CAD in dynamically-pathed environments such as the cloud. These methods eliminate the need to exchange raw content, drastically reduce network and CPU overhead, and offer varying levels of content privacy. We perform a comparative analysis of our methods using Random Forest, Logistic Regression, and Bloom Filter-based classifiers for operation in the cloud or other distributed settings such as wireless sensor networks. We find that content model aggregation offers statistically significant improvements over non-aggregate models with minimal overhead, and that distributed and non-distributed CAD have statistically indistinguishable performance. Thus, these methods enable the practical deployment of accurate CAD sensors in a distributed attack detection infrastructure.	(pdf)
Vernam, Mauborgne, and Friedman: The One-Time Pad and the Index of Coincidence	Steven M. Bellovin	2014-05-13	The conventional narrative for the invention of the AT&T one-time pad was related by David Kahn. Based on the evidence available in the AT&T patent files and from interviews and correspondence, he concluded that Gilbert Vernam came up with the need for randomness, while Joseph Mauborgne realized the need for a non-repeating key. Examination of other documents suggests a different narrative. It is most likely that Vernam came up with the need for non-repetition; Mauborgne, though, apparently contributed materially to the invention of the two-tape variant. Furthermore, there is reason to suspect that he suggested the need for randomness to Vernam. However, neither Mauborgne, Herbert Yardley, nor anyone at AT&T really understood the security advantages of the true one-time tape. Col. Parker Hitt may have; William Friedman definitely did. Finally, we show that Friedman's attacks on the two-tape variant likely led to his invention of the index of coincidence, arguably the single most important publication in the history of cryptanalysis.	(pdf)
Exploring Societal Computing based on the Example of Privacy	Swapneel Sheth	2014-04-25	Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. This thesis will consist of the following four projects that aim to address the issues of privacy and software engineering. First, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives. Second, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. As social network platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 59 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 70% of respondents found visualizations using crowd sourced data useful for understanding privacy settings, and 80% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls. Third, as software evolves over time, this might introduce bugs that breach usersâ€™ privacy. Further, there might be system-wide policy changes that could change usersâ€™ settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective. Fourth, approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable â€” systems where privacy could be achieved â€œfor free,â€ i.e., without having to spend extra computational effort. We describe how privacy can indeed be achieved for free â€” an accidental and beneficial side effect of doing some existing computation â€” in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate. Finally, we generalize the problem of privacy and its tradeoffs. As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, Societal Computing, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole.	(pdf)
A Convergence Study of Multimaterial Mesh-based Surface Tracking	Fang Da, Christopher Batty, Eitan Grinspun	2014-04-14	We report the results from experiments on the convergence of the multimaterial mesh-based surface tracking method introduced by the same authors. Under mesh refinement, approximately first order convergence or higher in L1 and L2 is shown for vertex positions, face normals and non-manifold junction curves in a number of scenarios involving the new operations proposed in the method.	(pdf)
The Economics of Cyberwar	Steven M. Bellovin	2014-04-11	Cyberwar is very much in the news these days. It is tempting to try to understand the economics of such an activity, if only qualitatively. What effort is required? What can such attacks accomplish? What does this say, if anything, about the likelihood of cyberwar?	(pdf)
Energy Exchanges: Internal Power Oversight for Applications	Melanie Kambadur, Martha A. Kim	2014-04-08	This paper introduces energy exchanges, a set of abstractions that allow applications to help hardware and operating systems manage power and energy consumption. Using annotations, energy exchanges dictate when, where, and how to trade performance or accuracy for power in ways that only an applicationâ€™s developer can decide. In particular, the abstractions offer audits and budgets which watch and cap the power or energy of some piece of the application. The interface also exposes energy and power usage reports which an application may use to change its behavior. Such information complements existing system-wide energy management by operating systems or hardware, which provide global fairness and protections, but are unaware of the internal dynamics of an application. Energy exchanges are implemented as a user-level C++ library. The library employs an accounting technique to attribute shares of system-wide energy consumption (provided by system-wide hardware energy meters available on newer hardware platforms) to individual application threads. With these per-thread meters and careful tracking of an applicationâ€™s activity, the library exposes energy and power usage for program regions of interest via the energy exchange abstractions with negligible runtime or power overhead. We use the library to demonstrate three applications of energy exchanges: (1) the prioritization of a mobile gameâ€™s energy use over third-party advertisements, (2) dynamic adaptations of the framerate of a video tracking benchmark that maximize performance and accuracy within the confines of a given energy allotment, and (3) the triggering of computational sprints and corresponding cooldowns, based on time, system TDP, and power consumption.	(pdf)
Phosphor: Illuminating Dynamic Data Flow in the JVM	Jonathan Bell, Gail Kaiser	2014-03-25	Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to inputs, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, accuracy, and portability. Performance can be critical, as these systems are typically intended to be deployed with software, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be accurate and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code. We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, accuracy, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two versions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average), using the DaCapo macro benchmark suite. This paper describes the approach that Phosphor uses to achieve portable taint tracking in the JVM.	(pdf)
Enhancing Security by Diversifying Instruction Sets	Kanad Sinha, Vasileios Kemerlis, Vasilis Pappas, Simha Sethumadhavan, Angelos Keromytis	2014-03-20	Despite the variety of choices regarding hardware and software, to date a large number of computer systems remain identical. Characteristic examples of this trend are Windows on x86 and Android on ARM. This homogeneity, sometimes referred to as â€œcomputing oligoculture", provides a fertile ground for malware in the highly networked world of today. One way to counter this problem is to diversify systems so that attackers cannot quickly and easily compromise a large number of machines. For instance, if each system has a different ISA, the attacker has to invest more time in developing exploits that run on every system manifestation. It is not that each individual attack gets harder, but the spread of malware slows down. Further, if the diversified ISA is kept secret from the attacker, the bar for exploitation is raised even higher. In this paper, we show that system diversification can be realized by enabling diversity at the lowest hardware/software interface, the ISA, with almost zero performance overhead. We also describe how prac- tical development and deployment problems of diversified systems can be handled easily in the context of popular software distrbution models, such as the mobile app store model. We demonstrate our proposal with an OpenSPARC FPGA prototype	(pdf)
Teaching Microarchitecture through Metaphors	Julianna Eum, Simha Sethumadhavan	2014-03-19	Students traditionally learn microarchitecture by studying textual descriptions with diagrams but few analogies. Several popular textbooks on this topic introduce concepts such as pipelining and caching in the context of simple paper-only architectures. While this instructional style allows important concepts to be covered within a given class period, students have difficulty bridging the gap between what is covered in classes and real-world implementations. Discussing concrete implementations and complications would, however, take too much time. In this paper, we propose a technique of representing microarchitecture building blocks with animated metaphors to accelerate the process of learning about complex microarchitectures. We represent hardware implementations as road networks that include specific patterns of traffic flow found in microarchitectural behavior. Our experiences indicate an 83% improvement to understanding memory system microarchitecture. We believe the mental models developed by these students will serve them in remembering microarchitectural behavior and extend to learning new microarchitectures more easily.	(pdf)
A Red Team/Blue Team Assessment of Functional Analysis Methods for Malicious Circuit Identification	Adam Waksman, Jeyavijayan Rajendran, Matthew Suozzo, Simha Sethumadhavan	2014-03-05	Recent advances in hardware security have led to the development of FANCI (Functional Analysis for Nearly-Unused Circuit Identification), an analysis tool that identifies stealthy, malicious circuits within hardware designs that can perform malicious backdoor behavior. Evaluations of such tools against benchmarks and academic attacks are not always equivalent to the dynamic attack scenarios that can arise in the real world. For this reason, we apply a red team/blue team approach to stress-test FANCI's abilities to efficiently detect malicious backdoor circuits within hardware designs. In the Embedded Systems Challenge (ESC) 2013, teams from research groups from multiple continents created designs with malicious backdoors hidden in them as part of a red team effort to circumvent FANCI. Notably, these backdoors were not placed into a priori known designs. The red team was allowed to create arbitrary, unspecified designs. Two interesting results came out of this effort. The first was that FANCI was surprisingly resilient to this wide variety of attacks and was not circumvented by any of the stealthy backdoors created by the red teams. The second result is that frequent-action backdoors, which are backdoors that are not made stealthy, were often successful. These results emphasize the importance of combining FANCI with a reasonable degree of validation testing. The blue team efforts also exposed some aspects of the FANCI prototype that make analysis time-consuming in some cases, which motivates further development of the prototype in the future.	(pdf) (ps)
Unsupervised Anomaly-based Malware Detection using Hardware Features	Adrian Tang, Simha Sethumadhavan, Salvatore Stolfo	2014-03-01	Recent works have shown promise in using microarchitectural execution patterns to detect malware programs. These detectors belong to a class of detectors known as signature-based detectors as they catch malware by comparing a program's execution pattern (signature) to execution patterns of known malware programs. In this work, we propose a new class of detectors - anomaly-based hardware malware detectors - that do not require signatures for malware detection, and thus can catch a wider range of malware including potentially novel ones. We use unsupervised machine learning to build profiles of normal program execution based on data from performance counters, and use these profiles to detect significant deviations in program behavior that occur as a result of malware exploitation. We show that real-world exploitation of popular programs such as IE and Adobe PDF Reader on a Windows/x86 platform can be detected with nearly perfect certainty. We also examine the limits and challenges in implementing this approach in face of a sophisticated adversary attempting to evade anomaly-based detection. The proposed detector is complementary to previously proposed signature-based detectors and can be used together to improve security.	(pdf)
Enabling the Virtual Phones to remotely sense the Real Phones in real-time ~ A Sensor Emulation initiative for virtualized Android-x86 ~	Raghavan Santhanam	2014-02-13	Smartphones nowadays have the ground-breaking features that were only a figment of oneâ€™s imagination. For the ever-demanding cellphone users, the exhaustive list of features that a smartphone supports just keeps getting more exhaustive with time. These features aid oneâ€™s personal and professional uses as well. Extrapolating into the future the features of a present-day smartphone, the lives of us humans using smartphones are going to be unimaginably agile. With the above said emphasis on the current and future potential of a smartphone, the ability to virtualize smartphones with all their real-world features into a virtual platform, is a boon for those who want to rigorously experiment and customize the virtualized smartphone hardware without spending an extra penny. Once virtualizable independently on a larger scale, the idea of virtualized smartphones with all the virtualized pieces of hardware takes an interesting turn with the sensors being virtualized in a way thatâ€™s closer to the real-world behavior. When accessible remotely with the real-time responsiveness, the above mentioned real-world behavior will be a real dealmaker in many real-world systems, namely, the life-saving systems like the ones that instantaneously get alerts about harmful magnetic radiations in the deep mining areas, etc. And these life-saving systems would be installed on a large scale on the desktops or large servers as virtualized smartphones having the added support of virtualized sensors which remotely fetch the real hardware sensor readings from a real smartphone in real-time. Based on these readings the lives working in the affected areas can be alerted and thus saved by the people who are operating the at the desktops or large servers hosting the virtualized smartphones. In addition, the direct and one of the biggest advantages of such a real hardware sensor driven Sensor Emulation in an emulated Android(-x86) environment is that the Android applications that use sensors can now run on the emulator and act under the influence of real hardware sensorsâ€™ due to the emulated sensors. The current work of Sensor Emulation is quite unique when compared to the existing and past sensor-related works. The uniqueness comes from the full-fledged sensoremulation in a virtualized smartphone environment as opposed to building some sophisticated physical systems that usually aggregate the sensor readings from the real hardware sensors, might be in a remote manner and in real-time. For example, wireless sensor networks based remote-sensing systems that install real hardware sensors in remote places and have the sensor readings from all those sensors at a centralized server or something similar, for the necessary real-time or offline analysis. In these systems, apart from collecting mere real hardware sensor readings into a centralized entity, nothing more is being achieved unlike in the current work of Sensor Emulation wherein the emulated sensors behave exactly like the remote real hardware sensors. The emulated sensors can be calibrated, speeded up or slowed down(in terms of their sampling frequency), influence the sensor-based application running inside the virtualized smartphone environment exactly as the real hardware sensors of a real phone would do to the sensor-based application running in that real phone. In essence, the current work is more about generalizing the sensors with all its real-world characteristics as far as possible in a virtualized platform than just a framework to send and receive sensor readings over the network between the real and virtual phones. Realizing the useful advantages of Sensor Emulation which is about adding virtualized sensors support to emulated environments, the current work emulates a total of ten sensors present in the real smartphone, Samsung Nexus S, an Android device. Virtual phones run Android-x86 while real phones run Android. The real reason behind choosing Android-x86 for virtual phone is that x86-based Android devices are feature-rich over ARM based ones, for example a full-fledged x86 desktop or a tablet has more features than a relatively small smartphone. Out of the ten, five are real sensors and the rest are virtual or synthetic ones. The real ones are Accelerometer, Magnetometer, Light, Proximity, and Gyroscope whereas the virtual ones are Corrected Gyroscope, Linear Acceleration, Gravity, Orientation, and Rotation Vector. The emulated Android-x86 is of Android release version Jelly Bean 4.3.1 which differs only very slightly in terms of bug fixes from Android Jelly Bean 4.3 running on the real smartphone. One of the noteworthy aspects of the Sensor Emulation accomplished is being demand-less - exactly the same sensor-based Android applications will be able to use the sensors on the real and virtual phones, with absolutely no difference in terms of their sensor-based behavior. The emulationâ€™s core idea is the socket-communication between two modules of Hardware Abstraction Layer(HAL) which is driver-agnostic, remotely over a wireless network in real-time. Apart from a Paired real-device scenario from which the real hardware sensor readings are fetched, the Sensor Emulation also is compatible with a Remote Server Scenario wherein the artificially generated sensor readings are fetched from a remote server. Due to the Sensor Emulation having been built on mereend-to-end socket-communication, itâ€™s logical and obvious to see that the real and virtual phones can run different Android(-x86) releases with no real impact on the Sensor Emulation being accomplished. Sensor Emulation once completed was evaluated for each of the emulated sensors using applications from Android Market as well as Amazon Appstore. The applications category include both the basic sensor-test applications that show raw sensor readings, as well as the advanced 3D sensor-driven games which are emulator compatible, especially in terms of the graphics. The evaluations proved the current work of Sensor Emulation to be generic, efficient, robust, fast, accurate, and real. As of this writing i.e., January 2014, the current work of Sensor Emulation is the sole system-level sensor virtualization work that embraces remoteness in real-time for the emulated Android-x86 systems. It is important to note that though the current work is targeted for Android-x86, the code written for the current work makes no assumptions about underlying platform to be an x86 one. Hence, the work is also logically seen as compatible with ARM based emulated Android environment though not actually tested.	(pdf)
Towards A Dynamic QoS-aware Over-The-Top Video Streaming in LTE	Hyunwoo Nam, Kyung Hwa Kim, Bong Ho Kim, Doru Calin, Henning Schulzrinne	2014-01-16	We present a study of traffic behavior of two popular over-the-top (OTT) video streaming services (YouTube and Netflix). Our analysis is conducted on different mobile devices (iOS and Android) over various wireless networks (Wi-Fi, 3G and LTE) under dynamic network conditions. Our measurements show that the video players frequently discard a large amount of video content although it is successfully delivered to a client. We first investigate the root cause of this unwanted behavior. Then, we propose a Quality-of-Service (QoS)-aware video streaming architecture in Long Term Evolution (LTE) networks to reduce the waste of network resource and improve user experience. The architecture includes a selective packet discarding mechanism, which can be placed in packet data network gateways (P-GW). In addition, our QoS-aware rules assist video players in selecting an appropriate resolution under a fluctuating channel condition. We monitor network condition and configure QoS parameters to control availability of the maximum bandwidth in real time. In our experimental setup, the proposed platform shows up to 20.58% improvement in saving downlink bandwidth and improves user experience by reducing buffer underflow period to an average of 32 seconds.	(pdf)
Towards Dynamic Network Condition-Aware Video Server Selection Algorithms over Wireless Networks	Hyunwoo Nam, Kyung-Hwa Kim, Doru Calin, Henning Schulzrinne	2014-01-16	We investigate video server selection algorithms in a distributed video-on-demand system. We conduct a detailed study of the YouTube Content Delivery Network (CDN) on PCs and mobile devices over Wi-Fi and 3G networks under varying network conditions. We proved that a location-aware video server selection algorithm assigns a video content server based on the network attachment point of a client. We found out that such distance-based algorithms carry the risk of directing a client to a less optimal content server, although there may exist other better performing video delivery servers. In order to solve this problem, we propose to use dynamic network information such as packet loss rates and Round Trip Time (RTT)between an edge node of an wireless network (e.g., an Internet Service Provider (ISP) router in a Wi-Fi network and a Radio Network Controller (RNC) node in a 3G network) and video content servers, to find the optimal video content server when a video is requested. Our empirical study shows that the proposed architecture can provide higher TCP performance, leading to better viewing quality compared to location-based video server selection algorithms.	(pdf)
Approximating the Bethe partition function	Adrian Weller, Tony Jebara	2013-12-30	When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy $\F$, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced for attractive binary pairwise MRFs which is guaranteed to return an $\eps$-approximation to the global minimum of $\F$ in polynomial time provided the maximum degree $\Delta=O(\log n)$, where $n$ is the number of variables. Here we significantly improve this algorithm and derive several results including a new approach based on analyzing first derivatives of $\F$, which leads to performance that is typically far superior and yields a fully polynomial-time approximation scheme (FPTAS) for attractive models without any degree restriction. Further, the method applies to general (non-attractive) models, though with no polynomial time guarantee in this case, leading to the important result that approximating $\log$ of the Bethe partition function, $\log Z_B=-\min \F$, for a general model to additive $\epsilon$-accuracy may be reduced to a discrete MAP inference problem. We explore an application to predicting equipment failure on an urban power network and demonstrate that the Bethe approximation can perform well even when BP fails to converge.	(pdf)
A Gameful Approach to Teaching Software Design and Software Testing	Swapneel Sheth, Jonathan Bell, Gail Kaiser	2013-12-13	Introductory computer science courses traditionally focus on exposing students to basic programming and computer science theory, leaving little or no time to teach students about software testing. A lot of studentsâ€™ mental model when they start learning programming is that â€œif it compiles and runs without crashing, it must work fine.â€ Thus exposure to testing, even at a very basic level, can be very beneficial to the students. In the short term, they will do better on their assignments as testing before submission might help them discover bugs in their implementation that they hadnâ€™t realized. In the long term, they will appreciate the importance of testing as part of the software development life cycle.	(pdf)
A Gameful Approach to Teaching Software Design and Software Testing --- Assignments and Quests	Swapneel Sheth, Jonathan Bell, Gail Kaiser	2013-12-11	We describe how we used HALO in a CS2 classroom and include the assignments and quests created.	(pdf)
Heterogeneous Access: Survey and Design Considerations	Amandeep Singh, Gaston Ormazabal, Sateesh Addepalli, Henning Schulzrinne	2013-10-25	As voice, multimedia, and data services are converging to IP, there is a need for a new networking architecture to support future innovations and applications. Users are consuming Internet services from multiple devices that have multiple network interfaces such as Wi-Fi, LTE, Bluetooth, and possibly wired LAN. Such diverse network connectivity can be used to increase both reliability and performance by running applications over multiple links, sequentially for seamless user experience, or in parallel for bandwidth and performance enhancements. The existing networking stack, however, offers almost no support for intelligently exploiting such network, device, and location diversity. In this work, we survey recently proposed protocols and architectures that enable heterogeneous networking support. Upon evaluation, we abstract common design patterns and propose a unified networking architecture that makes better use of a heterogeneous dynamic environment, both in terms of networks and devices. The architecture enables mobile nodes to make intelligent decisions about how and when to use each or a combination of networks, based on access policies. With this new architecture, we envision a shift from current applications, which support a single network, location, and device at a time to applications that can support multiple networks, multiple locations, and multiple devices.	(pdf)
Functioning Hardware from Functional Programs	Stephen A. Edwards	2013-10-08	To provide high performance at practical power levels, tomorrow's chips will have to consist primarily of application-specific logic that is only powered on when needed. This paper discusses synthesizing such logic from the functional language Haskell. The proposed approach, which consists of rewriting steps that ultimately dismantle the source program into a simple dialect that enables a syntax-directed translation to hardware, enables aggressive parallelization and the synthesis of application-specific distributed memory systems. Transformations include scheduling arithmetic operations onto specific data paths, replacing recursion with iteration, and improving data locality by inlining recursive types. A compiler based on these principles is under development.	(pdf)
N Heads are Better than One	Morris Hopkins, Mauricio Castaneda, Swapneel Sheth, Gail Kaiser	2013-10-04	Social network platforms have transformed how people communicate and share information. However, as these platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 50 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 80% of users found visualizations using crowd sourced data useful for understanding privacy settings, and 70% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls.	(pdf)
Us and Them --- A Study of Privacy Requirements Across North America, Asia, and Europe	Swapneel Sheth, Gail Kaiser, Walid Maalej	2013-09-15	Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. However, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and personal data such as location than their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives.	(pdf)
Unit Test Virtualization with VMVM	Jonathan Bell, Gail Kaiser	2013-09-13	Testing large software packages can become very time intensive. To address this problem, researchers have investigated techniques such as Test Suite Minimization. Test Suite Minimization reduces the number of tests in a suite by removing tests that appear redundant, at the risk of a reduction in fault-finding ability since it can be difficult to identify which tests are truly redundant. We take a completely different approach to solving the same problem of long running test suites by instead reducing the time needed to execute each test, an approach that we call Unit Test Virtualization. With Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight virtualization container. We describe the empirical analysis that grounds our approach and provide an implementation of Unit Test Virtualization targeting Java applications. We evaluated our implementation, VMVM, using 20 real-world Java applications and found that it reduces test suite execution time by up to 97% (on average, 62%) when compared to traditional unit test execution. We also compared VMVM to a well known Test Suite Minimization technique, finding the reduction provided by VMVM to be four times greater, while still executing every test with no loss of fault-finding ability.	(pdf)
Metamorphic Runtime Checking of Applications without Test Oracles	Christian Murphy, Gail Kaiser, Jonathan Bell, Fang-Hsiang Su	2013-09-13	Challenges arise in testing applications that do not have test oracles, i.e., for which it is impossible or impractical to know what the correct output should be for general input. Metamorphic testing, introduced by Chen et al., has been shown to be a simple yet effective technique in testing these types of applications: test inputs are transformed in such a way that it is possible to predict the expected change to the output, and if the output resulting from this transformation is not as expected, then a fault must exist. Here, we improve upon previous work by presenting a new technique called Metamorphic Runtime Checking, which automatically conducts metamorphic testing of both the entire application and individual functions during a program's execution. This new approach improves the scope, scale, and sensitivity of metamorphic testing by allowing for the identification of more properties and execution of more tests, and increasing the likelihood of detecting faults not found by application-level properties. We also present the results of new mutation analysis studies that demonstrate that Metamorphic Runtime Checking can kill an average of 170% more mutants than traditional, application-level metamorphic testing alone, and advances the state of the art in testing applications without oracles.	(pdf)
Effectiveness of Teaching Metamorphic Testing, Part II	Kunal Swaroop Mishra, Gail E. Kaiser, Swapneel K. Sheth	2013-07-31	We study the ability of students in a senior/graduate software engineering course to understand and apply metamorphic testing, a relatively recently invented advance in software testing research that complements conventional approaches such as equivalence partitioning and boundary analysis. We previously reported our investigation of the fall 2011 offering of the Columbia University course COMS W4156 Advanced Software Engineering, and here report on the fall 2012 offering and contrast it to the previous year. Our main findings are: 1) Although the students in the second offering did not do very well on the newly added individual assignment specifically focused on metamorphic testing, thereafter they were better able to find metamorphic properties for their team projects than the students from the previous year who did not have that preliminary homework and, perhaps most significantly, did not have the solution set for that homework. 2) Students in the second offering did reasonably well using the relatively novel metamorphic testing technique vs. traditional black box testing techniques in their projects (such comparison data is not available for the first offering). 3) Finally, in both semesters, the majority of the student teams were able to apply metamorphic testing to their team projects after only minimal instruction, which would imply that metamorphic testing is a viable strategy for student testers.	(pdf)
On the Effectiveness of Traffic Analysis Against Anonymity Networks Using Flow Records	Sambuddho Chakravarty, Marco V. Barbera, Georgios Portokalidis, Michalis Polychronakis, Angelos D. Keromytis	2013-07-18	Low-latency anonymous communication networks, such as Tor, are geared towards web browsing, instant messaging, and other semi-interactive applications. To achieve acceptable quality of service, these systems attempt to preserve packet inter-arrival characteristics, such as inter-packet delay. Consequently, a powerful adversary can mount traffic analysis attacks by observing similar traffic patterns at various points of the network, linking together otherwise unrelated network connections. Previous research has shown that having access to a few Internet exchange points is enough for monitoring a significant percentage of the network paths from Tor nodes to destination servers. Although the capacity of current networks makes packet-level monitoring at such a scale quite challenging, adversaries could potentially use less accurate but readily available traffic monitoring functionality, such as Cisco's NetFlow, to mount large-scale traffic analysis attacks. In this paper, we assess the feasibility and effectiveness of practical traffic analysis attacks against the Tor network using NetFlow data. We present an active traffic analysis method based on deliberately perturbing the characteristics of user traffic at the server side, and observing a similar perturbation at the client side through statistical correlation. We evaluate the accuracy of our method using both in-lab testing, as well as data gathered from a public Tor relay serving hundreds of users. Our method revealed the actual sources of anonymous traffic with 100% accuracy for the in-lab tests, and achieved an overall accuracy of about 81.4% for the real-world experiments, with an average false positive rate of 6.4%.	(pdf)
A Mobile Video Traffic Analysis: Badly Designed Video Clients Can Waste Network Bandwidth	Hyunwoo Nam, Bong Ho Kim, Doru Calin, Henning Schulzrinne	2013-07-08	Video streaming on mobile devices is on the rise. According to recent reports, mobile video streaming traffic accounted for 52.8% of total mobile data traffic in 2011, and it is forecast to reach 66.4% in 2015. We analyzed the network traffic behaviors of the two most popular HTTP-based video streaming services: YouTube and Netflix. Our research indicates that the network traffic behavior depends on factors such as the type of device, multimedia applications in use and network conditions. Furthermore, we found that a large part of the downloaded video content can be unaccepted by a video player even though it is successfully delivered to a client. This unwanted behavior often occurs when the video player changes the resolution in a fluctuating network condition and the playout buffer is full while downloading a video. Some of the measurements show that the discarded data may exceed 35% of the total video content.	(pdf)
Energy Secure Architecture: A wish list	Simha Sethumadhavan	2013-06-23	Energy optimizations are being aggressively pursued today. Can these optimizations open up security vulnerabilities? In this invited talk at the Energy Secure System Architectures Workshop (run by Pradip Bose from IBM Watson research center) I discussed security implications of energy optimizations, capabilities of attackers, ease of exploitation, and potential payoff to the attacker. I presented a mini tutorial on security for computer architects, and a personal research wish list for this emerging topic.	(pdf)
Principles and Techniques of Schlieren Imaging Systems	Amrita Mazumdar	2013-06-19	This paper presents a review of modern-day schlieren optics system and its application. Schlieren imaging systems provide a powerful technique to visualize changes or nonuniformities in refractive index of air or other transparent media. With the popularization of computational imaging techniques and widespread availability of digital imaging systems, schlieren systems provide novel methods of viewing transparent fluid dynamics. This paper presents a historical background of the technique, describes the methodology behind the system, presents a mathematical proof of schlieren fundamentals, and lists various recent applications and advancements in schlieren studies.	(pdf)
WiSlow: A WiFi Network Performance Troubleshooting Tool for End Users	Kyung Hwa Kim, Hyunwoo Nam, Henning Schulzrinne	2013-05-29	The increasing number of 802.11 APs and wireless devices results in more contention, which causes unsatisfactory WiFi network performance. In addition, non-WiFi devices sharing the same spectrum with 802.11 networks such as microwave ovens, cordless phones, and baby monitors severely interfere with WiFi networks. Although the problem sources can be easily removed in many cases, it is difficult for end users to identify the root cause. We introduce WiSlow, a software tool that diagnoses the root causes of poor WiFi performance with user-level network probes and leverages peer collaboration to identify the location of the causes. We elaborate on two main methods: packet loss analysis and 802.11 ACK pattern analysis.	(pdf)
Connecting the Physical World with Arduino in SECE	Hyunwoo Nam, Jan Janak, Henning Schulzrinne	2013-05-23	The Internet of Things (IoT) enables the physical world to be connected and controlled over the Internet. This paper presents a smart gateway platform that connects everyday objects such as lights, thermometers, and TVs over the Internet. The proposed hardware architecture is implemented on an Arduino platform with a variety of off the shelf home automation technologies such as Zigbee and X10. Using the microcontroller-based platform, the SECE (Sense Everything, Control Everything) system allows users to create various IoT services such as monitoring sensors, controlling actuators, triggering action events, and periodic sensor reporting. We give an overview of the Arduino-based smart gateway architecture and its integration into SECE.	(pdf)
Chameleon: Multi-Persona Binary Compatibility for Mobile Devices	Jeremy Andrus, Alexander Van't Hof, Naser AlDuaij, Christoffer Dall, Nicolas Viennot, Jason Nieh	2013-04-08	Mobile devices are vertically integrated systems that are powerful, useful platforms, but unfortunately limit user choice and lock users and developers into a particular mobile ecosystem, such as iOS or Android. We present Chameleon, a multi-persona binary compatibility architecture that allows mobile device users to run applications built for different mobile ecosystems together on the same smartphone or tablet. Chameleon enhances the domestic operating system of a device with personas to mimic the application binary interface of a foreign operating system to run unmodified foreign binary applications. To accomplish this without reimplementing the entire foreign operating system from scratch, Chameleon provides four key mechanisms. First, a multi-persona binary interface is used that can load and execute both domestic and foreign applications that use different sets of system calls. Second, compile-time code adaptation makes it simple to reuse existing unmodified foreign kernel code in the domestic kernel. Third, API interposition and passport system calls make it possible to reuse foreign user code together with domestic kernel facilities to support foreign kernel functionality in user space. Fourth, schizophrenic processes allow foreign applications to use domestic libraries to access proprietary software and hardware interfaces on the device. We have built a Chameleon prototype and demonstrate that it imposes only modest performance overhead and can run iOS applications from the Apple App Store together with Android applications from Google Play on a Nexus 7 tablet running the latest version of Android.	(pdf)
KVM/ARM: Experiences Building the Linux ARM Hypervisor	Christoffer Dall, Jason Nieh	2013-04-05	As ARM CPUs become increasingly common in mobile devices and servers, there is a growing demand for providing the benefits of virtualization for ARMbased devices. We present our experiences building the Linux ARM hypervisor, KVM/ARM, the first full system ARM virtualization solution that can run unmodified guest operating systems on ARM multicore hardware. KVM/ARM introduces split-mode virtualization, allowing a hypervisor to split its execution across CPU modes to take advantage of CPU mode-specific features. This allows KVM/ARM to leverage Linux kernel services and functionality to simplify hypervisor development and maintainability while utilizing recent ARM hardware virtualization extensions to run application workloads in guest operating systems with comparable performance to native execution. KVM/ARM has been successfully merged into the mainline Linux 3.9 kernel, ensuring that it will gain wide adoption as the virtualization platform of choice for ARM. We provide the first measurements on real hardware of a complete hypervisor using ARM hardware virtualization support. Our results demonstrate that KVM/ARM has modest virtualization performance and power costs, and can achieve lower performance and power costs compared to x86-based Linux virtualization on multicore hardware.	(pdf)
FARE: A Framework for Benchmarking Reliability of Cyber-Physical Systems	Leon Wu, Gail Kaiser	2013-04-01	A cyber-physical system (CPS) is a system featuring a tight combination of, and coordination between, the systemâ€™s computational and physical elements. System reliability is a critical requirement of cyber-physical systems. An unreliable CPS often leads to system malfunctions, service disruptions, financial losses and even human life. Improving CPS reliability requires an objective measurement, estimation and comparison of the CPS system reliability. This paper describes FARE (Failure Analysis and Reliability Estimation), a framework for benchmarking reliability of cyber-physical systems. Some prior researches have proposed reliability benchmark for some specific CPS such as wind power plant and wireless sensor networks. There were also some prior researches on the components of CPS such as software and some specific hardware. But according to the best of our knowledge, there isnâ€™t any reliability benchmark framework for CPS in general. FARE framework provides a CPS reliability model, a set of methods and metrics on the evaluation environment selection, failure analysis and reliability estimation for benchmarking CPS reliability. It not only provides a retrospect evaluation and estimation of the CPS system reliability using the past data, but also provides a mechanism for continuous monitoring and evaluation of CPS reliability for runtime enhancement. The framework is extensible for accommodating new reliability measurement techniques and metrics. It is also generic and applicable to a wide range of CPS applications. For empirical study, we applied the FARE framework on a smart building management system for a large commercial building in New York City. Our experiments showed that FARE is easy to implement, accurate for comparison and can be used for building useful industry benchmarks and standards after accumulating enough data.	(pdf)
Additional remarks on designing category-level attributes for discriminative visual recognition	Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang	2013-03-10	This is the supplementary material for the paper Designing Category-Level Attributes for Discriminative Visual Recognition.	(pdf)
Make Parallel Programs Reliable with Stable Multithreading	Junfeng Yang, Heming Cui, Jingyue Wu, John Gallagher, Chia-Che Tsai, Huayang Guo	2013-02-20	Our accelerating computational demand and the rise of multicore hardware have made parallel programs increasingly pervasive and critical. Yet, these programs remain extremely difficult to write, test, analyze, debug, and verify. In this article, we provide our view on why parallel programs, specifically multithreaded programs, are difficult to get right.We present a promising approach we call stable multithreading to dramatically improve reliability, and summarize our last four yearsâ€™ research on building and applying stable multithreading systems.	(pdf)
A Finer Functional Fibonacci on a Fast FPGA	Stephen A. Edwards	2013-02-13	Through a series of mechanical, semantics-preserving transformations, I show how a three-line recursive Haskell program (Fibonacci) can be transformed to a hardware description language -- Verilog -- that can be synthesized on an FPGA. This report lays groundwork for a compiler that will perform this transformation automatically.	(pdf)
Cost and Scalability of Hardware Encryption Techniques	Adam Waksman, Simha Sethumadhavan	2013-02-06	We discuss practical details and basic scalability for two recent ideas for hardware encryption for trojan prevention. The broad idea is to encrypt the data used as inputs to hardware circuits to make it more difficult for malicious attackers to exploit hardware trojans. The two methods we discuss are data obfuscation and fully homomorphic encryption (FHE). Data obfuscation is a technique wherein specific data inputs are encrypted so that they can be operated on within a hardware module without exposing the data itself to the hardware. FHE is a technique recently discovered to be theoretically possible. With FHE, not only the data but also the operations and the entire circuit are encrypted. FHE primarily exists as a theoretical construct currently. It has been shown that it can theoretically be applied to any program or circuit. It has also been applied in a limited respect to some software. Some initial algorithms for hardware applications have been proposed. We find that data obfuscation is efficient enough to be immediately practical, while FHE is not yet in the practical realm. There are also scalability concerns regarding current algorithms for FHE.	(pdf)
Societal Computing - Thesis Proposal	Swapneel Sheth	2013-01-30	As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, \emph{Societal Computing}, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole. This thesis will consist of the following four projects that aim to address the issues of Societal Computing. First, privacy in the context of ubiquitous social computing systems has become a major concern for society at large. As the number of online social computing systems that collect user data grows, concerns with privacy are further exacerbated. Examples of such online systems include social networks, recommender systems, and so on. Approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable --- systems where privacy could be achieved ``for free,'' \ie without having to spend extra computational effort. We describe how privacy can indeed be achieved for free --- an accidental and beneficial side effect of doing some existing computation --- in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate. Second, we aim to understand what the expectations and needs to end-users and software developers are, with respect to privacy in social systems. Some questions that we want to answer are: Do end-users care about privacy? What aspects of privacy are the most important to end-users? Do we need different privacy mechanisms for technical vs. non-technical users? Should we customize privacy settings and systems based on the geographic location of the users? We have created a large scale user study using an online questionnaire to gather privacy requirements from a variety of stakeholders. We also plan to conduct follow-up semi-structured interviews. This user study will help us answer these questions. Third, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. Our approach is to use crowdsourcing to find out how other users deal with privacy and what settings are commonly used to give users feedback on aspects like how public/private their settings are, what common settings are typically used by others, where do a certain users' settings differ from a trusted group of friends, etc. We have a large dataset of privacy settings for over 500 users on Facebook and we plan to create a user study that will use the data to make privacy settings more understandable. Finally, end-users of such systems find it increasingly hard to understand complex privacy settings. As software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by \emph{end-users} for detecting changes in privacy, \ie regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective.	(pdf)
Finding 9-1-1 Callers in Tall Buildings	Wonsang Song, Jae Woo Lee, Byung Suk Lee, Henning Schulzrinne	2013-01-23	Accurately determining a user's floor location is essential for minimizing delays in emergency response. This paper presents a floor localization system intended for emergency calls. We aim to provide floor-level accuracy with minimum infrastructure support. Our approach is to use multiple sensors, all available in today's smartphones, to trace a user's vertical movements inside buildings. We make three contributions. First, we present a hybrid architecture for floor localization with emergency calls in mind. The architecture combines beacon-based infrastructure and sensor-based dead reckoning, striking the right balance between accurately determining a user's location and minimizing the required infrastructure. Second, we present the elevator module for tracking a user's movement in an elevator. The elevator module addresses three core challenges that make it difficult to accurately derive displacement from acceleration. Third, we present the stairway module which determines the number of floors a user has traveled on foot. Unlike previous systems that track users' foot steps, our stairway module uses a novel landing counting technique.	(pdf)
Effective Dynamic Detection of Alias Analysis Errors	Jingyue Wu, Gang Hu, Yang Tang, Junfeng Yang	2013-01-23	Alias analysis is perhaps one of the most crucial and widely used analyses, and has attracted tremendous research efforts over the years. Yet, advanced alias analyses are extremely difficult to get right, and the bugs in these analyses are most likely the reason that they have not been adopted to production compilers. This paper presents NEONGOBY, a system for effectively detecting errors in alias analysis implementations, improving their correctness and hopefully widening their adoption. NEONGOBY works by dynamically observing pointer addresses during the execution of a test program and then checking these addresses against an alias analysis for errors. It is explicitly designed to (1) be agnostic to the alias analysis it checks for maximum applicability and ease of use and (2) detect alias analysis errors that manifest on real-world programs and workloads. It reduces false positives and performance overhead using a practical selection of techniques. Evaluation on three popular alias analyses and real-world programs Apache and MySQL shows that NEONGOBY effectively finds 29 alias analysis bugs with only 2 false positives and reasonable overhead. To enable alias analysis builders to start using NEONGOBY today, we have released it open-source at https://github.com/wujingyue/neongoby, along with our error detection results and proposed patches.	(pdf)
Bethe Bounds and Approximating the Global Optimum	Adrian Weller, Tony Jebara	2012-12-31	Inference in general Markov random fields (MRFs) is NP-hard, though identifying the maximum a posteriori (MAP) configuration of pairwise MRFs with submodular cost functions is efficiently solvable using graph cuts. Marginal inference, however, even for this restricted class, is in \#P. We prove new formulations of derivatives of the Bethe free energy, provide bounds on the derivatives and bracket the locations of stationary points, introducing a new technique called Bethe bound propagation. Several results apply to pairwise models whether associative or not. Applying these to discretized pseudo-marginals in the associative case we present a polynomial time approximation scheme for global optimization provided the maximum degree is $O(\log n)$, and discuss several extensions.	(pdf)
Reconstructing Pong on an FPGA	Stephen A. Edwards	2012-12-27	I describe in detail the circuitry of the original 1972 Pong video arcade game and how I reconstructed it on an FPGA -- a modern-day programmable logic device. In the original circuit, I discover some sloppy timing and a previously unidentified bug that subtly affected gameplay. I emulate the quasi-synchronous behavior of the original circuit by running a synchronous ``simulation'' circuit with a 2X clock and replacing each flip-flop with a circuit that effectively simulates one. The result is an accurate reproduction that exhibits many idiosyncracies of the original.	(pdf)
Focal Sweep Camera for Space-Time Refocusing	Changyin Zhou, Daniel Miau, Shree K. Nayar	2012-11-29	A conventional camera has a limited depth of field (DOF), which often results in defocus blur and loss of image detail. The technique of image refocusing allows a user to interactively change the plane of focus and DOF of an image after it is captured. One way to achieve refocusing is to capture the entire light field. But this requires a significant compromise of spatial resolution. This is because of the dimensionality gap - the captured information (a light field) is 4-D, while the information required for refocusing (a focal stack) is only 3-D. In this paper, we present an imaging system that directly captures a focal stack by physically sweeping the focal plane. We first describe how to sweep the focal plane so that the aggregate DOF of the focal stack covers the entire desired depth range without gaps or overlaps. Since the focal stack is captured in a duration of time when scene objects can move, we refer to the captured focal stack as a duration focal stack. We then propose an algorithm for computing a space-time in-focus index map from the focal stack, which represents the time at which each pixel is best focused. The algorithm is designed to enable a seamless refocusing experience, even for textureless regions and at depth discontinuities. We have implemented two prototype focal-sweep cameras and captured several duration focal stacks. Results obtained using our method can be viewed at www.focalsweep.com.	(pdf)
Effectiveness of Teaching Metamorphic Testing	Kunal Swaroop Mishra, Gail Kaiser	2012-11-15	This paper is an attempt to understand the effectiveness of teaching metamorphic properties in a senior/graduate software engineering course classroom environment through gauging the success achieved by students in identifying these properties on the basis of the lectures and materials provided in class. The main findings were: (1) most of the students either misunderstood what metamorphic properties are or fell short of identifying all the metamorphic properties in their respective projects, (2) most of the students that were successful in finding all the metamorphic properties in their respective projects had incorporated certain arithmetic rules into their project logic, and (3) most of the properties identified were numerical metamorphic properties. A possible reason for this could be that the two relevant lectures given in class cited examples of metamorphic properties that were based on numerical properties. Based on the findings of the case study, pertinent suggestions were made in order to improve the impact of lectures provided for Metamorphic Testing.	(pdf)
A Competitive-Collaborative Approach for Introducing Software Engineering in a CS2 Class	Swapneel Sheth, Jonathan Bell, Gail Kaiser	2012-11-05	Introductory Computer Science (CS) classes are typically competitive in nature. The cutthroat nature of these classes comes from students attempting to get as high a grade as possible, which may or may not correlate with actual learning. Further, there is very little collaboration allowed in most introductory CS classes. Most assignments are completed individually since many educators feel that students learn the most, especially in introductory classes, by working alone. In addition to completing ``normal'' individual assignments, which have many benefits, we wanted to expose students to collaboration early (via, for example, team projects). In this paper, we describe how we leveraged competition and collaboration in a CS2 to help students learn aspects of computer science better --- in this case, good software design and software testing --- and summarize student feedback.	(pdf)
Increasing Student Engagement in Software Engineering with Gamification	Swapneel Sheth, Jonathan Bell, Gail Kaiser	2012-11-05	Gamification, or the use of game elements in non-game contexts, has become an increasingly popular approach to increasing end-user engagement in many contexts, including employee productivity, sales, recycling, and education. Our preliminary work has shown that gamification can be used to boost student engagement and learning in basic software testing. We seek to expand our gamified software engineering approach to motivate other software engineering best practices. We propose to build a game layer on top of traditional continuous integration technologies to increase student engagement in development, documentation, bug reporting, and test coverage. This poster describes to our approach and presents some early results showing feasibility.	(pdf)
Improving Vertical Accuracy of Indoor Positioning for Emergency Communication	Wonsang Song, Jae Woo Lee, Byung Suk Lee, Henning Schulzrinne	2012-10-30	The emergency communication systems are undergoing a transition from the PSTN-based legacy system to an IP-based next generation system. In the next generation system, GPS accurately provides a user's location when the user makes an emergency call outdoors using a mobile phone. Indoor positioning, however, presents a challenge because GPS does not generally work indoors. Moreover, unlike outdoors, vertical accuracy is critical indoors because an error of few meters will send emergency responders to a different floor in a building. This paper presents an indoor positioning system which focuses on improving the accuracy of vertical location. We aim to provide floor-level accuracy with minimal infrastructure support. Our approach is to use multiple sensors available in today's smartphones to trace users' vertical movements inside buildings. We make three contributions. First, we present the elevator module for tracking a user's movement in elevators. The elevator module addresses three core challenges that make it difficult to accurately derive displacement from acceleration. Second, we present the stairway module which determines the number of floors a user has traveled on foot. Unlike previous systems that track users' foot steps, our stairway module uses a novel landing counting technique. Third, we present a hybrid architecture that combines the sensor-based components with minimal and practical infrastructure. The infrastructure provides initial anchor and periodic corrections of a user's vertical location indoors. The architecture strikes the right balance between the accuracy of location and the feasibility of deployment for the purpose of emergency communication.	(pdf)
An Autonomic Reliability Improvement System for Cyber-Physical Systems	Leon Wu, Gail Kaiser	2012-09-17	System reliability is a fundamental requirement of cyber-physical systems. Unreliable systems can lead to disruption of service, financial cost and even loss of human life. Typical cyber-physical systems are designed to process large amounts of data, employ software as a system component, run online continuously and retain an operator-in-the-loop because of human judgment and accountability requirements for safety-critical systems. This paper describes a data-centric runtime monitoring system named ARIS (Autonomic Reliability Improvement System) for improving the reliability of these types of cyber-physical systems. ARIS employs automated online evaluation, working in parallel with the cyber-physical system to continuously conduct automated evaluation at multiple stages in the system workflow and provide real-time feedback for reliability improvement. This approach enables effective evaluation of data from cyber-physical systems. For example, abnormal input and output data can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop, who can then take actions and make changes to the system based on these alerts in order to achieve minimal system downtime and higher system reliability. We have implemented ARIS in a large commercial building cyber-physical system in New York City, and our experiment has shown that it is effective and efficient in improving building system reliability.	(pdf)
Hardware-Accelerated Range Partitioning	Lisa Wu, Raymond J. Barker, Martha A. Kim, Kenneth A. Ross	2012-09-05	With global pool of data growing at over 2.5 quinitillion bytes per day and over 90% of all data in existence created in the last two years alone [23], there can be little doubt that we have entered the big data era. This trend has brought database performance to the forefront of high throughput, low energy system design. This paper explores targeted deploy- ment of hardware accelerators to improve the throughput and efficiency of database processing. Partitioning, a critical operation when manipulating large data sets, is often the limiting factor in database performance, and represents a significant amount of the overall runtime of database processing workloads. This paper describes a hardware-software streaming framework and a hardware accelerator for range partitioning, or HARP. The streaming framework offers seamless execution environment for database processing elements such as HARP. HARP offers performance, as well as orders of magnitude gains in power and area efficiency. A detailed analysis of a 32nm physical design shows 9.3 times the throughput of a highly optimized and optimistic software implementation, while consuming just 3.6% of the area and 2.6% of the power of a single Xeon core in the same technology generation.	(pdf)
End-User Regression Testing for Privacy	Swapneel Sheth, Gail Kaiser	2012-08-25	Privacy in social computing systems has become a major concern. End-users of such systems find it increasingly hard to understand complex privacy settings. As software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective.	(pdf)
Chronicler: Lightweight Recording to Reproduce Field Failures	Jonathan Bell, Nikhil Sarda, Gail Kaiser	2012-08-23	When programs fail in the field, developers are often left with limited information to diagnose the failure. Automated error reporting tools can assist in bug report generation but without precise steps from the end user it is often difficult for developers to recreate the failure. Advanced remote debugging tools aim to capture sufficient information from field executions to recreate failures in the lab but often have too much overhead to practically deploy. We present CHRONICLER, an approach to remote debugging that captures non-deterministic inputs to applications in a lightweight manner, assuring faithful reproduction of client executions. We evaluated CHRONICLER by creating a Java implementation, CHRONICLERJ, and then by using a set of benchmarks mimicking real world applications and workloads, showing its runtime overhead to be under 10% in most cases (worst case 86%), while an existing tool showed overhead over 100% in the same cases (worst case 2,322%).	(pdf)
Statically Unrolling Recursion to Improve Opportunities for Parallelism	Neil Deshpande, Stephen A. Edwards	2012-07-13	We present an algorithm for unrolling recursion in the Haskell functional language. Adapted from a similar algorithm proposed by Rugina and Rinard for imperative languages, it essentially inlines a function in itself as many times as requested. This algorithm aims to increase the available parallelism in recursive functions, with an eye toward its eventual application in a Haskell-to-hardware compiler. We first illustrate the technique on a series of examples, then describe the algorithm, and finally show its Haskell source, which operates as a plug-in for the Glasgow Haskell Compiler.	(pdf)
Functional Fibonacci to a Fast FPGA	Stephen A. Edwards	2012-06-16	Through a series of mechanical transformation, I show how a three-line recursive Haskell function (Fibonacci) can be translated into a hardware description language -- VHDL -- for efficient execution on an FPGA. The goal of this report is to lay the groundwork for a compiler that will perform these transformations automatically, hence the development is deliberately pedantic.	(pdf)
High Throughput Heavy Hitter Aggregation	Orestis Polychroniou, Kenneth A. Ross	2012-05-15	Heavy hitters are data items that occur at high frequency in a data set. Heavy hitters are among the most important items for an organization to summarize and understand dur- ing analytical processing. In data sets with sufficient skew, the number of heavy hitters can be relatively small. We take advantage of this small footprint to compute aggregate functions for the heavy hitters in fast cache memory. We design cache-resident, shared-nothing structures that hold only the most frequent elements from the table. Our approach works in three phases. It first samples and picks heavy hitter candidates. It then builds a hash table and computes the exact aggregates of these candidates. Finally, if necessary, a validation step identifies the true heavy hitters from among the candidates based on the query specification. We identify trade-offs between the hash table capacity and performance. Capacity determines how many candidates can be aggregated. We optimize performance by the use of perfect hashing and SIMD instructions. SIMD instructions are utilized in novel ways to minimize cache accesses, be- yond simple vectorized operations. We use bucketized and cuckoo hash tables to increase capacity, to adapt to different datasets and query constraints. The performance of our method is an order of magnitude faster than in-memory aggregation over a complete set of items if those items cannot be cache resident. Even for item sets that are cache resident, our SIMD techniques enable significant performance improvements over previous work.	(pdf)
Improving Efficiency and Reliability of Building Systems Using Machine Learning and Automated Online Evaluation	Leon Wu, Gail Kaiser, David Solomon, Rebecca Winter, Albert Boulanger, Roger Anderson	2012-05-04	A high percentage of newly-constructed commercial office buildings experience energy consumption that exceeds specifications and system failures after being put into use. This problem is even worse for older buildings. We present a new approach, ‘predictive building energy optimization’, which uses machine learning (ML) and automated online evaluation of historical and real-time building data to improve efficiency and reliability of building operations without requiring large amounts of additional capital investment. Our ML approach uses a predictive model to generate accurate energy demand forecasts and automated analyses that can guide optimization of building operations. In parallel, an automated online evaluation system monitors efficiency at multiple stages in the system workflow and provides building operators with continuous feedback. We implemented a prototype of this application in a large commercial building in Manhattan. Our predictive machine learning model applies Support Vector Regression (SVR) to the building’s historical energy use and temperature and wet-bulb humidity data from the building’s interior and exterior in order to model performance for each day. This predictive model closely approximates actual energy usage values, with some seasonal and occupant-specific variability, and the dependence of the data on day-of-the-week makes the model easily applicable to different types of buildings with minimal adjustment. In parallel, an automated online evaluator monitors the building’s internal and external conditions, control actions and the results of those actions. Intelligent real-time data quality analysis components quickly detect anomalies and automatically transmit feedback to building management, who can then take necessary preventive or corrective actions. Our experiments show that this evaluator is responsive and effective in further ensuring reliable and energy-efficient operation of building systems.	(pdf)
Aperture Evaluation for Defocus Deblurring and Extended Depth of Field	Changyin Zhou, Shree Nayar	2012-04-17	For a given camera setting, scene points that lie outside of depth of field (DOF) will appear defocused (or blurred). Defocus causes the loss of image details. To recover scene details from a defocused region, deblurring techniques must be employed. It is well known that the deblurring quality is closely related to the defocus kernel or point-spread-function (PSF), whose shape is largely determined by the aperture pattern of the camera. In this paper, we propose a comprehensive framework of aperture evaluation for the purpose of defocus deblurring, which takes the effects of image noise, deblurring algorithm, and the structure of natural images into account. By using the derived evaluation criterion, we are able to solve for the optimal coded aperture patterns. Extensive simulations and experiments are then conducted to compare the optimized coded apertures with previously proposed ones. The proposed framework of aperture evaluation is further extended to evaluate and optimize extended depth of field (EDOF) cameras. EDOF cameras (e.g., focal sweep and wavefront coding camera) are designed to produce PSFs which are less sensitive to depth variation, so that people can deconvolve the whole image using a single PSF without knowing scene depth. Different choices of camera parameters or the PSF to deconvolve with lead to different deblurring qualities. With the derived evaluation criterion, we are able to derive the optimal PSF to deconvolve with in a closed-form and optimize camera parameters for the best deblurring results.	(pdf)
Partitioned Blockmap Indexes for Multidimensional Data Access	Kenneth Ross, Evangelia Sitaridi	2012-04-16	Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an in-memory column store tuned for on-line analytical processing or scientific data analysis. We propose a multidimensional data structure that contains a novel combination of a grid array and several bitmaps. The base data is clustered in an order matching that of the index structure. The bitmaps contain one bit per block of data, motivating the term ``blockmap.'' The proposed data structures are compact, typically taking less than one bit of space per row of data. Partition boundaries can be chosen in a way that reflects both the query workload and the data distribution, and boundaries are not required to evenly divide the data if there is a bias in the query distribution. We examine the theoretical performance of the data structure and experimentally measure its performance on three modern CPUs and one GPU processor. We demonstrate that efficient multidimensional access can be achieved with minimal space overhead.	(pdf)
Experiments of Image Retrieval Using Weak Attributes	Felix X. Yu, Rongrong Ji, Ming-Hen Tsai, Guangnan Ye, Shih-Fu Chang	2012-04-06	Searching images based on descriptions of image attributes is an intuitive process that can be easily understood by humans and recently made feasible by a few promising works in both the computer vision and multimedia communities. In this report, we describe some experiments of image retrieval methods that utilize weak attributes.	(pdf)
When Does Computational Imaging Improve Performance?	Oliver Cossairt, Mohit Gupta, Shree Nayar	2012-03-24	A number of computational imaging techniques have been introduced to improve image quality by increasing light throughput. These techniques use optical coding to measure a stronger signal level. However, the performance of these techniques is limited by the decoding step, which amplifies noise. While it is well understood that optical coding can increase performance at low light levels, little is known about the quantitative performance advantage of computational imaging in general settings. In this paper, we derive the performance bounds for various computational imaging techniques. We then discuss the implications of these bounds for several real-world scenarios (illumination conditions, scene properties and sensor noise characteristics). Our results show that computational imaging techniques provide a significant performance advantage in a surprisingly small set of real world settings. These results can be readily used by practitioners to design the most suitable imaging systems given the application at hand.	(pdf)
High Availability for Carrier-Grade SIP Infrastructure on Cloud Platforms	Jong Yul Kim, Henning Schulzrinne	2012-03-19	SIP infrastructure on cloud platforms has the potential to be both scalable and highly available. In our previous project, we focused on the scalability aspect of SIP services on cloud platforms; the focus of this project is on the high availability aspect. We investigated the effects of component fault on service availability with the goal of understanding how high availability can be guaranteed even in the face of component faults. The experiments were conducted empirically on a real system that runs on Amazon EC2. Our analysis shows that most component faults are masked with a simple automatic failover technique. However, we have also identified fundamental problems that cannot be addressed by simple failover techniques; a problem involving DNS cache in resolvers and a problem involving static failover configurations. Recommendations on how to solve these problems are included in the report.	(pdf)
Automatic Detection of Metamorphic Properties of Software	Sahar Hasan	2012-03-14	The goal of this project is to demonstrate the feasibility of automatic detection of metamorphic properties of individual functions. Properties of interest here, as described in Murphy et al.’s SEKE 2008 paper “Properties of Machine Learning Applications for Use in Metamorphic Testing”, include: 1. Permutation of the order of the input data 2. Addition of numerical values by a constant 3. Multiplication of numerical values by a constant 4. Reversal of the order of the input data 5. Removal of part of the data 6. Addition of data to the dataset While focusing on permutative, additive, and multiplicative properties in functions and applications, I have sought to identify common programming constructs and code fragments that strongly indicate that these properties will hold, or fail to hold, along an execution path in which the code is evaluated. I have constructed a syntax for expressions representing these common constructs and have also mapped a collection of these expressions to the metamorphic properties they uphold or invalidate. I have then developed a general framework to evaluate these properties for programs as a whole.	(pdf)
CloudFence: Enabling Users to Audit the Use of their Cloud-Resident Data	Vasilis Pappas, Vasileios P. Kemerlis, Angeliki Zavou, Michalis Polychronakis, Angelos D. Keromytis	2012-01-24	One of the primary concerns of users of cloud-based services and applications is the risk of unauthorized access to their private information. For the common setting in which the infrastructure provider and the online service provider are different, end users have to trust their data to both parties, although they interact solely with the service provider. This paper presents CloudFence, a framework that allows users to independently audit the treatment of their private data by third-party online services, through the intervention of the cloud provider that hosts these services. CloudFence is based on a fine-grained data flow tracking platform exposed by the cloud provider to both developers of cloud-based applications, as well as their users. Besides data auditing for end users, CloudFence allows service providers to confine the use of sensitive data in well-defined domains using data tracking at arbitrary granularity, offering additional protection against inadvertent leaks and unauthorized access. The results of our experimental evaluation with real-world applications, including an e-store platform and a cloud-based backup service, demonstrate that CloudFence requires just a few changes to existing application code, while it can detect and prevent a wide range of security breaches, ranging from data leakage attacks using SQL injection, to personal data disclosure due to missing or erroneously implemented access control checks.	(pdf)
Failure Analysis of the New York City Power Grid	Leon Wu, Roger Anderson, Albert Boulanger, Cynthia Rudin, Gail Kaiser	2012-01-12	As U.S. power grid transforms itself into Smart Grid, it has become less reliable in the past years. Power grid failures lead to huge financial cost and affect peopleâ€™s life. Using a statistical analysis and holistic approach, this paper analyzes the New York City power grid failures: failure patterns and climatic effects. Our findings include: higher peak electrical load increases likelihood of power grid failure; increased subsequent failures among electrical feeders sharing the same substation; underground feeders fail less than overhead feeders; cables and joints installed during certain years are more likely to fail; higher weather temperature leads to more power grid failures. We further suggest preventive maintenance, intertemporal consumption, and electrical load optimization for failure prevention. We also estimated that the predictability of the power grid component failures correlates with the cycles of the North Atlantic Oscillation (NAO) Index.	(pdf)
NetServ: Reviving Active Networks	Jae Woo Lee, Roberto Francescangeli, Wonsang Song, Emanuele Maccherani, Jan Janak, Suman Srinivasan	2012-01-05	In 1996, Tennenhouse and Wetherall proposed active net- works, where users can inject code modules into network nodes. The proposal sparked intense debate and follow- on research, but ultimately failed to win over the net- working community. Fifteen years later, the problems that motivated the active networks proposal persist. We call for a revival of active networks. We present NetServ, a fully integrated active network system that provides all the necessary functionality to be deployable, addressing the core problems that prevented the practical success of earlier approaches. We make the following contributions. We present a hybrid approach to active networking, which combines the best qualities from the two extreme approachesï¿½ integrated and discrete. We built a working system that strikes the right balance between security and perfor- mance by leveraging current technologies. We suggest an economic model based on NetServ between content providers and ISPs. We built four applications to illus- trate the model.	(pdf)
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft	Jonathan Bell, Swapneel Sheth, Gail Kaiser	2011-12-29	We present a survey of usage of the popular Massively Multiplayer Online Role Playing Game, World of Warcraft. Players within this game often self-organize into communities with similar interests and/or styles of play. By mining publicly available data, we collected a dataset consisting of the complete player history for approximately six million characters, with partial data for another six million characters. The paper provides a thorough description of the distributed approach used to collect this massive community data set, and then focuses on an analysis of player achievement data in particular, exposing trends in play from this highly successful game. From this data, we present several findings regarding player profiles. We correlate achievements with motivations based upon a previously-defined motivation model, and then classify players based on the categories of achievements that they pursued. Experiments show players who fall within each of these buckets can play differently, and that as players progress through game content, their play style evolves as well.	(pdf)
GRAND: Git Revisions As Named Data	Jan Janak, Jae Woo Lee, Henning Schulzrinne	2011-12-12	GRAND is an experimental extension of Git, a distributed revision control system, which enables the synchronization of Git repositories over Content-Centric Networks (CCN). GRAND brings some of the benefits of CCN to Git, such as transparent caching, load balancing, and the ability to fetch objects by name rather than location. Our implementation is based on CCNx, a reference implementation of content router. The current prototype consists of two components: git-daemon-ccnx allows a node to publish its local Git repositories to CCNx Content Store; git-remote-ccnx implements CCNx transport on the client side. This adds CCN to the set of transport protocols supported by Git, alongside HTTP and SSH.	(pdf)
ActiveCDN: Cloud Computing Meets Content Delivery Networks	Suman Srinivasan, Jae Woo Lee, Dhruva Batni, Henning Schulzrinne	2011-11-15	Content delivery networks play a crucial role in today’s Internet. They serve a large portion of the multimedia on the Internet and solve problems of scalability and indirectly network congestion (at a price). However, most content delivery networks rely on a statically deployed configuration of nodes and network topology that makes it hard to grow and scale dynamically. We present ActiveCDN, a novel CDN architecture that allows a content publisher to dynamically scale their content delivery services using network virtualization and cloud computing techniques.	(pdf)
libdft: Practical Dynamic Data Flow Tracking for Commodity Systems	Vasileios P. Kemerlis, Georgios Portokalidis, Kangkook Jee, Angelos D. Keromytis	2011-10-27	Dynamic data flow tracking (DFT) deals with the tagging and tracking of "interesting" data as they propagate during program execution. DFT has been repeatedly implemented by a variety of tools for numerous purposes, including protection from zero-day and cross-site scripting attacks, detection and prevention of information leaks, as well as for the analysis of legitimate and malicious software. We present libdft, a dynamic DFT framework that unlike previous work is at once fast, reusable, and works with commodity software and hardware. libdft provides an API, which can be used to painlessly deliver DFT-enabled tools that can be applied on unmodified binaries, running on common operating systems and hardware, thus facilitating research and rapid prototyping. We explore different approaches for implementing the low-level aspects of instruction-level data tracking, introduce a more efficient and 64-bit capable shadow memory, and identify (and avoid) the common pitfalls responsible for the excessive performance overhead of previous studies. We evaluate libdft using real applications with large codebases like the Apache and MySQL servers, and the Firefox web browser. We also use a series of benchmarks and utilities to compare libdft with similar systems. Our results indicate that it performs at least as fast, if not faster, than previous solutions, and to the best of our knowledge, we are the first to evaluate the performance overhead of a fast dynamic DFT implementation in such depth. Finally, our implementation is freely available as open source software.	(pdf)
Money for Nothing and Privacy for Free?	Swapneel Sheth, Tal Malkin, Gail Kaiser	2011-10-10	Privacy in the context of ubiquitous social computing systems has become a major concern for the society at large. As the number of online social computing systems that collect user data grows, this privacy threat is further exacerbated. There has been some work (both, recent and older) on addressing these privacy concerns. These approaches typically require extra computational resources, which might be beneficial where privacy is concerned, but when dealing with Green Computing and sustainability, this is not a great option. Spending more computation time results in spending more energy and more resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable - systems where privacy could be achieved ``for free,'' i.e., without having to spend extra computational effort. In this paper, we describe how privacy can be achieved for free - an accidental and beneficial side effect of doing some existing computation - and what types of privacy threats it can mitigate. More precisely, we describe a ``Privacy for Free'' design pattern and show its feasibility, sustainability, and utility in building complex social computing systems.	(pdf)
Forecasting Energy Demand in Large Commercial Buildings Using Support Vector Machine Regression	David Solomon, Rebecca Winter, Albert Boulanger, Roger Anderson, Leon Wu	2011-09-24	As our society gains a better understanding of how humans have negatively impacted the environment, research related to reducing carbon emissions and overall energy consumption has become increasingly important. One of the simplest ways to reduce energy usage is by making current buildings less wasteful. By improving energy efficiency, this method of lowering our carbon footprint is particularly worthwhile because it reduces energy costs of operating the building, unlike many environmental initiatives that require large monetary investments. In order to improve the efficiency of the heating, ventilation, and air conditioning (HVAC) system of a Manhattan skyscraper, 345 Park Avenue, a predictive computer model was designed to forecast the amount of energy the building will consume. This model uses Support Vector Machine Regression (SVMR), a method that builds a regression based purely on historical data of the building, requiring no knowledge of its size, heating and cooling methods, or any other physical properties. SVMR employs time-delay coordinates as a representation of the past to create the feature vectors for SVM training. This pure dependence on historical data makes the model very easily applicable to different types of buildings with few model adjustments. The SVM regression model was built to predict a week of future energy usage based on past energy, temperature, and dew point temperature data.	(pdf)
Privacy Enhanced Access Control for Outsourced Data Sharing	Mariana Raykova, Hang Zhao, Steven Bellovin	2011-09-20	Traditional access control models often assume that the entity enforcing access control policies is also the owner of data and resources. This assumption no longer holds when data is outsourced to a third-party storage provider, such as the \emph{cloud}. Existing access control solutions mainly focus on preserving confidentiality of stored data from unauthorized access and the storage provider. However, in this setting, access control policies as well as users' access patterns also become privacy sensitive information that should be protected from the cloud. We propose a two-level access control scheme that combines coarse-grained access control enforced at the cloud, which allows to get acceptable communication overhead and at the same time limits the information that the cloud learns from his partial view of the access rules and the access patterns, and fine-grained cryptographic access control enforced at the user's side, which provides the desired expressiveness of the access control policies. Our solution handles both \emph{read} and \emph{write} access control.	(pdf)
Stable Flight and Object Tracking with a Quadricopter using an Android Device	Benjamin Bardin, William Brown, Paul S. Blaer	2011-09-09	We discuss a novel system architecture for quadricopter control, the Robocopter platform, in which the quadricopter can behave near-autonomously and processing is handled by an Android device on the quadricopter. The Android device communicates with a laptop, receiving commands from the host and sending imagery and sensor data back. We also discuss the results of a series of tests of our platform on our first hardware iteration, named Jabberwock.	(pdf)
NetServ on OpenFlow 1.0	Emanuele Maccherani, Jae Woo Lee, Mauro Femminella, Gianluca Reali, Henning Schulzrinne	2011-09-03	We describe the initial prototype implementation of OpenFlow-based NetServ.	(pdf)
Improving System Reliability for Cyber-Physical Systems	Leon Wu	2011-08-31	System reliability is a fundamental requirement of Cyber-Physical System, i.e., a system featuring a tight combination of, and coordination between, the systems computational and physical elements. Cyber-physical system includes systems ranging from the critical infrastructure such as power grid and transportation system to the health and biomedical devices. An unreliable system often leads to disruption of service, financial cost and even loss of human life. This thesis aims to improve system reliability for cyber-physical systems that meet following criteria: processing large amount of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and accountability requirement for safety critical systems. The reason that I limit the system scope to this type of cyber-physical system is that this type of cyber-physical systems are important and becoming more prevalent. To improve system reliability for this type of cyber-physical systems, I propose a system evaluation approach named automated online evaluation. It works in parallel with the cyber-physical system to conduct automated evaluation at the multiple stages along the workflow of the system continuously and provide operator-in-the-loop feedback on reliability improvement. It is an approach whereby data from cyber-physical system is evaluated. For example, abnormal input and output data can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and higher system reliability. To implement the proposed approach, I further propose a system architecture named ARIS (Autonomic Reliability Improvement System). One technique used by the approach is data quality analysis using computational intelligence that applies computational intelligence in evaluating data quality in some automated and efficient way to ensure data quality and make sure the running system to perform as expected reliably. The computational intelligence is enabled by machine learning, data mining, statistical and probabilistic analysis, and other intelligent techniques. In a cyber-physical system, the data collected from the system, e.g., software bug reports, system status logs and error reports, are stored in some databases. In my approach, these data are analyzed via data mining and other intelligent techniques so that useful information on system reliability including erroneous data and abnormal system state can be concluded. These reliability related information are directed to operators so that proper actions can be taken, sometimes proactively based on the predictive results, to ensure the proper and reliable execution of the system. Another technique used by the approach is self-tuning that automatically self-manages and self-configures the evaluation system to ensure it adapts itself based on the changes in the system and feedback from the operator. The self-tuning adapts the evaluation system to ensure its proper functioning, which leads to a more robust evaluation system and improved system reliability. For feasibility study of the proposed approach, I first present NOVA (Neutral Online Visualization-aided Autonomic) system, a data quality analysis system for improving system reliability for power grid cyber-physical system. I then present a feasibility study on effectiveness of some self-tuning techniques, including data classification, redundancy checking and trend detection. The self-tuning leads to an adaptive evaluation system that works better under system changes and operator feedback, which will lead to improved system reliability. The contribution of the work is an automated online evaluation approach that is able to improve system reliability for cyber-physical systems in the domain of interest as indicated above. It enables online reliability assurance of the deployed systems that are not possible to perform robust tests prior to actual deployment.	(pdf)
Describable Visual Attributes for Face Images	Neeraj Kumar	2011-08-01	We introduce the use of describable visual attributes for face images. Describable visual attributes are labels that can be given to an image to describe its appearance. This thesis focuses mostly on images of faces and the attributes used to describe them, although the concepts also apply to other domains. Examples of face attributes include gender, age, jaw shape, nose size, etc. The advantages of an attribute-based representation for vision tasks are manifold: they can be composed to create descriptions at various levels of specificity; they are generalizable, as they can be learned once and then applied to recognize new objects or categories without any further training; and they are efficient, possibly requiring exponentially fewer attributes (and training data) than explicitly naming each category. We show how one can create and label large datasets of real-world images to train classifiers which measure the presence, absence, or degree to which an attribute is expressed in images. These classifiers can then automatically label new images. We demonstrate the current effectiveness and explore the future potential of using attributes for image search, automatic face replacement in images, and face verification, via both human and computational experiments. To aid other researchers in studying these problems, we introduce two new large face datasets, named FaceTracer and PubFig, with labeled attributes and identities, respectively. Finally, we also show the effectiveness of visual attributes in a completely different domain: plant species identification. To this end, we have developed and publicly released the Leafsnap system, which has been downloaded by over half a million users. The mobile phone application is a flexible electronic field guide with high-quality images of the tree species in the Northeast US. It also gives users instant access to our automatic recognition system, greatly simplifying the identification process.	(pdf)
ICOW: Internet Access in Public Transit Systems	Se Gi Hong, SungHoon Seo, Henning Schulzrinne, Prabhakar Chitrapu	2011-07-27	When public transportation stations have access points to provide Internet access to passengers, public transportation becomes a more attractive travel and commute option. However, the Internet connectivity is intermittent because passengers can access the Internet only when a bus or a train is within the networking coverage of an access point at a stop. To efficiently handle this intermittent network for the public transit system, we propose Internet Cache on Wheels (ICOW), a system that provides a low-cost way for bus and train operators to offer access to Internet content. Each bus or train car is equipped with a smart cache that serves popular content to passengers. The cache updates its content based on passenger requests when it is within range of Internet access points placed at bus stops, train stations or depots. We have developed a system architecture and built a prototype of the ICOW system. Our evaluation and analysis show that ICOW is significantly more efficient than having passengers contact Internet access points individually and ensures continuous availability of content throughout the journey.	(pdf)
Data Quality Assurance and Performance Measurement of Data Mining for Preventive Maintenance of Power Grid	Leon Wu, Gail Kaiser, Cynthia Rudin, Roger Anderson	2011-07-01	Ensuring reliability as the electrical grid morphs into the "smart grid" will require innovations in how we assess the state of the grid, for the purpose of proactive maintenance, rather than reactive maintenance; in the future, we will not only react to failures, but also try to anticipate and avoid them using predictive modeling (machine learning and data mining) techniques. To help in meeting this challenge, we present the Neutral Online Visualization-aided Autonomic evaluation framework (NOVA) for evaluating machine learning and data mining algorithms for preventive maintenance on the electrical grid. NOVA has three stages provided through a unified user interface: evaluation of input data quality, evaluation of machine learning and data mining results, and evaluation of the reliability improvement of the power grid. A prototype version of NOVA has been deployed for the power grid in New York City, and it is able to evaluate machine learning and data mining systems effectively and efficiently.	(pdf)
Columbia University WiMAX Campus Deployment and Installation	SungHoon Seo, Jan Janak, Henning Schulzrinne	2011-06-27	This report describes WiMAX campus deployment and installation at Columbia University.	(pdf)
The Benefits of Using Clock Gating in the Design of Networks-on-Chip	Michele Petracca, Luca P. Carloni	2011-06-21	Networks-on-chip (NoC) are critical to the design of complex multi-core system-on-chip (SoC) architectures. Since SoCs are characterized by a combination of high performance requirements and stringent energy constraints, NoCs must be realized with low-power design techniques. Since the use of semicustom design flow based on standard-cell technology libraries is essential to cope with the SoC design complexity challenges under tight time-to-market constraints, NoC must be implemented using logic synthesis. In this paper we analyze the major power reduction that clock gating can deliver when applied to the synthesis of a NoC in the context of a semi-custom automated design flow.	(pdf)
Secret Ninja Testing with HALO Software Engineering	Jonathan Bell, Swapneel Sheth, Gail Kaiser	2011-06-21	Software testing traditionally receives little attention in early computer science courses. However, we believe that if exposed to testing early, students will develop positive habits for future work. As we have found that students typically are not keen on testing, we propose an engaging and socially-oriented approach to teaching software testing in introductory and intermediate computer science courses. Our proposal leverages the power of gaming utilizing our previously described system HALO. Unlike many previous approaches, we aim to present software testing in disguise - so that students do not recognize (at first) that they are being exposed to software testing. We describe how HALO could be integrated into course assignments as well as the benefits that HALO creates.	(pdf)
Markov Models for Network-Behavior Modeling and Anonymization	Yingbo Song, Salvatore J. Stolfo, Tony Jebara	2011-06-15	Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks -- particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymity-by-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network "behavior" and introduce a kernel-based framework for anonymity which fits together naturally with network-data modeling.	(pdf)
Concurrency Attacks	Junfeng Yang, Ang Cui, John Gallagher, Sal Stolfo, Simha Sethumadhavan	2011-06-02	Just as errors in sequential programs can lead to security exploits, errors in concurrent programs can lead to concurrency attacks. In this paper, we present an in-depth study of concurrency attacks and how they may affect existing defenses. Our study yields several interesting findings. For instance, we find that concurrency attacks can corrupt non-pointer data, such as user identifiers, which existing memory-safety defenses cannot handle. Inspired by our findings, we propose new defense directions and fixes to existing defenses.	(pdf)
Constructing Subtle Concurrency Bugs Using Synchronization-Centric Second-Order Mutation Operators	Leon Wu, Gail Kaiser	2011-06-01	Mutation testing applies mutation operators to modify program source code or byte code in small ways, and then runs these modified programs (i.e., mutants) against a test suite in order to evaluate the quality of the test suite. In this paper, we first describe a general fault model for con- current programs and some limitations of previously developed sets of first-order concurrency mutation operators. We then present our new mutation testing approach, which em- ploys synchronization-centric second-order mutation operators that are able to generate subtle concurrency bugs not represented by the first-order mutation. These operators are used in addition to the synchronization-centric first-order mutation operators to form a small set of effective concurrency mutation operators for mutant generation. Our empirical study shows that our set of operators is effective in mutant generation with limited cost and demonstrates that this new approach is easy to implement.	(pdf)
BUGMINER: Software Reliability Analysis Via Data Mining of Bug Reports	Leon Wu, Boyi Xie, Gail Kaiser, Rebecca Passonneau	2011-06-01	Software bugs reported by human users and automatic error reporting software are often stored in some bug track- ing tools (e.g., Bugzilla and Debbugs). These accumulated bug reports may contain valuable information that could be used to improve the quality of the bug reporting, reduce the quality assurance effort and cost, analyze software re- liability, and predict future bug report trend. In this paper, we present BUGMINER, a tool that is able to derive useful information from historic bug report database using data mining, use these information to do completion check and redundancy check on a new or given bug report, and to estimate the bug report trend using statistical analysis. Our empirical studies of the tool using several real-world bug report repositories show that it is effective, easy to implement, and has relatively high accuracy despite low quality data.	(pdf)
Evaluating Machine Learning for Improving Power Grid Reliability	Leon Wu, Gail Kaiser, Cynthia Rudin, David Waltz, Roger Anderson, Albert Boulanger	2011-06-01	Ensuring reliability as the electrical grid morphs into the “smart grid” will require innovations in how we assess the state of the grid, for the purpose of proactive maintenance, rather than reactive maintenance – in the future, we will not only react to failures, but also try to anticipate and avoid them using predictive modeling (ma- chine learning) techniques. To help in meeting this challenge, we present the Neutral Online Visualization-aided Autonomic evaluation framework (NOVA) for evaluating machine learning algorithms for preventive maintenance on the electrical grid. NOVA has three stages provided through a unified user interface: evaluation of input data quality, evaluation of machine learning results, and evaluation of the reliability improvement of the power grid. A prototype version of NOVA has been deployed for the power grid in New York City, and it is able to evaluate machine learning systems effectively and efficiently.	(pdf)
Entropy, Randomization, Derandomization, and Discrepancy	Michael Gnewuch	2011-05-24	Abstract The star discrepancy is a measure of how uniformly distributed a finite point set is in the d-dimensional unit cube. It is related to high-dimensional numerical integration of certain function classes as expressed by the Koksma-Hlawka inequality. A sharp version of this inequality states that the worst-case error of approximating the integral of functions from the unit ball of some Sobolev space by an equal-weight cubature is exactly the star discrepancy of the set of sample points. In many applications, as, e.g., in physics, quantum chemistry or finance, it is essential to approximate high-dimensional integrals. Thus with regard to the Koksma- Hlawka inequality the following three questions are very important: (i) What are good bounds with explicitly given dependence on the dimension d for the smallest possible discrepancy of any n-point set for moderate n? (ii) How can we construct point sets efficiently that satisfy such bounds? (iii) How can we calculate the discrepancy of given point sets efficiently? We want to discuss these questions and survey and explain some approaches to tackle them relying on metric entropy, randomization, and derandomization.	(pdf) (ps)
A NEW RANDOMIZED ALGORITHM TO APPROXIMATE THE STAR DISCREPANCY BASED ON THRESHOLD ACCEPTING	MICHAEL GNEWUCH, MAGNUS WAHLSTROM, CAROLA WINZEN	2011-05-24	Abstract. We present a new algorithm for estimating the star discrepancy of arbitrary point sets. Similar to the algorithm for discrepancy approximation of Winker and Fang [SIAM J. Numer. Anal. 34 (1997), 2028{2042] it is based on the optimization algorithm threshold accepting. Our improvements include, amongst others, a non-uniform sampling strategy which is more suited for higher-dimensional inputs and additionally takes into account the topological characteristics of given point sets, and rounding steps which transform axis-parallel boxes, on which the discrepancy is to be tested, into critical test boxes. These critical test boxes provably yield higher discrepancy values, and contain the box that exhibits the maximum value of the local discrepancy. We provide comprehensive experiments to test the new algorithm. Our randomized algorithm computes the exact discrepancy frequently in all cases where this can be checked (i.e., where the exact discrepancy of the point set can be computed in feasible time). Most importantly, in higher dimension the new method behaves clearly better than all previously known methods.	(pdf) (ps)
Cells: A Virtual Mobile Smartphone Architecture	Jeremy Andrus, Christoffer Dall, Alexander Van't Hoff, Oren Laadan, Jason Nieh	2011-05-24	Cellphones are increasingly ubiquitous, so much so that many users are inconveniently forced to carry multiple cellphones to accommodate work, personal, and geographic mobility needs. We present Cells, a virtualization architecture for enabling multiple virtual smartphones to run simultaneously on the same physical cellphone device in a securely isolated manner. Cells introduces a usage model of having one foreground virtual phone and multiple background virtual phones. This model enables a new device namespace mechanism and novel device proxies that integrate with lightweight operating system virtualization to efficiently and securely multiplex phone hardware devices across multiple virtual phones while providing native hardware device performance to all applications. Virtual phone features include fully-accelerated graphics for gaming, complete power management features, and full telephony functionality with separately assignable telephone numbers and caller ID support. We have implemented a Cells prototype that supports multiple Android virtual phones on the same phone hardware. Our performance results demonstrate that Cells imposes only modest runtime and memory overhead, works seamlessly across multiple hardware devices including Google Nexus 1 and Nexus S phones and an NVIDIA tablet, and transparently runs all existing Android applications without any modifications.	(pdf)
Towards Diversity in Recommendations using Social Networks	Swapneel Sheth, Jonathan Bell, Nipun Arora, Gail Kaiser	2011-05-17	While there has been a lot of research towards improving the accuracy of recommender systems, the resulting systems have tended to become increasingly narrow in suggestion variety. An emerging trend in recommendation systems is to actively seek out diversity in recommendations, where the aim is to provide unexpected, varied, and serendipitous recommendations to the user. Our main contribution in this paper is a new approach to diversity in recommendations called ``Social Diversity,'' a technique that uses social network information to diversify recommendation results. Social Diversity utilizes social networks in recommender systems to leverage the diverse underlying preferences of different user communities to introduce diversity into recommendations. This form of diversification ensures that users in different social networks (who may not collaborate in real life, since they are in a different network) share information, helping to prevent siloization of knowledge and recommendations. We describe our approach and show its feasibility in providing diverse recommendations for the MovieLens dataset.	(pdf)
Combining a Baiting and a User Search Profiling Techniques for Masquerade Detection	Malek Ben Salem, Salvatore J. Stolfo	2011-05-06	Masquerade attacks are characterized by an adversary stealing a legitimate user's credentials and using them to impersonate the victim and perform malicious activities, such as stealing information. Prior work on masquerade attack detection has focused on proling legitimate user behavior and detecting abnormal behavior indicative of a masquerade attack. Like any anomaly-detection based techniques, detecting masquerade attacks by proling user behavior suers from a signicant number of false positives. We extend prior work and provide a novel integrated detection approach in this paper. We combine a user behavior proling technique with a baiting technique in order to more accurately detect masquerade activity. We show that using this integrated approach reduces the false positives by 36% when compared to user behavior proling alone, while achieving almost perfect detection results.We also show how this combined detection approach serves as a mechanism for hardening the masquerade attack detector against mimicry attacks.	(pdf)
DYSWIS: Collaborative Network Fault Diagnosis - Of End-users, By End-users, For End-users	Kyung Hwa Kim, Vishal Singh, Henning Schulzrinne	2011-05-05	With increase in application complexity, the need for network faults diagnosis for end-users has increased. However, existing failure diagnosis techniques fail to assist the endusers in accessing the applications and services. We present DYSWIS, an automatic network fault detection and diagnosis system for end-users. The key idea is collaboration of end-users; a node requests multiple nodes to diagnose a network fault in real time to collect diverse information from different parts of the networks and infer the cause of failure. DYSWIS leverages DHT network to search the collaborating nodes with appropriate network properties required to diagnose a failure. The framework allows dynamic updating of rules and probes into a running system. Another key aspect is contribution of expert knowledge (rules and probes) by application developers, vendors and network administrators; thereby enabling crowdsourcing of diagnosis strategy for growing set of applications. We have implemented the framework and the software and tested them using our test bed and PlanetLab to show that several complex commonly occurring failures can be detected and diagnosed successfully using DYSWIS, while single-user probe with traditional tools fails to pinpoint the cause of such failures. We validate that our base modules and rules are sufficient to detect infrastructural failures causing majority of application failures.	(pdf)
NetServ Framework Design and Implementation 1.0	Jae Woo Lee, Roberto Francescangeli, Wonsang Song, Jan Janak, Suman Srinivasan, Michael S. Kester	2011-05-04	Eyeball ISPs today are under-utilizing an important asset: edge routers. We present NetServ, a programmable node architecture aimed at turning edge routers into distributed service hosting platforms. This allows ISPs to allocate router resources to content publishers and application service pro\-vi\-ders motivated to deploy content and services at the network edge. This model provides important benefits over currently available solutions like CDN. Content and services can be brought closer to end users by dynamically installing and removing custom modules as needed throughout the network. Unlike previous programmable router proposals which focused on customizing features of a router, NetServ focuses on deploying content and services. All our design decisions reflect this change in focus. We set three main design goals: a wide-area deployment, a multi-user execution environment, and a clear economic benefit. We built a prototype using Linux, NSIS signaling, and the Java OSGi framework. We also implemented four prototype applications: ActiveCDN provides publisher-specific content distribution and processing; KeepAlive Responder and Media Relay reduce the infrastructure needs of telephony providers; and Overload Control makes it possible to deploy more flexible algorithms to handle excessive traffic.	(pdf)
Estimation of System Reliability Using a Semiparametric Model	Leon Wu, Timothy Teravainen, Gail Kaiser, Roger Anderson, Albert Boulanger, Cynthia Rudin	2011-04-20	An important problem in reliability engineering is to predict the failure rate, that is, the frequency with which an engineered system or component fails. This paper presents a new method of estimating failure rate using a semiparametric model with Gaussian process smoothing. The method is able to provide accurate estimation based on historical data and it does not make strong a priori assumptions of failure rate pattern (e.g., constant or monotonic). Our experiments of applying this method in power system failure data compared with other models show its efficacy and accuracy. This method can be used in estimating reliability for many other systems, such as software systems or components.	(pdf)
Beyond Trending Topics: Real-World Event Identification on Twitter	Hila Becker, Mor Naaman, Luis Gravano	2011-03-25	User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, earlier than other social media sites such as Flickr or YouTube, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages. Our approach relies on a rich family of aggregate statistics of topically similar message clusters, including temporal, social, topical, and Twitter-centric features. Our large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.	(pdf)
Efficient, Deterministic and Deadlock-free Concurrency	Nalini Vasudevan	2011-03-25	Concurrent programming languages are growing in importance with the advent of multicore systems. Two major concerns in any concurrent program are data races and deadlocks. Each are potentially subtle bugs that can be caused by nondeterministic scheduling choices in most concurrent formalisms. Unfortunately, traditional race and deadlock detection techniques fail on both large programs, and small programs with complex behaviors. We believe the solution is model-based design, where the programmer is presented with a constrained higher-level language that prevents certain unwanted behavior. We present the SHIM model that guarantees the absence of data races by eschewing shared memory. This dissertation provides SHIM based techniques that aid determinism - models that guarantee determinism, compilers that generate deterministic code and libraries that provide deterministic constructs. Additionally, we avoid deadlocks, a consequence of improper synchronization. A SHIM program may deadlock if it violates a communication protocol. We provide efficient techniques for detecting and deterministically breaking deadlocks in programs that use the SHIM model. We evaluate the efficiency of our techniques with a set of benchmarks. We have also extended our ideas to other languages. The ultimate goal is to provide deterministic deadlock-free concurrency along with efficiency. Our hope is that these ideas will be used in the future while designing complex concurrent systems.	(pdf)
Implementing Zeroconf in Linphone	Abhishek Srivastava, Jae Woo Lee, Henning Schulzrinne	2011-03-05	This report describes the motivation behind implementing Zeroconf in a open source SIP phone(Linphone) and the architecture of the solution implemented. It also describes the roadblocks encountered and how they were tackled in the implementation. It concludes with a few mentions about future enhancements that may be implemented on a later date.	(pdf)
Frank Miller: Inventor of the One-Time Pad	Steven M. Bellovin	2011-03-01	The invention of the one-time pad is generally credited to Gilbert S. Vernam and Joseph O. Mauborgne. We show that it was invented about 35 years earlier by a Sacramento banker named Frank Miller. We provide a tentative identification of which Frank Miller it was, and speculate on whether or not Mauborgne might have known of Miller's work, especially via his colleague Parker Hitt.	(pdf)
The Failure of Online Social Network Privacy Settings	Michelle Madejski, Maritza Johnson, Steven M. Bellovin	2011-02-23	Increasingly, people are sharing sensitive personal information via online social networks (OSN). While such networks do permit users to control what they share with whom, access control policies are notoriously difficult to configure correctly; this raises the question of whether OSN users' privacy settings match their sharing intentions. We present the results of an empirical evaluation that measures privacy attitudes and intentions and compares these against the privacy settings on Facebook. Our results indicate a serious mismatch: every one of the 65 participants in our study confirmed that at least one of the identified violations was in fact a sharing violation. In other words, OSN users' privacy settings are incorrect. Furthermore, a majority of users cannot or will not fix such errors. We conclude that the current approach to privacy settings is fundamentally flawed and cannot be fixed; a fundamentally different approach is needed. We present recommendations to ameliorate the current problems, as well as provide suggestions for future research.	(pdf)
HALO (Highly Addictive, sociaLly Optimized) Software Engineering	Swapneel Sheth, Jonathan Bell, Gail Kaiser	2011-02-08	In recent years, computer games have become increasingly social and collaborative in nature. Massively multiplayer online games, in which a large number of players collaborate with each other to achieve common goals in the game, have become extremely pervasive. By working together towards a common goal, players become more engrossed in the game. In everyday work environments, this sort of engagement would be beneficial, and is often sought out. We propose an approach to software engineering called HALO that builds upon the properties found in popular games, by turning work into a game environment. Our proposed approach can be viewed as a model for a family of prospective games that would support the software development process. Utilizing operant conditioning and flow theory, we create an immersive software development environment conducive to increased productivity. We describe the mechanics of HALO and how it could fit into typical software engineering processes.	(pdf)
On Effective Testing of Health Care Simulation Software	Christian Murphy, M.S. Raunak, Andrew King, Sanjien Chen, Christopher Imbriano, Gail Kaiser	2011-02-04	Health care professionals rely on software to simulate anatomical and physiological elements of the human body for purposes of training, prototyping, and decision making. Software can also be used to simulate medical processes and protocols to measure cost effectiveness and resource utilization. Whereas much of the software engineering research into simulation software focuses on validation (determining that the simulation accurately models real-world activity), to date there has been little investigation into the testing of simulation software itself, that is, the ability to effectively search for errors in the implementation. This is particularly challenging because often there is no test oracle to indicate whether the results of the simulation are correct. In this paper, we present an approach to systematically testing simulation software in the absence of test oracles, and evaluate the effectiveness of the technique.	(pdf)
Protocols and System Design, Reliability, and Energy Efficiency in Peer-to-Peer Communication Systems	Salman Abdul Baset	2011-02-04	Modern Voice-over-IP (VoIP) communication systems provide a bundle of services to their users. These services range from the most basic voice-based services such as voice calls and voicemail to more advanced ones such as conferencing, voicemail-to-text, and online address books. Besides voice, modern VoIP systems provide video calls and video conferencing, presence, instant messaging (IM), and even desktop sharing services. These systems also let their users establish a voice, video, or a text session with devices in cellular, public switched telephone network (PSTN), or other VoIP networks. The peer-to-peer (p2p) paradigm for building VoIP systems involves minimal or no use of managed servers and is therefore attractive from an administrative and economic perspective. However, the benefits of using p2p paradigm in VoIP systems are not without their challenges. First, p2p communication (VoIP) systems can be deployed in environ- ments with varying requirements of scalability, connectivity, security, interoperability, and performance. These requirements bring forth the question of designing open and standardized protocols for diverse deployments. Second, the presence of restrictive network address translators (NATs) and firewalls prevents machines from directly exchanging packets and is problematic from the perspective of establishing direct media sessions. The p2p communication systems address this problem by using an intermediate peer with unrestricted connectivity to relay the session or by preferring the use of TCP. This technique for addressing connectivity problems raises questions about the reliability and session quality of p2p communication systems compared with the traditional client-server VoIP systems. Third, while administrative overheads are likely to be lower in running p2p communication systems as compared to client-server, can the same be said about the energy efficiency? Fourth, what type of techniques can be used to gain insights into the performance of a deployed p2p VoIP system like Skype? The thesis addresses the challenges in designing, building, and analyzing peer-to-peer communication systems. The thesis presents Peer-to-Peer Protocol (P2PP), an open protocol for building p2p communication systems with varying operational requirements. P2PP is now part of the IETF's P2PSIP protocol and is on track to become an RFC. The thesis describes the design and implementation of OpenVoIP, a proof-of-concept p2p communication system to demonstrate the feasibility of P2PP and to explore issues in building p2p communication systems. The thesis introduces a simple and novel analytical model for analyzing the reliability of peer-to-peer communication systems and analyzes the feasibility of TCP for sending real-time traffic. The thesis then analyzes the energy efficiency of peer-to-peer and client-server VoIP systems and shows that p2p VoIP systems are less energy efficient than client-server even if the peers consume a small amount of energy for running the p2p network. Finally, the thesis presents an analysis of the Skype protocol which indicates that Skype is free-riding on the network bandwidth of universities.	(pdf)
Detecting Traffic Snooping in Anonymity Networks Using Decoys	Sambuddho Chakravarty, Georgios Portokalidis, Michalis Polychronakis, Angelos D. Keroymtis	2011-02-03	Anonymous communication networks like Tor partially protect the confidentiality of their users' traffic by encrypting all intra-overlay communication. However, when the relayed traffic reaches the boundaries of the overlay network towards its actual destination, the original user traffic is inevitably exposed. At this point, unless end-to-end encryption is used, sensitive user data can be snooped by a malicious or compromised exit node, or by any other rogue network entity on the path towards the actual destination. We explore the use of decoy traffic for the detection of traffic interception on anonymous proxying systems. Our approach is based on the injection of traffic that exposes bait credentials for decoy services that require user authentication. Our aim is to entice prospective eavesdroppers to access decoy accounts on servers under our control using the intercepted credentials. We have deployed our prototype implementation in the Tor network using decoy IMAP and SMTP servers. During the course of six months, our system detected eight cases of traffic interception that involved eight different Tor exit nodes. We provide a detailed analysis of the detected incidents, discuss potential improvements to our system, and outline how our approach can be extended for the detection of HTTP session hijacking attacks.	(pdf)
POWER: Parallel Optimizations With Executable Rewriting	Nipun Arora, Jonathan Bell, Martha Kim, Vishal Singh, Gail Kaiser	2011-02-01	The hardware industryï¿½s rapid development of multicore and many core hardware has outpaced the software industryï¿½s transition from sequential to parallel programs. Most applications are still sequential, and many cores on parallel machines remain unused. We propose a tool that uses data-dependence profiling and binary rewriting to parallelize executables without access to source code. Our technique uses Bernsteinï¿½s conditions to identify independent sets of basic blocks that can be executed in parallel, introducing a level of granularity between fine-grained instruction level and coarse grained task level parallelism. We analyze dynamically generated control and data dependence graphs to find independent sets of basic blocks which can be parallelized. We then propose to parallelize these candidates using binary rewriting techniques. Our technique aims to demonstrate the parallelism that remains in serial application by exposing concrete opportunities for parallelism.	(pdf)
Decoy Document Deployment for Effective Masquerade Attack Detection	Malek Ben Salem, Salvatore J. Stolfo	2011-01-30	Masquerade attacks pose a grave security problem that is a consequence of identity theft. Detecting masqueraders is very hard. Prior work has focused on proling legitimate user behavior and detecting deviations from that normal behavior that could potentially signal an ongoing masquerade attack. Such approaches suffer from high false positive rates. Other work investigated the use of trap-based mechanisms as a means for detecting insider attacks in general. In this paper, we investigate the use of such trap-based mechanisms for the detection of masquerade at tacks. We evaluate the desirable properties of decoys deployed within a user's file space for detection.We investigate the trade-os between these properties through two user studies, and propose recommendations for effective masquerade detection using decoy documents based on findings from our user studies.	(pdf)
Data Collection and Analysis for Masquerade Attack Detection: Challenges and Lessons Learned	Malek Ben Salem, Salvatore J. Stolfo	2011-01-30	Real-world large-scale data collection poses an important challenge in the security field. Insider and masquerader attack data collection poses even a greater challenge. Very few organizations acknowledge such breaches because of liability concerns and potential implications on their market value. This caused the scarcity of real-world data sets that could be used to study insider and masquerader attacks. In this paper, we present the design, technical, and procedural challenges encountered during our own masquerade data gathering project. We also share some lessons learned from this several-year project related to the Institutional Review Board process and to user study design.	(pdf)
On Accelerators: Motivations, Costs, and Outlook	Simha Sethumadhavan	2011-01-30	Some notes on accelerators.	(pdf)
Computational Cameras: Appraoches, Benefits and Limits	Shree K. Nayar	2011-01-15	A computational camera uses a combination of optics and software to produce images that cannot be taken with traditional cameras. In the last decade, computational imaging has emerged as a vibrant field of research. A wide variety of computational cameras have been demonstrated - some designed to achieve new imaging functionalities and others to reduce the complexity of traditional imaging. In this article, we describe how computational cameras have evolved and present a taxonomy for the technical approaches they use. We explore the benefits and limits of computational imaging, and describe how it is related to the adjacent and overlapping fields of digital imaging, computational photography and computational image sensors.	(pdf)
Weighted Geometric Discrepancies and Numerical Integration on Reproducing Kernel Hilbert Spaces	Michael Gnewuch	2010-12-22	We extend the notion of L2-B-discrepancy introduced in [E. Novak, H. Wo´zniakowski, L2 discrepancy and multivariate integration, in: Analytic number theory. Essays in honour of Klaus Roth. W. W. L. Chen, W. T. Gowers, H. Halberstam, W. M. Schmidt, and R. C. Vaughan (Eds.), Cambridge University Press, Cambridge, 2009, 359 – 388] to what we want to call weighted geometric L2-discrepancy. This extended notion allows us to consider weights to moderate the importance of different groups of variables, and additionally volume measures different from the Lebesgue measure as well as classes of test sets different from measurable subsets of Euclidean spaces. We relate the weighted geometric L2-discrepancy to numerical integration defined over weighted reproducing kernel Hilbert spaces and settle in this way an open problem posed by Novak and Wo´zniakowski. Furthermore, we prove an upper bound for the numerical integration error for cubature formulas that use admissible sample points. The set of admissible sample points may actually be a subset of the integration domain of measure zero. We illustrate that particularly in infinite dimensional numerical integration it is crucial to distinguish between the whole integration domain and the set of those sample points that actually can be used by algorithms.	(pdf) (ps)
A Comprehensive Survey of Voice over IP Security Research	Angelos D. Keromytis	2010-12-22	We present a comprehensive survey of Voice over IP security academic research, using a set of 245 publications forming a closed cross-citation set. We classify these papers according to an extended version of the VoIP Security Alliance (VoIPSA) Threat Taxonomy. Our goal is to provide a roadmap for researchers seeking to understand existing capabilities and to identify gaps in addressing the numerous threats and vulner- abilities present in VoIP systems. We discuss the implications of our findings with respect to vulnerabilities reported in a variety of VoIP products. We identify two specific problem areas (denial of service, and service abuse) as requiring significant more attention from the research community. We also find that the overwhelming majority of the surveyed work takes a black box view of VoIP systems that avoids examining their internal structure and implementation. Such an approach may miss the mark in terms of addressing the main sources of vulnerabilities, i.e., implementation bugs and misconfigurations. Finally, we argue for further work on understanding cross-protocol and cross-mechanism vulnerabilities (emergent properties), which are the byproduct of a highly complex system-of-systems and an indication of the issues in future large-scale systems.	(pdf)
Modeling User Search Behavior for Masquerade Detection	Malek Ben Salem, Salvatore J. Stolfo	2010-12-13	Masquerade attacks are a common security problem that is a consequence of identity theft. Masquerade detection may serve as a means of building more secure and dependable systems that authenticate legitimate users by their behavior. Prior work has focused on user command modeling to identify abnormal behavior indicative of impersonation. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own file system well enough to search in a limited, targeted and unique fashion in order to find information germane to their current task. Masqueraders, on the other hand, will likely not know the file system and layout of another userï¿½s desktop, and would likely search more extensively and broadly in a manner that is different than the victim user being impersonated. We devise a taxonomy of Windows applications and user commands that are used to abstract sequences of user actions and identify actions linked to search activities. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 1.1%, far better than prior published results. The limited set of features used for search behavior modeling also results in large performance gains over the same modeling techniques that use larger sets of features.	(pdf)
The Tradeoffs of Societal Computing	Swapneel Sheth, Gail Kaiser	2010-12-10	As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, \emph{Societal Computing}, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole.	(pdf)
NetServ: Early Prototype Experiences	Michael S. Kester, Eric Liu, Jae Woo Lee, Henning Schulzrinne	2010-12-03	This paper describes a work-in-progress to demonstrate the feasibility of integrating services in the Internet core. The project aims to reduce or eliminate so called ossification of the Internet. Here we discuss the recent contributions of two of the team members at Columbia University. We will describe experiences setting up a Juniper router, running packet forwarding tests, preparing for the GENI demo, and starting prototype 2 of NetServ.	(pdf)
Towards using Cached Data Mining for Large Scale Recommender Systems	Swapneel Sheth, Gail Kaiser	2010-11-01	Recommender systems are becoming increasingly popular. As these systems become commonplace and the number of users increases, it will become important for these systems to be able to cope with a large and diverse set of users whose recommendation needs may be very different from each other. In particular, large scale recommender systems will need to ensure that users' requests for recommendations can be answered with low response times and high throughput. In this paper, we explore how to use caches and cached data mining to improve the performance of recommender systems by improving throughput and reducing response time for providing recommendations. We describe the structure of our cache, which can be viewed as a prefetch cache that prefetches all types of supported recommendations, and how it is used in our recommender system. We also describe the results of our simulation experiments to measure the efficacy of our cache.	(pdf)
Automatic Detection of Defects in Applications without Test Oracles	Christian Murphy, Gail Kaiser	2010-10-29	In application domains that do not have a test oracle, such as machine learning and scientific computing, quality assurance is a challenge because it is difficult or impossible to know in advance what the correct output should be for general input. Previously, metamorphic testing has been shown to be a simple yet effective technique in detecting defects, even without an oracle. In metamorphic testing, the application's ``metamorphic properties'' are used to modify existing test case input to produce new test cases in such a manner that, when given the new input, the new output can easily be computed based on the original output. If the new output is not as expected, then a defect must exist. In practice, however, metamorphic testing can be a manually intensive technique for all but the simplest cases. The transformation of input data can be laborious for large data sets, and errors can occur in comparing the outputs when they are very complex. In this paper, we present a tool called Amsterdam that automates metamorphic testing by allowing the tester to easily set up and conduct metamorphic tests with little manual intervention, merely by specifying the properties to check, configuring the framework, and running the software. Additionally, we describe an approach called Heuristic Metamorphic Testing, which addresses issues related to false positives and non-determinism, and we present the results of new empirical studies that demonstrate the effectiveness of metamorphic testing techniques at detecting defects in real-world programs without test oracles.	(pdf)
Bypassing Races in Live Applications with Execution Filters	Jingyue Wu, Heming Cui, Junfeng Yang	2010-09-30	Deployed multithreaded applications contain many races because these applications are difficult to write, test, and debug. Worse, the number of races in deployed applications may drastically increase due to the rise of multicore hardware and the immaturity of current race detectors. LOOM is a “live-workaround” system designed to quickly and safely bypass application races at runtime. LOOM provides a flexible and safe language for developers to write execution filters that explicitly synchronize code. It then uses an evacuation algorithm to safely install the filters to live applications to avoid races. It reduces its performance overhead using hybrid instrumen- tation that combines static and dynamic instrumentation. We evaluated LOOM on nine real races from a diverse set of six applications, including MySQL and Apache. Our results show that (1) LOOM can safely fix all evaluated races in a timely manner, thereby increasing application availability; (2) LOOM incurs little performance overhead; (3) LOOM scales well with the number of application threads; and (4) LOOM is easy to use.	(pdf)
Baseline: Metrics for setting a baseline for web vulnerability scanners	Huning Dai, Michael Glass, Gail Kaiser	2010-09-22	As web scanners are becoming more popular because they are faster and cheaper than security consultants, the trend of relying on these scanners also brings a great hazard: users can choose a weak or outdated scanner and trust incomplete results. Therefore, benchmarks are created to both evaluate and compare the scanners. Unfortunately, most existing benchmarks suffer from various drawbacks, often by testing against inappropriate criteria that does not reflect the user's needs. To deal with this problem, we present an approach called Baseline that coaches the user in picking the minimal set of weaknesses (i.e., a baseline) that a qualified scanner should be able to detect and also helps the user evaluate the effectiveness and efficiency of the scanner in detecting those chosen weaknesses. Baseline's goal is not to serve as a generic ranking system for web vulnerability scanners, but instead to help users choose the most appropriate scanner for their specific needs.	(pdf)
Tractability of the Fredholm problem of the second kind	Arthur G. Werschulz, Henryk Wozniakowski	2010-09-21	We study the tractability of computing $\varepsilon$-approximations of the Fredholm problem of the second kind: given $f\in F_d$ and $q\in Q_{2d}$, find $u\in L_2(I^d)$ satisfying \[ u(x) - \int_{I^d} q(x,y)u(y)\,dy = f(x) \qquad\forall\,x\in I^d=[0,1]^d. \] Here, $F_d$ and $Q_{2d}$ are spaces of $d$-variate right hand functions and $2d$-variate kernels that are continuously embedded in~$L_2(I^d)$ and~$L_2(I^{2d})$, respectively. We consider the worst case setting, measuring the approximation error for the solution $u$ in the $L_2(I^d)$-sense. We say that a problem is tractable if the minimal number of information operations of $f$ and $q$ needed to obtain an $\varepsilon$-approximation is sub-exponential in $\varepsilon^{-1}$ and~$d$. One information operation corresponds to the evaluation of one linear functional or one function value. The lack of sub-exponential behavior may be defined in various ways, and so we have various kinds of tractability. In particular, the problem is strongly polynomially tractable if the minimal number of information operations is bounded by a polynomial in $\varepsilon^{-1}$ for all~$d$. We show that tractability (of any kind whatsoever) for the Fredholm problem is equivalent to tractability of the $L_2$-approximation problems over the spaces of right-hand sides and kernel functions. So (for example) if both these approximation problems are strongly polynomially tractable, so is the Fredholm problem. In general, the upper bound provided by this proof is essentially non-constructive, since it involves an interpolatory algorithm that exactly solves the Fredholm problem (albeit for finite-rank approximations of~$f$ and~$q$). However, if linear functionals are permissible and that $F_d$ and~$Q_{2d}$ are tensor product spaces, we are able to surmount this obstacle; that is, we provide a fully-constructive algorithm that provides an approximation with nearly-optimal cost, i.e., one whose cost is within a factor $\ln\,\varepsilon^{-1}$ of being optimal.	(pdf)
Trade-offs in Private Search	Vasilis Pappas, Mariana Raykova, Binh Vo, Steven M. Bellovin, Tal Malkin	2010-09-17	Encrypted search --- performing queries on protected data --- is a well researched problem. However, existing solutions have inherent inefficiency that raises questions of practicality. Here, we step back from the goal of achieving maximal privacy guarantees in an encrypted search scenario to consider efficiency as a priority. We propose a privacy framework for search that allows tuning and optimization of the trade-offs between privacy and efficiency. As an instantiation of the privacy framework we introduce a tunable search system based on the SADS scheme and provide detailed measurements demonstrating the trade-offs of the constructed system. We also analyze other existing encrypted search schemes with respect to this framework. We further propose a protocol that addresses the challenge of document content retrieval in a search setting with relaxed privacy requirements.	(pdf)
Simple-VPN: Simple IPsec Configuration	Shreyas Srivatsan, Maritza Johnson, Steven M. Bellovin	2010-07-12	The IPsec protocol promised easy, ubiquitous encryption. That has never happened. For the most part, IPsec usage is confined to VPNs for road warriors, largely due to needless configuration complexity and incompatible implementations. We have designed a simple VPN configuration language that hides the unwanted complexities. Virtually no options are necessary or possible. The administrator specifies the absolute minimum of information: the authorized hosts, their operating systems, and a little about the network topology; everything else, including certificate generation, is automatic. Our implementation includes a multitarget compiler, which generates implementation-specific configuration files for three different platforms; others are easy to add.	(pdf)
Infinite-Dimensional Integration on Weighted Hilbert Spaces	Michael Gnewuch	2010-05-21	We study the numerical integration problem for functions with infinitely many variables. The functions we want to integrate are from a reproducing kernel Hilbert space which is endowed with a weighted norm. We study the worst case $\epsilon$-complexity which is defined as the minimal cost among all algorithms whose worst case error over the Hilbert space unit ball is at most $\epsilon$. Here we assume that the cost of evaluating a function depends polynomially on the number of active variables. The infinite-dimensional integration problem is (polynomially) tractable if the $\epsilon$-complexity is bounded by a constant times a power of $1/\epsilon$. The smallest such power is called the exponent of tractability. First we study finite-order weights. We provide improved lower bounds for the exponent of tractability for general finite-order weights and improved upper bounds for three newly defined classes of finite-order weights. The constructive upper bounds are obtained by multilevel algorithms that use for each level quasi-Monte Carlo integration points whose projections onto specific sets of coordinates exhibit a small discrepancy. The newly defined finite-intersection weights model the situation where each group of variables interacts with at most $\rho$ other groups of variables, where $\rho$ is some fixed number. For these weights we obtain a sharp upper bound. This is the first class of weights for which the exact exponent of tractability is known for any possible decay of the weights and for any polynomial degree of the cost function. For the other two classes of finite-order weights our upper bounds are sharp if, e.g., the decay of the weights is fast or slow enough. We extend our analysis to the case of arbitrary weights. In particular, from our results for finite-order weights, we conclude a lower bound on the exponent of tractability for arbitrary weights and a constructive upper bound for product weights. Although we confine ourselves for simplicity to explicit upper bounds for four classes of weights, we stress that our multilevel algorithm together with our default choice of quasi-Monte Carlo points is applicable to any class of weights.	(pdf) (ps)
Huning Dai's Master's Thesis	Huning Dai	2010-05-13	Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations and inputs together with a certain runtime environment. One approach to detecting these vulnerabilities is fuzz testing that feeds randomly generated inputs to the software and witnesses its failures. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants'' that, if violated, indicate a vulnerability. We discuss the approach and introduce a prototype framework called ConFu (CONfiguration FUzzing testing framework) implementation. We also present the results of case studies that demonstrate the approach's feasibility and evaluate its performance.	(pdf)
Modeling User Search-Behavior for Masquerade Detection	Malek Ben Salem, Shlomo Hershkop, Salvatore J Stolfo	2010-05-12	Masquerade attacks are a common security problem that is a consequence of identity theft. Prior work has focused on user command modeling to identify abnormal behavior indicative of impersonation. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own le system well enough to search in a limited, targeted and unique fashion in order to nd information germane to their current task. Masqueraders, on the other hand, will likely not know the le system and layout of another user's desktop, and would likely search more extensively and broadly in a manner that is dierent than the victim user being impersonated. We extend prior research by devising taxonomies of UNIX commands and Windows applications that are used to abstract sequences of user commands and actions. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 0.13%, far better than prior published results. The limited set of features used for search behavior modeling also results in large performance gains over the same modeling techniques that use larger sets of features.	(pdf)
The weHelp Reference Architecture for Community-Driven Recommender Systems	Swapneel Sheth, Nipun Arora, Christian Murphy, Gail Kaiser	2010-05-11	Recommender systems have become increasingly popular. Most research on recommender systems has focused on recommendation algorithms. There has been relatively little research, however, in the area of generalized system architectures for recommendation systems. In this paper, we introduce weHelp - a reference architecture for social recommender systems. Our architecture is designed to be application and domain agnostic, but we briefly discuss here how it applies to recommender systems for software engineering.	(pdf)
Comparing Speed of Provider Data Entry: Electronic Versus Paper Methods	Kevin M. Jackson, Gail Kaiser, Lyndon Wong, Daniel Rabinowitz, Michael F. Chiang	2010-05-11	Electronic health record (EHR) systems have significant potential advantages over traditional paper-based systems, but they require that providers assume responsibility for data entry. One significant barrier to adoption of EHRs is the perception of slowed data-entry by providers. This study compares the speed of data-entry using computer-based templates vs. paper for a large eye clinic, using 10 subjects and 10 simulated clinical scenarios. Dataentry into the EHR was significantly slower (p<0.01) than traditional paper forms.	(pdf)
Empirical Study of Concurrency Mutation Operators for Java	Leon Wu, Gail Kaiser	2010-04-29	Mutation testing is a white-box fault-based software testing technique that applies mutation operators to modify program source code or byte code in small ways and then runs these modified programs (i.e., mutants) against a test suite in order to measure its effectiveness and locate the weaknesses either in the test data or in the program that are seldom or never exposed during normal execution. In this paper, we describe our implementation of a generic mutation testing framework and the results of applying three sets of concurrency mutation operators on four example Java programs through empirical study and analysis.	(pdf)
Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles	Christian Murphy	2010-04-27	Applications in the fields of scientific computing, simulation, optimization, machine learning, etc. are sometimes said to be "non-testable programs" because there is no reliable test oracle to indicate what the correct output should be for arbitrary input. In some cases, it may be impossible to know the program's correct output a priori; in other cases, the creation of an oracle may simply be too hard. These applications typically fall into a category of software that Weyuker describes as "Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known." The absence of a test oracle clearly presents a challenge when it comes to detecting subtle errors, faults, defects or anomalies in software in these domains. Without a test oracle, it is impossible to know in general what the expected output should be for a given input, but it may be possible to predict how changes to the input should effect changes in the output, and thus identify expected relations among a set of inputs and among the set of their respective outputs. This approach, introduced by Chen et al., is known as "metamorphic testing". In metamorphic testing, if test case input x produces an output f(x), the function's so-called "metamorphic properties" can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the expected output f(t(x)), based on the (already known) value of f(x). If the new output is as expected, it is not necessarily right, but any violation of the property indicates a defect. That is, though it may not be possible to know whether an output is correct, we can at least tell whether an output is incorrect. This thesis investigates three hypotheses. First, I claim that an automated approach to metamorphic testing will advance the state of the art in detecting defects in programs without test oracles, particularly in the domains of machine learning, simulation, and optimization. To demonstrate this, I describe a tool for test automation, and present the results of new empirical studies comparing the effectiveness of metamorphic testing to that of other techniques for testing applications that do not have an oracle. Second, I suggest that conducting function-level metamorphic testing in the context of a running application will reveal defects not found by metamorphic testing using system-level properties alone, and introduce and evaluate a new testing technique called Metamorphic Runtime Checking. Third, I hypothesize that it is feasible to continue this type of testing in the deployment environment (i.e., after the software is released), with minimal impact on the user, and describe a generalized approach called In Vivo Testing. Additionally, this thesis presents guidelines for identifying metamorphic properties, explains how metamorphic testing fits into the software development process, and discusses suggestions for both practitioners and researchers who need to test software without the help of a test oracle.	(pdf)
Robust, Efficient, and Accurate Contact Algorithms	David Harmon	2010-04-26	Robust, efficient, and accurate contact response remains a challenging problem in the simulation of deformable materials. Contact models should robustly handle contact between geometry by preventing interpenetrations. This should be accomplished while respecting natural laws in order to maintain physical correctness. We simultaneously desire to achieve these criteria as efficiently as possible to minimize simulation runtimes. Many methods exist that partially achieve these properties, but none yet fully attain all three. This thesis investigates existing methodologies with respect to these attributes, and proposes a novel algorithm for the simulation of deformable materials that demonstrate them all. This new method is analyzed and optimized, paving the way for future work in this simplified but powerful manner of simulation.	(pdf)
A Real-World Identity Management System with Master Secret Revocation	Elli Androulaki, Binh Vo, Steven Bellovin	2010-04-21	Cybersecurity mechanisms have become increasingly important as online and offline worlds converge. Strong authentication and accountability are key tools for dealing with online attacks, and we would like to realize them through a token-based, centralized identity management system. In this report, we present aprivacy-preserving group of protocols comprising a unique per user digital identity card, with which its owner is able to authenticate himself, prove possession of attributes, register himself to multiple online organizations (anonymously or not) and provide proof of membership. Unlike existing credential-based identity management systems, this card is revocable, i.e., its legal owner may invalidate it if physically lost, and still recover its content and registrations into a new credential. This card will protect an honest individual's anonymity when applicable as well as ensure his activity is known only to appropriate users.	(pdf)
Quasi-Polynomial Tractability	Michael Gnewuch, Henryk Wozniakowski	2010-04-09	Tractability of multivariate problems has become nowadays a popular research subject. Polynomial tractability means that the solution of a d-variate problem can be solved to within $\varepsilon$ with polynomial cost in $\varepsilon^{-1}$ and d. Unfortunately, many multivariate problems are not polynomially tractable. This holds for all non-trivial unweighted linear tensor product problems. By an unweighted problem we mean the case when all variables and groups of variables play the same role. It seems natural to ask what is the ``smallest'' non-exponential function $T:[1,\infty)\times [1,\infty)\to[1,\infty)$ for which we have T-tractability of unweighted linear tensor product problems. That is, when the cost of a multivariate problem can be bounded by a multiple of a power of $T(\varepsilon^{-1},d)$. Under natural assumptions, it turns out that this function is $T^{qpol}(x,y):=\exp((1+\ln\,x)(1+\ln y))$ for all $x,y\in[1,\infty)$. The function $T^{qpol}$ goes to infinity faster than any polynomial although not ``much'' faster, and that is why we refer to $T^{qpol}$-tractability as quasi-polynomial tractability. The main purpose of this paper is to promote quasi-polynomial tractability especially for the study of unweighted multivariate problems. We do this for the worst case and randomized settings and for algorithms using arbitrary linear functionals or only function values. We prove relations between quasi-polynomial tractability in these two settings and for the two classes of algorithms.	(pdf) (ps)
BotSwindler: Tamper Resistant Injection of Believable Decoys in VM-Based Hosts for Crimeware Detection	Brian M. Bowen, Pratap Prabhu, Vasileios P. Kemerlis, Stelios Sidiroglou-Douskos, Angelos D. Keromytis, Salvatore J. Stolfo	2010-04-09	We introduce BotSwindler, a bait injection system designed to delude and detect crimeware by forcing it to reveal itself during the exploitation of monitored information. Our implementation of BotSwindler relies upon an out-of-host software agent to drive user-like interactions in a virtual machine, seeking to convince malware residing within the guest OS that it has captured legitimate credentials. To aid in the accuracy and realism of the simulations, we introduce a low overhead approach, called virtual machine verification, for verifying whether the guest OS is in one of a predefined set of states. We provide empirical evidence to show that BotSwindler can be used to induce malware into performing observable actions and demonstrate how this approach is superior to that used in other tools. We present results from a user study to illustrate the believability of the simulations and show that financial bait information can be used to effectively detect compromises through experimentation with real credential-collecting malware.	(pdf)
Privacy-Preserving, Taxable Bank Accounts	Elli Androulaki, Binh Vo, Steven Bellovin	2010-04-07	Current banking systems do not aim to protect user privacy. Purchases made from a single bank account can be linked to each other by many parties. This could be addressed in a straight-forward way by generating unlinkable credentials from a single master credential using Camenisch and Lysyanskaya's algorithm; however, if bank accounts are taxable, some report must be made to the tax authority about each account. Using unlinkable credentials, digital cash, and zero knowledge proofs of knowledge, we present a solution that prevents anyone, even the tax authority, from knowing which accounts belong to which users, or from being able to link any account to another or to purchases or deposits.	(pdf)
CONFU: Configuration Fuzzing Testing Framework for Software Vulnerability Detection	Huning Dai, Christian Murphy, Gail Kaiser	2010-02-19	Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations and inputs together with a certain runtime environment. One approach to detecting these vulnerabilities is fuzz testing. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants'' that, if violated, indicate a vulnerability. We discuss the approach and introduce a prototype framework called ConFu (CONfiguration FUzzing testing framework) for implementation. We also present the results of case studies that demonstrate the approach's feasibility and evaluate its performance.	(pdf)
Empirical Evaluation of Approaches to Testing Applications without Test Oracles	Christian Murphy, Gail Kaiser	2010-02-05	Software testing of applications in fields like scientific omputation, simulation, machine learning, etc. is particularly challenging because many applications in these domains have no reliable "test oracle" to indicate whether the program's output is correct when given arbitrary input. A common approach to testing such applications has been to use a "pseudo-oracle", in which multiple independently-developed implementations of an algorithm process an input and the results are compared. Other approaches include the use of program invariants, formal specification languages, trace and log file analysis, and metamorphic testing. In this paper, we present the results of two empirical studies in which we compare the effectiveness of some of these approaches, including metamorphic testing, pseudo-oracles, and runtime assertion checking. We also analyze the results in terms of the software development process, and discuss suggestions for practitioners and researchers who need to test software without a test oracle.	(pdf)
Automatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysis	Christian Murphy, Moses Vaughan, Waseem Ilahi, Gail Kaiser	2010-01-19	For large, complex software systems, it is typically impossible in terms of time and cost to reliably test the application in all possible execution states and configurations before releasing it into production. One proposed way of addressing this problem has been to continue testing and analysis of the application in the field, after it has been deployed. The theory behind this "perpetual testing" approach is that over time, defects will reveal themselves given that multiple instances of the same application may be run globally with different configurations, in different environments, under different patterns of usage, and in different system states. A practical limitation of many automated approaches to deployment environment testing and analysis is the potentially high performance overhead incurred by the necessary instrumentation. However, it may be possible to reduce this overhead by selecting test cases and performing analysis only in previously-unseen application states, thus reducing the number of redundant tests and analyses that are run. Solutions for fault detection, model checking, security testing, and fault localization in deployed software may all benefit from a technique that ignores application states that have already been tested or explored. In this paper, we apply such a technique to a testing methodology called "In Vivo Testing", which conducts tests in deployed applications, and present a solution that ensures that tests are only executed in states that the application has not previously encountered. In addition to discussing our implementation, we present the results of an empirical study that demonstrates its effectiveness, and explain how the new approach can be generalized to assist other automated testing and analysis techniques.	(pdf)
Testing and Validating Machine Learning Classifiers by Metamorphic Testing	Xiaoyuan Xie, Joshua W. K. Ho, Christian Murphy, Gail Kaiser, Baowen Xu, Tsong Yueh Chen	2010-01-11	Machine Learning algorithms have provided important core functionality to support solutions in many scientific computing applications - such as computational biology, computational linguistics, and others. However, it is difficult to test such applications because often there is no "test oracle" to indicate what the correct output should be for arbitrary input. To help address the quality of scientific computing software, in this paper we present a technique for testing the implementations of machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called "metamorphic testing", which has been shown to be effective in such cases. Also presented is a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has very high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficient to test for the correctness of a supervised classification program. Metamorphic testing is strongly recommended as a complementary approach. Finally we discuss how our findings can be used in other areas of computational science and engineering.	(pdf)
ONEChat: Enabling Group Chat and Messaging in Opportunistic Networks	Heming Cui, Suman Srinivasan, Henning Schulzrinne	2010-01-04	Opportunistic networks, which are wireless network "islands" formed when transient and highly mobile nodes meet for a short period of time, are becoming commonplace as wireless devices become more and more popular. It is thus imperative to develop communication tools and applications that work well in opportunistic networks. In particular, group chat and instant messaging applications are particularly lacking for such opportunistic networks today. In this paper, we present ONEChat, a group chat and instant messaging program that works in such opportunistic networks. ONEChat uses message multicasting on top of service discovery protocols in order to support group chat and reduce bandwidth consumption in opportunistic networks. ONEChat does not require any pre-configuration, a fixed network infrastructure or a client-server architecture in order to operate. In addition, it supports features such as group chat, private rooms, line-by-line or character-by-character messaging, file transfer, etc. We also present our quantitative analysis of ONEChat, which we believe indicates that the ONEChat architecture is an efficient group collaboration platform for opportunistic networks.	(pdf)
Exploiting Local Logic Structures to Optimize Multi-Core SoC Floorplanning	Cheng-Hong Li, Sampada Sonalkar, Luca P. Carloni	2009-12-10	We present a throughput-driven partitioning and a throughput-preserving merging algorithm for the high-level physical synthesis of latency-insensitive (LI) systems. These two algorithms are integrated along with a published floorplanner in a new iterative physical synthesis flow to optimize system throughput and reduce area occupation. The synthesis flow iterates a floorplanning-partitioning-floorplanning-merging sequence of operations to improve the system topology and the physical locations of cores. The partitioning algorithm performs bottom-up clustering of the internal logic of a given IP core to divide it into smaller ones, each of which has no combinational path from input to output and thus is legal for LI-interface encapsulation. Applying this algorithm to cores on critical feedback loops optimizes their length and in turn enables throughput optimization via the subsequent floorplanning. The merging algorithm reduces the number of cores on non-critical loops, lowering the overall area taken by LI interfaces without hurting the system throughput. Experimental results on a large system-on-chip design show a 16.7% speedup in system throughput and a 2.1% reduction in area occupation.	(pdf)
ConFu: Configuration Fuzzing Framework for Software Vulnerability Detection	Huning Dai, Gail E. Kaiser	2009-12-08	Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations of the software and certain inputs together with its particular runtime environment. One approach to detecting these vulnerabilities is fuzz testing, which feeds a range of randomly modified inputs to a software application while monitoring it for failures. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, in this proposal we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants" that, if violated, indicate a vulnerability; however, the fuzzing is performed in a duplicated copy of the original process, so that it does not affect the state of the running application. Configuration Fuzzing uses a covering array algorithm when fuzzing the configuration which guarantees a certain degree of coverage of the configuration space in the lifetime of the program-under-test. In addition, Configuration Fuzzing tests that are run after the software is released ensure representative real-world user inputs to test with. In addition to discussing the approach and describing a prototype framework for implementation, we also present the results of case studies to prove the approach's feasibility and evaluate its performance. In this thesis, we will continue developing the framework called ConFu (CONfiguration FUzzi ng framework) that supports the generation of test functions, parallel sandboxed execution and vulnerability detection. Given the initial ConFu, we will optimize the way that configurations get mutated, define more security invariants and conduct additional empirical studies of ConFu's effectiveness in detecting vulnerabilities. At the conclusion of this work, we want to prove that ConFu is efficient and effective in detecting common vulnerabilities and tests executed by ConFu can ensure reasonable degree of coverage of both the configuration and user input space in the lifetime of the software.	(pdf)
Record and Transplay: Partial Checkpointing for Replay Debugging	Dinesh Subhraveti, Jason Nieh	2009-11-21	Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. Toward addressing this problem, we present Transplay, a system that captures application software bugs as they occur in production and deterministically reproduces them in a completely different environment, potentially running a different operating system, where the application, its binaries and other support data do not exist. Transplay introduces partial checkpointing, a new mechanism that provides two key properties. It efficiently captures the minimal state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries or the original execution environment. Transplay integrates with existing debuggers to provide facilities such as breakpoints and single-stepping to allow the user to examine the contents of variables and other program state at each source line of the application’s replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with server applications such as the Apache web server show that Transplay can be used in production with modest recording overhead.	(pdf)
On TCP-based SIP Server Overload Control	Charles Shen, Henning Schulzrinne	2009-11-10	SIP server overload management has attracted interest recently as SIP becomes the core signaling protocol for Next Generation Networks. Yet virtually all existing SIP overload control work is focused on SIP-over-UDP, despite the fact that TCP is increasingly seen as the more viable choice of SIP transport. This paper answers the following questions: is the existing TCP flow control capable of handling the SIP overload problem? If not, why and how can we make it work? We provide a comprehensive explanation of the default SIP-over-TCP overload behavior through server instrumentation. We also propose and implement novel but simple overload control algorithms without any kernel or protocol level modification. Experimental evaluation shows that with our mechanism the overload performance improves from its original zero throughput to nearly full capacity. Our work also leads to the important high level insight that the traditional notion of TCP flow control alone is incapable of managing overload for time-critical session based applications, which would be applicable not only to SIP, but also to a wide range of other common applications such as database servers.	(pdf)
PBS: Signaling architecture for network traffic authorization	Se Gi Hong, Henning Schulzrinne, Swen Weiland	2009-10-27	We present a signaling architecture for network traffic authorization, Permission-Based Sending (PBS). This architecture aims to prevent Denial-of-Service (DoS) attacks and other forms of unauthorized traffic. Towards this goal, PBS takes a hybrid approach: a proactive approach of explicit permissions and a reactive approach of monitoring and countering attacks. On-path signaling is used to configure the permission state stored in routers for a data flow. The signaling approach enables easy installation and management of the permission state, and its use of soft-state improves robustness of the system. For secure permission state setup, PBS provides security for signaling in two ways: signaling messages are encrypted end-to-end using public key encryption and TLS provides hop-by-hop encryption of signaling paths. In addition, PBS uses IPsec for data packet authentication. Our analysis and performance evaluation show that PBS is an effective and scalable solution for preventing various kinds of attack scenarios, including Byzantine attacks.	(pdf)
A Secure and Privacy-Preserving Targeted Ad System	Elli Androulaki, Steven Bellovin	2009-10-22	Thanks to its low product-promotion cost and its efficiency, targeted online advertising has become very popular. Unfortunately, being profile-based, online advertising methods violate consumers' privacy, which has engendered resistance to the ads. However, protecting privacy through anonymity seems to encourage click-fraud. In this paper, we define consumer's privacy and present a privacy-preserving, targeted ad system (PPOAd) which is resistant towards click fraud. Our scheme is structured to provide financial incentives to to all entities involved.	(pdf)
Rank-Aware Subspace Clutering for Structured Datasets	Julia Stoyanovich, Sihem Amer-Yahia	2009-10-21	In online applications such as Yahoo! Personals and Trulia.com users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in the form of a ranked list. Top results in ranked lists are typically homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with different characteristics. An alternative to ranking is to group matches on common attribute values (e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths). However, not all groups will be of interest to the user given the ranking criteria. We argue here that neither single-list ranking nor attribute-based grouping is adequate for effective exploration of ranked datasets. We formalize rank-aware clustering and develop a novel rank-aware bottom-up subspace clustering algorithm. We evaluate the performance of our algorithm over large datasets from a leading online dating site, and present an experimental evaluation of its effectiveness.	(pdf)
Metamorphic Runtime Checking of Non-Testable Programs	Christian Murphy, Gail Kaiser	2009-10-20	Challenges arise in assuring the quality of applications that do not have test oracles, i.e., for which it is impossible to know what the correct output should be for arbitrary input. Metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of these "non-testable programs". In metamorphic testing, if test input x produces output f(x), specified "metamorphic properties" are used to create a transformation function t, which can be applied to the input to produce t(x); this transformation then allows the output f(t(x)) to be predicted based on the already-known value of f(x). If the output is not as expected, then a defect must exist. Previously we investigated the effectiveness of testing based on metamorphic properties of the entire application. Here, we improve upon that work by presenting a new technique called Metamorphic Runtime Checking, a testing approach that automatically conducts metamorphic testing of individual functions during the program's execution. We also describe an implementation framework called Columbus, and discuss the results of empirical studies that demonstrate that checking the metamorphic properties of individual functions increases the effectiveness of the approach in detecting defects, with minimal performance impact.	(pdf)
Configuration Fuzzing for Software Vulnerability Detection	Huning Dai, Christian Murphy, Gail Kaiser	2009-10-07	Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations of the software together with its particular runtime environment. One approach to detecting these vulnerabilities is fuzz testing, which feeds a range of randomly modified inputs to a software application while monitoring it for failures. However, fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, in this paper we present a new testing methodology called configuration fuzzing. Configuration fuzzing is a technique whereby the configuration of the running application is randomly modified at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants" that, if violated, indicate a vulnerability; however, the fuzzing is performed in a duplicated copy of the original process, so that it does not affect the state of the running application. In addition to discussing the approach and describing a prototype framework for implementation, we also present the results of a case study to demonstrate the approach’s efficiency.	(pdf)
Curtailed Online Boosting	Raphael Pelossof, Michael Jones	2009-09-28	The purpose of this work is to lower the average number of features that are evaluated by an online algorithm. This is achieved by merging Sequential Analysis and Online Learning. Many online algorithms use the example's margin to decide whether the model should be updated. Usually, the algorithm's model is updated when the margin is smaller than a certain threshold. The evaluation of the margin for each example requires the algorithm to evaluate all the model's features. Sequential Analysis allows us to early stop the computation of the margin when uninformative examples are encountered. It is desirable to save computation on uninformative examples since they will have very little impact on the final model. We show the successful speedup of Online Boosting while maintaining accuracy on a synthetic and the MNIST data sets.	(pdf)
Using a Model Checker to Determine Worst-case Execution Time	Sungjun Kim, Hiren D. Patel, Stephen A. Edwards	2009-09-03	Hard real-time systems use worst-case execution time (WCET) estimates to ensure that timing requirements are met. The typical approach for obtaining WCET estimates is to employ static program analysis methods. While these approaches provide WCET bounds, they struggle to analyze programs with loops whose iteration counts depend on input data. Such programs mandate user-guided annotations. We propose a hybrid approach by augmenting static program analysis with model-checking to analyze such programs and derive the loop bounds automatically. In addition, we use model-checking to guarantee repeatable timing behaviors from segments of program code. Our target platform is a precision timed architecture: a SPARC-based architecture promising predictable and repeatable timing behaviors. We use CBMC and illustrate our approach on Euclidean greatest common divisor algorithm (for WCET analysis) and a VGA controller (for repeatable timing validation).	(pdf)
Smashing the Stack with Hydra: The Many Heads of Advanced Polymorphic Shellcode	Pratap V. Prabhu, Yingbo Song, Salvatore J. Stolfo	2009-08-31	Recent work on the analysis of polymorphic shellcode engines suggests that modern obfuscation methods would soon eliminate the usefulness of signature-based network intrusion detection methods and supports growing views that the new generation of shellcode cannot be accurately and efficiently represented by the string signatures which current IDS and AV scanners rely upon. In this paper, we expand on this area of study by demonstrating never before seen concepts in advanced shellcode polymorphism with a proof-of-concept engine which we call Hydra. Hydra distinguishes itself by integrating an array of obfuscation techniques, such as recursive NOP sleds and multi-layer ciphering into one system while offering multiple improvements upon existing strategies. We also introduce never before seen attack methods such as byte-splicing statistical mimicry, safe-returns with forking shellcode and syscall-time-locking. In total, Hydra simultaneously attacks signature, statistical, disassembly, behavioral and emulation-based sensors, as well as frustrates ofﬂine forensics. This engine was developed to present an updated view of the frontier of modern polymorphic shellcode and provide an effective tool for evaluation of IDS systems, Cyber test ranges and other related security technologies.	(pdf)
On the Learnability of Monotone Functions	Homin K. Lee	2009-08-13	A longstanding lacuna in the field of computational learning theory is the learnability of succinctly representable monotone Boolean functions, i.e., functions that preserve the given order of the input. This thesis makes significant progress towards understanding both the possibilities and the limitations of learning various classes of monotone functions by carefully considering the complexity measures used to evaluate them. We show that Boolean functions computed by polynomial-size monotone circuits are hard to learn assuming the existence of one-way functions. Having shown the hardness of learning general polynomial-size monotone circuits, we show that the class of Boolean functions computed by polynomial-size depth-3 monotone circuits are hard to learn using statistical queries. As a counterpoint, we give a statistical query learning algorithm that can learn random polynomial-size depth-2 monotone circuits (i.e., monotone DNF formulas). As a preliminary step towards a fully polynomial-time, proper learning algorithm for learning polynomial-size monotone decision trees, we also show the relationship between the average depth of a monotone decision tree, its average sensitivity, and its variance. Finally, we return to monotone DNF formulas, and we show that they are teachable (a different model of learning) in the average case. We also show that non-monotone DNF formulas, juntas, and sparse GF2 formulas are teachable in the average case.	(pdf)
Mouth-To-Ear Latency in Popular VoIP Clients	Chitra Agastya, Dan Mechanic, Neha Kothari	2009-08-06	Most popular instant messaging clients are now offering Voiceover- IP (VoIP) technology. The many options running on similar platforms, implementing common audio codecs and encryption algorithms offers the opportunity to identify what factors affect call quality. We measure call quality objectively based on mouthto- ear latency. Based on our analysis we determine that the mouth-to-ear latency can be influenced by operating system (process priority and interrupt handling), the VoIP client implementation and network quality.	(pdf)
Apiary: Easy-to-use Desktop Application Fault Containment on Commodity Operating Systems	Shaya Potter, Jason Nieh	2009-08-05	Desktop computers are often compromised by the interaction of untrusted data and buggy software. To address this problem, we present Apiary, a system that provides transparent application fault containment while retaining the ease of use of a traditional integrated desktop environment. Apiary accomplishes this with three key mechanisms. It isolates applications in containers that integrate in a controlled manner at the display and file system. It introduces ephemeral containers that are quickly instantiated for single application execution and then removed, to prevent any exploit that occurs from persisting and to protect user privacy. It introduces the virtual layered file system to make instantiating containers fast and space efficient, and to make managing many containers no more complex than having a single traditional desktop. We have implemented Apiary on Linux without any application or operating system kernel changes. Our results from running real applications, known exploits, and a 24-person user study show that Apiary has modest performance overhead, is effective in limiting the damage from real vulnerabilities to enable quick recovery, and is as easy to use as a traditional desktop while improving desktop computer security and privacy.	(pdf)
Source Prefix Filtering in ROFL	Hang Zhao, Maritza Johnson, Chi-Kin Chau, Steven M. Bellovin	2009-07-26	Traditional firewalls have the ability to allow or block traffic based on source address as well as destination address and port number. Our original ROFL scheme implements firewalling by layering it on top of routing; however, the original proposal focused just on destination address and port number. Doing route selection based in part on source addresses is a form of policy routing, which has started to receive increased amounts of attention. In this paper, we extend the original ROFL (ROuting as the Firewall Layer) scheme by including source prefix constraints in route announcement. We present algorithms for route propagation and packet forwarding, and demonstrate the correctness of these algorithms using rigorous proofs. The new scheme not only accomplishes the complete set of filtering functionality provided by traditional firewalls, but also introduces a new direction for policy routing.	(pdf)
Semantic Ranking and Result Visualization for Life Sciences Publications	Julia Stoyanovich, William Mee, Kenneth A. Ross	2009-06-23	An ever-increasing amount of data and semantic knowledge in the domain of life sciences is bringing about new data management challenges. In this paper we focus on adding the semantic dimension to literature search, a central task in scientific research. We focus our attention on PubMed, the most significant bibliographic source in life sciences, and explore ways to use high-quality semantic annotations from the MeSH vocabulary to rank search results. We start by developing several families of ranking functions that relate a search query to a document's annotations. We then propose an efficient adaptive ranking mechanism for each of the families. We also describe a two-dimensional Skyline-based visualization that can be used in conjunction with the ranking to further improve the user's interaction with the system, and demonstrate how such Skylines can be computed adaptively and efficiently. Finally, we present a user study that demonstrates the effectiveness of our ranking. We use the full PubMed dataset and the complete MeSH ontology in our experimental evaluation.	(pdf)
A Software Checking Framework Using Distributed Model Checking and Checkpoint/Resume of Virtualized PrOcess Domains	Nageswar Keetha, Leon Wu, Gail Kaiser, Junfeng Yang	2009-06-18	Complexity and heterogeneity of the deployed software applications often result in a wide range of dynamic states at runtime. The corner cases of software failure during execution often slip through the traditional software checking. If the software checking infrastructure supports the transparent checkpoint and resume of the live application states, the checking system can preserve and replay the live states in which the software failures occur. We introduce a novel software checking framework that enables application states including program behaviors and execution contexts to be cloned and resumed on a computing cloud. It employs (1) EXPLODE’s model checking engine for a lightweight and general purpose software checking (2) ZAP system for faster, low overhead and transparent checkpoint and resume mechanism through virtualized PODs (PrOcess Domains), which is a collection of host-independent processes, and (3) scalable and distributed checking infrastructure based on Distributed EXPLODE. Efficient and portable checkpoint/resume and replay mechanism employed in this framework enables scalable software checking in order to improve the reliability of software products. The evaluation we conducted showed its feasibility, efficiency and applicability.	(pdf)
Serving Niche Video-on-Demand Content in a Managed P2P Environment	Eli Brosh, Chitra Agastya, John Morales, Vishal Misra, Dan Rubenstein	2009-06-17	A limitation of existing P2P VoD services is their inability to support efficient streamed access to niche content that has relatively small demand. This limitation stems from the poor performance of P2P when the number of peers sharing the content is small. In this paper, we propose a new provider-managed P2P VoD framework for efficient delivery of niche content based on two principles: reserving small portions of peers' storage and upload resources, as well as using novel, weighed caching techniques. We demonstrate through analytical analysis, simulations, and experiments on planetlab that our architecture can provide high streaming quality for niche content. In particular, we show that our architecture increases the catalog size by up to $40\%$ compared to standard P2P VoD systems, and that a weighted cache policy can reduce the startup delay for niche content by a factor of more than three.	(pdf)
Flexible Filters: Load Balancing through Backpressure for Stream Programs	Rebecca Collins, Luca Carloni	2009-06-16	Stream processing is a promising paradigm for programming multi-core systems for high-performance embedded applications. We propose flexible filters as a technique that combines static mapping of the stream program tasks with dynamic load balancing of their execution. The goal is to improve the system-level processing throughput of the program when it is executed on a distributed-memory multi-core system as well as the local (core-level) memory utilization. Our technique is distributed and scalable because it is based on point-to-point handshake signals exchanged between neighboring cores. Load balancing with flexible filters can be applied to stream applications that present large dynamic variations in the computational load of their tasks and the dimension of the stream data tokens. In order to demonstrate the practicality of our technique, we present the performance improvements for the case study of a JPEG encoder running on the IBM Cell multi-core processor.	(pdf)
Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating	Gabriela Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo	2009-06-11	The deployment and use of Anomaly Detection (AD) sensors often requires the intervention of a human expert to manually calibrate and optimize their performance. Depending on the site and the type of traffic it receives, the operators might have to provide recent and sanitized training data sets, the characteristics of expected traffic (i.e. outlier ratio), and exceptions or even expected future modifications of system’s behavior. In this paper, we study the potential performance issues that stem from fully automating the AD sensors’ day-to-day maintenance and calibration. Our goal is to remove the dependence on human operator using an unlabeled, and thus potentially dirty, sample of incoming traffic. To that end, we propose to enhance the training phase of AD sensors with a self-calibration phase, leading to the automatic determination of the optimal AD parameters. We show how this novel calibration phase can be employed in conjunction with previously proposed methods for training data sanitization resulting in a fully automated AD maintenance cycle. Our approach is completely agnostic to the underlying AD sensor algorithm. Furthermore, the self-calibration can be applied in an online fashion to ensure that the resulting AD models reflect changes in the system’s behavior which would otherwise render the sensor’s internal state inconsistent. We verify the validity of our approach through a series of experiments where we compare the manually obtained optimal parameters with the ones computed from the self-calibration phase. Modeling traffic from two different sources, the fully automated calibration shows a 7.08% reduction in detection rate and a 0.06% increase in false positives, in the worst case, when compared to the optimal selection of parameters. Finally, our adaptive models outperform the statically generated ones retaining the gains in performance from the sanitization process over time.	(pdf)
Masquerade Attack Detection Using a Search-Behavior Modeling Approach	Malek Ben Salem, Salvatore J. Stolfo	2009-06-10	Masquerade attacks are unfortunately a familiar security problem that is a consequence of identity theft. Detecting masqueraders is very hard. Prior work has focused on user command modeling to identify abnormal behavior indicative of impersonation. This paper extends prior work by presenting one-class Hellinger distance-based and one-class SVM modeling techniques that use a set of novel features to reveal user intent. The specic objective is to model user search proles and detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own le system well enough to search in a limited, targeted and unique fashion in order to nd information germane to their current task. Masqueraders, on the other hand, will likely not know the le system and layout of another user's desktop, and would likely search more extensively and broadly in a manner that is dierent than the victim user being impersonated. We extend prior research that uses UNIX command sequences issued by users as the audit source by relying upon an abstraction of commands. We devise taxonomies of UNIX commands and Windows applications that are used to abstract sequences of user commands and actions. We also gathered our own normal and masquerader data sets captured in a Windows environment for evaluation. The datasets are publicly available for other researchers who wish to study masquerade attack rather than author identication as in much of the prior reported work. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 0.1%, far better than prior published results. The limited set of features used for search behavior modeling also results in huge performance gains over the same modeling techniques that use larger sets of features.	(pdf)
Self-monitoring Monitors	Salvatore Stolfo, Isaac Greenbaum, Simha Sethumadhavan	2009-06-03	Many different monitoring systems have been created to identify system state conditions to detect or prevent a myriad of deliberate attacks, or arbitrary faults inherent in any complex system. Monitoring systems are also vulnerable to attack. A stealthy attacker can simply turn off or disable these monitoring systems without being detected; he would thus be able to perpetrate the very attacks that these systems were designed to stop. For example, many examples of virus attacks against antivirus scanners have appeared in the wild. In this paper, we present a novel technique to “monitor the monitors” in such a way that (a) unauthorized shutdowns of critical monitors are detected with high probability, (b) authorized shutdowns raise no alarm, and (c) the proper shutdown sequence for authorized shutdowns cannot be inferred from reading memory. The techniques proposed to prevent unauthorized shut down (turning off) of monitoring systems was inspired by the duality of safety technology devised to prevent unauthorized discharge (turning on) of nuclear weapons.	(pdf)
Thwarting Attacks in Malcode-Bearing Documents by Altering Data Sector Values	Wei-Jen Li, Salvatore J. Stolfo	2009-06-01	Embedding malcode within documents provides a convenient means of attacking systems. Such attacks can be very targeted and difficult to detect to stop due to the multitude of document-exchange vectors and the vulnerabilities in modern document processing applications. Detecting malcode embedded in a document is difficult owing to the complexity of modern document formats that provide ample opportunity to embed code in a myriad of ways. We focus on Microsoft Word documents as malcode carriers as a case study in this paper. To detect stealthy embedded malcode in documents, we develop an arbitrary data transformation technique that changes the value of data segments in documents in such a way as to purposely damage any hidden malcode that may be embedded in those sections. Consequently, the embedded malcode will not only fail but also introduce a system exception that would be easily detected. The method is intended to be applied in a safe sandbox, the transformation is reversible after testing a document, and does not require any learning phase. The method depends upon knowledge of the structure of the document binary format to parse a document and identify the specific sectors to which the method can be safely applied for malcode detection. The method can be implemented in MS Word as a security feature to enhance the safety of Word documents.	(pdf)
weHelp: A Reference Architecture for Social Recommender Systems	Swapneel Sheth, Nipun Arora, Christian Murphy, Gail Kaiser	2009-05-15	Recommender systems have become increasingly popular. Most of the research on recommender systems has focused on recommendation algorithms. There has been relatively little research, however, in the area of generalized system architectures for recommendation systems. In this paper, we introduce \textit{weHelp}: a reference architecture for social recommender systems - systems where recommendations are derived automatically from the aggregate of logged activities conducted by the system's users. Our architecture is designed to be application and domain agnostic. We feel that a good reference architecture will make designing a recommendation system easier; in particular, weHelp aims to provide a practical design template to help developers design their own well-modularized systems.	(pdf)
The Zodiac Policy Subsystem: a Policy-Based Management System for a High-Security MANET	Yuu-Heng Cheng, Scott Alexander, Alex Poylisher, Mariana Raykova, Steven M. Bellovin	2009-05-07	Zodiac (Zero Outage Dynamic Intrinsically Assurable Communities) is an implementation of a high-security MANET, resistant to multiple types of attacks, including Byzantine faults. The Zodiac architecture poses a set of unique system security, performance, and usability requirements to its policy-based management system (PBMS). In this paper, we identify theses requirements, and present the design and implementation of the Zodiac Policy Subsystem (ZPS), which allows administrators to securely specify, distribute and evaluate network control and system security policies to customize Zodiac behaviors. ZPS uses Keynote language for specifying all authorization policies with simple extension to support obligation policies.	(pdf)
The Impact of TLS on SIP Server Performance	Charles Shen, Erich Nahum, Henning Schulzrinne, Charles Wright	2009-05-05	This report studies the performance impact of using TLS as a transport protocol for SIP servers. We evaluate the cost of TLS experimentally using a testbed with OpenSIPS, OpenSSL, and Linux running on an Intel-based server. We analyze TLS costs using application, library, and kernel profiling, and use the profiles to illustrate when and how different costs are incurred, such as bulk data encryption, public key encryption, private key decryption, and MAC-based verification. We show that using TLS can reduce performance by up to a factor of 20 compared to the typical case of SIP over UDP. The primary factor in determining performance is whether and how TLS connection establishment is performed, due to the heavy costs of RSA operations used for session negotiation. This depends both on how the SIP proxy is deployed (e.g., as an inbound or outbound proxy) and what TLS options are used (e.g., mutual authentication, session reuse). The cost of symmetric key operations such as AES or 3DES, in contrast, tends to be small. Network operators deploying SIP over TLS should attempt to maximize the persistence of secure connections, and will need to assess the server resources required. To aid them, we provide a measurement-driven cost model for use in provisioning SIP servers using TLS. Our cost model predicts performance within 15 percent on average.	(pdf)
COMPASS: A Community-driven Parallelization Advisor for Sequential Software	Simha Sethumadhavan, Gail E. Kaiser	2009-04-22	The widespread adoption of multicores has renewed the emphasis on the use of parallelism to improve performance. The present and growing diversity in hardware architectures and software environments, however, continues to pose difficulties in the effective use of parallelism thus delaying a quick and smooth transition to the concurrency era. In this paper, we describe the research being conducted at Columbia University on a system called COMPASS that aims to simplify this transition by providing advice to programmers while they reengineer their code for parallelism. The advice proffered to the programmer is based on the wisdom collected from programmers who have already parallelized some similar code. The utility of COMPASS rests, not only on its ability to collect the wisdom unintrusively but also on its ability to automatically seek, find and synthesize this wisdom into advice that is tailored to the task at hand, i.e., the code the user is considering parallelizing and the environment in which the optimized program is planned to execute. COMPASS provides a platform and an extensible framework for sharing human expertise about code parallelization – widely, and on diverse hardware and software. By leveraging the “wisdom of crowds” model [26], which has been conjectured to scale exponentially and which has successfully worked for wikis, COMPASS aims to enable rapid propagation of knowledge about code parallelization in the context of the actual parallelization reengineering, and thus continue to extend the benefits of Moore’s law scaling to science and society.	(pdf)
Have I Met You Before? Using Cross-Media Relations to Reduce SPIT	Kumiko Ono, Henning Schulzrinne	2009-04-14	Most legitimate calls are from persons or organizations with strong social ties such as friends. Some legitimate calls, however, are from those with weak social ties such as a restaurant the callee booked a table on-line. Since a callee's contact list usually contains only the addresses of persons or organizations with strong social ties, filtering out unsolicited calls using the contact list is prone to false positives. To reduce these false positives, we first analyzed call logs and identified that legitimate calls are initiated from persons or organizations with weak social ties through transactions over the web or email exchanges. This paper proposes two approaches to label incoming calls by using cross-media relations to previous contact mechanisms which initiate the calls. One approach is that potential callers offer the callee their contact addresses which might be used in future correspondence. Another is that a callee provides potential callers with weakly-secret information that the callers should use in future correspondence in order to identify them as someone the callee has contacted before through other means. Depending on previous contact mechanisms, the callers use either customized contact addresses or message identifiers. The latter approach enables a callee to label incoming calls even without caller identifiers. Reducing false positives during filtering using our proposed approaches will contribute to the reduction in SPIT (SPam over Internet Telephony).	(pdf)
F3ildCrypt: End-to-End Protection of Sensitive Information in Web Services	Matthew Burnside, Angelos D. Keromytis	2009-03-30	The frequency and severity of recent intrusions involving data theft and leakages has shown that online users' trust, voluntary or not, in the ability of third parties to protect their sensitive data is often unfounded. Data may be exposed anywhere along a corporation's web pipeline, from the outward-facing web servers to the back-end databases. Additionally, in service-oriented architectures (SOAs), data may also be exposed as they transit between SOAs. For example, credit card numbers may be leaked during transmission to or handling by transaction-clearing intermediaries. We present F3ildCrypt, a system that provides end-to-end protection of data across a web pipeline and between SOAs. Sensitive data are protected from their origin (the user's browser) to their legitimate final destination. To that end, F3ildCrypt exploits browser scripting to enable application- and merchant-aware handling of sensitive data. Such techniques have traditionally been considered a security risk; to our knowledge, this is one of the first uses of web scripting that enhances overall security. F3ildCrypt uses proxy re-encryption to re-target messages as they enter and cross SOA boundaries, and uses XACML, the XML-based access control language, to define protection policies. Our approach scales well in the number of public key operations required for web clients and does not reveal proprietary details of the logical enterprise network (because of the application of proxy re-encryption). We evaluate F3ildCrypt and show an additional cost of 40 to 150 ms when making sensitive transactions from the web browser, and a processing rate of 100 to 140 XML fields/second on the server. We believe such costs to be a reasonable tradeoff for increased sensitive-data confidentiality.	(pdf)
Baiting Inside Attackers using Decoy Documents	Brian M. Bowen, Shlomo Hershkop, Angelos D. Keromytis, Salvatore J. Stolfo	2009-03-30	The insider threat remains one of the most vexing problems in computer security. A number of approaches have been proposed to detect nefarious insider actions including user modeling and profiling techniques, policy and access enforcement techniques, and misuse detection. In this work we propose trap-based defense mechanisms for the case where insiders attempt to exfiltrate and use sensitive information. Our goal is to confuse and confound the attacker requiring far more effort to identify real information from bogus information and to provide a means of detecting when an inside attacker attempts to exploit sensitive information. ``Decoy Documents" are automatically generated and stored on a file system with the aim of enticing a malicious insider to open and review the contents of the documents. The decoy documents contain several different types of bogus credentials that when used, trigger an alert. We also embed ``stealthy beacons" inside the documents that cause a signal to be emitted to a server indicating when and where the particular decoy was opened. We evaluate decoy documents on honeypots penetrated by attackers demonstrating the feasibility of the method.	(pdf)
Metamorphic Runtime Checking of Non-Testable Programs	Christian Murphy, Gail Kaiser	2009-03-16	Challenges arise in assuring the quality of applications that do not have test oracles, i.e., for which it is difficult or impossible to know that the correct output should be for arbitrary input. Recently, metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of these so-called "non-testable programs". In metamorphic testing, existing test case input is modified to produce new test cases in such a manner that, when given the new input, the function should produce an output that can easily be computed based on the original output. That is, if input x produces output f(x), then we create input x' such that we can predict f(x') based on f(x); if the application does not produce the expected output, then a defect must exist, and either f(x) or f(x') (or both) is wrong. Previously we have presented an approach called "Automated Metamorphic System Testing", in which metamorphic testing is conducted automatically as the program executes. In the approach, metamorphic properties of the entire application are specified, and then checked after execution is complete. Here, we improve upon that work by presenting a technique in which the metamorphic properties of individual functions are used, allowing for the specification of more complex properties and enabling finer-grained runtime checking. Our goal is to demonstrate that such an approach will be more effective than one based on specifying metamorphic properties at the system level, and is also feasible for use in the deployment environment. This technique, called Metamorphic Runtime Checking, is a system testing approach in which the metamorphic properties of individual functions are automatically checked during the program's execution. The tester is able to easily specify the functions' properties so that metamorphic testing can be conducted in a running application, allowing the tests to execute using real input data and in the context of real system states, without affecting those states. We also describe an implementation framework called Columbus, and present the results of empirical studies that demonstrate that checking the metamorphic properties of individual functions increases the effectiveness of the approach in detecting defects, with minimal performance impact.	(pdf)
An Anonymous Credit Card System	Elli Androulaki, Steven Bellovin	2009-02-27	Credit cards have many important benets; however, these same benefits often carry with them many privacy concerns. In particular, the need for users to be able to monitor their own transactions, as well as bank's need to justify its payment requests from cardholders, entitle the latter to maintain a detailed log of all transactions its credit card customers were involved in. A bank can thus build a profile of each cardholder even without the latter's consent. In this technical report, we present a practical and accountable anonymous credit system based on ecash , with a privacy preserving mechanism for error correction and expense-reporting.	(pdf)
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue	Agustin Gravano	2009-02-23	As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as 'okay' or 'alright' that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination.	(pdf)
Automatic System Testing of Programs without Test Oracles	Christian Murphy, Kuang Shen, Gail Kaiser	2009-01-30	Metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of applications that do not have test oracles, i.e., for which it is difficult or impossible to know what the correct output should be for arbitrary input. In metamorphic testing, existing test case input is modified to produce new test cases in such a manner that, when given the new input, the application should produce an output that can be easily be computed based on the original output. That is, if input x produces output f (x), then we create input x' such that we can predict f (x') based on f(x); if the application does not produce the expected output, then a defect must exist, and either f (x) or f (x') (or both) is wrong. In practice, however, metamorphic testing can be a manually intensive technique for all but the simplest cases. The transformation of input data can be laborious for large data sets, or practically impossible for input that is not in human-readable format. Similarly, comparing the outputs can be error-prone for large result sets, especially when slight variations in the results are not actually indicative of errors (i.e., are false positives), for instance when there is non-determinism in the application and multiple outputs can be considered correct. In this paper, we present an approach called Automated Metamorphic System Testing. This involves the automation of metamorphic testing at the system level by checking that the metamorphic properties of the entire application hold after its execution. The tester is able to easily set up and conduct metamorphic tests with little manual intervention, and testing can continue in the field with minimal impact on the user. Additionally, we present an approach called Heuristic Metamorphic Testing which seeks to reduce false positives and address some cases of non-determinism. We also describe an implementation framework called Amsterdam, and present the results of empirical studies in which we demonstrate the effectiveness of the technique on real-world programs without test oracles.	(pdf)
Example application under PRET environment -- Programming a MultiMediaCard	Devesh Dedhia	2009-01-22	PRET philosophy proposes the temporal characteristics to be made predictable. However for various applications the PRET processor will have to interact with a non predictable environment. In this paper an example of one such environment, an MultiMediaCard (MMC) is considered. This paper illustrates a method to make the response of the MMC predictable.	(pdf)
Improving the Quality of Computational Science Software by Using Metamorphic Relations to Test Machine Learning Applications	Xiaoyuan Xie, Joshua Ho, Christian Murphy, Gail Kaiser, Baowen Xu, T.Y. Chen	2009-01-19	Many applications in the field of scientific computing - such as computational biology, computational linguistics, and others - depend on Machine Learning algorithms to provide important core functionality to support solutions in the particular problem domains. However, it is difficult to test such applications because often there is no "test oracle" to indicate what the correct output should be for arbitrary input. To help address the quality of scientific computing software, in this paper we present a technique for testing the implementations of machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called "metamorphic testing", which has been shown to be effective in such cases. In addition to presenting our technique, we describe a case study we performed on a real-world machine learning application framework, and discuss how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also discuss how our findings can be of use to other areas of computational science and engineering.	(pdf)
Multi-perspective Evaluation of Self-Healing Systems Using Simple Probabilistic Models	Rean Griffith, Gail Kaiser, Javier Alonso Lopez	2009-01-19	Quantifying the efficacy of self-healing systems is a challenging but important task, which has implications for increasing designer, operator and end-user confidence in these systems. During design system architects benefit from tools and techniques that enhance their understanding of the system, allowing them to reason about the tradeoffs of proposed or existing self-healing mechanisms and the overall effectiveness of the system as a result of different mechanism-compositions. At deployment time, system integrators and operators need to understand how the selfhealing mechanisms work and how their operation impacts the system’s reliability, availability and serviceability (RAS) in order to cope with any limitations of these mechanisms when the system is placed into production. In this paper we construct an evaluation framework for selfhealing systems around simple, yet powerful, probabilistic models that capture the behavior of the system’s selfhealing mechanisms from multiple perspectives (designer, operator, and end-user). We combine these analytical models with runtime fault-injection to study the operation of VM-Rejuv – a virtual machine based rejuvenation scheme for web-application servers. We use the results from the fault-injection experiments and model-analysis to reason about the efficacy of VM-Rejuv, its limitations and strategies for managing/mitigating these limitations in systemdeployments. Whereas we use VM-Rejuv as the subject of our evaluation in this paper, our main contribution is a practical evaluation approach that can be generalized to other self-healing systems.	(pdf)
A Case Study in Distributed Deployment of Embedded Software for Camera NetworksA Case Study in Distributed Deployment of Embedded Software for Camera Networks	Francesco Leonardi, Alessandro Pinto, Luca P. Carloni	2009-01-15	We present an embedded software application for the real-time estimation of building occupancy using a network of video cameras. We analyze a series of alternative decompositions of the main application tasks and profile each of them by running the corresponding embedded software on three different processors. Based on the profiling measures, we build various alternative embedded platforms by combining different embedded processors, memory modules and network interfaces. In particular, we consider the choice of two possible network technologies: ARCnet and Ethernet. After deriving an analytical model of the network costs, we use it to complete an exploration of the design space as we scale the number of video cameras in an hypothetical building. We compare our results with those obtained for two real buildings of different characteristics. We conclude discussing the results of our case study in the broader context of other camera-network applications.	(pdf)
Improving Virtual Appliance Management through Virtual Layered File Systems	Shaya Potter, Jason Nieh	2009-01-15	Managing many computers is difficult. Recent virtualization trends exacerbate this problem by making it easy to create and deploy multiple virtual appliances per physical machine, each of which can be configured with different applications and utilities. This results in a huge scaling problem for large organizations as management overhead grows linearly with the number of appliances. To address this problem, we present Strata, a system that introduces the Virtual Layered File System (VLFS) and integrates it with virtual appliances to simplify system management. Unlike a traditional file system, which is a monolithic entity, a VLFS is a collection of individual software layers composed together to provide the traditional file system view. Individual layers are maintained in a central repository and shared across all VLFSs that use them. Layer changes and upgrades only need to be done once in the repository and are then automatically propagated to all VLFSs, resulting in management overhead independent of the number of virtual appliances. We have implemented a Strata Linux prototype without any application or operating system kernel changes. Using this prototype, we demonstrate how Strata enables fast system provisioning, simplifies system maintenanc and upgrades, speeds system recovery from security exploits, and incurs only modest performance overhead.	(pdf)
Retrocomputing on an FPGA: Reconstructing an 80's-Era Home Computer with Programmable Logic	Stephen A. Edwards	2009-01-12	The author reconstructs a computer of his childhood, an Apple II+.	(pdf) (ps)
A MPEG Decoder in SHIM	Keerti Joshi, Delvin Kellebrew	2008-12-23	The emergence of world-wide standards for video compression has created a demand for design tools and simulation resources to support algorithm research and new product development. Because of the need for subjective study in the design of video compression algorithms it is essential that flexible yet computationally efficient tools be developed. For this project, we plan to implement a MPEG standard using the SHIM programming language. The SHIM is a software/hardware integration language whose aim is to provide communication between hardware and software while providing deterministic concurrency. The focus of this project will be to emphasize the efficiency of the SHIM language in embedded applications as compared to other existing implementations.	(pdf)
Using Metamorphic Testing at Runtime to Detect Defects in Applications without Test Oracles	Christian Murphy	2008-12-22	First, we will present an approach called Automated Metamorphic System Testing. This will involve automating system-level metamorphic testing by treating the application as a black box and checking that the metamorphic properties of the entire application hold after execution. This will allow for metamorphic testing to be conducted in the production environment without affecting the user, and will not require the tester to have access to the source code. The tests do not require an oracle upon their creation; rather, the metamorphic properties act as built-in test oracles. We will also introduce an implementation framework called Amsterdam. Second, we will present a new type of testing called Metamorphic Runtime Checking. This involves the execution of metamorphic tests from within the application, i.e., the application launches its own tests, within its current context. The tests execute within the application’s current state, and in particular check a function’s metamorphic properties. We will also present a system called Columbus that supports the execution of the Metamorphic Runtime Checking from within the context of the running application. Like Amsterdam, it will conduct the tests with acceptable performance overhead, and will ensure that the execution of the tests does not affect the state of the original application process from the users’ perspective; however, the implementation of Columbus will be more challenging in that it will require more sophisticated mechanisms for conducting the tests without pre-empting the rest of the application, and for comparing the results which may conceivably be in different processes or environments. Third, we will describe a set of metamorphic testing guidelines that can be followed to assist in the formulation and specification of metamorphic properties that can be used with the above approaches. These will categorize the different types of properties exhibited by many applications in the domain of machine learning and data mining in particular (as a result of the types of applications we will investigate), but we will demonstrate that they are also generalizable to other domains as well. This set of guidelines will also correlate to the different types of defects that we expect the approaches will be able to find.	(pdf)
Static Deadlock Detection in SHIM with an Automata Type Checking System	Dave Aaron Smith, Nalini Vasudevan, Stephen Edwards	2008-12-21	With the advent of multicores, concurrent programming languages are become more prevelant. Data Races and Deadlocks are two major problems with concurrent programs. SHIM is a concurrent programming language that guarantees absence of data races through its semantics. However, a program written in SHIM can deadlock if not carefully written. In this paper, we present a divide-and-merge technique to statically detect deadlocks in SHIM. SHIM is asynchronous, but we can greatly reduce its state space without loosing precision because of its semantics.	(pdf) (ps)
SHIM Optimization: Elimination Of Unstructured Loops	Ravindra Babu Ganapathi, Stephen A. Edwards	2008-12-21	The SHIM compiler for the IBM CELL processor generates distinct code for the two processing units, PPE (Power Processor Element) and SPE (Synergistic Processor Elements). The SPE is specialized to give high throughput with computation intensive application operating on dense data. We propose mechanism to tune the code generated by the SHIM compiler to enable optimizing compilers to generate structured code. Although, the discussion here is related to optimizing SHIM IR (Intermediate Representation) code, the techniques discussed here can be incorporated into compilers to convert unstructured loops consisting of goto statements to structured loops such as while and do-while statements to ease back end compiler optimizations. Our research based SHIM compiler takes the code written in SHIM language and performs various static analysis and finally transforms it into C code. This generated code is compiled to binary using standard compilers available for IBM cell processor such as GCC and IBM XL compiler.	(pdf) (ps)
uClinux on the Altera DE2	David Lariviere, Stephen A. Edwards	2008-12-21	This technical report provides an introduction on how to compile and run uClinux and third-party programs to be run on a Nios II CPU core instantiated within the FPGA on the Altera DE2. It is based on experiences working with the OS and development board while teaching the Embedded Systems course during the springs of 2007 and 2008.	(pdf)
Memory Issues in PRET Machines	Nishant R. Shah	2008-12-21	In a processor design the premier issues with memory are (1) main memory allocation and (2) interprocess communication. These two mainly affect the performance of the memory system. The goal of this paper is to formulate a deterministic model for memory systems of PRET, taking into account all the intertwined parallelism of modern memory chips. Studying existing memory models is necessary to understand the implications of these factors to realize a perfectly time predictable memory system.	(pdf)
Analysis of Clocks in X10 Programs (Extended)	Nalini Vasudevan, Olivier Tardieu, Julian Dolby, Stephen A. Edwards	2008-12-19	Clocks are a mechanism for providing synchronization barriers in concurrent programming languages. They are usually implemented using primitive communication mechanisms and thus spare the programmer from reasoning about low-level implementation details such as remote procedure calls and error conditions. Clocks provide flexibility, but programs often use them in specific ways that do not require their full implementation. In this paper, we describe a tool that mitigates the overhead of general-purpose clocks by statically analyzing how programs use them and choosing optimized implementations when available. We tackle the clock implementation in the standard library of the X10 programming language---a parallel, distributed object-oriented language. We report our findings for a small set of analyses and benchmarks. Our tool only adds a few seconds to analysis time, making it practical to use as part of a compilation chain.	(pdf)
Classifying High-Dimensional Text and Web Data using Very Short Patterns	Hassan Malik, John Kender	2008-12-17	In this paper, we propose the "Democratic Classifier", a simple, democracy-inspired pattern-based classification algorithm that uses very short patterns for classification, and does not rely on the minimum support threshold. Borrowing ideas from democracy, our training phase allows each training instance to vote for an equal number of candidate size-2 patterns. Similar to the usual democratic election process, where voters select candidates by considering their qualifications, prior contributions at the constituency and territory levels, as well as their own perception about candidates, the training instances select patterns by effectively balancing between local, class, and global significance of patterns. In addition, we respect "each voter's opinion" by simultaneously adding shared patterns to all applicable classes, and then apply a novel power law based weighing scheme, instead of making binary decisions on these patterns. Results of experiments performed on 121 common text and web datasets show that our algorithm almost always outperforms state of the art classification algorithms, without requiring any dataset-specific parameter tuning. On 100 real-life, noisy, web datasets, the average absolute classification accuracy improvement was as great as 10% over SVM, Harmony, C4.5 and KNN. Also, our algorithm ran about 3.5 times faster than the fastest existing pattern-based classification algorithm.	(pdf)
Distributed eXplode: A High-Performance Model Checking Engine to Scale Up State-Space Coverage	Nageswar Keetha, Leon Wu, Gail Kaiser, Junfeng Yang	2008-12-10	Model checking the state space (all possible behaviors) of software systems is a promising technique for verification and validation. Bugs such as security vulnerabilities, file storage issues, deadlocks and data races can occur anywhere in the state space and are often triggered by corner cases; therefore, it becomes important to explore and model check all runtime choices. However, large and complex software systems generate huge numbers of behaviors leading to ‘state explosion’. eXplode is a lightweight, deterministic and depth-bound model checker that explores all dynamic choices at runtime. Given an application-specific test-harness, eXplode performs state search in a serialized fashion - which limits its scalability and performance. This paper proposes a distributed eXplode engine that uses multiple host machines concurrently in order to achieve more state space coverage in less time, and is very helpful to scale up the software verification and validation effort. Test results show that Distributed eXplode runs several times faster and covers more state space than the standalone eXplode.	(pdf)
Generalized Assorted Pixel Camera: Post-Capture Control of Resolution, Dynamic Range and Spectrum	Fumihito Yasuma, Tomoo Mitsunaga, Daisuke Iso, Shree K. Nayar	2008-11-24	We propose the concept of a generalized assorted pixel (GAP) camera, which enables the user to capture a single image of a scene and, after the fact, control the trade-off between spatial resolution, dynamic range and spectral detail. The GAP camera uses a complex array (or mosaic) of color filters. A major problem with using such an array is that the captured image is severely under-sampled for at least some of the filter types. This leads to reconstructed images with strong aliasing. We make three contributions in this paper: (a) We present a comprehensive optimization method to arrive at the spatial and spectral layout of the color filter array of a GAP camera. (b) We develop a novel anti-aliasing algorithm for reconstructing the under-sampled channels of the image with minimal aliasing. (c) We demonstrate how the user can capture a single image and then control the trade-off of spatial resolution to generate a variety of images, including monochrome, high dynamic range (HDR) monochrome, RGB, HDR RGB, and multispectral images. Finally, the performance of our GAP camera has been verified using extensive simulations that use multispectral images of real world scenes. A large database of these multispectral images is being made publicly available for use by the research community.	(pdf)
Measurements of Multicast Service Discovery in a Campus Wireless Network	Se Gi Hong, Suman Srinivasan, Henning Schulzrinne	2008-11-14	Applications utilizing multicast service discovery protocols, such as iTunes, have become increasingly popular. However, multicast service discovery protocols are considered to generate network traffic overhead, especially in a wireless network. Therefore, it becomes important to evaluate the traffic and overhead caused by multicast service discovery packets in real-world networks. We measure and analyze the traffic of one of the mostly deployed multicast service discovery protocols, multicast DNS (mDNS) service discovery, in a campus wireless network that forms a single multicast domain of large users. We also analyze different service discovery models in terms of packet overhead and service discovery delay under different network sizes and churn rates. Our measurement shows that mDNS traffic consumes about 13 percent of the total bandwidth.	(pdf)
Improving the Dependability of Machine Learning Applications	Christian Murphy, Gail Kaiser	2008-10-10	As machine learning (ML) applications become prevalent in various aspects of everyday life, their dependability takes on increasing importance. It is challenging to test such applications, however, because they are intended to learn properties of data sets where the correct answers are not already known. Our work is not concerned with testing how well an ML algorithm learns, but rather seeks to ensure that an application using the algorithm implements the specification correctly and fulfills the users' expectations. These are critical to ensuring the application's dependability. This paper presents three approaches to testing these types of applications. In the first, we create a set of limited test cases for which it is, in fact, possible to predict what the correct output should be. In the second approach, we use random testing to generate large data sets according to parameterization based on the application’s equivalence classes. Our third approach is based on metamorphic testing, in which properties of the application are exploited to define transformation functions on the input, such that the new output can easily be predicted based on the original output. Here we discuss these approaches, and our findings from testing the dependability of three real-world ML applications.	(pdf)
Opportunistic Use of Client Repeaters to Improve Performance of WLANs	Victor Bahl, Ranveer Chandra, Patrick Pak-Ching Lee, Vishal Misra, Jitendra Padhye, Dan Rubenstein	2008-10-09	Currently deployed IEEE 802.11WLANs (Wi-Fi networks) share access point (AP) bandwidth on a per-packet basis. However, the various stations communicating with the AP often have different signal qualities, resulting in different transmission rates. This induces a phenomenon known as the rate anomaly problem, in which stations with lower signal quality transmit at lower rates and consume a significant majority of airtime, thereby dramatically reducing the throughput of stations transmitting at high rates. We propose a practical, deployable system, called SoftRepeater, in which stations cooperatively address the rate anomaly problem. Specifically, higher-rate Wi-Fi stations opportunistically transformthemselves into repeaters for stations with low data-rates when transmitting to/from the AP. The key challenge is to determine when it is beneficial to enable the repeater functionality. In this paper, we propose an initiation protocol that ensures that repeater functionality is enabled only when appropriate. Also, our system can run directly on top of today’s 802.11 infrastructure networks. We also describe a novel, zero-overhead network coding scheme that further alleviates undesirable symptoms of the rate anomaly problem. We evaluate our system using simulation and testbed implementation, and find that SoftRepeater can improve cumulative throughput by up to 200%.	(pdf)
The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models	Rean Griffith	2008-10-06	Renewed interest in developing computing systems that meet additional non-functional requirements such as reliability, high availability and ease-of-management/self-management (serviceability) has fueled research into developing systems that exhibit enhanced reliability, availability and serviceability (RAS) capabilities. This research focus on enhancing the RAS capabilities of computing systems impacts not only the legacy/existing systems we have today, but also has implications for the design and development of next generation (self- managing/self-) systems, which are expected to meet these non-functional requirements with minimal human intervention. To reason about the RAS capabilities of the systems of today or the self- systems of tomorrow, there are three evaluation-related challenges to address. First, developing (or identifying) practical fault-injection tools that can be used to study the failure behavior of computing systems and exercise any (remediation) mechanisms the system has available for mitigating or resolving problems. Second, identifying techniques that can be used to quantify RAS deficiencies in computing systems and reason about the eﬃcacy of individual or combined RAS-enhancing mechanisms (at design-time or after system deployment). Third, developing an evaluation methodology that can be used to objectively compare systems based on the (expected or actual) beneﬁts of RAS-enhancing mechanisms. This thesis addresses these three challenges by introducing the 7U Evaluation Methodology, a complementary approach to traditional performance-centric evaluations that identifies criteria for comparing and analyzing existing (or yet-to-be-added) RAS-enhancing mechanisms, is able to evaluate and reason about combinations of mechanisms, exposes under-performing mechanisms and highlights the lack of mechanisms in a rigorous, objective and quantitative manner. The development of the 7U Evaluation Methodology is based on the following three hypotheses. First, that runtime adaptation provides a platform for implementing eﬃcient and ﬂexible fault-injection tools capable of in-situ and in-vivo interactions with computing systems. Second, that mathematical models such as Markov chains, Markov reward networks and Control theory models can successfully be used to create simple, reusable templates for describing speciﬁc failure scenarios and scoring the system’s responses, i.e., studying the failure-behavior of systems, and the various facets of its remediation mechanisms and their impact on system operation. Third, that combining practical fault-injection tools with mathematical modeling techniques based on Markov Chains, Markov Reward Networks and Control Theory can be used to develop a benchmarking methodology for evaluating and comparing the reliability, availability and serviceability (RAS) characteristics of computing systems. This thesis demonstrates how the 7U Evaluation Method can be used to evaluate the RAS capabilities of real-world computing systems and in so doing makes three contributions. First, a suite of runtime fault-injection tools (Kheiron tools) able to work in a variety of execution environments is developed. Second, analytical tools that can be used to construct mathematical models (RAS models) to evaluate and quantify RAS capabilities using appropriate metrics are discussed. Finally, the results and insights gained from conducting fault-injection experiments on real-world systems and modeling the system responses (or lack thereof) using RAS models are presented. In conducting 7U Evaluations of real-world systems, this thesis highlights the similarities and differences between traditional performance-oriented evaluations and RAS-oriented evaluations and outlines a general framework for conducting RAS evaluations.	(pdf)
Quality Assurance of Software Applications using the In Vivo Testing Approach	Christian Murphy, Gail Kaiser, Ian Vo, Matt Chu	2008-10-02	Software products released into the field typically have some number of residual defects that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, incorrect assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system; these defects may also be due to application states that were not considered during lab testing, or corrupted states that could arise due to a security violation. One approach to this problem is to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present a testing methodology we call in vivo testing, in which tests are continuously executed in the deployment environment. We also describe a type of test we call in vivo tests that are specifically designed for use with such an approach: these tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state from the perspective of the end-user. We discuss the approach and the prototype testing framework for Java applications called Invite. We also provide the results of case studies that demonstrate Invite’s effectiveness and efficiency.	(pdf)
Using JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracles	Christian Murphy, Kuang Shen, Gail Kaiser	2008-10-02	It is challenging to test applications and functions for which the correct output for arbitrary input cannot be known in advance, e.g. some computational science or machine learning applications. In the absence of a test oracle, one approach to testing these applications is to use metamorphic testing: existing test case input is modified to produce new test cases in such a manner that, when given the new input, the application should produce an output that can be easily be computed based on the original output. That is, if input x produces output f(x), then we create input x' such that we can predict f(x') based on f(x); if the application or function does not produce the expected output, then a defect must exist, and either f(x) or f(x') (or both) is wrong. By using metamorphic testing, we are able to provide built-in "pseudo-oracles" for these so-called "nontestable programs" that have no test oracles. In this paper, we describe an approach in which a function's metamorphic properties are specified using an extension to the Java Modeling Language (JML), a behavioral interface specification language that is used to support the "design by contract" paradigm in Java applications. Our implementation, called Corduroy, pre-processes these specifications and generates test code that can be executed using JML runtime assertion checking, for ensuring that the specifications hold during program execution. In addition to presenting our approach and implementation, we also describe our findings from case studies in which we apply our technique to applications without test oracles.	(pdf)
VoIP-based Air Traffic Controller Training	Supreeth Subramanya, Xiaotao Wu, Henning Schulzrinne	2008-09-26	Extending VoIP beyond the Internet telephony, we propose a case study of applying the technology outside of its intended domain, to solve a real-world problem. This work is an attempt to understand an analog hardwired communication system of the U.S. Federal Aviation Administration (FAA), and effectively translate it into a generic, standards-based VoIP system that runs on their existing data network. We develop insights into the air traffic training and weigh on the design choices for building a soft real-time data communication system. We also share our real-world deployment and maintenance experiences, as the FAA Academy has been successfully using this VoIP system in five training rooms since 2006 to train the future air traffic controllers of the U.S. and the world.	(pdf)
A Better Approach than Carrier-Grade-NAT	Olaf Maennel, Randy Bush, Luca Cittadini, Steven M. Bellovin	2008-09-24	We are facing the exhaustion of newly assignable IPv4 addresses. Unfortunately, IPv6 is not yet deployed widely enough to fully replace IPv4, and it is unrealistic to expect that this is going to change before we run out of IPv4 addresses. Letting hosts seamlessly communicate in an IPv4-world without assigning a unique globally routable IPv4 address to each of them is a challenging problem, for which many solutions have been proposed. Some prominent ones target towards carrier-grade-NATs (CGN), which we feel is a bad idea. Instead, we propose using specialized NATs at the edge that treat some of the port number bits as part of the address.	(pdf)
Spectrogram: A Mixture-of-Markov-Chains Model for Anomaly Detection in Web Traffic	Yingbo Song, Angelos D. Keromytis, Salvatore J. Stolfo	2008-09-15	We present Spectrogram, a mixture of Markov-chains sensor for anomaly detection (AD) against web-layer (port 80) code-injection attacks such as PHP file inclusion, SQL-injection, cross-site-scripting, as well as memory layer buffer overflows. Port 80 is the gateway to many application level services and a large array of attacks are channeled through this vector, servers cannot easily firewall this port. Signature-based sensors are effective in filtering known exploits but can’t detect 0-day vulnerabilities or deal with polymorphism and statistical AD approaches have mostly been limited to network layer, protocol-agnostic modeling, weakening their effectiveness. N -gram based modeling approaches have recently demonstrated success but the ill-posed nature of modeling large grams have thus far prevented exploration of higher order statistical models. In this paper, we provide a solution to this problem based on a factorization into Markov-chains and aim to model higher order structure as well as content for web requests. Spectrogram is implemented in a protocol-aware, passive, network-situated, but CGI-layered, AD architecture and we show in our evaluation that this model demonstrates significant detection results on an array of real world web-layer attacks, achieving at least 97% detection rates on all but one dataset and comparing favorably against other AD sensors.	(pdf)
Retina: Helping Students and Instructors Based on Observed Programming Activities	Christian Murphy, Gail Kaiser, Kristin Loveland, Sahar Hasan	2008-08-28	It is difficult for instructors of CS1 and CS2 courses to get accurate answers to such critical questions as "how long are students spending on programming assignments?", or "what sorts of errors are they making?" At the same time, students often have no idea of where they stand with respect to the rest of the class in terms of time spent on an assignment or the number or types of errors that they encounter. In this paper, we present a tool called Retina, which collects information about students' programming activities, and then provides useful and informative reports to both students and instructors based on the aggregation of that data. Retina can also make real-time recommendations to students, in order to help them quickly address some of the errors they make. In addition to describing Retina and its features, we also present some of our initial ndings during two trials of the tool in a real classroom setting.	(pdf)
Approximating a Global Passive Adversary Against Tor	Sambuddho Chakravarty, Angelos Stavrou, Angelos D. Keromytis	2008-08-18	We present a novel, practical, and effective mechanism for identifying the IP address of Tor clients. We approximate an almost-global passive adversary (GPA) capable of eavesdropping anywhere in the network by using LinkWidth, a novel bandwidth-estimation technique. LinkWidth allows network edge-attached entities to estimate the available bandwidth in an arbitrary Internet link without a cooperating peer host, router, or ISP. By modulating the bandwidth of an anonymous connection (e.g., when the destination server or its router is under our control), we can observe these fluctuations as they propagate through the Tor network and the Internet to the end-user’s IP address. Our technique exploits one of the design criteria for Tor (trading off GPA-resistance for improved latency/bandwidth over MIXes) by allowing well-provisioned (in terms of bandwidth) adversaries to effectively become GPAs. Although timing-based attacks have been demonstrated against non-timing-preserving anonymity networks, they have depended either on a global passive adversary or on the compromise of a substantial number of Tor nodes. Our technique does not require compromise of any Tor nodes or collaboration of the end-server (for some scenarios). We demonstrate the effectiveness of our approach in tracking the IP address of Tor users in a series of experiments. Even for an underprovisioned adversary with only two network vantage points, we can identify the end user (IP address) in many cases.	(pdf)
Deux: Autonomic Testing System for Operating System Upgrades	Leon Wu, Gail Kaiser, Jason Nieh, Christian Murphy	2008-08-15	Operating system upgrades and patches sometimes break applications that worked fine on the older version. We present an autonomic approach to testing of OS updates while minimizing downtime, usable without local regression suites or IT expertise. Deux utilizes a dual-layer virtual machine architecture, with lightweight application process checkpoint and resume across OS versions, enabling simultaneous execution of the same applications on both OS versions in different VMs. Inputs provided by ordinary users to the production old version are also fed to the new version. The old OS acts as a pseudo-oracle for the update, and application state is automatically re-cloned to continue testing after any output discrepancies (intercepted at system call level) - all transparently to users. If all differences are deemed inconsequential, then the VM roles are switched with the application state already in place. Our empirical evaluation with both LAMP and standalone applications demonstrates Deux’s efficiency and effectiveness.	(pdf)
Predictive Models of Gene Regulation	Anshul Kundaje	2008-08-15	The regulation of gene expression plays a central role in the development and function of a living cell. A complex network of interacting regulatory proteins bind specific sequence elements in the genome to control the amount and timing of gene expression. The abundance of genome-scale datasets from different organisms provides an opportunity to accelerate our understanding of the mechanisms of gene regulation. Developing computational tools to infer gene regulation programs from high-throughput genomic data is one of the central problems in computational biology. In this thesis, we present a new predictive modeling framework for studying gene regulation. We formulate the problem of learning regulatory programs as a binary classification task: to accurately predict the the condition-specific activation (up-regulation) and repression (down-regulation) of gene expression. The gene expression response is measured by microarray expression data. Genes are represented by various genomic regulatory sequence features. Experimental conditions are represented by the gene expression levels of various regulatory proteins. We use this combination of features to learn a prediction function for the regulatory response of genes under different experimental conditions. The core computational approach is based on boosting. Boosting algorithms allow us to learn high-accuracy, large-margin classifiers and avoid overfitting. We describe three applications of our framework to study gene regulation: - In the GeneClass algorithm, we use a compendium of known transcription factor binding sites and gene expression data to learn a global context-specific regulation program that accurately predicts differential expression. GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. We introduce a novel robust variant of boosting that improves stability and biological interpretability in the presence of correlated features. We also show how to incorporate genome-wide protein-DNA binding data from ChIP-chip experiments into the framework. - In several organisms, the DNA binding sites of many transcription factors are unknown. Hence, automatic discovery of regulatory sequence motifs is required. In the MEDUSA algorithm, we integrate raw promoter sequence data and gene expression data to simultaneously discover cis regulatory motifs ab initio and learn predictive regulatory programs. MEDUSA automatically learns probabilistic representations of motifs and their corresponding target genes. We show that we are able to accurately learn the binding sites of most known transcription factors in yeast. - We also design new techniques for extracting biologically and statistically significant information from the learned regulatory models. We use a margin-based score to extract global condition-specific regulomes as well as cluster-specific and gene-specific regulation programs. We develop a post-processing framework for interpreting and visualizing biological information encapsulated in our models. We show the utility of our framework in analyzing several interesting biological contexts (environmental stress responses, DNA-damage response and hypoxia-response) in the budding yeast Saccharomyces cerevisiae. We also show that our methods can learn regulatory programs and cis regulatory motifs in higher eukaryotes such as worms and humans. Several hypotheses generated by our methods are validated by our collaborators using biochemical experiments. Experimental results demonstrate that our framework is quantitatively and qualitatively predictive. We are able to achieve high prediction accuracy on test data and also generate specific, testable hypotheses.	(pdf)
Using Runtime Testing to Detect Defects in Applications without Test Oracles	Christian Murphy, Gail Kaiser	2008-08-07	It is typically infeasible to test a large, complex software system in all its possible configurations and system states prior to deployment. Moreover, some such applications have no test oracles to indicate their correctness. In my thesis, we will address these problems in two ways. First, we suggest that executing tests within the context of an application running in the field can reveal defects that would not ordinarily otherwise be found. Second, we believe that this approach can further be extended to applications for which there is no test oracle by using a variant of metamorphic testing at runtime.	(pdf)
Towards the Quality of Service for VoIP traffic in IEEE 802.11 Wireless Networks	Sangho Shin, Henning Schulzrinne	2008-07-09	The usage of voice over IP (VoIP) traffic in IEEE 802.11 wireless networks is expected to increase in the near future due to widely deployed 802.11 wireless networks and VoIP services on fixed lines. However, the quality of service (QoS) of VoIP traffic in wireless networks is still unsatisfactory. In this thesis, I identify several sources for the QoS problems of VoIP traffic in IEEE 802.11 wireless networks and propose solutions for these problems. The QoS problems discussed can be divided into three categories, namely, user mobility, VoIP capacity, and call admission control. User mobility causes network disruptions during handoffs. In order to reduce the handoff time between Access Points (APs), I propose a new handoff algorithm, Selective Scanning and Caching, which finds available APs by scanning a minimum number of channels and furthermore allows clients to perform handoffs without scanning, by caching AP information. I also describe a new architecture for the client and server side for seamless IP layer handoffs, which are caused when mobile clients change the subnet due to layer 2 handoffs. I also present two methods to improve VoIP capacity for 802.11 networks, Adaptive Priority Control (APC) and Dynamic Point Coordination Function (DPCF). APC is a new packet scheduling algorithm at the AP and improves the capacity by balancing the uplink and downlink delay of VoIP traffic, and DPCF uses a polling based protocol and minimizes the bandwidth wasted from unnecessary polling, using a dynamic polling list. Additionally, I estimated the capacity for VoIP traffic in IEEE 802.11 wireless networks via theoretical analysis, simulations, and experiments in a wireless test-bed and show how to avoid mistakes in the measurements and comparisons. Finally, to protect the QoS for existing VoIP calls while maximizing the channel utilization, I propose a novel admission control algorithm called QP-CAT (Queue size Prediction using Computation of Additional Transmission), which accurately predicts the impact of new voice calls by virtually transmitting virtual new VoIP traffic.	(pdf)
genSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work	Christian Murphy, Swapneel Sheth, Gail Kaiser, Lauren Wilcox	2008-06-13	Many collaborative applications, especially in scientific research, focus only on the sharing of tools or the sharing of data. We seek to introduce an approach to scientific collaboration that is based on knowledge sharing. We do this by automatically building organizational memory and enabling knowledge sharing by observing what users do with a particular tool or set of tools in the domain, through the addition of activity and usage monitoring facilities to standalone applications. Once this knowledge has been gathered, we apply social networking models to provide collaborative features to users, such as suggestions on tools to use, and automatically-generated sequences of actions based on past usage amongst the members of a social network or the entire community. In this work, we investigate social networking models as an approach to scientific knowledge sharing, and present an implementation called genSpace, which is built as an extension to the geWorkbench platform for computational biologists. Last, we discuss the approach from the viewpoint of social software engineering.	(pdf)
Application Layer Feedback-based SIP Server Overload Control	Charles Shen, Henning Schulzrinne, Erich Nahum	2008-06-06	A SIP server may be overloaded by emergency-induced call volume, "American Idol" style flash crowd effects or denial of service attacks. The SIP server overload problem is interesting especially because the costs of serving or rejecting a SIP session can be similar. For this reason, the built-in SIP overload control mechanism based on generating rejection messages cannot prevent the server from entering congestion collapse under heavy load. The SIP overload problem calls for a pushback control solution in which the potentially overloaded receiving server may notify its upstream sending servers to have them send only the amount of load within the receiving server's processing capacity. The pushback framework can be achieved by SIP application layer rate-based feedback or window-based feedback. The centerpiece of the feedback mechanism is the algorithm used to generate load regulation information. We propose three new window-based feedback algorithms and evaluate them together with two existing rate-based feedback algorithms. We compare the different algorithms in terms of the number of tuning parameters and performance under both steady and variable load. Furthermore, we identify two categories of fairness requirements for SIP overload control, namely, user-centric and provider-centric fairness. With the introduction of a new double-feed SIP overload control architecture, we show how the algorithms meet those fairness criteria.	(pdf)
CPU Torrent -- CPU Cycle Offloading to Reduce User Wait Time and Provider Resource Requirements	Swapneel Sheth, Gail Kaiser	2008-06-04	Developers of novel scientific computing systems are often eager to make their algorithms and databases available for community use, but their own computational resources may be inadequate to fulfill external user demand -- yet the system's footprint is far too large for prospective user organizations to download and run locally. Some heavyweight systems have become part of designated ``centers'' providing remote access to supercomputers and/or clusters supported by substantial government funding; others use virtual supercomputers dispersed across grids formed by massive numbers of volunteer Internet-connected computers. But public funds are limited and not all systems are amenable to huge-scale divisibility into independent computation units. We have identified a class of scientific computing systems where ``utility'' sub-jobs can be offloaded to any of several alternative providers thereby freeing up local cycles for the main proprietary jobs, implemented a proof-of-concept framework enabling such deployments, and analyzed its expected throughput and response-time impact on a real-world bioinformatics system (Columbia's PredictProtein) whose present users endure long wait queues.	(pdf)
FairTorrent: Bringing Fairness to Peer-to-Peer Systems	Alex Sherman, Jason Nieh, Clifford Stein	2008-05-27	The lack of fair bandwidth allocation in Peer-to-Peer systems causes many performance problems, including users being disincentivized from contributing upload bandwidth, free riders taking as much from the system as possible while contributing as little as possible, and a lack of quality-of-service guarantees to support streaming applications. We present FairTorrent, a simple distributed scheduling algorithm for Peer-to-Peer systems that fosters fair bandwidth allocation among peers. For each peer, FairTorrent maintains a deficit counter which represents the number of bytes uploaded to a peer minus the number of bytes downloaded from it. It then uploads to the peer with the lowest deficit counter. FairTorrent automatically adjusts to variations in bandwidth among peers and is resilient to exploitation by free-riding peers. We have implemented FairTorrent inside a BitTorrent client without modifications to the BitTorrent protocol, and compared its performance on PlanetLab against other widely-used BitTorrent clients. Our results show that FairTorrent can provide up to two orders of magnitude better fairness and up to five times better download performance for high contributing peers. It thereby gives users an incentive to contribute more bandwidth, and improve overall system performance.	(pdf)
IEEE 802.11 in the Large: Observations at an IETF Meeting	Andrea G. Forte, Sangho Shin, Henning Schulzrinne	2008-05-05	We observed wireless network traffic at the 65th IETF Meeting in Dallas, Texas in March of 2006, attended by approximately 1200 engineers. The event was supported by a very large number of 802.11a and 802.11b access points, often seeing hundreds of simultaneous users. We were particularly interested in the stability of wireless connectivity, load balancing and loss behavior, rather than just traffic.We observed distinct differences among client implementations and saw a number of factors that made the overall system less than optimal, pointing to the need for better design tools and automated adaptation mechanisms.	(pdf)
ROFL: Routing as the Firewall Layer	Hang Zhao, Chi-Kin Chau, Steven M. Bellovin	2008-05-03	We propose a firewall architecture that treats port numbers as part of the IP address. Hosts permit connectivity to a service by advertising the IPaddr:port/48 address; they block connectivity by ensuring that there is no route to it. This design, which is especially well-suited to MANETs, provides greater protection against insider attacks than do conventional firewalls, but drops unwanted traffic far earlier than distributed firewalls do.	(pdf)
Stored Media Streaming in BitTorrent-like P2P Networks	Kyung-Wook Hwang, Vishal Misra, Dan Rubenstein	2008-05-01	Peer-to-peer (P2P) networks exist on the Internet today as a popular means of data distribution. However, conventional uses of P2P networking involve distributing stored files for use after the entire file has been downloaded. In this work, we investigate whether P2P networking can be used to provide real-time playback capabilities for stored media. For real-time playback, users should be able to start playback immediately, or almost immediately, after requesting the media and have uninterrupted playback during the download. To achieve this goal, it is critical to efficiently schedule the order in which pieces of the desired media are downloaded. Simply downloading pieces in sequential (earliest-first) order is prone to bottlenecks. Consequently we propose a hybrid of earliest-first and rarest-first scheduling - ensuring high piece diversity while at the same time prioritizing pieces needed to maintain uninterrupted playback. We consider an approach to peer-assisted streaming that is based on BitTorrent. In particular, we show that dynamic adjustment of the probabilities of earliest-first and rarest-first strategies along with utilization of coding techniques promoting higher data diversity, can offer noticeable improvements for real-time playback.	(pdf)
ReoptSMART: A Learning Query Plan Cache	Julia Stoyanovich, Kenneth A. Ross, Jun Rao, Wei Fan, Volker Markl, Guy Lohman	2008-04-24	The task of query optimization in modern relational database systems is important but can be computationally expensive. Parametric query optimization(PQO) has as its goal the prediction of optimal query execution plans based on historical results, without consulting the query optimizer. We develop machine learning techniques that can accurately model the output of a query optimizer. Our algorithms handle non-linear boundaries in plan space and achieve high prediction accuracy even when a limited amount of data is available for training. We use both predicted and actual query execution times for learning, and are the first to demonstrate a total net win of a PQO method over a state-of-the-art query optimizer for some workloads. ReoptSMART realizes savings not only in optimization time, but also in query execution time, for an over-all improvement by more than an order of magnitude in some cases.	(pdf)
Masquerade Detection Using a Taxonomy-Based Multinomial Modeling Approach in UNIX Systems	Malek Ben Salem, Salvatore J. Stolfo	2008-04-14	This paper presents one-class Hellinger distance-based and one-class SVM modeling techniques that use a set of features to reveal user intent. The specific objective is to model user command profiles and detect deviations indicating a masquerade attack. The approach aims to model user intent, rather than only modeling sequences of user issued commands. We hypothesize that each individual user will search in a targeted and limited fashion in order to find information germane to their current task. Masqueraders, on the other hand, will likely not know the file system and layout of another user's desktop, and would likely search more extensively and broadly. Hence, modeling a user search behavior to detect deviations may more accurately detect masqueraders. To that end, we extend prior research that uses UNIX command sequences issued by users as the audit source by relying upon an abstraction of commands. We devised a taxonomy of UNIX commands that is used to abstract command sequences. The experimental results show that the approach does not lose information and performs comparably to or slightly better than the modeling approach based on simple UNIX command frequencies.	(pdf)
Approximating the Permanent with Belief Propagation	Bert Huang, Tony Jebara	2008-04-05	This work describes a method of approximating matrix permanents efficiently using belief propagation. We formulate a probability distribution whose partition function is exactly the permanent, then use Bethe free energy to approximate this partition function. After deriving some speedups to standard belief propagation, the resulting algorithm requires $(n^2)$ time per iteration. Finally, we demonstrate the advantages of using this approximation.	(pdf)
Behavior-Based Network Access Control: A Proof-of-Concept	Vanessa Frias-Martinez	2008-03-27	Current NAC technologies implement a pre-connect phase where the status of a device is checked against a set of policies before being granted access to a network, and a post-connect phase that examines whether the device complies with the policies that correspond to its role in the network. In order to enhance current NAC technologies, we propose a new architecture based on behaviors rather than roles or identity, where the policies are automatically learned and updated over time by the members of the network in order to adapt to behavioral changes of the devices. Behavior proﬁles may be presented as identity cards that can change over time. By incorporating an Anomaly Detector (AD) to the NAC server or to each of the hosts, their behavior proﬁle is modeled and used to determine the type of behaviors that should be accepted within the network. These models constitute behavior-based policies. In our enhanced NAC architecture, global decisions are made using a group voting process. Each host’s behavior proﬁle is used to compute a partial decision for or against the acceptance of a new proﬁle or trafﬁc. The aggregation of these partial votes amounts to the model-group decision. This voting process makes the architecture more resilient to attacks. Even after accepting a certain percentage of malicious devices, the enhanced NAC is able to compute an adequate decision. We provide proof-of-concept experiments of our architecture using web trafﬁc from our department network. Our results show that the model-group decision approach based on behavior proﬁles has a 99% detection rate of anomalous trafﬁc with a false positive rate of only 0.005%. Furthermore, the architecture achieves short latencies for both the pre- and post-connect phases.	(pdf)
Path-based Access Control for Enterprise Networks	Matthew Burnside, Angelos D. Keromytis	2008-03-27	Enterprise networks are ubiquitious and increasingly complex. The mechanisms for defining security policies in these networks have not kept up with the advancements in networking technology. In most cases, system administrators must define policies on a per-application basis, and subsequently, these policies do not interact. For example, there is no mechanism that allows a firewall to communicate decisions based on its ruleset to a web server behind it, even though decisions being made at the firewall may be relevant to decisions made at the web server. In this paper, we describe a path-based access control system which allows applications in a network to pass access-control-related information to neighboring applications, as the applications process requests from outsiders and from each other. This system defends networks against a class of attacks wherein individual applications may make correct access control decisions but the resulting network behavior is incorrect. We demonstrate the system on service-oriented architecture (SOA)-style networks, in two forms, using graph-based policies, and leveraging the KeyNote trust management system.	(pdf)
Tractability of multivariate approximation over a weighted unanchored Sobolev space: Smoothness sometimes hurts	Arthur G. Werschulz, Henryk Wozniakowski	2008-03-25	We study $d$-variate approximation for a weighted unanchored Sobolev space having smoothness $m\ge1$. Folk wisdom would lead us to believe that this problem should become easier as its smoothness increases. This is true if we are only concerned with asymptotic analysis: the $n$th minimal error is of order~$n^{-(m-\delta)}$ for any $\delta>0$. However, it is unclear how long we need to wait before this asymptotic behavior kicks in. How does this waiting period depend on $d$ and~$m$? We prove that no matter how the weights are chosen, the waiting period is at least~$m^d$, even if the error demand~$\varepsilon$ is arbitrarily close to~$1$. Hence, for $m\ge2$, this waiting period is exponential in~$d$, so that the problem suffers from the curse of dimensionality and is intractable. In other words, the fact that the asymptotic behavior improves with~$m$ is irrelevant when $d$~is large. So, we will be unable to vanquish the curse of dimensionality unless $m=1$, i.e., unless the smoothness is minimal. We then show that our problem \emph{can} be tractable if $m=1$. That is, we can find an $\varepsilon$-approximation using polynomially-many (in $d$ and~$\varepsilon^{-1}$) information operations, even if only function values are permitted. When $m=1$, it is even possible for the problem to be \emph{strongly} tractable, i.e., we can find an $\varepsilon$-approximation using polynomially-many (in~$\varepsilon^{-1}$) information operations, independent of~$d$. These positive results hold when the weights of the Sobolev space decay sufficiently quickly or are bounded finite-order weights, i.e., the $d$-variate functions we wish to approximate can be decomposed as sums of functions depending on at most~$\omega$ variables, where $\omega$ is independent of~$d$.	(pdf)
Spreadable Connected Autonomic Networks (SCAN)	Joshua Reich, Vishal Misra, Dan Rubestein, Gil Zussman	2008-03-24	A Spreadable Connected Autonomic Network (SCAN) is a mobile network that automatically maintains its own connectivity as nodes move. We envision SCANs to enable a diverse set of applications such as self-spreading mesh networks and robotic search and rescue systems. This paper describes our experiences developing a prototype robotic SCAN built from commercial, off-the-shelf hardware, to support such applications. A major contribution of our work is the development of a protocol, called SCAN1, which maintains network connectivity by enabling individual nodes to determine when they must constrain their mobility in order to avoid disconnecting the network. SCAN1 achieves its goal through an entirely distributed process in which individual nodes utilize only local (2-hop) knowledge of the network's topology to periodically make a simple decision: move, or freeze in place. Along with experimental results from our hardware testbed, we model SCAN1's performance, providing both supporting analysis and simulation for the efficacy of SCAN1 as a solution to enable SCANs. While our evaluation of SCAN1 in this paper is limited to systems whose capabilities match those of our testbed, SCAN1 can be utilized in conjunction with a wide-range of potential applications and environments, as either a primary or backup connectivity maintenance mechanism.	(pdf)
Leveraging Local Intra-Core Information to Increase Global Performance in Block-Based Design of Systems-on-Chip	Cheng-Hong Li, Luca P. Carloni	2008-03-18	Latency-insensitive design is a methodology for system-on-chip (SoC) design that simplifies the reuse of intellectual property cores and the implementation of the communication among them. This simplification is based on a system-level protocol that decouples the intra-core logic design from the design of the inter-core communication channels. Each core is encapsulated within a shell, a synthesized logic block that dynamically controls its operation to interface it with the rest of the SoC and to absorb any latency variations on its I/O signals. In particular, a shell stalls a core whenever new valid data are not available on the input channels or a down-link core has requested a delay in the data production on the output channels. We study how knowledge about the internal logic structure of a core can be applied to the design of its shell to improve the overall system-level performance by avoiding unnecessary local stalling. We introduce the notion of functional independence conditions (FIC) and present a novel circuit design of a generic shell template that can leverage FIC. We propose a procedure for the logic synthesis of a FIC-shell instance that is only based on the analysis of the intra-core logic and does not require any input from the designers. Finally, we present a comprehensive experimental analysis that shows the performance benefits and limited design overhead of the proposed technique. This includes the semi-custom design of an SoC, an ultra-wideband baseband transmitter, using a 90nm industrial standard cell library.	(pdf)
The Delay-Friendliness of TCP	Eli Brosh, Salman Baset, Vishal Misra, Dan Rubenstein, Henning Schulzrinne	2008-03-10	TCP has been traditionally considered unfriendly for real-time applications. Nonetheless, popular applications such as Skype use TCP due to the deployment of NATs and firewalls that prevent UDP traffic. Motivated by this observation we study the delay performance of TCP for real-time media flows. We develop an analytical performance model for the delay of TCP. We use extensive experiments to validate the model and to evaluate the impact of various TCP mechanisms on its delay performance. Based on our results, we derive the working region for VoIP and live video streaming applications and provide guidelines for delay-friendly TCP settings. Our research indicates that simple application-level schemes, such as packet splitting and parallel connections, can reduce the delay of real-time TCP flows by as much as 30\% and 90\%, respectively.	(pdf) (ps)
Properties of Machine Learning Applications for Use in Metamorphic Testing	Christian Murphy, Gail Kaiser, Lifeng Hu	2008-02-28	It is challenging to test machine learning (ML) applications, which are intended to learn properties of data sets where the correct answers are not already known. In the absence of a test oracle, one approach to testing these applications is to use metamorphic testing, in which properties of the application are exploited to define transformation functions on the input, such that the new output will be unchanged or can easily be predicted based on the original output; if the output is not as expected, then a defect must exist in the application. Here, we seek to enumerate and classify the metamorphic properties of some machine learning algorithms, and demonstrate how these can be applied to reveal defects in the applications of interest. In addition to the results of our testing, we present a set of properties that can be used to define these metamorphic relationships so that metamorphic testing can be used as a general approach to testing machine learning applications.	(pdf)
The Impact of SCTP on Server Scalability and Performance	Kumiko Ono, Henning Schulzrinne	2008-02-28	The Stream Control Transmission Protocol (SCTP) is a newer transport protocol, having additional features to TCP. Although SCTP is an alternative transport protocol for the Session Initiation Protocol (SIP), we do not know how SCTP features influence SIP server scalability and performance. To estimate this, we measured the scalability and performance of two servers, an echo server and a simplified SIP server on Linux, comparing to TCP. Our measurements found that using SCTP does not significantly affect on data latency: approximately 0.3 ms longer for the handshake than that for TCP. However, server scalability in terms of the number of sustainable associations drops to 17-21%, or to 43% of TCP if we adjust the acceptable gap size of unordered data delivery.	(pdf)
Optimal Splitters for Database Partitioning with Size Bounds	Kenneth A. Ross, John Cieslewicz	2008-02-27	Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(k lg_2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be extended to pairs of tables, so that joins can be partitioned into work units that have balanced cost. We demonstrate experimentally (a) that finding the optimal set of splitters can be done efficiently, and (b) that using the precomputed splitters can improve the time to sort a data set by up to 76%, with particular benefits in the presence of a few heavy hitters.	(pdf)
One Server Per City: Using TCP for Very Large SIP Servers	Kumiko Ono, Henning Schulzrinne	2008-02-26	The transport protocol for SIP can be chosen based on the requirements of services and network conditions. How does the choice of TCP affect the scalability and performance compared to UDP? We experimentally analyze the impact of using TCP as a transport protocol for a SIP server. We first investigate scalability of a TCP echo server, then compare performance of a SIP server for three TCP connection lifetimes: transaction, dialog, and persistent. Our results show that a Linux machine can establish 450,000+ TCP connections and maintaining connections does not affect the transaction response time. Additionally, the transaction response times using the three TCP connection lifetimes and UDP show no significant difference at 2,500 registration requests/second and at 500 call requests/second. However, sustainable request rate is lower for TCP than for UDP, since using TCP requires more message processing. More message processing causes longer delays at the thread queue for the server implementing a thread-pool model. Finally, we suggest how to reduce the impact of TCP for a scalable SIP server especially under overload control. This is applicable to other servers with very large connection counts.	(pdf)
Newspeak: A Secure Approach for Designing Web Applications	Kyle Dent, Steven M. Bellovin	2008-02-16	Internet applications are being used for more and more important business and personal purposes. Despite efforts to lock down web servers and isolate databases, there is an inherent problem in the web application architecture that leaves databases necessarily exposed to possible attack from the Internet. We propose a new design that removes the web server as a trusted component of the architecture and provides an extra layer of protection against database attacks. We have created a prototype system that demonstrates the feasibility of the new design.	(pdf)
Summary-Based Pointer Analysis Framework for Modular Bug Finding	Marcio O. Buss	2008-02-07	Modern society is irreversibly dependent on computers and, consequently, on software. However, as the complexity of programs increase, so does the number of defects within them. To alleviate the problem, automated techniques are constantly used to improve software quality. Static analysis is one such approach in which violations of correctness properties are searched and reported. Static analysis has many advantages, but it is necessarily conservative because it symbolically executes the program instead of using real inputs, and it considers all possible executions simultaneously. Being conservative often means issuing false alarms, or missing real program errors. Pointer variables are a challenging aspect of many languages that can force static analysis tools to be overly conservative. It is often unclear what variables are affected by pointer-manipulating expressions, and aliasing between variables is one of the banes of program analysis. To alleviate that, a common solution is to allow the programmer to provide annotations such as declaring a variable as unaliased in a given scope, or providing special constructs such as the ``never-null'' pointer of Cyclone. However, programmers rarely keep these annotations up-to-date. The solution is to provide some form of pointer analysis, which derives useful information about pointer variables in the program. An appropriate pointer analysis equips the static tool so that it is capable of reporting more errors without risking too many false alarms. This dissertation proposes a methodology for pointer analysis that is specially tailored for ``modular bug finding.'' It presents a new analysis space for pointer analysis, defined by finer-grain ``dimensions of precision,'' which allows us to explore and evaluate a variety of different algorithms to achieve better trade-offs between analysis precision and efficiency. This framework is developed around a new abstraction for computing points-to sets, the Assign-Fetch Graph, that has many interesting features. Empirical evaluation shows promising results, as some unknown errors in well-known applications were discovered.	(pdf)
SPARSE: A Hybrid System to Detect Malcode-Bearing Documents	Wei-Jen Li, Salvatore J. Stolfo	2008-01-31	Embedding malcode within documents provides a convenient means of penetrating systems which may be unreachable by network-level service attacks. Such attacks can be very targeted and difficult to detect compared to the typical network worm threat due to the multitude of document-exchange vectors. Detecting malcode embedded in a document is difficult owing to the complexity of modern document formats that provide ample opportunity to embed code in a myriad of ways. We focus on Microsoft Word documents as malcode carriers as a case study in this paper. We introduce a hybrid system that integrates static and dynamic techniques to detect the presence and location of malware embedded in documents. The system is designed to automatically update its detection models to improve accuracy over time. The overall hybrid detection system with a learning feedback loop is demonstrated to achieve a 99.27% detection rate and 3.16% false positive rate on a corpus of 6228 Word documents.	(pdf)
The In Vivo Approach to Testing Software Applications	Christian Murphy, Gail Kaiser, Matt Chu	2008-01-31	Software products released into the field typically have some number of residual bugs that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system. Testing approaches such as perpetual testing or continuous testing seek to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present our initial work towards a testing methodology we call in vivo testing, in which unit tests are continuously executed inside a running application in the deployment environment. These tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state. Our approach can reveal defects both in the applications of interest and in the unit tests themselves. It can also be used for detecting concurrency or robustness issues that may not have appeared in a testing lab. Here we describe the approach and the testing framework called Invite that we have developed for Java applications. We also enumerate the classes of bugs our approach can discover, and provide the results of a case study on a publicly-available application, as well as the results of experiments to measure the added overhead.	(pdf)
Mitigating the Effect of Free-Riders in BitTorrent using Trusted Agents	Alex Sherman, Angelos Stavrou, Jason Nieh, Cliff Stein	2008-01-25	Even though Peer-to-Peer (P2P) systems present a cost-effective and scalable solution to content distribution, most entertainment, media and software, content providers continue to rely on expensive, centralized solutions such as Content Delivery Networks. One of the main reasons is that the current P2P systems cannot guarantee reasonable performance as they depend on the willingness of users to contribute bandwidth. Moreover, even systems like BitTorrent, which employ a tit-for-tat protocol to encourage fair bandwidth exchange between users, are prone to free-riding (i.e. peers that do not upload). Our experiments on PlanetLab extend previous research (e.g. LargeViewExploit, BitTyrant) demonstrating that such selfish behavior can seriously degrade the performance of regular users in many more scenarios beyond simple free-riding: we observed an overhead of upto 430\% for 80\% of free-riding identities easily generated by a small set of selfish users. To mitigate the effects of selfish users, we propose a new P2P architecture that classifies peers with the help of a small number of {\em trusted nodes} that we call Trusted Auditors (TAs). TAs participate in P2P download like regular clients and detect free-riding identities by observing their neighbors' behavior. Using TAs, we can separate compliant users into a separate service pool resulting in better performance. Furthermore, we show that TAs are more effective ensuring the performance of the system than a mere increase in bandwidth capacity: for 80\% of free-riding identities a single-TA system has a 6\% download time overhead while without the TA and three times the bandwidth capacity we measure a 100\% overhead.	(pdf)
A Distance Learning Approach to Teaching eXtreme Programming	Christian Murphy, Dan Phung, Gail Kaiser	2008-01-23	As university-level distance learning programs become more and more popular, and software engineering courses incorporate eXtreme Programming (XP) into their curricula, certain challenges arise when teaching XP to students who are not physically co-located. In this paper, we present the results of a three-year study of such an online software engineering course targeted to graduate students, and describe some of the specific challenges faced, such as students’ aversion to aspects of XP and difficulties in scheduling. We discuss our findings in terms of the course’s educational objectives, and present suggestions to other educators who may face similar situations.	(pdf)
Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems	Rebecca Collins, Luca Carloni	2008-01-15	Latency-insensitive protocols allow system-on-chip engineers to decouple the design of the computing cores from the design of the inter-core communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS) each core is encapsulated within a shell, a synthesized interface module that dynamically controls its operation. At each clock period, if new data has not arrived on an input channel or a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing. We also practically characterize several LIS topologies and how the topology of a LIS can impact not only how much throughput degradation will occur, but also the difficulty of finding optimal queue sizing solutions.	(pdf)
LinkWidth: A Method to Measure Link Capacity and Available Bandwidth using Single-End Probes	Sambuddho Chakravarty, Angelos Stavrou, Angelos D. Keromytis	2008-01-05	We introduce LinkWidth, a method for estimating capacity and available bandwidth using single-end controlled TCP packet probes. To estimate capacity, we generate a train of TCP RST packets ``sandwiched'' between trains of TCP SYN packets. Capacity is computed from the end-to-end packet dispersion of the received TCP RST/ACK packets corresponding to the TCP SYN packets going to closed ports. Our technique is significantly different from the rest of the packet-pair based measurement techniques, such as {\em CapProbe,} {\em pathchar} and {\em pathrate,} because the long packet trains minimize errors due to bursty cross-traffic. Additionally, TCP RST packets do not generate additional ICMP replies, thus avoiding cross-traffic due to such packets from interfering with our probes. In addition, we use TCP packets for all our probes to prevent QoS-related traffic shaping (based on packet types) from affecting our measurements (eg. CISCO routers by default are known have to very high latency while generating to ICMP TTL expired replies). We extend the {\it Train of Packet Pairs} technique to approximate the available link capacity. We use a train of TCP packet pairs with variable intra-pair delays and sizes. This is the first attempt to implement this technique using single-end TCP probes, tested on a range of networks with different bottleneck capacities and cross traffic rates. The method we use for measuring from a single point of control uses TCP RST packets between a train of TCP SYN packets. The idea is quite similar to the technique for measuring the bottleneck capacity. We compare our prototype with {\em pathchirp,} {\em pathload,} {\em IPERF,} which require control of both ends as well as another single end controlled technique {\em abget}, and demonstrate that in most cases our method gives approximately the same results if not better.	(pdf)
Autotagging to Improve Text Search for 3D Models	Corey Goldfeder, Peter Allen	2008-01-02	Text search on 3D models has traditionally worked poorly, as text annotations on 3D models are often unreliable or incomplete. In this paper we attempt to improve the recall of text search by automatically assigning appropriate tags to models. Our algorithm finds relevant tags by appealing to a large corpus of partially labeled example models, which does not have to be preclassified or otherwise prepared. For this purpose we use a copy of Google 3DWarehouse, a database of user contributed models which is publicly available on the Internet. Given a model to tag, we find geometrically similar models in the corpus, based on distances in a reduced dimensional space derived from Zernike descriptors. The labels of these neighbors are used as tag candidates for the model with probabilities proportional to the degree of geometric similarity. We show experimentally that text based search for 3D models using our computed tags can work as well as geometry based search. Finally, we demonstrate our 3D model search engine that uses this algorithm and discuss some implementation issues.	(pdf)
Schema Polynomials and Applications	Kenneth A. Ross, Julia Stoyanovich	2007-12-17	Conceptual complexity is emerging as a new bottleneck as database developers, application developers, and database administrators struggle to design and comprehend large, complex schemas. The simplicity and conciseness of a schema depends critically on the idioms available to express the schema. We propose a formal conceptual schema representation language that combines different design formalisms, and allows schema manipulation that exposes the strengths of each of these formalisms. We demonstrate how the schema factorization framework can be used to generate relational, object-oriented, and faceted physical schemas, allowing a wider exploration of physical schema alternatives than traditional methodologies. We illustrate the potential practical benefits of schema factorization by showing that simple heuristics can significantly reduce the size of a real-world schema description. We also propose the use of schema polynomials to model and derive alternative representations for complex relationships with constraints.	(pdf)
A Recursive Data-Driven Approach to Programming Multicore Systems	Rebecca Collins, Luca Carloni	2007-12-05	In this paper, we propose a method to program divide-and-conquer problems on multicore systems that is based on a data-driven recursive programming model. Data intensive programs are difficult to program on multicore architectures because they require efficient utilization of inter-core communication. Models for programming multicore systems available today generally lack the ability to automatically extract concurrency from a sequential style program and map concurrent tasks to efficiently leverage data and temporal locality. For divide-and-conquer algorithms, a recursive programming model can address both of these problems. Furthermore, since a recursive function has the same behavior patterns at all granularities of a problem, the same recursive model can be used to implement a multicore program at all of its levels: 1. the operations of a single core, 2. how to distribute tasks among several cores, and 3. in what order to schedule tasks on a multicore system when it is not possible to schedule all of the tasks at the same time. We present a novel selective execution technique that can enable automatic parallelization and task mapping of a recursive program onto a multicore system. To verify the practicality of this approach, we perform a case-study of bitonic sort on the Cell BE processor.	(pdf)
Speech Enabled Avatar from a Single Photograph	Dmitri Bitouk, Shree K. Nayar	2007-11-25	This paper presents a complete framework for creating speech-enabled 2D and 3D avatars from a single image of a person. Our approach uses a generic facial motion model which represents deformations of the prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model is transformed to a novel face geometry using a set of corresponding points between the generic mesh and the novel face. In the case of a 2D avatar, a single photograph of the person is used as input. We manually select a small number of features on the photograph and these are used to deform the prototype surface. The deformed surface is then used to animate the photograph. In the case of a 3D avatar, we use a single stereo image of the person as input. The sparse geometry of the face is computed from this image and used to warp the prototype surface to obtain the complete 3D surface of the person's face. This surface is etched into a glass cube using sub-surface laser engraving (SSLE) technology. Synthesized facial animation videos are then projected onto the etched glass cube. Even though the etched surface is static, the projection of facial animation onto it results in a compelling experience for the viewer. We show several examples of 2D and 3D avatars that are driven by text and speech inputs.	(pdf)
Partial Evaluation for Code Generation from Domain-Specific Languages	Jia Zeng	2007-11-20	Partial evaluation has been applied to compiler optimization and generation for decades. Most of the successful partial evaluators have been designed for general-purpose languages. Our observation is that domain-specific languages are also suitable targets for partial evaluation. The unusual computational models in many DSLs bring challenges as well as optimization opportunities to the compiler. To enable aggressive optimization, partial evaluation has to be specialized to fit the specific paradigm of a DSL. In this dissertation, we present three such specialized partial evaluation techniques designed for specific languages that address a variety of compilation concerns. The first algorithm provides a low-cost solution for simulating concurrency on a single-threaded processor. The second enables a compiler to compile modest-sized synchronous programs in pieces that involve communication cycles. The third statically elaborates recursive function calls that enable programmers to dynamically create a system's concurrent components in a convenient and algorithmic way. Our goal is to demonstrate the potential of partial evaluation to solve challenging issues in code generation for domain-specific languages. Naturally, we do not cover all DSL compilation issues. We hope our work will enlighten and encourage future research on the application of partial evaluation to this area.	(pdf)
Distributed In Vivo Testing of Software Applications	Matt Chu, Christian Murphy, Gail Kaiser	2007-11-16	The in vivo software testing approach focuses on testing live applications by executing unit tests throughout the lifecycle, including after deployment. The motivation is that the “known state” approach of traditional unit testing is unrealistic; deployed applications rarely operate under such conditions, and it may be more informative to perform the testing in live environments. One of the limitations of this approach is the high performance cost it incurs, as the unit tests are executed in parallel with the application. Here we present distributed in vivo testing, which focuses on easing the burden by sharing the load across multiple instances of the application of interest. That is, we elevate the scope of in vivo testing from a single instance to a community of instances, all participating in the testing process. Our approach is different from prior work in that we are actively testing during execution, as opposed to passively monitoring the application or conducting tests in the user environment prior to execution. We discuss new extensions to the existing in vivo testing framework (called Invite) and present empirical results that show the performance overhead improves linearly with the number of clients.	(pdf)
Tractability of the Helmholtz equation with non-homogeneous Neumann boundary conditions: Relation to $L_2$-approximation	Arthur G. Werschulz	2007-11-08	We want to compute a worst case $\varepsilon$-approximation to the solution of the Helmholtz equation $-\Delta u+qu=f$ over the unit $d$-cube~$I^d$, subject to Neumann boundary conditions $\partial_\nu u=g$ on~$\partial I^d$. Let $\mathop{\rm card}(\varepsilon,d)$ denote the minimal number of evaluations of $f$, $g$, and~$q$ needed to compute an absolute or normalized $\varepsilon$-approximation, assuming that $f$, $g$, and~$q$ vary over balls of weighted reproducing kernel Hilbert spaces. This problem is said to be weakly tractable if $\mathop{\rm card}(\varepsilon,d)$ grows subexponentially in~$\varepsilon^{-1}$ and $d$. It is said to be polynomially tractable if $\mathop{\rm card}(\varepsilon,d)$ is polynomial in~$\varepsilon^{-1}$ and~$d$, and strongly polynomially tractable if this polynomial is independent of~$d$. We have previously studied tractability for the homogeneous version $g=0$ of this problem. In this paper, we investigate the tractability of the non-homogeneous problem, with general~$g$. First, suppose that we use product weights, in which the role of any variable is moderated by its particular weight. We then find that if the weight sum is sublinearly bounded, then the problem is weakly tractable; moreover, this condition is more or less necessary. We then show that the problem is polynomially tractable if the weight sum is logarithmically or uniformly bounded, and we estimate the exponents of tractability for these two cases. Next, we turn to finite-order weights of fixed order~$\omega$, in which a $d$-variate function can be decomposed as sum, each term depending on at most $\omega$~variables. We show that the problem is always polynomially tractable for finite-order weights, and we give estimates for the exponents of tractability. Since our results so far have established nothing stronger than polynomial tractability, we look more closely at whether strong polynomial tractability is possible. We show that our problem is never strongly polynomially tractable for the absolute error criterion. Moreover, we believe that the same is true for the normalized error criterion, but we have been able to prove this lack of strong tractability only when certain conditions hold on the weights. Finally, we use the Korobov- and min-kernels, along with product weights, to illustrate our results.	(pdf)
High Level Synthesis for Packet Processing Pipelines	Cristian Soviani	2007-10-30	Packet processing is an essential function of state-of-the-art network routers and switches. Implementing packet processors in pipelined architectures is a well-known, established technique, albeit different approaches have been proposed. The design of packet processing pipelines is a delicate trade-off between the desire for abstract specifications, short development time, and design maintainability on one hand and very aggressive performance requirements on the other. This thesis proposes a coherent design flow for packet processing pipelines. Like the design process itself, I start by introducing a novel domain-specific language that provides a high-level specification of the pipeline. Next, I address synthesizing this model and calculating its worst-case throughput. Finally, I address some specific circuit optimization issues. I claim, based on experimental results, that my proposed technique can dramatically improve the design process of these pipelines, while the resulting performance matches the expectations of hand-crafted design. The considered pipelines exhibit a pseudo-linear topology, which can be too restrictive in the general case. However, especially due to its high performance, such an architecture may be suitable for applications outside packet processing, in which case some of my proposed techniques could be easily adapted. Since I ran my experiments on FPGAs, this work has an inherent bias towards that technology; however, most results are technology-independent.	(pdf) (ps)
Generalized Tractability for Multivariate Problems	Michael Gnewuch, Henryk Wozniakowski	2007-10-30	\usepackage{amssymb} \begin{document} We continue the study of generalized tractability initiated in our previous paper ``Generalized tractability for multivariate problems, Part I: Linear tensor product problems and linear information'', J. Complexity, 23, 262-295 (2007). We study linear tensor product problems for which we can compute linear information which is given by arbitrary continuous linear functionals. We want to approximate an operator $S_d$ given as the $d$-fold tensor product of a compact linear operator $S_1$ for $d=1,2,\dots\,$, with $\\|S_1\\|=1$ and $S_1$ has at least two positive singular values. Let $n(\varepsilon,S_d)$ be the minimal number of information evaluations needed to approximate $S_d$ to within $\varepsilon\in[0,1]$. We study \emph{generalized tractability} by verifying when $n(\varepsilon,S_d)$ can be bounded by a multiple of a power of $T(\varepsilon^{-1},d)$ for all $(\varepsilon^{-1},d)\in\Omega \subseteq[1,\infty)\times \mathbb{N}$. Here, $T$ is a \emph{tractability} function which is non-decreasing in both variables and grows slower than exponentially to infinity. We study the \emph{exponent of tractability} which is the smallest power of $T(\varepsilon^{-1},d)$ whose multiple bounds $n(\varepsilon,S_d)$. We also study \emph{weak tractability}, i.e., when $\lim_{\varepsilon^{-1}+d\to\infty,(\varepsilon^{-1},d)\in\Omega} \ln\,n(\varepsilon,S_d)/(\varepsilon^{-1}+d)=0$. In our previous paper, we studied generalized tractability for proper subsets $\Omega$ of $[1,\infty)\times\mathbb{N}$, whereas in this paper we take the unrestricted domain $\Omega^{\rm unr}=[1,\infty)\times\mathbb{N}$. We consider the three cases for which we have only finitely many positive singular values of $S_1$, or they decay exponentially or polynomially fast. Weak tractability holds for these three cases, and for all linear tensor product problems for which the singular values of $S_1$ decay slightly faster that logarithmically. We provide necessary and sufficient conditions on the function~$T$ such that generalized tractability holds. These conditions are obtained in terms of the singular values of $S_1$ and mostly limiting properties of $T$. The tractability conditions tell us how fast $T$ must go to infinity. It is known that $T$ must go to infinity faster than polynomially. We show that generalized tractability is obtained for $T(x,y)=x^{1+\ln\,y}$. We also study tractability functions $T$ of product form, $T(x,y) =f_1(x)f_2(x)$. Assume that $a_i=\liminf_{x\to\infty}(\ln\,\ln f_i(x))/(\ln\,\ln\,x)$ is finite for $i=1,2$. Then generalized tractability takes place iff $$a_i>1 \ \ \mbox{and}\ \ (a_1-1)(a_2-1)\ge1,$$ and if $(a_1-1)(a_2-1)=1$ then we need to assume one more condition given in the paper. If $(a_1-1)(a_2-1)>1$ then the exponent of tractability is zero, and if $(a_1-1)(a_2-1)=1$ then the exponent of tractability is finite. It is interesting to add that for $T$ being of the product form, the tractability conditions as well as the exponent of tractability depend only on the second singular eigenvalue of $S_1$ and they do \emph{not} depend on the rate of their decay. Finally, we compare the results obtained in this paper for the unrestricted domain $\Omega^{\rm unr}$ with the results from our previous paper obtained for the restricted domain $\Omega^{\rm res}= [1,\infty)\times\{1,2,\dots,d^\}\,\cup\,[1,\varepsilon_0^{-1})\times\mathbb{N}$ with $d^\ge1$ and $\varepsilon_0\in(0,1)$. In general, the tractability results are quite different. We may have generalized tractability for the restricted domain and no generalized tractability for the unrestricted domain which is the case, for instance, for polynomial tractability $T(x,y)=xy$. We may also have generalized tractability for both domains with different or with the same exponents of tractability. \end{document}	(pdf)
Optimizing Frequency Queries for Data Mining Applications	Hassan Malik, John Kender	2007-10-27	Data mining algorithms use various Trie and bitmap-based representations to optimize the support (i.e., frequency) counting performance. In this paper, we compare the memory requirements and support counting performance of FP Tree, and Compressed Patricia Trie against several novel variants of vertical bit vectors. First, borrowing ideas from the VLDB domain, we compress vertical bit vectors using WAH encoding. Second, we evaluate the Gray code rank-based transaction reordering scheme, and show that in practice, simple lexicographic ordering, obtained by applying LSB Radix sort, outperforms this scheme. Led by these results, we propose HDO, a novel Hamming-distance-based greedy transaction reordering scheme, and aHDO, a linear-time approximation to HDO. We present results of experiments performed on 15 common datasets with varying degrees of sparseness, and show that HDO- reordered, WAH encoded bit vectors can take as little as 5% of the uncompressed space, while aHDO achieves similar compression on sparse datasets. Finally, with results from over a billion database and data mining style frequency query executions, we show that bitmap-based approaches result in up to hundreds of times faster support counting, and HDO-WAH encoded bitmaps offer the best space-time tradeoff.	(pdf)
Automated Social Hierarchy Detection through Email Network Analysis	Ryan Rowe, German Creamer, Shlomo Heshkop, Sal Stolfo	2007-10-17	We present our work on automatically extracting social hierarchies from electronic communication data. Data mining based on user behavior can be leveraged to analyze and catalog patterns of communications between entities to rank relationships. The advantage is that the analysis can be done in an automatic fashion and can adopt itself to organizational changes over time. We illustrate the algorithms over real world data using the Enron corporation's email archive. The results show great promise when compared to the corporations work chart and judicial proceeding analyzing the major players.	(pdf)
A New Framework for Unsupervised Semantic Discovery	Barry Schiffman	2007-10-16	This paper presents a new framework for the unsupervised discovery of semantic information, using a divide-and-conquer approach to take advantage of contextual regularities and to avoid problems of polysemy and sublanguages. Multiple sets of documents are formed and analyzed to create multiple sets of frames. The overall procedure is wholly unsupervised and domain independent. The end result will be a collection of sets of semantic frames that will be useful in a wide range of applications, including question-answering, information extraction, summarization and text generation.	(pdf)
Towards In Vivo Testing of Software Applications	Christian Murphy, Gail Kaiser, Matt Chu	2007-10-15	Software products released into the field typically have some number of residual bugs that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system. Testing approaches such as perpetual testing or continuous testing seek to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present our initial work towards a testing methodology we call “in vivo testing”, in which unit tests are continuously executed inside a running application in the deployment environment. In this novel approach, unit tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state. Our approach has been shown to reveal defects both in the applications of interest and in the unit tests themselves. It can also be used for detecting concurrency or robustness issues that may not have appeared in a testing lab. Here we describe the approach, the testing framework we have developed for Java applications, classes of bugs our approach can discover, and the results of experiments to measure the added overhead.	(pdf)
Experiences in Teaching eXtreme Programming in a Distance Learning Program	Christian Murphy, Dan Phung, Gail Kaiser	2007-10-12	As university-level distance learning programs become more and more popular, and software engineering courses incorporate eXtreme Programming (XP) into their curricula, certain challenges arise when teaching XP to students who are not physically co-located. In this paper, we present our experiences and observations from managing such an online software engineering course, and describe some of the specific challenges we faced, such as students’ aversion to using XP and difficulties in scheduling. We also present some suggestions to other educators who may face similar situations.	(pdf)
BARTER: Profile Model Exchange for Behavior-Based Access Control and Communication Security in MANETs	Vanessa Frias-Martinez, Salvatore J. Stolfo, Angelos D. Keromytis	2007-10-10	There is a considerable body of literature and technology that provides access control and security of communication for Mobile Ad-hoc Networks (MANETs) based on cryptographic authentication technologies and protocols. We introduce a new method of granting access and securing communication in a MANET environment to augment, not replace, existing techniques. Previous approaches grant access to the MANET, or to its services, merely by means of an authenticated identity or a qualified role. We present BARTER, a framework that, in addition, requires nodes to exchange a model of their behavior to grant access to the MANET and to assess the legitimacy of their subsequent communication. This framework forces the nodes not only to say who or what they are, but also how they behave. BARTER will continuously run membership acceptance and update protocols to give access to and accept traffic only from nodes whose behavior model is considered ``normal'' according to the behavior model of the nodes in the MANET. We implement and experimentally evaluate the merger between BARTER and other cryptographic technologies and show that BARTER can implement a fully distributed automatic access control and update with small cryptographic costs. Although the methods proposed involve the use of content-based anomaly detection models, the generic infrastructure implementing the methodology may utilize any behavior model. Even though the experiments are implemented for MANETs, the idea of model exchange for access control can be applied to any type of network.	(pdf)
Post-Patch Retraining for Host-Based Anomaly Detection	Michael E. Locasto, Gabriela F. Cretu, Shlomo Hershkop, Angelos Stavrou	2007-10-05	Applying patches, although a disruptive activity, remains a vital part of software maintenance and defense. When host-based anomaly detection (AD) sensors monitor an application, patching the application requires a corresponding update of the sensor's behavioral model. Otherwise, the sensor may incorrectly classify new behavior as malicious (a false positive) or assert that old, incorrect behavior is normal (a false negative). Although the problem of ``model drift'' is an almost universally acknowledged hazard for AD sensors, relatively little work has been done to understand the process of re-training a ``live'' AD model --- especially in response to legal behavioral updates like vendor patches or repairs produced by a self-healing system. We investigate the feasibility of automatically deriving and applying a ``model patch'' that describes the changes necessary to update a ``reasonable'' host-based AD behavioral model ({\it i.e.,} a model whose structure follows the core design principles of existing host--based anomaly models). We aim to avoid extensive retraining and regeneration of the entire AD model when only parts may have changed --- a task that seems especially undesirable after the exhaustive testing necessary to deploy a patch.	(pdf) (ps)
Privacy-Enhanced Searches Using Encrypted Bloom Filters	Steven M. Bellovin, William R. Cheswick	2007-09-25	It is often necessary for two or more or more parties that do not fully trust each other to share data selectively. For example, one intelligence agency might be willing to turn over certain documents to another such agency, but only if the second agency requests the specific documents. The problem, of course, is finding out that such documents exist when access to the database is restricted. We propose a search scheme based on Bloom filters and group ciphers such as Pohlig-Hellman encryption. A semi-trusted third party can transform one party's search queries to a form suitable for querying the other party's database, in such a way that neither the third party nor the database owner can see the original query. Furthermore, the encryption keys used to construct the Bloom filters are not shared with this third party. Multiple providers and queriers are supported; provision can be made for third-party ``warrant servers'', as well as ``censorship sets'' that limit the data to be shared.	(pdf)
Service Composition in a Global Service Discovery System	Knarig Arabshian, Christian Dickmann, Henning Schulzrinne	2007-09-17	GloServ is a global service discovery system which aggregates information about different types of services in a globally distributed network. GloServ classifies services in an ontology and maps knowledge obtained by the ontology onto a scalable hybrid hierarchical peer-to-peer network. The network mirrors the semantic relationships of service classes and as a result, reduces the number of message hops across the global network due to the domain-specific way services are distributed. Also, since services are described in greater detail, due to the ontology representation, greater reasoning is applied when querying and registering services. In this paper, we describe an enhancement to the GloServ querying mechanism which allows GloServ servers to process and issue subqueries between servers of different classes. Thus, information about different service classes may be queried for in a single query and issued directly from the front end, creating an extensible platform for service composition. The results are then aggregated and presented to the user such that services which share an attribute are categorized together. We have built and evaluated a location-based web service discovery prototype which demonstrates the flexibility of service composition in GloServ and discuss the design and evaluation of this system. Keywords: service discovery, ontologies, OWL, CAN, peer-to-peer, web service composition	(pdf)
Using boosting for automated planning and trading systems	German Creaer	2007-09-15	The problem: Much of finance theory is based on the efficient market hypothesis. According to this hypothesis, the prices of financial assets, such as stocks, incorporate all information that may affect their future performance. However, the translation of publicly available information into predictions of future performance is far from trivial. Making such predictions is the livelihood of stock traders, market analysts, and the like. Clearly, the efficient market hypothesis is only an approximation which ignores the cost of producing accurate predictions. Markets are becoming more efficient and more accessible because of the use of ever faster methods for communicating and analyzing financial data. Algorithms developed in machine learning can be used to automate parts of this translation process. In other words, we can now use machine learning algorithms to analyze vast amounts of information and compile them to predict the performance of companies, stocks, or even market analysts. In financial terms, we would say that such algorithms discover inefficiencies in the current market. These discoveries can be used to make a profit and, in turn, reduce the market inefficiencies or support strategic planning processes. Relevance: Currently, the major stock exchanges such as NYSE and NASDAQ are transforming their markets into electronic financial markets. Players in these markets must process large amounts of information and make instantaneous investment decisions. Machine learning techniques help investors and corporations recognize new business opportunities or potential corporate problems in these markets. With time, these techniques help the financial market become better regulated and more stable. Also, corporations could save significant amount of resources if they can automate certain corporate finance functions such as planning and trading. Results: This dissertation offers a novel approach to using boosting as a predictive and interpretative tool for problems in finance. Even more, we demonstrate how boosting can support the automation of strategic planning and trading functions. Many of the recent bankruptcy scandals in publicly held US companies such as Enron and WorldCom are inextricably linked to the conflict of interest between shareholders (principals) and managers (agents). We evaluate this conflict in the case of Latin American and US companies. In the first part of this dissertation, we use Adaboost to analyze the impact of corporate governance variables on performance. In this respect, we present an algorithm that calculates alternating decision trees (ADTs), ranks variables according to their level of importance, and generates representative ADTs. We develop a board Balanced Scorecard (BSC) based on these representative ADTs which is part of the process to automate the planning functions. In the second part of this dissertation we present three main algorithms to improve forecasting and automated trading. First, we introduce a link mining algorithm using a mixture of economic and social network indicators to forecast earnings surprises, and cumulative abnormal return. Second, we propose a trading algorithm for short-term technical trading. The algorithm was tested in the context of the Penn-Lehman Automated Trading Project (PLAT) competition using the Microsoft stock. The algorithm was profitable during the competition. Third, we present a multi-stock automated trading system that includes a machine learning algorithm that makes the prediction, a weighting algorithm that combines the experts, and a risk management layer that selects only the strongest prediction and avoids trading when there is a history of negative performance. This algorithm was tested with 100 randomly selected S&P 500 stocks. We find that even an efficient learning algorithm, such as boosting, still requires powerful control mechanisms in order to reduce unnecessary and unprofitable trades that increase transaction costs.	(pdf)
Oblivious Image Matching	Shai Avidan, Ariel Elbaz, Tal Malkin, Ryan Moriarty	2007-09-13	We present the problem of Oblivious Image Matching, where two parties want to determine whether they have images of the same object or scene, without revealing any additional information. While image matching has attracted a great deal of attention in the computer vision community, it was never treated in a cryptographic sense. In this paper we study the private version of the problem, oblivious image matching, and provide an efficient protocol for it. In doing so, we design a novel image matching algorithm, and a few private protocols that may be of independent interest. Specifically, we first show how to reduce the image matching problem to a two-level version of the fuzzy set matching problem, and then present a novel protocol to privately compute this (and several other) matching problems.	(pdf)
OpenTor: Anonymity as a Commodity Service	Elli Androulaki, Mariana Raykova, Angelos Stavrou, Steven Bellovin	2007-09-13	Despite the growth of the Internet and the increasing concern for privacy of online communications, current deployments of anonymization networks depends on a very small set of nodes that volunteer their bandwidth. We believe that the main reason is not disbelief in their ability to protect anonymity, but rather the practical limitations in bandwidth and latency that stem from limited participation. This limited participation, in turn, is due to a lack of incentives. We propose providing economic incentives, which historically have worked very well. In this technical report, we demonstrate a payment scheme that can be used to compensate nodes which provide anonymity in Tor, an existing onion routing, anonymizing network. We show that current anonymous payment schemes are not suitable and introduce a hybrid payment system based on a combination of the Peppercoin Micropayment system and a new type of ``one use'' electronic cash. Our system claims to maintain users' anonymity, although payment techniques mentioned previously --- when adopted individually --- provably fail.	(pdf)
Reputation Systems for Anonymous Networks	Elli Androulaki, Seung Geol Choi, Steven M. Bellovin, Tal G. Malkin	2007-09-12	We present a reputation scheme for a pseudonymous peer-to-peer (P2P) system in an anonymous network. Misbehavior is one of the biggest problems in pseudonymous P2P systems, where there is little incentive for proper behavior. In our scheme, using ecash for reputation points, the reputation of each user is closely related to his real identity rather than to his current pseudonym. Thus, our scheme allows an honest user to switch to a new pseudonym keeping his good reputation, while hindering a malicious user from erasing his trail of evil deeds with a new pseudonym.	(pdf)
A Study of Malcode-Bearing Documents	Wei-Jen Li, Salvatore Stolfo, Angelos Stavrou, Elli Androulaki, Angelos D. Keromytis	2007-09-07	By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party applications that may harbor exploitable vulnerabilities otherwise unreachable by network-level service attacks. Such attacks can be very selective and difficult to detect compared to the typical network worm threat, owing to the complexity of these applications and data formats, as well as the multitude of document-exchange vectors. As a case study, this paper focuses on Microsoft Word documents as malcode carriers. We investigate the possibility of detecting embedded malcode in Word documents using two techniques: static content analysis using statistical models of typical document content, and run-time dynamic tests on diverse platforms. The experiments demonstrate these approaches can not only detect known malware, but also most zero-day attacks. We identify several problems with both approaches, representing both challenges in addressing the problem and opportunities for future research.	(pdf)
Backstop: A Tool for Debugging Runtime Errors	Christian Murphy, Eunhee Kim, Gail Kaiser, Adam Cannon	2007-09-06	The errors that Java programmers are likely to encounter can roughly be categorized into three groups: compile-time (semantic and syntactic), logical, and runtime (exceptions). While much work has focused on the first two, there are very few tools that exist for interpreting the sometimes cryptic messages that result from runtime errors. Novice programmers in particular have difficulty dealing with uncaught exceptions in their code and the resulting stack traces, which are by no means easy to understand. We present Backstop, a tool for debugging runtime errors in Java applications. This tool provides more user-friendly error messages when an uncaught exception occurs, but also provides debugging support by allowing users to watch the execution of the program and the changes to the values of variables. We also present the results of two studies conducted on introductory-level programmers using the two different features of the tool.	(pdf)
RAS-Models: A Building Block for Self-Healing Benchmarks	Rean Griffith, Ritika Virmani, Gail Kaiser	2007-09-01	To evaluate the efficacy of self-healing systems a rigorous, objective, quantitative benchmarking methodology is needed. However, developing such a benchmark is a non-trivial task given the many evaluation issues to be resolved, including but not limited to: quantifying the impacts of faults, analyzing various styles of healing (reactive, preventative, proactive), accounting for partially automated healing and accounting for incomplete/imperfect healing. We posit, however,that it is possible to realize a self-healing benchmark using a collection of analytical techniques and practical tools as building blocks. This paper highlights the flexibility of one analytical tool, the Reliability, Availability and Serviceability (RAS) model, and illustrates its power and relevance to the problem of evaluating self-healing mechanisms/systems, when combined with practical tools for fault-injection.	(pdf)
A Precomputed Polynomial Representation for Interactive BRDF Editing with Global Illumination	Aner Ben-Artzi, Kevin Egan, Fredo Durand, Ravi Ramamoorthi	2007-07-31	The ability to interactively edit BRDFs in their final placement within a computer graphics scene is vital to making informed choices for material properties. We significantly extend previous work on BRDF editing for static scenes (with fixed lighting and view), by developing a precomputed polynomial representation that enables interactive BRDF editing with global illumination. Unlike previous recomputation based rendering techniques, the image is not linear in the BRDF when considering interreflections. We introduce a framework for precomputing a multi-bounce tensor of polynomial coefficients, that encapsulates the nonlinear nature of the task. Significant reductions in complexity are achieved by leveraging the low-frequency nature of indirect light. We use a high-quality representation for the BRDFs at the first bounce from the eye, and lower-frequency (often diffuse) versions for further bounces. This approximation correctly captures the general global illumination in a scene, including color-bleeding, near-field object reflections, and even caustics. We adapt Monte Carlo path tracing for precomputing the tensor of coefficients for BRDF basis functions. At runtime, the high-dimensional tensors can be reduced to a simple dot product at each pixel for rendering. We present a number of examples of editing BRDFs in complex scenes, with interactive feedback rendered with global illumination.	(pdf)
Parameterizing Random Test Data According to Equivalence Classes	Christian Murphy, Gail Kaiser, Marta Arias	2007-07-12	We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. Random testing is one solution, but may have limited effectiveness in cases in which a reliable test oracle does not exist, as is the case of the machine learning applications of interest. To address this problem, we have developed an approach to creating data sets called “parameterized random data generation”. Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications.	(pdf)
The Delay-Friendliness of TCP	Salman Abdul Baset, Eli Brosh, Vishal Misra, Dan Rubenstein, Henning Schulzrinne	2007-06-30	Traditionally, TCP has been considered unfriendly for real-time applications. Nonetheless, popular applications such as Skype use TCP due to the deployment of NATs and firewalls that prevent UDP traffic. This observation motivated us to study the delay performance of TCP for real-time media flows using an analytical model and experiments. The results obtained yield the working region for VoIP and live video streaming applications and guidelines for delay-friendly TCP settings. Further, our research indicates that simple application-level schemes, such as packet splitting and parallel connections, can significantly improve the delay performance of real-time TCP flows.	(pdf)
STAND: Sanitization Tool for ANomaly Detection	Gabriela F. Cretu, Angelos Stavrou, Slavatore J. Stolfo, Angelos D. Keromytis	2007-05-30	The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Arti- ficial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these “micro-models” to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as “attack-free” and “regular” as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site.	(pdf)
The Role of Reliability, Availability and Serviceability (RAS) Models in the Design and Evaluation of Self-Healing Systems	Rean Griffith, Ritika Virmani, Gail Kaiser	2007-04-10	In an idealized scenario, self-healing systems predict, prevent or diagnose problems and take the appropriate actions to mitigate their impact with minimal human intervention. To determine how close we are to reaching this goal we require analytical techniques and practical approaches that allow us to quantify the effectiveness of a system’s remediations mechanisms. In this paper we apply analytical techniques based on Reliability, Availability and Serviceability (RAS) models to evaluate individual remediation mechanisms of select system components and their combined effects on the system. We demonstrate the applicability of RAS-models to the evaluation of self-healing systems by using them to analyze various styles of remediations (reactive, preventative etc.), quantify the impact of imperfect remediations, identify sub-optimal (less effective) remediations and quantify the combined effects of all the activated remediations on the system as a whole.	(pdf)
Aequitas: A Trusted P2P System for Paid Content Delivery	Alex Sherman, Japinder Chawla, Jason Nieh, Cliff Stein, Justin Sarma	2007-03-30	P2P file-sharing has been recognized as a powerful and efficient distribution model due to its ability to leverage users' upload bandwidth. However, companies that sell digital content on-line are hesitant to rely on P2P models for paid content distribution due to the free file-sharing inherent in P2P models. In this paper we present Aequitas, a P2P system in which users share paid content anonymously via a layer of intermediate nodes. We argue that with the extra anonymity in Aequitas, vendors could leverage P2P bandwidth while effectively maintaining the same level of trust towards their customers as in traditional models of paid content distribution. As a result, a content provider could reduce its infrastructure costs and subsequently lower the costs for the end-users. The intermediate nodes are incentivized to contribute their bandwidth via electronic micropayments. We also introduce techniques that prevent the intermediate nodes from learning the content of the files they help transmit. In this paper we present the design of our system, an analysis of its properties and an implementation and experimental evaluation. We quantify the value of the intermediate nodes, both in terms of efficiency and their effect on anonoymity. We argue in support of the economic and technological merits of the system.	(pdf)
Can P2P Replace Direct Download for Content Distribution	Alex Sherman, Angelos Stavrou, Jason Nieh, Cliff Stein, Angelos Keromytis	2007-03-30	While peer-to-peer (P2P) file-sharing is a powerful and cost-effective content distribution model, most paid-for digital-content providers (CPs) rely on direct download to deliver their content. CPs such as Apple iTunes that command a large base of paying users are hesitant to use a P2P model that could easily degrade their user base into yet another free file-sharing community. We present TP2, a system that makes P2P file sharing a viable delivery mechanism for paid digital content by providing the same security properties as the currently used direct-download model.} introduces the novel notion of trusted auditors (TAs) -- P2P peers that are controlled by the system operator. TAs monitor the behavior of other peers and help detect and prevent formation of illegal file-sharing clusters among the CP's user base. TAs both complement and exploit the strong authentication and authorization mechanisms that are used in TP2 to control access to content. It is important to note that TP2 does not attempt to solve the out-of-band file-sharing or DRM problems, which also exist in the direct-download systems currently in use. We analyze TP2 by modeling it as a novel game between misbehaving users who try to form unauthorized file-sharing clusters and TAs who curb the growth of such clusters. Our analysis shows that a small fraction of TAs is sufficient to protect the P2P system against unauthorized file sharing. In a system with as many as 60\% of misbehaving users, even a small fraction of TAs can detect 99\% of unauthorized cluster formation. We developed a simple economic model to show that even with such a large fraction of malicious nodes, TP2 can improve CP's profits (which could translate to user savings) by 62 to 122\%, even while assuming conservative estimates of content and bandwidth costs. We implemented TP2 as a layer on top of BitTorrent and demonstrated experimentally using PlanetLab that our system provides trusted P2P file sharing with negligible performance overhead.	(pdf)
Policy Algebras for Hybrid Firewalls	Hang Zhao, Steven M. Bellovin	2007-03-21	Firewalls are a effective means of protecting a local system or network of systems from network-based security threats. In this paper, we propose a policy algebra framework for security policy enforcement in hybrid firewalls, ones that exist both in the network and on end systems. To preserve the security semantics, the policy algebras provide a formalism to compute addition, conjunction, subtraction, and summation on rule sets; it also defines the cost and risk functions associated with policy enforcement. Policy outsourcing triggers global cost minimization. We show that our framework can easily be extended to support packet filter firewall policies. Finally, we discuss special challenges and requirements for applying the policy algebra framework to MANETs.	(pdf)
The PBS Policy: Some Properties and Their Proofs	Hanhua Feng, Vishal Misra, Dan Rubenstein	2007-03-20	In this report we analyze a configurable blind scheduler containing a continuous, tunable parameter. After the definition of this policy, we prove the property of no surprising interruption, the property of no permanent starvation, and two theorems about monotonicity of this policy. This technical report contains supplemental materials for the following publication: Hanhua Feng, Vishal Misra, and Dan Rubenstein, "PBS: A unified priority-based scheduler", Proceedings of ACM SIGMETRICS '07, 2007.	(pdf) (ps)
An Approach to Software Testing of Machine Learning Applications	Christian Murphy, Gail Kaiser, Marta Arias	2007-03-19	Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test such ML software, because there is no reliable test oracle. We describe a software testing approach aimed at addressing this problem. We present our findings from testing implementations of two different ML ranking algorithms: Support Vector Machines and MartiRank.	(pdf)
Design, Implementation, and Validation of a New Class of Interface Circuits for Latency-Insensitive Design	Cheng-Hong Li, Rebecca Collins, Sampada Sonalkar, Luca P. Carloni	2007-03-05	With the arrival of nanometer technologies wire delays are no longer negligible with respect to gate delays, and timing-closure becomes a major challenge to System-on-Chip designers. Latency-insensitive design (LID) has been proposed as a "correct-by-construction" design methodology to cope with this problem. In this paper we present the design and implementation of a new and more efficient class of interface circuits to support LID. Our design offers substantial improvements in terms of logic delay over the design originally proposed by Carloni et al. [1] as well as in terms of both logic delay and processing throughput over the synchronous elastic architecture (SELF) recently proposed by Cortadella et al. [2]. These claims are supported by the experimental results that we obtained completing semi-custom implementations of the three designs with a 90nm industrial standard-cell library. We also report on the formal verification of our design: using the NuSMV model checker we verified that the RTL synthesizable implementations of our LID interface circuits (relay stations and shells) are correct refinements of the corresponding abstract specifications according to the theory of LID [3].	(pdf)
Evaluating Software Systems via Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models	Rean Griffith	2007-02-28	The most common and well-understood way to evaluate and compare computing systems is via performance-oriented benchmarks. However, numerous other demands are placed on computing systems besides speed. Current generation and next generation computing systems are expected to be reliable, highly available, easy to manage and able to repair faults and recover from failures with minimal human intervention. The extra-functional requirements concerned with reliability, high availability, and serviceability (manageability, repair and recovery) represent an additional set of high-level goals the system is expected to meet or exceed. These goals govern the system’s operation and are codified using policies and service level agreements (SLAs). To satisfy these extra-functional requirements, system-designers explore or employ a number of mechanisms geared towards improving the system’s reliability, availability and serviceability (RAS) characteristics. However, to evaluate these mechanisms and their impact, we need something more than performance metrics. Performance-measures are suitable for studying the feasibility of the mechanisms i.e. they can be used to conclude that the level of performance delivered by the system with these mechanisms active does not preclude its usage. However, performance numbers convey little about the efficacy of the systems RAS-enhancing mechanisms. Further, they do not allow us to analyze the (expected or actual) impact of individual mechanisms or make comparisons/discuss tradeoffs between mechanisms. What is needed is an evaluation methodology that is able to analyze the details of the RAS-enhancing mechanisms – the micro-view as well as the high-level goals, expressed as policies, SLAs etc., governing the system’s operation – the macro-view. Further, we must establish a link between the details of the mechanisms and their impact on the high-level goals. This thesis is concerned with developing the tools and applying analytical techniques to enable this kind of evaluation. We make three contributions. First, we contribute to a suite of runtime fault-injection tools with Kheiron. Kheiron demonstrates a feasible, low-overhead, transparent approach to performing system-adaptations in a variety of execution environments at runtime. We use Kheiron’s runtime-adaptation capability to inject faults into running programs. We present three implementations of Kheiron, each targeting a different execution environment. Kheiron/C manipulates compiled C-programs running in an unmanaged execution environment – comprised of the operating system and the underlying processor. Kheiron/CLR manipulates programs running in Microsoft’s Common Language Runtime (CLR) and Kheiron/JVM manipulates programs running in Sun Microsystems’ Java Virtual Machine (JVM). Kheiron’s operation is transparent to both the application and the execution environment. Further, the overheads imposed by Kheiron on the application and the execution environment are negligible, <5%, when no faults are being injected. Second, we describe analytical techniques based on RAS-models, represented as Markov chains and Markov reward models, to demonstrate their power in evaluating RAS-mechanisms and their impact on the high-level goals governing system-operation. We demonstrate the flexibility of these models in evaluating reactive, proactive and preventative mechanisms as well as their ability to explore the feasibility of yet-to-be-implemented mechanisms. Our analytical techniques focus on remediations rather than observed mean time to failures (MTTF). Unlike hardware, where the laws of physics govern the failure rates of mechanical and electrical parts, there are no such guarantees for software failure rates. Software failure-rates can however be influenced using fault-injection, which we employ in our experiments. In our analysis we consider a number of facets of remediations, which include, but go beyond mean time to recovery (MTTR). For example we consider remediation success rates, the (expected) impact of preventative-maintenance and the degradation-impact of remediations in our efforts to establish a framework for reasoning about the tradeoffs (the costs versus the benefits) of various remediation mechanisms. Finally, we distill our experiences developing runtime fault-injection tools, performing fault-injection experiments and constructing and analyzing RAS-models into a 7-step process for evaluating computing systems – the 7U-evaluation methodology. Our evaluation method succeeds in establishing the link between the details of the low-level mechanisms and the high-level goals governing the system’s operation. It also highlights the role of environmental constraints and policies in establishing meaningful criteria for scoring and comparing these systems and their RAS-enhancing mechanisms.	(pdf)
Privacy-Preserving Distributed Event Corroboration	Janak J. Parekh	2007-02-26	Event correlation is a widely-used data processing methodology for a broad variety of applications, and is especially useful in the context of distributed monitoring for software faults and vulnerabilities. However, most existing solutions have typically been focused on "intra-organizational" correlation; organizations typically employ privacy policies that prohibit the exchange of information outside of the organization. At the same time, the promise of "inter-organizational" correlation is significant given the broad availability of Internet-scale communications, and its potential role in both software fault maintenance and software vulnerability detection. In this thesis, I present a framework for reconciling these opposing forces via the use of privacy preservation integrated into the event processing framework. I introduce the notion of event corroboration, a reduced yet flexible form of correlation that enables collaborative verification, without revealing sensitive information. By accommodating privacy policies, we enable the corroboration of data across different organizations without actually releasing sensitive information. The framework supports both source anonymity and data privacy, yet allows for temporal corroboration of a broad variety of data. The framework is designed as a lightweight collection of components to enable integration with existing COTS platforms and distributed systems. I also present an implementation of this framework: Worminator, a collaborative Intrusion Detection System, based on an earlier platform, XUES (XML Universal Event Service), an event processor used as part of a software monitoring platform called KX (Kinesthetics eXtreme). KX comprised a series of components, connected together with a publish-subscribe content-based routing event subsystem, for the autonomic software monitoring, reconfiguration, and repair of complex distributed systems. Sensors were installed in legacy systems; XUES' two modules then performed event processing on sensor data: information was collected and processed by the Event Packager, and correlated using the Event Distiller. While XUES itself was not privacy-preserving, it laid the groundwork for this thesis by supporting event typing, the use of publish-subscribe and extensibility support via pluggable event transformation modules. I also describe techniques by which corroboration and privacy preservation could optionally be "retrofitted" onto XUES without breaking the correlation applications and scenarios described. Worminator is a ground-up rewrite of the XUES platform to fully support privacy-preserving event types and algorithms in the context of a Collaborative Intrusion Detection System (CIDS), whereby sensor alerts can be exchanged and corroborated without revealing sensitive information about a contributor's network, services, or even external sources, as required by privacy policies. Worminator also fully anonymizes source information, allowing contributors to decide their preferred level of information disclosure. Worminator is implemented as a monitoring framework on top of a collection of non-collaborative COTS and in-house IDS sensors, and demonstrably enables the detection of not only worms but also "broad and stealthy" scans; traditional single-network sensors either bury such scans in large volumes or miss them entirely. Worminator supports corroboration for packet and flow headers (metadata), packet content, and even aggregate models of network traffic using a variety of techniques. The contributions of this thesis include the development of a cross-application-domain event processing framework with native privacy-preserving types, the use and validation of privacy-preserving corroboration, and the establishment of a practical deployed collaborative security system. The thesis also quantifies Worminator's effectiveness at attack detection, the overhead of privacy preservation and the effectiveness of our approach against adversaries, be they "honest-but-curious" or actively malicious.	(pdf)
Distributed Algorithms for Secure Multipath Routing in Attack-Resistant Networks	Patrick Pak-Ching Lee, Vishal Misra, Dan Rubenstein	2007-02-16	To proactively defend against intruders from readily jeopardizing single-path data sessions, we propose a {\em distributed secure multipath solution} to route data across multiple paths so that intruders require much more resources to mount successful attacks. Our work exhibits several important properties that include: (1) routing decisions are made locally by network nodes without the centralized information of the entire network topology, (2) routing decisions minimize throughput loss under a single-link attack with respect to different session models, and (3) routing decisions address multiple link attacks via lexicographic optimization. We devise two algorithms termed the {\em Bound-Control algorithm} and the {\em Lex-Control algorithm}, both of which provide provably optimal solutions. Experiments show that the Bound-Control algorithm is more effective to prevent the worst-case single-link attack when compared to the single-path approach, and that the Lex-Control algorithm further enhances the Bound-Control algorithm by countering severe single-link attacks and various types of multi-link attacks. Moreover, the Lex-Control algorithm offers prominent protection after only a few execution rounds, implying that we can sacrifice minimal routing protection for significantly improved algorithm performance. Finally, we examine the applicability of our proposed algorithms in a specialized defensive network architecture called the attack-resistant network and analyze how the algorithms address resiliency and security in different network settings.	(pdf)
MutaGeneSys: Making Diagnostic Predictions Based on Genome-Wide Genotype Data in Association Studies	Julia Stoyanovich, Itsik Pe'er	2007-02-16	Summary: We present MutaGeneSys: a system that uses genomewide genotype data for disease prediction. Our system integrates three data sources: the International HapMap project, whole-genome marker correlation data and the Online Mendelian Inheritance in Man (OMIM) database. It accepts SNP data of individuals as query input and delivers disease susceptibility hypotheses even if the original set of typed SNPs is incomplete. Our system is scalable and flexible: it operates in real time and can be configured on the fly to produce population, technology, and confidence-specific predictions. Availability: Efforts are underway to deploy our system as part of the NCBI Reference Assembly. Meanwhile, the system may be obtained from the authors. Contact: jds1@cs.columbia.edu	(pdf)
Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems	Gabriela F. Cretu, Angelos Stavrou, Salvatore J. Stolfo, Angelos D. Keromytis	2007-02-15	Anomaly Detection (AD) sensors have become an invaluable tool for forensic analysis and intrusion detection. Unfortunately, the detection performance of all learning-based ADs depends heavily on the quality of the training data. In this paper, we extend the training phase of an AD to include a sanitization phase. This phase significantly improves the quality of unlabeled training data by making them as ”attack-free” as possible in the absence of absolute ground truth. Our approach is agnostic to the underlying AD, boosting its performance based solely on training-data sanitization. Our approach is to generate multiple AD models for content-based AD sensors trained on small slices of the training data. These AD “micro-models” are used to test the training data, producing alerts for each training input. We employ voting techniques to determine which of these training items are likely attacks. Our preliminary results show that sanitization increases 0-day attack detection while in most cases reducing the false positive rate. We analyze the performance gains when we deploy sanitized versus unsanitized AD systems in combination with expensive hostbased attack-detection systems. Finally, we show that our system incurs only an initial modest cost, which can be amortized over time during online operation.	(pdf)
Accelerating Service Discovery in Ad-Hoc Zero Configuration Networking	Se Gi Hong, Suman Srinivasan, Henning Schulzrinne	2007-02-12	Zero Configuration Networking (Zeroconf) assigns IP addresses and host names, and discovers service without a central server. Zeroconf can be used in wireless mobile ad-hoc networks which are based on IEEE 802.11 and IP. However, Zeroconf has problems in mobile ad-hoc networks as it cannot detect changes in the network topology. In highly mobile networks, Zeroconf causes network overhead while discovering new services. In this paper, we propose an algorithm to accelerate service discovery for mobile ad-hoc networks. Our algorithm involves the monitoring of network interface changes that occur when a device with IEEE 802.11 enabled joins a new network area. This algorithm allows users to discover network topology changes and new services in real-time while minimizing network overhead.	(pdf)
From STEM to SEAD: Speculative Execution for Automated Defense	Michael Locasto, Angelos Stavrou, Gabriela F. Cretu, Angelos D. Keromytis	2007-02-10	Most computer defense systems crash the process that they protect as part of their response to an attack. In contrast, self-healing software recovers from an attack by automatically repairing the underlying vulnerability. Although recent research explores the feasibility of the basic concept, self-healing faces four major obstacles before it can protect legacy applications and COTS software. Besides the practical issues involved in applying the system to such software (<i>e.g.</i>, not modifying source code), self-healing has encountered a number of problems: knowing when to engage, knowing how to repair, and handling communication with external entities. <p> Our previous work on a self-healing system, STEM, left these challenges as future work. STEM provides self-healing by speculatively executing ``slices'' of a process. This paper improves STEM's capabilities along three lines: (1) applicability of the system to COTS software (STEM does not require source code, and it imposes a roughly 73% performance penalty on Apache's normal operation), (2) semantic correctness of the repair (we introduce <i>virtual proxies</i> and <i>repair policy</i> to assist the healing process), and (3) creating a behavior profile based on aspects of data and control flow.	(pdf)
Topology-Based Optimization of Maximal Sustainable Throughput in a Latency-Insensitive System	Rebecca Collins, Luca Carloni	2007-02-06	We consider the problem of optimizing the performance of a latency-insensitive system (LIS) where the addition of backpressure has caused throughput degradation. Previous works have addressed the problem of LIS performance in different ways. In particular, the insertion of relay stations and the sizing of the input queues in the shells are the two main optimization techniques that have been proposed. We provide a unifying framework for this problem by outlining which approaches work for different system topologies, and highlighting counterexamples where some solutions do not work. We also observe that in the most difficult class of topologies, instances with the greatest throughput degradation are typically very amenable to simplifications. The contributions of this paper include a characterization of topologies that maintain optimal throughput with fixed-size queues and a heuristic for sizing queues that produces solutions close to optimal in a fraction of the time.	(pdf)
On the infeasibility of Modeling Polymorphic Shellcode for Signature Detection	Yingbo Song, Michael E. Locasto, Angelos Stavrou, Angelos D. Keromytis, Salvatore J. Stolfo	2007-02-04	POlymorphic malcode remains one of the most troubling threats for information security and intrusion defense systems. The ability for malcode to be automatically transformed into a semantically equivalent variant frustrates attemtps to construct a single, simple, easily verifiable representation. We present a quantitative analysis of the strentghs and limitations of shellcode polymorphism and consider the impact of this analysis on the current practices in intrusion detection. Our examination focuses on the nature of shellcode "decoding routines", and the empirical evidence we gather illustrate our mail result: that the challenge of modeling the class of self-modifying code is likely intractable - even when the size of the instruction sequence (i.e. the decoder) is relatively small. We develop metrics to gauge the power of polymorphic engines and use them to provide insight into the strengths and weaknesses of some popular engines. We believe this analysis supplies a novel and useful way to understand the limitations of the current generation of signature-based techniques. We analyze some contemporary polymorphic techniques, explore ways to improve them in order to forecast the nature of future threats, and present our suggestions for countermeasures. Our resulsts indicate that the class of polymorphic behavior is too greatly spread and varied to model effectively. We conclude that modeling normal content is ulatimately a more promising defense mechanism than modeling malicious or abnormal content.	(pdf)
Combining Ontology Queries with Text Search in Service Discovery	Knarig Arabshian, Henning Schulzrinne	2007-01-21	We present a querying mechanism for service discovery which combines ontology queries with text search. The underlying service discovery architecture used is GloServ. GloServ uses the Web Ontology Language (OWL) to classify services in an ontology and map knowledge obtained by the ontology onto a hierarchical peer-to-peer network. Initially, an ontology-based first order predicate logic query is issued in order to route the query to the appropriate server and to obtain exact and related service data. Text search further enhances querying by allowing services to be described not only with ontology attributes, but with plain text so that users can query for them using key words. Currently, querying is limited to either simple attribute-value pair searches, ontology queries or text search. Combining ontology queries with text search enhances current service discovery mechanisms.	(pdf)
A Model for Automatically Repairing Execution Integrity	Michael Locasto, Gabriela F. Cretu, Angelos Stavrou, Angelos D. Keromytis	2007-01-20	Many users value applications that continue execution in the face of attacks. Current software protection techniques typically abort a process after an intrusion attempt ({\it e.g.}, a code injection attack). We explore ways in which the security property of integrity can support availability. We extend the Clark-Wilson Integrity Model to provide primitives and rules for specifying and enforcing repair mechanisms and validation of those repairs. Users or administrators can use this model to write or automatically synthesize \emph{repair policy}. The policy can help customize an application's response to attack. We describe two prototype implementations for transparently applying these policies without modifying source code.	(pdf)
Using Functional Independence Conditions to Optimize the Performance of Latency-Insensitive Systems	Cheng-Hong Li, Luca Carloni	2007-01-11	In latency-insensitive design shell modules are used to encapsulate system components (pearls) in order to interface them with the given latency-insensitive protocol and dynamically control their operations. In particular, a shell stalls a pearl whenever new valid data are not available on its input channels. We study how functional independence conditions (FIC) can be applied to the performance optimization of a latency-insensitive system by avoiding unnecessary stalling of their pearls. We present a novel circuit design of a generic shell template that can exploit FICs. We also provide an automatic procedure for the logic synthesis of a shell instance that is only based on the particular local characteristics of its corresponding pearl and does not require any input from the designers. We conclude reporting on a set of experimental results that illustrate the beneits and overhead of the proposed technique.	(pdf)
Whitepaper: The Value of Improving the Separation of Concerns	Marc Eaddy, Alan Cyment, Pierre van de Laar, Fabian Schmied, Wolfgang Schult	2007-01-09	Microsoft's enterprise customers are demanding better ways to modularize their software systems. They look to the Java community, where these needs are being met with language enhancements, improved developer tools and middleware, and better runtime support. We present a business case for why Microsoft should give priority to supporting better modularization techniques, also known as advanced separation of concerns (ASOC), for the .NET platform, and we provide a roadmap for how to do so.	(pdf)
An Implementation of a Renesas H8/300 Microprocessor with a Cycle-Level Timing Extension	Chen-Chun Huang, Javier Coca, Yashket Gupta, Stephen A. Edwards	2006-12-30	We describe an implementation of the Renesas H8/300 16-bit processor in VHDL suitable for synthesis on an FPGA. We extended the ISA slightly to accomodate cycle-accurate timers accessible from the instruction set, designed to provide more precise real-time control. We describe the architecture of our implementation in detail, describe our testing strategy, and finally show how to built a cross compilation toolchain under Linux.	(pdf)
Embedded uClinux, the Altera DE2, and the SHIM Compiler	Wei-Chung Hsu, David Lariviere, Stephen A. Edwards	2006-12-28	SHIM is a concurrent deterministic language focused on embedded system. Although SHIM has undergone substantial evolution, it currently does not have a code generator for a true embedded environment. In this project, we built an embedded environment that we intend to use as a target for the SHIM compiler. We add the uClinux operating system between hardware devices and software programs. Our long-term goal is to have the SHIM compiler generate both user-space and kernel/module programs for this environment. This project is a first step: we manually explored what sort of code we ultimately want the SHIM compiler to produce. In this report, we provide instructions on how to build and install uClinux into an Altera DE2 board and example programs, including a user-space program, a kernel module, and a simple device driver for the buttons on the DE2 board that includes an interrupt handler.	(pdf) (ps)
A JPEG Decoder in SHIM	Nalini Vasudevan, Stephen A. Edwards	2006-12-25	Image compression plays an important role in multimedia systems, digital systems, handheld systems and various other devices. Efficient image processing techniques are needed to make images suitable for use in embedded systems. This paper describes an implementation of a JPEG decoder in the SHIM programming language. SHIM is a software/hardware integration language whose aim is to provide communication between hardware and software while providing deterministic concurrency. The paper shows that a JPEG decoder is a good application and reasonable test case for the SHIM language and illustrates the ease with which conventional sequential decoders can be modified to achieve concurrency.	(pdf) (ps)
Arrays in SHIM: A Proposal	Smridh Thapar, Olivier Tardieu, Stephen A. Edwards	2006-12-23	The use of multiprocessor configurations over uniprocessor is rapidly increasing to exploit parallelism instead of frequency scaling for better compute capacity. The multiprocessor architectures being developed will have a major impact on existing software. Current languages provide facilities for concurrent and distributed programming, but are prone to races and non-determinism. SHIM, a deterministic concurrent language, guarantees the behavior of its programs are independent of the scheduling of concurrent operations. The language currently supports atomic arrays only, i.e., parts of arrays cannot be sent to concurrent processes for evaluation (and edition). In this report, we propose a way to add non-atomic arrays to SHIM and describe the semantics that should be considered while allowing concurrent processes to edit parts of the same array.	(pdf) (ps)
High Quality, Efficient Hierarchical Document Clustering using Closed Interesting Itemsets	Hassan Malik, John Kender	2006-12-18	High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to improve the efficiency of hierarchical document clustering. In this paper, we introduce the notion of “closed interesting” itemsets (i.e. closed itemsets with high interestingness). We provide heuristics such as “super item” to efficiently mine these itemsets and show that they provide significant dimensionality reduction over closed frequent itemsets. Using “closed interesting” itemsets, we propose a new hierarchical document clustering method that outperforms state of the art agglomerative, partitioning and frequent-itemset based methods both in terms of FScore and Entropy, without requiring dataset specific parameter tuning. We evaluate twenty interestingness measures on nine standard datasets and show that when used to generate “closed interesting” itemsets, and to select parent nodes, Mutual Information, Added Value, Yule’s Q and Chi-Square offers best clustering performance, regardless of the characteristics of underlying dataset. We also show that our method is more scalable, and results in better run-time performance as compare to leading approaches. On a dual processor machine, our method scaled sub-linearly and was able to cluster 200K documents in about 40 seconds.	(pdf)
LinkWidth: A Method to measure Link Capacity and Available Bandwidth using Single-End Probes	Sambuddho Chakravarty, Angelos Stavrou, Angelos D. Keromytis	2006-12-15	We introduce LinkWidth, a method for estimating capacity and available bandwidth using single-end controlled TCP packet probes. To estimate capacity, we generate a train of TCP RST packets “sandwiched” between two TCP SYN packets. Capacity is obtained by end-to-end packet dispersion of the received TCP RST/ACK packets corresponding to the TCP SYN packets. Our technique is significantly different from the rest of the packet-pair-based measurement techniques, such as CapProbe, pathchar and pathrate, because the long packet trains minimize errors due to bursty cross-traffic. TCP RST packets do not generate additional ICMP replies preventing cross-traffic interference with our probes. In addition, we use TCP packets for all our probes to prevent some types of QoS-related traffic shaping from affecting our measurements. We extend the Train of Packet Pairs technique to approximate the available link capacity. We use pairs of TCP packets with variable intra-pair delays and sizes. This is the first attempt to implement this technique using single-end TCP probes, tested on a wide range of real networks with variable cross-traffic. We compare our prototype with pathchirp and pathload, which require control of both ends, and demonstrate that in most cases our method gives approximately the same results.	(pdf)
Deriving Utility from a Self-Healing Benchmark Report	Ritika Virmani, Rean Griffith, Gail Kaiser	2006-12-15	Autonomic systems, specifically self-healing systems, currently lack an objective and relevant methodology for their evaluation. Due to their focus on problem detection, diagnosis and remediation any evaluation methodology should facilitate an objective evaluation and/or comparison of these activities. Measures of “raw” performance are easily quantified and hence facilitate measurement and comparison on the basis of numbers. However, classifying a system better at problem detection, diagnosis and remediation purely on the basis of performance measures is not useful. The proposed evaluation methodology devised will differ from traditional benchmarks, which are primarily concerned with measures of performance. In order to develop this methodology we rely on a set of experiments which will enable us to compare the self-healing capabilities of one system versus another. As currently we do not have available “real” self-healing systems, we will simulate the behavior of some target self-healing systems, system faults and the operational and repair activities of target systems. Further, we will use the results derived from the simulation experiments to answer questions relevant to the utility of a benchmark report.	(pdf)
Measurements of DNS Stability	Omer Boyaci, Henning Schulzrinne	2006-12-14	In this project, we measured the stability of DNS servers based on the most popular 500 domains. In the first part of the project, DNS server replica counts and maximum DNS server separation are found for each domain. In the second part, these domains are queried for a one-month period in order to find their uptime percentages.	(pdf)
Cooperation Between Stations in Wireless Networks	Andrea G. Forte, Henning Schulzrinne	2006-12-07	In a wireless network, mobile nodes (MNs) repeatedly perform tasks such as layer 2 (L2) handoff, layer 3 (L3) handoff and authentication. These tasks are critical, particularly for real-time applications such as VoIP. We propose a novel approach, namely Cooperative Roaming (CR), in which MNs can collaborate with each other and share useful information about the network in which they move. We show how we can achieve seamless L2 and L3 handoffs regardless of the authentication mechanism used and without any changes to either the infrastructure or the protocol. In particular, we provide a working implementation of CR and show how, with CR, MNs can achieve a total L2+L3 handoff time of less than 16 ms in an open network and of about 21 ms in an IEEE 802.11i network. We consider behaviors typical of IEEE 802.11 networks, although many of the concepts and problems addressed here apply to any kind of mobile network.	(pdf)
Throughput and Fairness in CSMA/CA Wireless Networks	Hoon Chang, Vishal Misra, Dan Rubenstein	2006-12-07	While physical layer capture has been observed in real implementations of wireless devices accessing the channel like 802.11, log-utility fair allocation algorithms based on accurate channel models describing the phenomenon have not been developed. In this paper, using a general physical channel model, we develop an allocation algorithm for log-utility fairness. To maximize the aggregate utility, our algorithm determines channel access attempt probabilities of nodes using partial derivatives of the utility. Our algorithm is verified through extended simulations. The results indicate that our algorithm could quickly achieve allocations close to the optimum with 8.6% accuracy error on average.	(pdf)
A Case for P2P Delivery of Paid Content	Alex Sherman, Angelos Stavrou, Jason Nieh, Cliff Stein, Angelos D. Keromytis	2006-11-28	P2P file sharing provides a powerful content distribution model by leveraging users' computing and bandwidth resources. However, companies have been reluctant to rely on P2P systems for paid content distribution due to their inability to limit the exploitation of these systems for free file sharing. We present \sname, a system that combines the more cost-effective and scalable distribution capabilities of P2P systems with a level of trust and control over content distribution similar to direct download content delivery networks. \sname\ uses two key mechanisms that can be layered on top of existing P2P systems. First, it provides strong authentication to prevent free file sharing in the system. Second, it introduces a new notion of trusted auditors to detect and limit malicious attempts to gain information about participants in the system to facilitate additional out-of-band free file sharing. We analyze \sname\ by modeling it as a novel game between malicious users who try to form free file sharing clusters and trusted auditors who curb the growth of such clusters. Our analysis shows that a small fraction of trusted auditors is sufficient to protect the P2P system against unauthorized file sharing. Using a simple economic model, we further show that \sname\ provides a more cost-effective content distribution solution, resulting in higher profits for a content provider even in the presence of a large percentage of malicious users. Finally, we implemented \sname\ on top of BitTorrent and use PlanetLab to show that our system can provide trusted P2P f	(pdf)
Presence Traffic Optimization Techniques	Vishal Singh, Henning Schulzrinne, Markus Isomaki, Piotr Boni	2006-11-02	With the growth of presence-based services, it is important to provision the network to support high traffic and load generated by presence services. Presence event distribution systems amplify a single incoming PUBLISH message into possibly numerous outgoing NOTIFY messages from the server. This can increase the network load on inter-domain links and can potentially disrupt other QoS-sensitive applications. In this document, we present existing as well as new techniques that can be used to reduce presence traffic both in inter-domain and intra-domain scenarios. Specifically, we propose two new techniques: sending common NOTIFY for multiple watchers and batched notifications. We also propose some generic heuristics that can be used to reduce network traffic due to presence.	(pdf)
A Common Protocol for Implementing Various DHT Algorithms	Salman Abdul Baset, Henning Schulzrinne, Eunsoo Shim	2006-10-22	This document defines DHT-independent and DHT-dependent features of DHT algorithms and presents a comparison of Chord, Pastry and Kademlia. It then describes key DHT operations and their information requirements.	(pdf)
A survey on service creation by end users	Xiaotao Wu, Henning Schulzrinne	2006-10-15	We conducted a survey on end users’ willingness and capability to create their desired communication services. The survey is based on the graphical service creation tool we implemented for the Language for End System Services (LESS). We call the tool CUTE, which stands for Columbia University Telecommunication service Editor. This report demonstrates our survey result and shows that relatively inexperienced users are willing and capable to create their desired communication services, and CUTE fits their needs.	(pdf)
A VoIP Privacy Mechanism and its Application in VoIP Peering for Voice Service Provider Topology and Identity Hiding	Charles Shen, Henning Schulzrinne	2006-10-03	Voice Service Providers (VSPs) participating in VoIP peering frequently want to withhold their identity and related privacy-sensitive information from other parties during the VoIP communication. A number of existing documents on VoIP privacy exist, but most of them focus on end user privacy. By summarizing and extending existing work, we present a unified privacy mechanism for both VoIP users and service providers. We also show a case study on how VSPs can use this mechanism for identity and topology hiding in VoIP peering.	(pdf)
Evaluation and Comparison of BIND, PDNS and Navitas as ENUM Server	Charles Shen, Henning Schulzrinne	2006-09-27	ENUM is a protocol standard developed by the Internet Engineering Task Force (IETF) for translating the E.164 phone numbers into Internet Universal Resource Identifiers (URIs). It plays an increasingly important role as the bridge between Internet and traditional telecommunications services. ENUM is based on the Domain Name System (DNS), but places unique performance requirements on DNS server. In particular, ENUM server needs to host a huge number of records, provide high query throughput for both existing and non-existing records in the server, maintain high query performance under update load, and answer queries within a tight latency budget. In this report, we evaluate and compare performance of serving ENUM queries by three servers, namely BIND, PDNS and Navitas. Our objective is to answer whether and how these servers can meet the unique performance requirements of ENUM. Test results show that the ENUM query response time on our platform has always been on the order of a few milliseconds or less, so this is likely not a concern. Throughput then becomes the key. The throughput of BIND degrades linearly as the record set size grows, so BIND is not suitable for ENUM. PDNS delivers higher performance than BIND in most cases, while the commercial Navitas server presents even better ENUM performance than PDNS. Under our 5M-record set test, Navitas server with its default configuration consumes one tenth to one sixth the memory of PDNS, achieves six times higher throughput for existing records and an order of two magnitudes higher throughput for non-existing records than the bottom line PDNS server without caching. The throughput of Navitas is also the highest among the tested servers when the database is being updated in the background. We investigated ways to improve PDNS performance. For example, doubling CPU processing power by putting PDNS and its backend database in two separate machines can increase PDNS throughput for existing records by 45% and that for nonexisting records by 40%. Since PDNS is open source, we also instrumented the source code to obtain a detailed profile of contributions of various systems components to the overall latency. We found that when the server is within its normal load range, the main component of server processing latency is caused by backend database lookup operations. Excessive number of backend database lookups is the reason that makes PDNS throughput for non-existing records its key weakness. We studied using PDNS caching to reduce the number of database lookups. With a full packet cache and a modified cache maintenance mechanism, the PDNS throughput for existing records can be improved by 100%. This brings the value to one third of its Navitas counterpart. After enabling the PDNS negative query cache, we improved PDNS throughput for non-existing records to the level comparable to its throughput for existing records, but this result is still an order of magnitude lower than the corresponding value in Navitas. Further improvements of PDNS throughput for non-existing records will require optimization of related processing mechanism in its implementation.	(pdf)
Specifying Confluent Processes	Olivier Tardieu, Stephen A. Edwards	2006-09-22	We address the problem of specifying concurrent processes that can make local nondeterministic decisions without affecting global system behavior---the sequence of events communicated along each inter-process communication channel. Such nondeterminism can be used to cope with unpredictable execution rates and communication delays. Our model resembles Kahn's, but does not include unbounded buffered communication, so it is much simpler to reason about and implement. After formally characterizing these so-called confluent processes, we propose a collection of operators, including sequencing, parallel, and our own creation, confluent choice, that guarantee confluence by construction. The result is a set of primitive constructs that form the formal basis of a concurrent programming language for both hardware and software systems that gives deterministic behavior regardless of the relative execution rates of the processes. Such a language greatly simplifies the verification task because any correct implementation of such a system is guaranteed to have the same behavior, a property rarely found in concurrent programming environments.	(pdf) (ps)
MacShim: Compiling MATLAB to a Scheduling-Independent Concurrent Language	Neesha Subramaniam, Ohan Oda, Stephen A. Edwards	2006-09-22	Nondeterminism is a central challenge in most concurrent models of computation. That programmers must worry about races and other timing-dependent behavior is a key reason that parallel programming has not been widely adopted. The SHIM concurrent language, intended for hardware/software codesign applications, avoids this problem by providing deterministic (race-free) concurrency, but does not support automatic parallelization of sequential algorithms. In this paper, we present a compiler able to parallelize a simple MATLAB-like language into concurrent SHIM processes. From a user-provided partitioning of arrays to processes, our compiler divides the program into coarse-grained processes and schedules and synthesizes inter-process communication. We demonstrate the effectiveness of our approach on some image-processing algorithms.	(pdf) (ps)
SHIM: A Deterministic Approach to Programming with Threads	Olivier Tardieu, Stephen A. Edwards	2006-09-21	Concurrent programming languages should be a good fit for embedded systems because they match the intrinsic parallelism of their architectures and environments. Unfortunately, most concurrent programming formalisms are prone to races and nondeterminism, despite the presence of mechanisms such as monitors. In this paper, we propose SHIM, the core of a concurrent language with disciplined shared variables that remains deterministic, meaning the behavior of a program is independent of the scheduling of concurrent operations. SHIM does not sacrifice power or flexibility to achieve this determinism. It supports both synchronous and asynchronous paradigms---loosely and tightly synchronized threads---the dynamic creation of threads and shared variables, recursive procedures, and exceptions. We illustrate our programming model with examples including breadth-first-search algorithms and pipelines. By construction, they are race-free. We provide the formal semantics of SHIM and a preliminary implementation.	(pdf) (ps)
Debugging Woven Code	Marc Eaddy, Alfred Aho, Weiping Hu, Paddy McDonald, Julian Burger	2006-09-20	The ability to debug woven programs is critical to the adoption of Aspect Oriented Programming (AOP). Nevertheless, many AOP systems lack adequate support for debugging, making it difficult to diagnose faults and understand the program's structure and control flow. We discuss why debugging aspect behavior is hard and how harvesting results from related research on debugging optimized code can make the problem more tractable. We also specify general debugging criteria that we feel all AOP systems should support. We present a novel solution to the problem of debugging aspect-enabled programs. Our Wicca system is the first dynamic AOP system to support full source-level debugging of woven code. It introduces a new weaving strategy that combines source weaving with online byte-code patching. Changes to the aspect rules, or base or aspect source code are rewoven and recompiled on-the-fly. We present the results of an experiment that show how these features provide the programmer with a powerful interactive debugging experience with relatively little overhead.	(pdf)
A Framework for Quality Assurance of Machine Learning Applications	Christian Murphy, Gail Kaiser, Marta Arias	2006-09-15	Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test and debug such ML software, because there is no reliable test oracle. We describe a framework and collection of tools aimed to assist with this problem. We present our findings from using the testing framework with three implementations of an ML ranking algorithm (all of which had bugs).	(pdf)
Throughput and Fairness in Random Access Networks	Hoon Chang, Vishal Misra, Dan Rubenstein	2006-08-24	This paper present an throughput analysis of log-utility and max-min fairness. Assuming all nodes interfere with each other, completely or partially, log-utility fairness significantly enhances the total throughput compared to max-min fairness since the nodes should have the same throughput in max-min fairness. The improvement is enlarged especially when the effect of cumulated interference from multiple senders cannot be ignored.	(pdf)
Linear Approximation of Optimal Attempt Rate in Random Access Networks	HOON CHANG, VISHAL MISRA, DAN RUBENSTEIN	2006-08-01	While packet capture has been observed in real implementations of wireless devices randomly accessing shared channels, fair rate control algorithms based on accurate channel models that describe the phenomenon have not been developed. In this paper, using a general physical channel model, we develop the equation for the optimal attemp rate to maximize the aggregate log utility. We use the least squares method to approximate the equation to a linear function of the attempt rate. Our analysis on the approximation error shows that the linear function obtained is close enough to the original with the square of the residuals more than 0.9.	(pdf)
Complexity and tractability of the heat equation	Arthur G. Werschulz	2006-07-27	In a previous paper, we developed a general framework for establishing tractability and strong tractability for quasilinear multivariate problems in the worst case setting. One important example of such a problem is the solution of the heat equation $u_t = \Delta u - qu$ in $I^d\times(0,T)$, where $I$ is the unit interval and $T$ is a maximum time value. This problem is to be solved subject to homogeneous Dirichlet boundary conditions, along with the initial conditions $u(\cdot,0)=f$ over~$I^d$. The solution~$u$ depends linearly on~$f$, but nonlinearly on~$q$. Here, both $f$ and~$q$ are $d$-variate functions from a reproducing kernel Hilbert space with finite-order weights of order~$\omega$. This means that, although~$d$ can be arbitrary large, $f$ and~$q$ can be decomposed as sums of functions of at most $\omega$~variables, with $\omega$ independent of~$d$. In this paper, we apply our previous general results to the heat equation. We study both the absolute and normalized error criteria. For either error criterion, we show that the problem is \emph. That is, the number of evaluations of $f$ and~$q$ needed to obtain an $\varepsilon$-approximation is polynomial in~$\varepsilon$ and~$d$, with the degree of the polynomial depending linearly on~$\omega$. In addition, we want to know when the problem is \emph{strongly tractable}, meaning that the dependence is polynomial only in~$\varepsilon$, independently of~$d$. We show that if the sum of the weights defining the weighted reproducing kernel Hilbert space is uniformly bounded in~$d$ and the integral of the univariate kernel is positive, then the heat equation is strongly tractable.	(pdf)
Projection Volumetric Display using Passive Optical Scatterers	Shree K. Nayar, Vijay N. Anand	2006-07-25	In this paper, we present a new class of volumetric displays that can be used to display 3D objects. The basic approach is to trade-off the spatial resolution of a digital projector (or any light engine) to gain resolution in the third dimension. Rather than projecting an image onto a 2D screen, a depth-coded image is projected onto a 3D cloud of passive optical scatterers. The 3D point cloud is realized using a technique called Laser Induced Damage (LID), where each scatterer is a physical crack embedded in a block of glass or plastic. We show that when the point cloud is randomized in a specific manner, a very large fraction of the points are visible to the viewer irrespective of his/her viewing direction. We have developed an orthographic projection system that serves as the light engine for our volumetric displays. We have implemented several types of point clouds, each one designed to display a specific class of objects. These include a cloud with uniquely indexable points for the display of true 3D objects, a cloud with an independently indexable top layer and a dense extrusion volume to display extruded objects with arbitrarily textured top planes and a dense cloud for the display of purely extruded objects. In addition, we show how our approach can be used to extend simple video games to 3D. Finally, we have developed a 3D avatar in which videos of a face with expression changes are projected onto a static surface point cloud of the face.	(pdf)
Practical Preference Relations for Large Data Sets	Kenneth Ross, Peter Stuckey, Amelie Marian	2006-06-16	User-defined preferences allow personalized ranking of query results. A user provides a declarative specification of his/her preferences, and the system is expected to use that specification to give more prominence to preferred answers. We study constraint formalisms for expressing user preferences as base facts in a partial order. We consider a language that allows comparison and a limited form of arithmetic, and show that the transitive closure computation required to complete the partial order terminates. We consider various ways of composing partial orders from smaller pieces, and provide results on the size of the resulting transitive closures. We introduce the notion of ``covering composition,'' which solves some semantic problems apparent in previous notions of composition. Finally, we show how preference queries within our language can be supported by suitable index structures for efficient evaluation over large data sets. Our results provide guidance about when complex preferences can be efficiently evaluated, and when they cannot.	(pdf)
Feasibility of Voice over IP on the Internet	Alex Sherman, Jason Nieh, Yoav Freund	2006-06-09	VoIP (Voice over IP) services are using the Internet infrastructure to enable new forms of communication and collaboration. A growing number of VoIP service providers such as Skype, Vonage, Broadvoice, as well as many cable services are using the Internet to offer telephone services at much lower costs. However, VoIP services rely on the user's Internet connection, and this can often translate into lower quality communication. Overlay networks offer a potential solution to this problem by improving the default Internet routing and overcome failures. To assess the feasibility of using overlays to improve VoIP on the Internet, we have conducted a detailed experimental study to evaluate the benefits of using an overlay on PlanetLab nodes for improving voice communication connectivity and performance around the world. Our measurements demonstrate that an overlay architecture can significantly improve VoIP communication across most regions and provide their greatest benefit for locations with poorer default Internet connectivity. We explore overlay topologies and show that a small number of well-connected intermediate nodes is sufficient to improve VoIP performance. We show that there is significant variation over time in the best overlay routing paths and argue for the need for adaptive routing to account for this variation to deliver the best performance.	(pdf)
Exploiting Temporal Coherence for Pre-computation Based Rendering	Ryan Overbeck	2006-05-23	Precomputed radiance transfer (PRT) generates impressive images with complex illumi- nation, materials and shadows with real-time interactivity. These methods separate the scene’s static and dynamic components allowing the static portion to be computed as a preprocess. In this work, we hold geometry static and allow either the lighting or BRDF to be dynamic. To achieve real-time performance, both static and dynamic components are compressed by exploiting spatial and angular coherence. Temporal coherence of the dynamic component from frame to frame is an important, but unexplored additional form of coherence. In this thesis, we explore temporal coherence of two forms of all-frequency PRT: BRDF material editing and lighting design. We develop incremental methods for approximating the differences in the dynamic component between consecutive frames. For BRDF editing, we find that a pure incremental approach allows quick convergence to an exact solution with smooth real-time response. For relighting, we observe vastly differing degrees of temporal coherence accross levels of the lighting’s wavelet hierarchy. To address this, we develop an algorithm that treats each level separately, adapting to available coherence. The proposed methods are othogonal to other forms of coherence, and can be added to almost any PRT algorithm with minimal implementation, computation, or memory overhead. We demonstrate our technique within existing codes for nonlinear wavelet approximation, changing view with BRDF factorization, and clustered PCA. Exploiting temporal coherence of dynamic lighting yields a 3×–4× per- formance improvement, e.g., all-frequency effects are achieved with 30 wavelet coefficients, about the same as low-frequency spherical harmonic methods. Distinctly, our algorithm smoothly converges to the exact result within a few frames of the lighting becoming static.	(pdf)
Speculative Execution as an Operating System Service	Michael Locasto, Angelos Keromytis	2006-05-12	Software faults and vulnerabilities continue to present significant obstacles to achieving reliable and secure software. In an effort to overcome these obstacles, systems often incorporate self-monitoring and self-healing functionality. Our hypothesis is that internal monitoring is not an effective long-term strategy. However, monitoring mechanisms that are completely external lose the advantage of application-specific knowledge available to an inline monitor. To balance these tradeoffs, we present the design of VxF, an environment where both supervision and automatic remediation can take place by speculatively executing "slices" of an application. VxF introduces the concept of an endolithic kernel by providing execution as an operating system service: execution of a process slice takes place inside a kernel thread rather than directly on the system microprocessor.	(pdf)
Privacy-Preserving Payload-Based Correlation for Accurate Malicious Traffic Detection	Janak Parekh, Ke Wang, Salvatore Stolfo	2006-05-09	With the increased use of botnets and other techniques to obfuscate attackers' command-and-control centers, Distributed Intrusion Detection Systems (DIDS) that focus on attack source IP addresses or other header information can only portray a limited view of distributed scans and attacks. Packet payload sharing techniques hold far more promise, as they can convey exploit vectors and/or malcode used upon successful exploit of a target system, irrespective of obfuscated source addresses. However, payload sharing has had minimal success due to regulatory or business-based privacy concerns of transmitting raw or even sanitized payloads. The currently accepted form of content exchange has been limited to the exchange of known-suspicious content, e.g., packets captured by honeypots; however, signature generation assumes that each site receives enough traffic in order to correlate a meaningful set of payloads from which common content can be derived, and places fundamental and computationally stressful requirements on signature generators that may miss particularly stealthy or carefully-crafted polymorphic malcode. Instead, we propose a new approach to enable the sharing of suspicious payloads via privacy-preserving technologies. We detail the work we have done with two example payload anomaly detectors, PAYL and Anagram, to support generalized payload correlation and signature generation without releasing identifiable payload data and without relying on single-site signature generation. We present preliminary results of our approaches and suggest how such deployments may practically be used for not only cross-site, but also cross-domain alert sharing and its implications for profiling threats.	(pdf)
PBS: A Unified Priority-Based CPU Scheduler	Hanhua Feng, Vishal Misra, Dan Rubenstein	2006-05-01	A novel CPU scheduling policy is designed and implemented. It is a configurable policy in the sense that a tunable parameter is provided to change its behavior. With different settings of the parameter, this policy can emulate the first-come first-serve, the processing sharing, or the feedback policies, as well as different levels of their mixtures. This policy is implemented in the Linux kernel as a replacement of the default scheduler. The drastic changes of behaviors as the parameter changes are analyzed and simulated. Its performance is measured with the real systems by the workload generators and benchmarks.	(pdf) (ps)
A First Order Analysis of Lighting, Shading, and Shadows	Ravi Ramamoorthi, Dhruv Mahajan, Peter Belhumeur	2006-04-30	The shading in a scene depends on a combination of many factors---how the lighting varies spatially across a surface, how it varies along different directions, the geometric curvature and reflectance properties of objects, and the locations of soft shadows. In this paper, we conduct a complete first order or gradient analysis of lighting, shading and shadows, showing how each factor separately contributes to scene appearance, and when it is important. Gradients are well suited for analyzing the intricate combination of appearance effects, since each gradient term corresponds directly to variation in a specific factor. First, we show how the spatial {\em and} directional gradients of the light field change, as light interacts with curved objects. This extends the recent frequency analysis of Durand et al.\ to gradients, and has many advantages for operations, like bump-mapping, that are difficult to analyze in the Fourier domain. Second, we consider the individual terms responsible for shading gradients, such as lighting variation, convolution with the surface BRDF, and the object's curvature. This analysis indicates the relative importance of various terms, and shows precisely how they combine in shading. As one practical application, our theoretical framework can be used to adaptively sample images in high-gradient regions for efficient rendering. Third, we understand the effects of soft shadows, computing accurate visibility gradients. We generalize previous work to arbitrary curved occluders, and develop a local framework that is easy to integrate with conventional ray-tracing methods. Our visibility gradients can be directly used in practical gradient interpolation methods for efficient rendering.	(pdf) (ps)
Quantifying Application Behavior Space for Detection and Self-Healing	Michael Locasto, Angelos Stavrou, Gabriela G. Cretu, Angelos D. Keromytis, Salvatore J. Stolfo	2006-04-08	The increasing sophistication of software attacks has created the need for increasingly finer-grained intrusion and anomaly detection systems, both at the network and the host level. We believe that the next generation of defense mechanisms will require a much more detailed dynamic analysis of application behavior than is currently done. We also note that the same type of behavior analysis is needed by the current embryonic attempts at self-healing systems. Because such mechanisms are currently perceived as too expensive in terms of their performance impact, questions relating to the feasibility and value of such analysis remain unexplored and unanswered. We present a new mechanism for profiling the behavior space of an application by analyzing all function calls made by the process, including regular functions and library calls, as well as system calls. We derive behavior from aspects of both control and data flow. We show how to build and check profiles that contain this information at the binary level -- that is, without making changes to the application's source, the operating system, or the compiler. This capability makes our system, Lugrind, applicable to a variety of software, including COTS applications. Profiles built for the applications we tested can predict behavior with 97% accuracy given a context window of 15 functions. Lugrind demonstrates the feasibility of combining binary-level behavior profiling with detection and automated repair.	(pdf)
Seamless Layer-2 Handoff using Two Radios in IEEE 802.11 Wireless Networks	Sangho Shin, Andrea G. Forte, Henning Schulzrinne	2006-04-08	We propose a layer-2 handoff using two radios and achieves seamless handoff. Also, We reduces the false handoff probability signicantly by introducing selective passive scanning.	(pdf)
Anagram: A Content Anomaly Detector Resistant to Mimicry Attack	Ke Wang, Janak Parekh, Salvatore Stolfo	2006-04-07	In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and ^Ósuspicious^Ô network packet payloads. By using higher- order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly efficient Bloom filters, reducing space requirements and enabling privacy-preserving cross-site correlation. The sensor models the distinct content flow of a network or host using a semi- supervised training regimen. Previously known exploits, extracted from the signatures of an IDS, are likewise modeled in a Bloom filter and are used during training as well as detection time. We demon- strate that Anagram can identify anomalous traffic with high accuracy and low false positive rates. Anagram^Òs high-order n-gram analysis technique is also resil-ient against simple mimicry attacks that blend exploits with ^Ónormal^Ô appearing byte padding, such as the blended polymorphic attack recently demonstrated in [1]. We discuss randomized n-gram models, which further raises the bar and makes it more difficult for attackers to build precise packet structures to evade Anagram even if they know the distribution of the local site content flow. Finally, Ana-gram^Òs speed and high detection rate makes it valuable not only as a standalone sensor, but also as a network anomaly flow classifier in an instrumented fault-tolerant host-based environment; this enables significant cost amortization and the possibility of a ^Ósymbiotic^Ô feedback loop that can improve accuracy and reduce false positive rates over time.	(pdf)
Bloodhound: Searching Out Malicious Input in Network Flows for Automatic Repair Validation	Michael Locasto, Matthew Burnside, Angelos D. Keromytis	2006-04-05	Many current systems security research efforts focus on mechanisms for Intrusion Prevention and Self-Healing Software. Unfortunately, such systems find it difficult to gain traction in many deployment scenarios. For self-healing techniques to be realistically employed, system owners and administrators must have enough confidence in the quality of a generated fix that they are willing to allow its automatic deployment. In order to increase the level of confidence in these systems, the efficacy of a 'fix' must be tested and validated after it has been automatically developed, but before it is actually deployed. Due to the nature of attacks, such verification must proceed automatically. We call this problem Automatic Repair Validation (ARV). As a way to illustrate the difficulties faced by ARV, we propose the design of a system, Bloodhound, that tracks and stores malicious network flows for later replay in the validation phase for self-healing software.	(pdf)
Streak Seeding Automation Using Silicon Tools	Atanas Georgiev, Sergey Vorobiev, William Edstrom, Ting Song, Andrew Laine, John Hunt	2006-03-31	This report presents an approach to automation of a protein crystallography task called streak seeding. The approach is based on novel and unique custom-designed silicon microtools, which we experimentally verified to produce results similar to the results from traditionally used boar bristles. The advantage to using silicon is that it allows the employment of state-of-the-art micro-electro-mechanical-systems (MEMS) technology to produce microtools of various shapes and sizes and thatit is rigid and can be easily adopted as an accurately calibrated end-effector on a microrobotic system. A working prototype of an automatic streak seeding system is presented, which has been successfully applied for protein crystallization.	(pdf)
PalProtect: A Collaborative Security Approach to Comment Spam	Benny Wong, Michael Locasto, Angelos D. Keromytis	2006-03-22	Collaborative security is a promising solution to many types of security problems. Organizations and individuals often have a limited amount of resources to detect and respond to the threat of automated attacks. Enabling them to take advantage of the resources of their peers by sharing information related to such threats is a major step towards automating defense systems. In particular, comment spam posted on blogs as a way for attackers to do Search Engine Optimization (SEO) is a major annoyance. Many measures have been proposed to thwart such spam, but all such measures are currently enacted and operate within one administrative domain. We propose and implement a system for cross-domain information sharing to improve the quality and speed of defense against such spam.	(pdf)
Using Angle of Arrival (Bearing) Information in Network Localization	Tolga Eren, Walter Whiteley, Peter N. Belhumeur	2006-03-18	In this paper, we consider using angle of arrival information (bearing) for network localization and control in two different fields of multi-agent systems: (i) wireless sensor networks; (ii) robot networks. The essential property we require in this paper is that a node can infer heading information from its neighbors. We address the uniqueness of network localization solutions by the theory of globally rigid graphs. We show that while the parallel rigidity problem for formations with bearings is isomorphic to the distance case, the global rigidity of the formation is simpler (in fact identical to the simpler rigidity case) for a network with bearings, compared to formations with distances. We provide the conditions of localization for networks in which the neighbor relationship is not necessarily symmetric.	(pdf) (ps)
A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency	Dhruv Mahajan, Ravi Ramamoorthi, Brian Curless	2006-03-17	We develop new mathematical results based on the spherical harmonic convolution framework for reflection from a curved surface. We derive novel identities, which are the angular frequency domain analogs to common spatial domain invariants such as reflectance ratios. They apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. While this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image, to detect tampering or image splicing.	(pdf)
Passive Duplicate Address Detection for Dynamic Host Configuration Protocol (DHCP)	Andrea G. Forte, Sangho Shin, Henning Schulzrinne	2006-03-07	During a layer-3 handoff, address acquisition via DHCP is often the dominant source of handoff delay, duplicate address detection (DAD) being responsible for most of the delay. We propose a new DAD algorithm, passive DAD (pDAD), which we show to be effective, yet introduce only a few milliseconds of delay. Unlike traditional DAD, pDAD also detects the unauthorized use of an IP address before it is assigned to a DHCP client.	(pdf)
Evaluating an Evaluation Method: The Pyramid Method Applied to 2003 Document Understanding Conference (DUC) Data	Rebecca Passonneau	2006-03-03	A pyramid evaluation dataset was created for DUC 2003 in order to compare results with DUC 2005, and to provide an independent test of the evaluation metric. The main differences between DUC 2003 and 2005 datasets pertain to the document length, cluster sizes, and model summary length. For five of the DUC 2003 document sets, two pyramids each were constructed by annotators working independently. Scores of the same peer using different pyramids were highly correlated. Sixteen systems were evaluated on eight document sets. Analysis of variance using Tukey's Honest Significant Difference method showed significant differences among all eight document sets, and more significant differences among the sixteen systems than for DUC 2005.	(pdf)
Rigid Formations with Leader-Follower Architecture	Tolga Eren, Walter Whiteley, Peter N. Belhumeur	2006-02-25	This paper is concerned with information structures used in rigid formations of autonomous agents that have leader-follower architecture. The focus of the paper is on sensor/network topologies to secure control of rigidity. This papers extends the previous rigidity based approaches for formations with symmetric neighbor relations to include formations with leader-follower architecture. We provide necessary and sufficient conditions for rigidity of directed formations, with or without cycles. We present the directed Henneberg constructions as a sequential process for all guide rigid digraphs. We refine those results for acyclic formations, where guide rigid formations had a simple construction. The analysis in this paper confirms that acyclicity is not a necessary condition for stable rigidity. The cycles are not the real problem, but rather the lack of guide freedom is the reason behind why cycles have been seen as a problematic topology. Topologies that have cycles within a larger architecture can be stably rigid, and we conjecture that all guide rigid formations are stably rigid for internal control. We analyze how the external control of guide agents can be integrated into stable rigidity of a larger formation. The analysis in the paper also confirms the inconsistencies that result from noisy measurements in redundantly rigid formations. An algorithm given in the paper establishes a sequential way of determining the directions of links from a given undirected rigid formation so that the necessary and sufficient conditions are fulfilled.	(pdf) (ps)
Using an External DHT as a SIP Location Service	Kundan Singh, Henning Schulzrinne	2006-02-22	Peer-to-peer Internet telephony using the Session Initiation Protocol (P2P-SIP) can exhibit two different architectures: an existing P2P network can be used as a replacement for lookup and updates, or a P2P algorithm can be implemented using SIP messages. In this paper, we explore the first architecture using the OpenDHT service as an externally managed P2P network. We provide design details such as encryption and signing using pseudo-code and examples to provide P2P-SIP for various deployment components such as P2P client, proxy and adaptor, based on our implementation. The design can be used with other distributed hash tables (DHTs) also.	(pdf) (ps)
Synthesis of On-Chip Interconnection Structures:From Point-toPoint Links to Networks-on-Chip	Alessandro Pinto, Luca P. Carloni, Alberto L. Sangiovanni-Vincentelli	2006-02-20	Packet-switched networks-on-chip (NOC) have been advocated as the solution to the challenge of organizing efficient and reliable communication structures among the components of a system-on-chip (SOC). A critical issue in designing a NOC is to determine its topology given the set of point-to-point communication requirements among these components. We present a novel approach to on-chip communication synthesis that is based on the iterative combination of two efficient computational steps: (1) an application of the k-Median algorithm to coarsely determine the global communication structure (which may turned out not be a network after all), and a (2) a variation of the shortest-path algorithm in order to finely tune the data flows on the communication channels. The application of our method to case studies taken from the literature shows that we can automatically synthesize optimal NOC topologies for multi-core on-chip processors and it offers new insights on why NOC are not necessarily a value proposition for some classes of applcation-specific SOCs.	(pdf)
Theoretical Bounds on Control-Plane Self-Monitoring in Routing Protocols	Raj Kumar Rajendran, Vishal Misra, Dan Rubenstein	2006-02-15	Routing protocols rely on the cooperation of nodes in the network to both forward packets and to select the forwarding routes. There have been several instances in which an entire network's routing collapsed simply because a seemingly insignificant set of nodes reported erroneous routing information to their neighbors. It may have been possible for other nodes to trigger an automated response and prevent the problem by analyzing received routing information for inconsistencies that revealed the errors. Our theoretical study seeks to understand when nodes can detect the existence of errors in the implementation of route selection elsewhere in the network through monitoring their own routing states for inconsistencies. We start by constructing a methodology, called Strong-Detection, that helps answer the question. We then apply Strong-Detection to three classes of routing protocols: distance-vector, path-vector, and link-state. For each class, we derive low-complexity, self-monitoring algorithms that use the routing state created by these routing protocols to identify any detectable anomalies. These algorithms are then used to compare and contrast the self-monitoring power these various classes of protocols possess. We also study the trade-off between their state-information complexity and ability to identify routing anomalies.	(pdf) (ps)
A Survey of Security Issues and Solutions in Presence	Vishal Kumar Singh, Henning Schulzrinne	2006-02-10	With the growth of presence based services, it is important to securely manage and distribute sensitive presence information such as user location. We survey techniques that are used for security and privacy of presence information. In particular, we describe the SIMPLE based presence specific authentication, integrity and confidentiality. We also discuss the IETF’s common policy for geo-privacy, presence authorization for presence information privacy and distribution of different levels of presence information to different watchers. Additionally, we describe an open problem of getting the aggregated presence from the trusted server without the server knowing the presence information, and propose a solution. Finally, we discuss denial of service attacks on the presence system and ways to mitigate them.	(pdf)
SIMPLEstone - Benchmarking Presence Server Performance	Vishal Kumar Singh, Henning Schulzrinne	2006-02-10	Presence is an important enabler for communication in Internet telephony systems. Presence-based services depend on accurate and timely delivery of presence information. Hence, presence systems need to be appropriately dimensioned to meet the growing number of users, varying number of devices as presence sources, the rate at which they update presence information to the network and the rate at which network distributes the user’s presence information to the watchers. SIMPLEstone is a set of metrics for benchmarking the performance of presence systems based on SIMPLE. SIMPLEstone benchmarks a presence server by generating requests based on a work load specification. It measures server capacity in terms of request handling capacity as an aggregate of all types of requests as well as individual request types. The benchmark treats different configuration modes in which presence server interoperates with the Session Initiation protocol (SIP) server as one block.	(pdf)
Grouped Distributed Queues: Distributed Queue, Proportional Share Multiprocessor Scheduling	Bogdan Caprita, Jason Nieh, Clifford Stein	2006-02-07	We present Grouped Distributed Queues (GDQ), the first proportional share scheduler for multiprocessor systems that, by using a distributed queue architecture, scales well with a large number of processors and processes. GDQ achieves accurate proportional fairness scheduling with only O(1) scheduling overhead. GDQ takes a novel approach to distributed queuing: instead of creating per-processor queues that need to be constantly balanced to achieve any measure of proportional sharing fairness, GDQ uses a simple grouping strategy to organize processes into groups based on similar processor time allocation rights, and then assigns processors to groups based on aggregate group shares. Group membership of processes is static, and fairness is achieved by dynamically migrating processors among groups. The set of processors working on a group use simple, low-overhead round-robin queues, while processor reallocation among groups is achieved using a new multiprocessor adaptation of the well-known Weighted Fair Queuing algorithm. By commoditizing processors and decoupling their allocation from process scheduling, GDQ provides, with only constant scheduling cost, fairness within a constant of the ideal generalized processor sharing model for process weights with a fixed upper bound. We have implemented GDQ in Linux and measured its performance. Our experimental results show that GDQ has low overhead and scales well with the number of processors.	(pdf)
W3Bcrypt: Encryption as a Stylesheet	Angelos Stavrou, Michael Locasto, Angelos D. Keromytis	2006-02-06	While web communications are increasingly protected by transport layer cryptographic operations (SSL/TLS), there are many situations where even the communications infrastructure provider cannot be trusted. The end-to-end (E2E) encryption of data becomes increasingly important in these trust models to protect the confidentiality and integrity of the data against snooping and modification by the communications provider. We introduce W3Bcrypt, an extension to the Mozilla Firefox web platform that enables application-level cryptographic protection for web content. In effect, we view cryptographic operations as a type of style to be applied to web content along with layout and coloring operations. Among the main benefits of using encryption as a stylesheet are $(a)$ reduced workload on a web server, $(b)$ targeted content publication, and $(c)$ greatly increased privacy. This paper discusses our implementation for Firefox, but the core ideas are applicable to most current browsers.	(pdf)
A Runtime Adaptation Framework for Native C and Bytecode Applications	Rean Griffith, Gail Kaiser	2006-01-27	The need for self-healing software to respond with a reactive, proactive or preventative action as a result of changes in its environment has added the non-functional requirement of adaptation to the list of facilities expected in self-managing systems. The adaptations we are concerned with assist with problem detection, diagnosis and remediation. Many existing computing systems do not include such adaptation mechanisms, as a result these systems either need to be re-designed to include them or there needs to be a mechanism for retro-fitting these mechanisms. The purpose of the adaptation mechanisms is to ease the job of the system administrator with respect to managing software systems. This paper introduces Kheiron, a framework for facilitating adaptations in running programs in a variety of execution environments without requiring the redesign of the application. Kheiron manipulates compiled C programs running in an unmanaged execution environment as well as programs running in Microsoft’s Common Language Runtime and SunMicrosystems’ Java VirtualMachine. We present case-studies and experiments that demonstrate the feasibility of using Kheiron to support self-healing systems. We also describe the concepts and techniques used to retro-fit adaptations onto existing systems in the various execution environments.	(pdf)
Binary-level Function Profiling for Intrusion Detection and Smart Error Virtualization	Michael Locasto, Angelos Keromytis	2006-01-26	Most current approaches to self-healing software (SHS) suffer from semantic incorrectness of the response mechanism. To support SHS, we propose Smart Error Virtualization (SEV), which treats functions as transactions but provides a way to guide the program state and remediation to be a more correct value than previous work. We perform runtime binary-level profiling on unmodified applications to learn both good return values and error return values (produced when the program encounters ``bad'' input). The goal is to ``learn from mistakes'' by converting malicious input to the program's notion of ``bad'' input. We introduce two implementations of this system that support three major uses: function profiling for regression testing, function profiling for host-based anomaly detection (envinroment-specialized fault detection), and function profiling for automatic attack remediation via SEV. Our systems do not require access to the source code of the application to enact a fix. Finally, this paper is, in part, a critical examination of error virtualization in order to shed light on how to approach semantic correctness.	(pdf) (ps)
Converting from Spherical to Parabolic Coordinates	Aner Ben-Artzi	2006-01-20	A reference for converting directly between Spherical Coordinates and Parabolic Coordinates without using the intermediate Cartesian Coordinates.	(pdf)
Multi Facet Learning in Hilbert Spaces	Imre Risi Kondor, Gabor Csanyi, Sebastian E. Ahnert, Tony Jebara	2005-12-31	We extend the kernel based learning framework to learning from linear functionals, such as partial derivatives. The learning problem is formulated as a generalized regularized risk minimization problem, possibly involving several different functionals. We show how to reduce this to conventional kernel based learning methods and explore a specific application in Computational Condensed Matter Physics.	(pdf) (ps)
A Lower Bound for the Sturm-Liouville Eigenvalue Problem on a Quantum Computer	Arvid J. Bessen	2005-12-14	We study the complexity of approximating the smallest eigenvalue of a univariate Sturm-Liouville problem on a quantum computer. This general problem includes the special case of solving a one-dimensional Schroedinger equation with a given potential for the ground state energy. The Sturm-Liouville problem depends on a function q, which, in the case of the Schroedinger equation, can be identified with the potential function V. Recently Papageorgiou and Wozniakowski proved that quantum computers achieve an exponential reduction in the number of queries over the number needed in the classical worst-case and randomized settings for smooth functions q. Their method uses the (discretized) unitary propagator and arbitrary powers of it as a query ("power queries"). They showed that the Sturm-Liouville equation can be solved with O(log(1/e)) power queries, while the number of queries in the worst-case and randomized settings on a classical computer is polynomial in 1/e. This proves that a quantum computer with power queries achieves an exponential reduction in the number of queries compared to a classical computer. In this paper we show that the number of queries in Papageorgiou's and Wozniakowski's algorithm is asymptotically optimal. In particular we prove a matching lower bound of log(1/e) power queries, therefore showing that log(1/e) power queries are sufficient and necessary. Our proof is based on a frequency analysis technique, which examines the probability distribution of the final state of a quantum algorithm and the dependence of its Fourier transform on the input.	(pdf) (ps)
Dynamic Adaptation of Temporal Event Correlation Rules	Rean Griffith, Gail Kaiser, Joseph Hellerstein, Yixin Diao	2005-12-10	Temporal event correlation is essential to realizing self-managing distributed systems. Autonomic controllers often require that events be correlated across multiple components using rule patterns with timer-based transitions, e.g., to detect denial of service attacks and to warn of staging problems with business critical applications. This short paper discusses automatic adjustment of timer values for event correlation rules, in particular compensating for the variability of event propagation delays due to factors such as contention for network and server resources. We describe a corresponding Management Station architecture and present experimental studies on a testbed system that suggest that this approach can produce results at least as good as an optimal fixed setting of timer values.	(pdf)
Qubit Complexity of Continuous Problems	Anargyros Papageorgiou, Joseph Traub	2005-12-09	The number of qubits used by a quantum algorithm will be a crucial computational resource for the foreseeable future. We show how to obtain the classical query complexity for continuous problems. We then establish a simple formula for a lower bound on the qubit complexity in terms of the classical query complexity.	(pdf)
An Event System Architecture for Scaling Scale-Resistant Services	Philip Gross	2005-12-09	Large organizations are deploying ever-increasing numbers of networked compute devices, from utilities installing smart controllers on electricity distribution cables, to the military giving PDAs to soldiers, to corporations putting PCs on the desks of employees. These computers are often far more capable than is needed to accomplish their primary task, whether it be guarding a circuit breaker, displaying a map, or running a word processor. These devices would be far more useful if they had some awareness of the world around them: a controller that resists tripping a switch, knowing that it would set off a cascade failure, a PDA that warns its owner of imminent danger, a PC that exchanges reports of suspicious network activity to its peers to identify stealthy computer crackers. In order to provide these higher-level services, the devices need a model of their environment. The controller needs a model of the distribution grid, the PDA needs a model of the battlespace, and the PC needs a model of the network and of normal network and user behavior. Unfortunately, not only might models such as these require substantial computational resources, but generating and updating them is even more demanding. Modelbuilding algorithms tend to be bad in three ways: requiring large amounts of CPU and memory to run, needing large amounts of data from the outside to stay up to date, and running so slowly that can’t keep up with any fast changes in the environment that might occur. We can solve these problems by reducing the scope of the model to the immediate locale of the device, since reducing the size of the model makes the problem of model generation much more tractable. But such models are also much less useful, having no knowledge of the wider system. This thesis proposes a better solution to this problem called Level of Detail, after the computer graphics technique of the same name. Instead of simplifying the representation of distant objects, however, we simplify less-important data. Compute devices in the system receive streams of data that is a mixture of detailed data from devices that directly affect them and data summaries (aggregated data) from less directly influential devices. The degree to which the data is aggregated (i.e., how much it is reduced) is determined by calculating an influence metric between the target device and the remote device. The smart controller thus receives a continuous stream of raw data from the adjacent transformer, but only an occasional small status report summarizing all the equipment in a neighborhood in another part of the city. This thesis describes the data distribution system, the aggregation functions, and the influence metrics that can be used to implement such a system. I also describe my current towards establishing a test environment and validating the concepts, and describe the next steps in the research plan.	(pdf)
Tree Dependent Identically Distributed Learning	Tony Jebara, Philip M. Long	2005-12-06	We view a dataset of points or samples as having an underlying, yet unspecified, tree structure and exploit this assumption in learning problems. Such a tree structure assumption is equivalent to treating a dataset as being tree dependent identically distributed or tdid and preserves exchange-ability. This extends traditional iid assumptions on data since each datum can be sampled sequentially after being conditioned on a parent. Instead of hypothesizing a single best tree structure, we infer a richer Bayesian posterior distribution over tree structures from a given dataset. We compute this posterior over (directed or undirected) trees via the Laplacian of conditional distributions between pairs of input data points. This posterior distribution is efficiently normalized by the Laplacian's determinant and also facilitates novel maximum likelihood estimators, efficient expectations and other useful inference computations. In a classification setting, tdid assumptions yield a criterion that maximizes the determinant of a matrix of conditional distributions between pairs of input and output points. This leads to a novel classification algorithm we call the Maximum Determinant Machine. Unsupervised and supervised experiments are shown.	(pdf) (ps)
Micro-speculation, Micro-sandboxing, and Self-Correcting Assertions: Support for Self-Healing Software and Application Communities	Michael Locasto	2005-12-05	Software faults and vulnerabilities continue to present significant obstacles to achieving reliable and secure software. The critical problem is that systems currently lack the capability to respond intelligently and automatically to attacks -- especially attacks that exploit previously unknown vulnerabilities or are delivered by previously unseen inputs. Therefore, the goal of this thesis is to provide an environment where both supervision and automatic remediation can take place. Also provided is a mechanism to guide the supervision environment in detection and repair activities. This thesis supports the notion of Self-Healing Software by introducing three novel techniques: \emph{micro-sandboxing}, \emph{micro-speculation}, and \emph{self-correcting assertions}. These techniques are combined in a kernel-level emulation framework to speculatively execute code that may contain faults or vulnerabilities and automatically repair such faults or exploited vulnerabilities. The framework, VPUF, introduces the concept of computation as an operating system service by providing control for an array of virtual processors in the Linux kernel (creating the concept of an \emph{endolithic} kernel). This thesis introduces ROAR (Recognize, Orient, Adapt, Respond) as a conceptual workflow for Self-healing Software systems. This thesis proposal outlines a 17 month program for developing the major components of the proposed system, implementing them on a COTS operating system and programming language, subjecting them to a battery of evaluations for performance and efficacy, and publishing the results. In addition, this proposal looks forward to several areas of follow-on work, including implementing some of the proposed techniques in hardware and leveraging the general kernel-level framework to support Application Communities.	(pdf) (ps)
A Control Theory Foundation for Self-Managing Computing Systems	Yixin Diao, Joseph Hellerstein, Sujay Parekh, Rean Griffith, Gail Kaiser, Dan Phung	2005-12-05	The high cost of operating large computing installations has motivated a broad interest in reducing the need for human intervention by making systems self-managing. This paper explores the extent to which control theory can provide an architectural and analytic foundation for building self-managing systems. Control theory provides a rich set of methodologies for building automated self-diagnosis and self-repairing systems with properties such as stability, short settling times, and accurate regulation. However, there are challenges in applying control theory to computing systems, such as developing effective resource models, handling sensor delays, and addressing lead times in effector actions. We propose a deployable testbed for autonomic computing (DTAC) that we believe will reduce the barriers to addressing research problems in applying control theory to computing systems. The initial DTAC architecture is described along with several problems that it can be used to investigate.	(pdf)
A New Routing Metric for High Throughput in Dense Ad Hoc Networks	Hoon Chang, Vishal Misra, Dan Rubenstein	2005-12-01	Routing protocols in most ad hoc networks use the length of paths as the routing metric. Recent findings have revealed that the minimum-hop metric can not achieve the maximum throughput because it tries to reduce the number of hops by containing long range links, where packets need to be transmitted at the lowest transmission rate. In this paper, we investigate the tradeoff between transmission rates and throughputs and show that in dense networks with uniform-distributed traffic, there exists the optimal rate that may not be the lowest rate. Based on our observation, we propose a new routing metric, which measures the expected capability of a path assuming the per-node fairness. We develop a routing protocol based on DSDV and demonstrate that the routing metric enhances the system throughput by 20% compared to the original DSDV.	(pdf)
Effecting Runtime Reconfiguration in Managed Execution Environments	Rean Griffith, Giuseppe Valetto, Gail Kaiser	2005-11-21	Managed execution environments such as Microsoft’s Common Language Runtime (CLR) and Sun Microsystems’ Java Virtual Machine (JVM) provide a number of services – including but not limited to application isolation, security sandboxing, garbage collection and structured exception handling – that are aimed primarily at enhancing the robustness of managed applications. However, none of these services directly enables performing reconfigurations, repairs or diagnostics on the managed applications and/or its constituent subsystems and components. In this paper we examine how the facilities of a managed execution environment can be leveraged to support runtime system adaptations, such as reconfigurations and repairs. We describe an adaptation framework we have developed, which uses these facilities to dynamically attach/detach an engine capable of performing reconfigurations and repairs on a target system while it executes. Our adaptation framework is lightweight, and transparent to the application and the managed execution environment: it does not require recompilation of the application nor specially compiled versions of the managed execution runtime. Our prototype was implemented for the CLR. To evaluate our framework beyond toy examples, we searched on SourceForge for potential target systems already implemented on the CLR that might benefit from runtime adaptation. We report on our experience using our prototype to effect runtime reconfigurations in a system that was developed and is in use by others: the Alchemi enterprise Grid Computing System developed at the University of Melbourne, Australia.	(pdf) (ps)
Adaptive Synchronization of Semantically Compressed Instructional Videos for Collaborative Distance Learning	Dan Phung, Giuseppe Valetto, Gail Kaiser, Tiecheng Liu, John Kender	2005-11-21	The increasing popularity of online courses has highlighted the need for collaborative learning tools for student groups. In addition, the introduction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources available to students. We present an e-Learning architecture and adaptation model called AI2TV (Adaptive Interactive Internet Team Video), which allows groups of students to collaboratively view a video in synchrony. AI2TV upholds the invariant that each student will view semantically equivalent content at all times. A semantic compression model is developed to provide instructional videos at different level-of-details to accommodate dynamic network conditions and users’ system requirements. We take advantage of the semantic compression algorithm’s ability to provide different layers of semantically equivalent video by adapting the client to play at the appropriate layer that provides the client with the richest possible viewing experience. Video player actions, like play, pause and stop, can be initiated by any group member and and the results of those actions are synchronized with all the other students. These features allow students to review a lecture video in tandem, facilitating the learning process. Experimental trials show that AI2TV successfully synchronizes instructional videos for distributed students while concurrently optimizing the video quality, even under conditions of fluctuating bandwidth, by adaptively adjusting the quality level for each student while still maintaining the invariant.	(pdf)
A Genre-based Clustering Approach to Content Extraction	Suhit Gupta, Hila Becker, Gail Kaiser, Salvatore Stolfo	2005-11-11	The content of a webpage is usually contained within a small body of text and images, or perhaps several articles on the same page; however, the content may be lost in the clutter (defined as cosmetic features such as animations, menus, sidebars, obtrusive banners). Automatic content extraction has many applications, including browsing on small cell phone and PDA screens, speech rendering for the visually impaired, and reducing noise for information retrieval systems. We have developed a framework, Crunch, which employs various heuristics for content extraction in the form of filters applied to the webpage's DOM tree; the filters aim to prune or transform the clutter, leaving only the content. Crunch allows users to tune what we call "settings", consisting of thresholds for applying a particular filter and/or for toggling a filter on/off, because the HTML components that characterize clutter can vary significantly from website to website. However, we have found that the same settings tend to work well across different websites of the same genre, e.g., news or shopping, since the designers often employ similar page layouts. In particular, Crunch could obtain the settings for a previously unknown website by automatically classifying it as sufficiently similar to a cluster of known websites with previously adjusted settings. We present our approach to clustering a large corpus of websites into genres, using their pre-extraction textual material augmented by the snippets generated by searching for the website's domain name in web search engines. Including these snippets increases the frequency of function words needed for clustering. We use existing Manhattan distance measure and hierarchical clustering techniques, with some modifications, to pre-classify the corpus into genres offline. Our method does not require prior knowledge of the set of genres that websites fit into, but to be useful a priori settings must be available for some member of each cluster or a nearby cluster (otherwise defaults are used). Crunch classifies newly encountered websites online in linear-time, and then applies the corresponding filter settings, with no noticeable delay added by our content-extracting web proxy.	(pdf)
Privacy-Preserving Distributed Event Correlation	Janak Parekh	2005-11-07	Event correlation is a widely-used data processing methodology for a broad variety of applications, and is especially useful in the context of distributed monitoring for software faults and vulnerabilities. However, most existing solutions have typically been focused on "intra-organizational" correlation; organizations typically employ privacy policies that prohibit the exchange of information outside of the organization. At the same time, the promise of "inter-organizational" correlation is significant given the broad availability of Internet-scale communications, and its potential role in both software maintenance and software vulnerability exploits. In this proposal, I present a framework for reconciling these opposing forces in event correlation via the use of privacy preservation integrated into the event processing framework. By integrating flexible privacy policies, we enable the correlation of organizations' data without actually releasing sensitive information. The framework supports both source anonymity and data privacy, yet allows for the time-based correlation of a broad variety of data. The framework is designed as a lightweight collection of components to enable integration with existing COTS platforms and distributed systems. I also present two different implementations of this framework: XUES (XML Universal Event Service), an event processor used as part of a software monitoring platform called KX (Kinesthetics eXtreme), and Worminator, a collaborative Intrusion Detection System. KX comprised a series of components, connected together with a publish-subscribe content-based routing event subsystem, for the autonomic software monitoring of complex distributed systems. Sensors were installed in legacy systems. XUES' two modules then performed event processing on sensor data: information was collected and processed by the Event Packager, and correlated using the Event Distiller. While XUES itself was not privacy-preserving, it laid the groundwork for this thesis by supporting event typing, the use of publish-subscribe and extensibility support via pluggable event transformation modules. Worminator, the second implementation, extends the XUES platform to fully support privacy-preserving event types and algorithms in the context of a Collaborative Intrusion Detection System (CIDS), whereby sensor alerts can be exchanged and corroborated--a reduced form of correlation that enables collaborative verification--without revealing sensitive information about a contributor's network, services, or even external sources as required. Worminator also fully anonymizes source information, allowing contributors to decide their preferred level of information disclosure. Worminator is implemented as a monitoring framework on top of a COTS IDS sensor, and demonstrably enables the detection of not only worms but also "broad and stealthy" scans; traditional single-network sensors either bury such scans in large volumes or miss them entirely. Worminator has been successfully deployed at 5 collaborating sites and work is under way to scale it up further. The contributions of this thesis include the development of a cross-application-domain event correlation framework with native privacy-preserving types, the use and validation of privacy-preserving corroboration, and the establishment of a practical deployed collaborative security system. I also outline the next steps in the thesis research plan, including the development of evaluation metrics to quantify Worminator's effectiveness at long-term scan detection, the overhead of privacy preservation and the effectiveness of our approach against adversaries, be they "honest-but-curious" or actively malicious. This thesis has broad future work implications, including privacy-preserving signature detection and distribution, distributed stealthy attacker profiling, and "application community"-based software vulnerability detection.	(pdf)
Tractability of quasilinear problems. II: Second-order elliptic problems	A. G. Werschulz, H. Wozniakowski	2005-11-01	In a previous paper, we developed a general framework for establishing tractability and strong tractability for quasilinear multivariate problems in the worst case setting. One important example of such a problem is the solution of the Poisson equation $-\Delta u + qu = f$ in the $d$-dimensional unit cube, in which $u$ depends linearly on~$f$, but nonlinearly on~$q$. Here, both $f$ and~$q$ are $d$-variate functions from a reproducing kernel Hilbert space with finite-order weights of order~$\omega$. This means that, although~$d$ can be arbitrary large, $f$ and~$q$ can be decomposed as sums of functions of at most $\omega$~variables, with $\omega$ independent of~$d$. In this paper, we apply our previous general results to the Poisson equation, subject to either Dirichlet or Neumann homogeneous boundary conditions. We study both the absolute and normalized error criteria. For all four possible combinations of boundary conditions and error criteria, we show that the problem is \emph{tractable}. That is, the number of evaluations of $f$ and~$q$ needed to obtain an $\e$-approximation is polynomial in~$\e^{-1}$ and~$d$, with the degree of the polynomial depending linearly on~$\omega$. In addition, we want to know when the problem is \emph{strongly tractable}, meaning that the dependence is polynomial only in~$\e^{-1}$, independently of~$d$. We show that if the sum of the weights defining the weighted reproducing kernel Hilbert space is uniformly bounded in~$d$ and the integral of the univariate kernel is positive, then the Poisson equation is strongly tractable for three of the four possible combinations of boundary conditions and error criterion, the only exception being the Dirichlet boundary condition under the normalized error criterion.	(pdf)
TCP-Friendly Rate Control with Token Bucket for VoIP Congestion Control	Miguel Maldonado, Salman Abdul Baset, Henning Schulzrinne	2005-10-17	TCP Friendly Rate Control (TFRC) is a congestion control algorithm that provides a smooth transmission rate for real-time network applications. TFRC refrains from halving the sending rate on every packet drop, instead it is adjusted as a function of the loss rate during a single round trip time. TFRC has been proven to be fair when competing with TCP flows over congested links, but it lacks quality-of-service parameters to improve the performance of real-time traffic. A problem with TFRC is that it uses additive increase to adjust the sending rate during periods with no congestion. This leads to short term congestion that can degrade the quality of voice applications. We propose two changes to TFRC that improve the performance of VoIP applications. Our implementation, TFRC with Token Bucket (TFRC-TB), uses discrete calculated bit rates based on audio codec bandwidth usage to increase the sending rate. Also, it uses a token bucket to control the sending rate during congestion periods. We have used ns2, the network simulator, to compare our implementation to TFRC in a wide range of network conditions. Our results suggest that TFRC-TB can provide a quality of service (QoS) mechanism to voice applications while competing fairly with other traffic over congested links.	(pdf) (ps)
Performance and Usability Analysis of Varying Web Service Architectures	Michael Lenner, Henning Schulzrinne	2005-10-14	We tested the performance of four web application architectures, namely CGI, PHP, Java servlets, and Apache Axis SOAP. All four architectures implemented a series of typical web application tasks. Our findings indicated that PHP produced the smallest delay, while the SOAP implementation produces the largest.	(pdf)
Square Root Propagation	Andrew Howard, Tony Jebara	2005-10-07	We propose a message propagation scheme for numerically stable inference in Gaussian graphical models which can otherwise be susceptible to errors caused by finite numerical precision. We adapt square root algorithms, popular in Kalman filtering, to graphs with arbitrary topologies. The method consists of maintaining potentials and generating messages that involve the square root of precision matrices. Combining this with the machinery of the junction tree algorithm leads to an efficient and numerically stable algorithm. Experiments are presented to demonstrate the robustness of the method to numerical errors that can arise in complex learning and inference problems.	(ps)
Approximating the Reflection Integral as a Summation: Where did the delta go?	Aner Ben-Artzi	2005-10-07	In this note, I explore why the the common approximation of the reflection integral is not written with a delta omega-in ( ) to replace the differential omega-in ( ). After that, I go on to discover what really happens when the sum over all directions is reduced to a sum over a small number of directions. In the final section, I make recommendations for correctly approximating the reflection sum, and briefly suggest a possible framework for multiple importance sampling on both lighting and brdf.	(pdf)
DotSlash: Providing Dynamic Scalability to Web Applications with On-demand Distributed Query Result Caching	Weibin Zhao, Henning Schulzrinne	2005-09-29	Scalability poses a significant challenge for today's web applications, mainly due to the large population of potential users. To effectively address the problem of short-term dramatic load spikes caused by web hotspots, we developed a self-configuring and scalable rescue system called DotSlash. The primary goal of our system is to provide dynamic scalability to web applications by enabling a web site to obtain resources dynamically, and use them autonomically without any administrative intervention. To address the database server bottleneck, DotSlash allows a web site to set up on-demand distributed query result caching, which greatly reduces the database workload for read mostly databases, and thus increases the request rate supported at a DotSlash-enabled web site. The novelty of our work is that our query result caching is on demand, and operated based on load conditions. The caching remains inactive as long as the load is normal, but is activated once the load is heavy. This approach offers good data consistency during normal load situations, and good scalability with relaxed data consistency for heavy load periods. We have built a prototype system for the widely used LAMP configuration, and evaluated our system using the RUBBoS bulletin board benchmark. Experiments show that a DotSlash-enhanced web site can improve the maximum request rate supported by a factor of 5 using 8 rescue servers for the RUBBoS submission mix, and by a factor of 10 using 15 rescue servers for the RUBBoS read-only mix.	(pdf) (ps)
The Pseudorandomness of Elastic Block Ciphers	Debra Cook, Moti Yung, Angelos Keromytis	2005-09-28	We investigate elastic block ciphers, a method for constructing variable length block ciphers, from a theorectical perspective. We view the underlying structure of an elastic block cipher as a network, which we refer to as an elastic network, and analyze the network in a manner similar to the analysis performed by Luby and Rackoff on Fesitel networks. We prove that a three round elastic network is a pseudorandom permutation and a four round network is a strong pseudorandom permutation when the round functions are pseudorandom permutations.	(pdf) (ps)
A General Analysis of the Security of Elastic Block Ciphers	Debra Cook, Moti Yung, Angelos Keromytis	2005-09-28	We analyze the security of elastic block ciphers in general to show that an attack on an elastic version of block cipher implies a polynomial time related attack on the fixed-length version of the block cipher. We relate the security of the elastic version of a block cipher to the fixed-length version by forming a reduction between the versions. Our method is independent of the specific block cipher used. The results imply that if the fixed-length version of a block cipher is secure against attacks which attempt key recovery then the elastic version is also secure against such attacks.	(pdf) (ps)
On Elastic Block Ciphers and Their Differential and Linear Cryptanalyses	Debra Cook, Moti Yung, Angelos Keromytis	2005-09-28	Motivated by applications such as databases with nonuniform field lengths, we introduce the concept of an elastic block cipher, a new approach to variable length block ciphers which incorporates fixed sized cipher components into a new network structure. Our scheme allows us to dynamically "stretch" the supported block size of a block cipher up to a length double the original block size, while increasing the computational workload proportionally to the block size. We show that traditional attacks against an elastic block cipher are impractical if the original cipher is secure. In this paper we focus on differential and linear attacks. Specifically, we employ an elastic version of Rijndael supporting block sizes of 128 to 256 bits as an example, and show it is resistant to both differential and linear attacks. In particular, employing a different method than what is employed in Rijndael design, we show that the probability of any differential characteristic for the elastic version of Rijndael is <= 2^-(block size). We further prove that both linear and nonlinear attacks are computationally infeasible for any elastic block cipher if the original cipher is not subject to such an attack and involves a block size for which an exhaustive plaintext search is computationally infeasible (as is the case for Rijndael).	(pdf) (ps)
PachyRand: SQL Randomization for the PostgreSQL JDBC Driver	Michael Locasto, Angelos D. Keromytis	2005-08-26	Many websites are driven by web applications that deliver dynamic content stored in SQL databases. Such systems take input directly from the client via HTML forms. Without proper input validation, these systems are vulnerable to SQL injection attacks. The predominant defense against such attacks is to implement better input validation. This strategy is unlikely to succeed on its own. A better approach is to protect systems against SQL injection automatically and not rely on manual supervision or testing strategies (which are incomplete by nature). SQL randomization is a technique that defeats SQL injection attacks by transforming the language of SQL statements in a web application such that an attacker needs to guess the transformation in order to successfully inject his code. We present PachyRand, an extension to the PostgreSQL JDBC driver that performs SQL randomization. Our system is easily portable to most other JDBC drivers, has a small performance impact, and makes SQL injection attacks infeasible.	(pdf) (ps)
Parsing Preserving Techniques in Grammar Induction	Smaranda Muresan	2005-08-20	In this paper we present the theoretical foundation of the search space for learning a class of constraint-based grammars, which preserve the parsing of representative examples. We prove that under several assumptions the search space is a complete grammar lattice, and the lattice top element is a grammar that can always be learned from a set of representative examples and a sublanguage used to reduce the grammar semantics. This complete grammar lattice guarantees convergence of solutions of any learning algorithm that obeys the given assumptions.	(pdf) (ps)
Generic Models for Mobility Management in Next Generation Networks	Maria Luisa Cristofano, Andrea G. Forte, Henning Schulzrinne	2005-08-08	In the network community different mobility management techniques have been proposed over the years. However, many of these techniques share a surprisingly high number of similarities. In this technical report we analyze and evaluate the most relevant mobility management techniques, pointing out differences and similarities. For macro-mobility we consider Mobile IP (MIP), the Session Initiation Protocol (SIP) and mobility management techniques typical of a GSM network; for micro-mobility we describe and analyze several protocols such as: Hierarchical MIP, TeleMIP, IDMP, Cellular IP and HAWAII.	(pdf)
Pointer Analysis for C Programs Through AST Traversal	Marcio Buss, Stephen Edwards, Bin Yao, Daniel Waddington	2005-08-04	We present a pointer analysis algorithm designed for source-to-source transformations. Existing techniques for pointer analysis apply a collection of inference rules to a dismantled intermediate form of the source program, making them difficult to apply to source-to-source tools that generally work on abstract syntax trees to preserve details of the source program. Our pointer analysis algorithm operates directly on the abstract syntax tree of a C program and uses a form of standard dataflow analysis to compute the desired points-to information. We have implemented our algorithm in a source-to-source translation framework and experimental results show that it is practical on real-world examples.	(pdf)
Adaptive Interactive Internet Team Video	Dan Phung, Giuseppe Valetto, Gail Kaiser	2005-08-04	The increasing popularity of distance learning and online courses has highlighted the lack of collaborative tools for student groups. In addition, the introduction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources used by students. We present an e-Learning architecture and adaptation model called AI2TV (Adaptive Internet Interactive Team Video), a system that allows borderless, virtual students, possibly some or all disadvantaged in network resources, to collaboratively view a video in synchrony. AI2TV upholds the invariant that each student will view semantically equivalent content at all times. Video player actions, like play, pause and stop, can be initiated by any of the students and the results of those actions are seen by all the other students. These features allow group members to review a lecture video in tandem to facilitate the learning process. We show in experimental trials that our system can successfully synchronize video for distributed students while, at the same time, optimizing the video quality given actual (fluctuating) bandwidth by adaptively adjusting the quality level for each student.	(pdf)
Tractability of Quasilinear Problems I: General Results	Arthur Werschulz, Henryk Wozniakowski	2005-08-04	The tractability of multivariate problems has usually been studied only for the approximation of linear operators. In this paper we study the tractability of quasilinear multivariate problems. That is, we wish to approximate nonlinear operators~$S_d(\cdot,\cdot)$ that depend linearly on the first argument and satisfy a Lipschitz condition with respect to both arguments. Here, both arguments are functions of $d$~variables. Many computational problems of practical importance have this form. Examples inlude the solution of specific Dirichlet, Neumann, and Schr\"odinger problems. We show, under appropriate assumptions, that quasilinear problems, whose domain spaces are equipped with product or finite-order weights, are tractable or strongly tractable in the worst case setting. This paper is the first part in a series of papers. Here, we present tractability results for quasilinear problems under general assumptions on quasilinear operators and weights. In future papers, we shall verify these assumptions for quasilinear problems such as the solution of specific Dirichlet, Neumann, and Schr\"odinger problems.	(pdf)
Agnostically Learning Halfspaces	Adam Kalai, Adam Klivans, Yishay Mansour, Rocco A. Servedio	2005-08-02	We consider the problem of learning a halfspace in the agnostic framework of Kearns et al., where a learner is given access to a distribution on labelled examples but the labelling may be arbitrary. The learner's goal is to output a hypothesis which performs almost as well as the optimal halfspace with respect to future draws from this distribution. Although the agnostic learning framework does not explicitly deal with noise, it is closely related to learning in worst-case noise models such as malicious noise. We give the first polynomial-time algorithm for agnostically learning halfspaces with respect to several distributions, such as the uniform distribution over the $n$-dimensional Boolean cube {0,1}^n or unit sphere in n-dimensional Euclidean space, as well as any log-concave distribution in n-dimensional Euclidean space. Given any constant additive factor eps>0, our algorithm runs in poly(n) time and constructs a hypothesis whose error rate is within an additive eps of the optimal halfspace. We also show this algorithm agnostically learns Boolean disjunctions in time roughly 2^{\sqrt{n}} with respect to any distribution; this is the first subexponential-time algorithm for this problem. Finally, we obtain a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere which can tolerate the highest level of malicious noise of any algorithm to date. Our main tool is a polynomial regression algorithm which finds a polynomial that best fits a set of points with respect to a particular metric. We show that, in fact, this algorithm is an arbitrary-distribution generalization of the well known ``low-degree'' Fourier algorithm of Linial, Mansour, & Nisan and has excellent noise tolerance properties when minimizing with respect to the L_1 norm. We apply this algorithm in conjunction with a non-standard Fourier transform (which does not use the traditional parity basis) for learning halfspaces over the uniform distribution on the unit sphere; we believe this technique is of independent interest.	(pdf) (ps)
Learning mixtures of product distributions over discrete domains	Jon Feldman, Ryan O'Donnell, Rocco A. Servedio	2005-07-28	We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. We give a $\poly(n/\eps)$ time algorithm for learning a mixture of $k$ arbitrary product distributions over the $n$-dimensional Boolean cube $\{0,1\}^n$ to accuracy $\eps$, for any constant $k$. Previous polynomial time algorithms could only achieve this for $k = 2$ product distributions; our result answers an open question stated independently by Cryan and by Freund and Mansour. We further give evidence that no polynomial time algorithm can succeed when $k$ is superconstant, by reduction from a notorious open problem in PAC learning. Finally, we generalize our $\poly(n/\eps)$ time algorithm to learn any mixture of $k = O(1)$ product distributions over $\{0,1, \dots, b\}^n$, for any $b = O(1)$.	(pdf) (ps)
Incremental Algorithms for Inter-procedural Analysis of Safety Properties	Christopher L. Conway, Kedar Namjoshi, Dennis Dams, Stephen A. Edwards	2005-07-10	Automaton-based static program analysis has proved to be an effective tool for bug finding. Current tools generally re-analyze a program from scratch in response to a change in the code, which can result in much duplicated effort. We present an inter-procedural algorithm that analyzes incrementally in response to program changes and present experiments for a null-pointer dereference analysis. It shows a substantial speed-up over re-analysis from scratch, with a manageable amount of disk space used to store information between analysis runs.	(pdf) (ps)
Lexicalized Well-Founded Grammars: Learnability and Merging	Smaranda Muresan, Tudor Muresan, Judith Klavans	2005-06-30	This paper presents the theoretical foundation of a new type of constraint-based grammars, Lexicalized Well-Founded Grammars, which are adequate for modeling human language and are learnable. These features make the grammars suitable for developing robust and scalable natural language understanding systems. Our grammars capture both syntax and semantics and have two types of constraints at the rule level: one for semantic composition and one for ontology-based semantic interpretation. We prove that these grammars can always be learned from a small set of semantically annotated, ordered representative examples, using a relational learning algorithm. We introduce a new semantic representation for natural language, which is suitable for an ontology-based interpretation and allows us to learn the compositional constraints together with the grammar rules. Besides the learnability results, we give a principle for grammar merging. The experiments presented in this paper show promising results for the adequacy of these grammars in learning natural language. Relatively simple linguistic knowledge is needed to build the small set of semantically annotated examples required for the grammar induction.	(pdf) (ps)
A Uniform Programming Abstraction for Effecting Autonomic Adaptations onto Software Systems	Giuseppe Valetto, Gail Kaiser, Dan Phung	2005-06-05	Most general-purpose work towards autonomic or self-managing systems has emphasized the front end of the feedback control loop, with some also concerned with controlling the back end enactment of runtime adaptations -- but usually employing an effector technology peculiar to one type of target system. While completely generic "one size fits all" effector technologies seem implausible, we propose a general purpose programming model and interaction layer that abstractsaway from the peculiarities of target specific effectors,enabling a uniform approach to controlling and coordinatingthe low-level execution of reconfigurations, repairs,micro-reboots, etc	(pdf)
The Appearance of Human Skin	Takanori Igarashi, Ko Nishino, Shree K. Nayar	2005-05-31	Skin is the outer most tissue of the human body. As a result, people are very aware of, and very sensitive to, the appearance of their skin. Consequently, skin appearance has been a subject of great interest in various fields of science and technology. Research on skin appearance has been intensely pursued in the fields of medicine, cosmetology, computer graphics and computer vision. Since the goals of these fields are very different, each field has tended to focus on specific aspects of the appearance of skin. The goal of this work is to present a comprehensive survey that includes the most prominent results related to skin in these different fields and show how these seemingly disconnected studies are related to one another.	(pdf)
Time-Varying Textures	Sebastian Enrique, Melissa Koudelka, Peter Belhumeur, Julie Dorsey, Shree Nayar, Ravi Ramamoorthi	2005-05-25	Essentially all computer graphics rendering assumes that the reflectance and texture of surfaces is a static phenomenon. Yet, there is an abundance of materials in nature whose appearance varies dramatically with time, such as cracking paint, growing grass, or ripening banana skins. In this paper, we take a significant step towards addressing this problem, investigating a new class of time-varying textures. We make three contributions. First, we describe the carefully controlled acquisition of datasets of a variety of natural processes including the growth of grass, the accumulation of snow, and the oxidation of copper. Second, we show how to adapt quilting-based methods to time-varying texture synthesis, addressing the important challenges of maintaining temporal coherence, efficient synthesis on large time-varying datasets, and reducing visual artifacts specific to time-varying textures. Finally, we show how simple procedural techniques can be used to control the evolution of the results, such as allowing for a faster growth of grass in well lit (as opposed to shadowed) areas.	(pdf) (ps)
Merging Globally Rigid Formations of Mobile Autonomous Agents	Tolga Eren, Brian Anderson, Walter Whiteley, Stephen Morse, Peter Belhumeur	2005-05-19	This paper is concerned with merging globally rigid formations of mobile autonomous agents. A key element in all future multi-agent systems will be the role of sensor and communication networks as an integral part of coordination. Network topologies are critically important for autonomous systems involving mobile underwater, ground and air vehicles and for sensor networks. This paper focuses on developing techniques and strategies for the analysis and design of sensor and network topologies required to merge globally rigid formations for cooperative tasks. Central to the development of these techniques and strategies will be the use of tools from rigidity theory, and graph theory.	(pdf)
Optimal State-Free, Size-aware Dispatching for Heterogeneous $M/G/$-type systems	Hanhua Fengand Vishal Misra, Dan Rubenstein, Dan Rubenstein	2005-05-04	We consider a cluster of heterogeneous servers, modeled as $M/G/1$ queues with different processing speeds. The scheduling policies for these servers can be either processor-sharing or first-come first-serve. Furthermore, a dispatcher that assigns jobs to the servers takes as input only the size of the arriving job and the overall job-size distribution. This general model captures the behavior of a variety of real systems, such as web server clusters. Our goal is to identify assignment strategies that the dispatcher can perform to minimize expected completion time and waiting time. We show that there exist optimal strategies that are deterministic, fixing the server to which jobs of particular sizes are always sent. We prove that the optimal strategy for systems with identical servers assigns a non-overlapping interval range of job sizes to each server. We then prove that when server processing speeds differ, it is necessary to assign each server a distinct set of intervals of job sizes in order to minimize expected waiting or response times. We explore some of the practical challenges of identifying the optimal strategy, and also study a related problem that uses our model of how to provision server processing speeds to minimize waiting and completion time given a job size distribution and fixed aggregate processing power.	(pdf) (ps)
A Hybrid Approach to Topological Mobile Robot Localization	Paul Blaer, Peter K. Allen	2005-04-27	We present a hybrid method for localizing a mobile robot in a complex environment. The method combines the use of multiresolution histograms with a signal strength analysis of existing wireless networks. We tested this localization procedure on the campus of Columbia University with our mobile robot, the Autonomous Vehicle for Exploration and Navigation of Urban Environments. Our results indicate that localization accuracy is significantly improved when five levels of resolution are used instead of one in color histogramming. We also find that incorporating wireless signal strengths into the method further improves reliability and helps to resolve ambiguities which arise when different regions have similar visual appearances.	(pdf)
Classical and Quantum Complexity of the Sturm-Liouville Eigenvalue Problem	A. Papageorgiou, H. Wozniakowski	2005-04-22	We study the approximation of the smallest eigenvalue of a Sturm-Liouville problem in the classical and quantum settings. We consider a univariate Sturm-Liouville eigenvalue problem with a nonnegative function $q$ from the class $C^2([0,1])$ and study the minimal number $n(\e)$ of function evaluations or queries that are necessary to compute an $\e$-approximation of the smallest eigenvalue. We prove that $n(\e)=\Theta(\e^{-1/2})$ in the (deterministic) worst case setting, and $n(\e)=\Theta(\e^{-2/5})$ in the randomized setting. The quantum setting offers a polynomial speedup with {\it bit} queries and an exponential speedup with {\it power} queries. Bit queries are similar to the oracle calls used in Grover's algorithm appropriately extended to real valued functions. Power queries are used for a number of problems including phase estimation. They are obtained by considering the propagator of the discretized system at a number of different time moments. They allow us to use powers of the unitary matrix $\exp(\tfrac12 {\rm i}M)$, where $M$ is an $n\times n$ matrix obtained from the standard discretization of the Sturm-Liouville differential operator. The quantum implementation of power queries by a number of elementary quantum gates that is polylog in $n$ is an open issue. We study the approximation of the smallest eigenvalue of a Sturm-Liouville problem in the classical and quantum settings. We consider a univariate Sturm-Liouville eigenvalue problem with a nonnegative function $q$ from the class $C^2([0,1])$ and study the minimal number $n(\e)$ of function evaluations or queries that are necessary to compute an $\e$-approximation of the smallest eigenvalue. We prove that $n(\e)=\Theta(\e^{-1/2})$ in the (deterministic) worst case setting, and $n(\e)=\Theta(\e^{-2/5})$ in the randomized setting. The quantum setting offers a polynomial speedup with {\it bit} queries and an exponential speedup with {\it power} queries. Bit queries are similar to the oracle calls used in Grover's algorithm appropriately extended to real valued functions. Power queries are used for a number of problems including phase estimation. They are obtained by considering the propagator of the discretized system at a number of different time moments. They allow us to use powers of the unitary matrix $\exp(\tfrac12 {\rm i}M)$, where $M$ is an $n\times n$ matrix obtained from the standard discretization of the Sturm-Liouville differential operator. The quantum implementation of power queries by a number of elementary quantum gates that is polylog in $n$ is an open issue.	(pdf) (ps)
Improving Database Performance on Simultaneous Multithreading Processors	Jingren Zhou, John Cieslewicz, Kenneth A. Ross, Mihir Shah	2005-04-18	Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. Because threads share processor resources, an SMT system is inherently different from a multiprocessor system and, therefore, utilizing multiple threads on an SMT processor creates new challenges for database implementers. We investigate three thread-based techniques to exploit SMT architectures on memory-resident data. First, we consider running independent operations in separate threads, a technique applied to conventional multiprocessor systems. Second, we describe a novel implementation strategy in which individual operators are implemented in a multi-threaded fashion. Finally, we introduce a new data-structure called a work-ahead set that allows us to use one of the threads to aggressively preload data into the cache for use by the other thread. We evaluate each method with respect to its performance, implementation complexity, and other measures. We also provide guidance regarding when and how to best utilize the various threading techniques. Our experimental results show that by taking advantage of SMT technology we achieve a 30\% to 70\% improvement in throughput over single threaded implementations on in-memory database operations.	(pdf)
Quantum algorithms and complexity for certain continuous and related discrete problems	Marek Kwas	2005-04-14	The thesis contains an analysis of two computational problems. The first problem is discrete quantum Boolean summation. This problem is a building block of quantum algorithms for many continuous problems, such as integration, approximation, di®erential equations and path integration. The second problem is continuous multivariate Feynman-Kac path integration, which is a special case of path integration. The quantum Boolean summation problem can be solved by the quantum summation (QS) algorithm of Brassard, Høyer, Mosca and Tapp, which approximates the arithmetic mean of a Boolean function. We improve the error bound of Brassard et al. for the worst-probabilistic setting. Our error bound is sharp. We also present new sharp error bounds in the average-probabilistic and worst-average settings. Our average-probabilistic error bounds prove the optimality of the QS algorithm for a certain choice of its parameters. The study of the worst-average error shows that the QS algorithm is not optimal in this setting; we need to use a certain number of repetitions to regain its optimality. The multivariate Feynman-Kac path integration problem for smooth multivariate functions su®ers from the provable curse of dimensionality in the worst-case deterministic setting, i.e., the minimal number of function evaluations needed to compute an approximation depends exponentially on the number of variables. We show that in both the randomized and quantum settings the curse of dimensionality is vanquished, i.e., the minimal number of function evaluations and/or quantum queries required to compute an approximation depends only polynomially on the reciprocal of the desired accuracy and has a bound independent of the number of variables. The exponents of these polynomials are 2 in the randomized setting and 1 in the quantum setting. These exponents can be lowered at the expense of the dependence on the number of variables. Hence, the quantum setting yields exponential speedup over the worst-case deterministic setting, and quadratic speedup over the randomized setting.	(pdf)
A Hybrid Hierarchical and Peer-to-Peer Ontology-based Global Service Discovery System	Knarig Arabshian, Henning Schulzrinne	2005-04-06	Current service discovery systems fail to span across the globe and they use simple attribute-value pair or interface matching for service description and querying. We propose a global service discovery system, GloServ, that uses the description logic Web Ontology Language (OWL DL). The GloServ architecture spans both local and wide area networks. It maps knowledge obtained by the service classification ontology to a structured peer-to-peer network such as a Content Addressable Network (CAN). GloServ also performs automated and intelligent registration and querying by exploiting the logical relationships within the service ontologies.	(pdf) (ps)
Multi-Language Edit-and-Continue for the Masses	Marc Eaddy, Steven Feiner	2005-04-05	We present an Edit-and-Continue implementation that allows regular source files to be treated like interactively updatable, compiled scripts, coupling the speed of compiled na-tive machine code, with the ability to make changes without restarting. Our implementa-tion is based on the Microsoft .NET Framework and allows applications written in any .NET language to be dynamically updatable. Our solution works with the standard ver-sion of the Microsoft Common Language Runtime, and does not require a custom com-piler or runtime. Because no application changes are needed, it is transparent to the appli-cation developer. The runtime overhead of our implementation is low enough to support updating real-time applications (e.g., interactive 3D graphics applications).	(pdf) (ps)
Similarity-based Multilingual Multi-Document Summarization	David Kirk Evans, Kathleen McKeown, Judi, Kathleen McKeown, Judith L. Klavans	2005-03-31	We present a new approach for summarizing clusters of documents on the same event, some of which are machine translations of foreign-language documents and some of which are English. Our approach to multilingual multi-document summarization uses text similarity to choose sentences from English documents based on the content of the machine translated documents. A manual evaluation shows that 68\% of the sentence replacements improve the summary, and the overall summarization approach outperforms first-sentence extraction baselines in automatic ROUGE-based evaluations.	(pdf) (ps)
802.11b Throughput with Link Interference	Hoon Chang, Vishal Misra	2005-03-29	IEEE 802.11 MAC is a CSMA/CA protocol and uses RTS/CTS exchanges to avoid the hidden terminal problem. Recent findings have revealed that the carriersensing range set in current major implementations does not detect and prevent all interference signals even using RTS/CTS access method together. In this paper, we investigate the effect of interference and develop a mathematical model for it. We demonstrate that the 802.11 DCF does not properly act on the interference channel due to the small size and the exponential increment of backoff windows. The accuracy of our model is verified via simulations. Based on an insight from our model, we present a simple protocol that operates on the top of 802.11 MAC layer and achieves more throughput than rate-adjustment schemes.	(pdf) (ps)
A Lower Bound for Quantum Phase Estimation	Arvid J. Bessen	2005-03-22	We obtain a query lower bound for quantum algorithms solving the phase estimation problem. Our analysis generalizes existing lower bound approaches to the case where the oracle Q is given by controlled powers Q^p of Q, as it is for example in Shor's order finding algorithm. In this setting we will prove a log (1/epsilon) lower bound for the number of applications of Q^p1, Q^p2, ... This bound is tight due to a matching upper bound. We obtain the lower bound using a new technique based on frequency analysis.	(pdf) (ps)
The Power of Various Real-Valued Quantum Queries	Arvid J. Bessen	2005-03-22	The computation of combinatorial and numerical problems on quantum computers is often much faster than on a classical computer in numbers of queries. A query is a procedure by which the quantum computer gains information about the specific problem. Different query definitions were given and our aim is to review them and to show that these definitions are not equivalent. To achieve this result we will study the simulation and approximation of one query type by another. While approximation is easy in one direction, we will show that it is hard in the other direction by a lower bound for the numbers of queries needed in the simulation. The main tool in this lower bound proof is a relationship between quantum algorithms and trigonometric polynomials that we will establish.	(pdf) (ps)
Rigid Formations with Leader-Follower Architecture	Tolga Eren, Walter Whiteley, Peter N. Belhumeur	2005-03-14	This paper is concerned with information structures used in rigid formations of autonomous agents that have leader-follower architecture. The focus of this paper is on sensor/network topologies to secure control of rigidity. We extend our previous approach for formations with symmetric neighbor relations to include formations with leader-follower architecture. Necessary and sufficient conditions for stably rigid directed formations are given including both cyclic and acyclic directed formations. Some useful steps for creating topologies of directed rigid formations are developed. An algorithm to determine the directions of links to create stably rigid directed formations from rigid undirected formations is presented. It is shown that k-cycles (k > 2) do not cause inconsistencies when measurements are noisy, while 2-cycles do. Simulation results are presented for (i) a rigid acyclic formation, (i) a flexible formation, and (iii) a rigid formation with cycles.	(pdf) (ps)
P2P Video Synchronization in a Collaborative Virtual Environment	Suhit Gupta, Gail Kaiser	2005-02-25	We have previously developed a collaborative virtual environment (CVE) for small-group virtual classrooms, intended for distance learning by geographically dispersed students. The CVE employs a peer-to-peer approach to the frequent real-time updates to the 3D virtual worlds required by avatar movements (fellow students in the same room are depicted by avatars). This paper focuses on our extension to the P2P model to support group viewing of lecture videos, called VECTORS, for Video Enhanced Collaboration for Team Oriented Remote Synchronization. VECTORS supports synchronized viewing of lecture videos, so the students all see "the same thing at the same time", and can pause, rewind, etc. in synchrony while discussing the lecture material via "chat". We are particularly concerned with the needs of the technologically disenfranchised, e.g., whose only Web/Internet access if via dialup or other relatively low-bandwidth networking. Thus VECTORS employs semantically compressed videos with meager bandwidth requirements. Further, the videos are displayed as a sequence of JPEGs on the walls of a 3D virtual room, requiring fewer local multimedia resources than full motion MPEGs.	(pdf)
A Study on NSIS Interaction with Internet Route Changes	Charles Shen, Henning Schulzrinne, Sung-Hyuck Lee, Jong Ho Bang	2005-02-24	Design of Next Step In Signaling (NSIS) protocol and IP routing interaction requires a good understanding of today's Internet routing behavior. In this report we present a routing measurement experiment to characterize current Internet dynamics, including routing pathology, routing prevalence and routing persistence. The focus of our study is route change. We look at the types, duration and likely causes of different route changes and discuss their impact to the design of NSIS. We also review common route change detection methods and investigate rules to determine whether a route change happened in a node's forward-looking or backward-looking direction is detectable. We introduce typical NSIS deployment models and discuss specific categories of route changes that should be considered in each of these models. With the NSIS deployment models in mind, we further give experimental evaluation of two route change detection methods - the packet TTL monitoring method and a new delay variation monitoring method.	(pdf) (ps)
Adding Self-healing capabilities to the Common Language Runtime	Rean Griffith, Gail Kaiser	2005-02-23	Self-healing systems require that repair mechanisms are available to resolve problems that arise while the system executes. Managed execution environments such as the Common Language Runtime (CLR) and Java Virtual Machine (JVM) provide a number of application services (application isolation, security sandboxing, garbage collection and structured exception handling) which are geared primarily at making managed applications more robust. However, none of these services directly enables applications to perform repairs or consistency checks of their components. From a design and implementation standpoint, the preferred way to enable repair in a self-healing system is to use an externalized repair/adaptation architecture rather than hardwiring adaptation logic inside the system where it is harder to analyze, reuse and extend. We present a framework that allows a repair engine to dynamically attach and detach to/from a managed application while it executes essentially adding repair mechanisms as another application service provided in the execution environment.	(pdf) (ps)
Manipulating Managed Execution Runtimes to Support Self-Healing Systems	Rean Griffith, Gail Kaiser	2005-02-23	Self-healing systems require that repair mechanisms are available to resolve problems that arise while the system executes. Managed execution environments such as the Common Language Runtime (CLR) and Java Virtual Machine (JVM) provide a number of application services (application isolation, security sandboxing, garbage collection and structured exception handling) which are geared primarily at making managed applications more robust. However, none of these services directly enables applications to perform repairs or consistency checks of their components. From a design and implementation standpoint, the preferred way to enable repair in a self-healing system is to use an externalized repair/adaptation architecture rather than hardwiring adaptation logic inside the system where it is harder to analyze, reuse and extend. We present a framework that allows a repair engine to dynamically attach and detach to/from a managed application while it executes essentially adding repair mechanisms as another application service provided in the execution environment.	(pdf) (ps)
Genre Classification of Websites Using Search Engine Snippets	Suhit Gupta, Gail Kaiser, Salvatore Stolfo, Hila Becker	2005-02-03	Web pages often contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Automatic extraction of "useful and relevant" content from web pages has many applications, including browsing on small cell phone and PDA screens, speech rendering for the visually impaired, and reducing noise for information retrieval systems. Prior work has led to the development of Crunch, a framework which employs various heuristics in the form of filters and filter settings for content extraction. Crunch allows users to tune these settings, essentially the thresholds for applying each filter. However, in order to reduce human involvement in selecting these heuristic settings, we have extended this work to utilize a website's classification, defined by its genre and physical layout. In particular, Crunch would then obtain the settings for a previously unknown website by automatically classifying it as sufficiently similar to a cluster of known websites with previously adjusted settings - which in practice produces better content extraction results than a single one-size-fits-all set of setting defaults. In this paper, we present our approach to clustering a large corpus of websites by their genre, utilizing the snippets generated by sending the website's domain name to search engines as well as the website's own text. We find that exploiting these snippets not only increased the frequency of function words that directly assist in detecting the genre of a website, but also allow for easier clustering of websites. We use existing techniques like Manhattan distance measure and Hierarchical clustering, with some modifications, to pre-classify websites into genres. Our clustering method does not require prior knowledge of the set of genres that websites fit into, but instead discovers these relationships among websites. Subsequently, we are able to classify newly encountered websites in linear-time, and then apply the corresponding filter settings, with no noticeable delay introduced for the content-extracting web proxy.	(pdf) (ps)
A Uniform Programming Abstraction for Effecting Autonomic Adaptations onto Software Systems	Giuseppe Valetto, Gail Kaiser	2005-01-30	Most general-purpose work towards autonomic or self-managing systems has emphasized the front end of the feedback control loop, with some also concerned with controlling the back end enactment of runtime adaptations ^V but usually employing an effector technology peculiar to one type of target system. While completely generic ^Sone size fits all^T effector technologies seem implausible, we propose a general-purpose programming model and interaction layer that abstracts away from the peculiarities of target-specific effectors, enabling a uniform approach to controlling and coordinating the low-level execution of reconfigurations, repairs, micro-reboots, etc.	(pdf)
Dynamic Adaptation of Rules for Temporal Event Correlation in Distributed Systems	Rean Griffith, Joseph L. Hellerstein, Yixin Diao, Gail Kaiser	2005-01-30	Event correlation is essential to realizing self-managing distributed systems. For example, distributed systems often require that events be correlated from multiple systems using temporal patterns to detect denial of service attacks and to warn of problems with business critical applications that run on multiple servers. This paper addresses how to specify timer values for temporal patterns so as to manage the trade-off between false alarms and undetected alarms. A central concern is addressing the variability of event propagation delays due to factors such as contention for network and server resources. To this end, we develop an architecture and an adaptive control algorithm that dynamically compensate for variations in propagation delays. Our approach makes Management Stations more autonomic by avoiding the need for manual adjustments of timer values in temporal rules. Further, studies we conducted of a testbed system suggest that our approach produces results that are at least as good as an optimal fixed setting of timer values.	(pdf)
The Virtual Device: Expanding Wireless Communication Services Through Service Discovery and Session Mobility	Ron Shacham, Henning Schulzrinne, Srisakul Thakolsri, Wolfgang Kellerer	2005-01-12	We present a location-based, ubiquitous service architecture, based on the Session Initiation Protocol (SIP) and a service discovery protocol that enables users to enhance the multimedia communications services available on their mobile devices by discovering other local devices, and including them in their active sessions, creating a "virtual device." We have implemented our concept based on Columbia University's multimedia environment and we show its feasibility by a performance analysis.	(pdf) (ps)
Autonomic Control for Quality Collaborative Video Viewing	Dan Phung, Giuseppe Valetto, Gail Kaiser	2004-12-31	We present an autonomic controller for quality collaborative video viewing, which allows groups of geographically dispersed users with different network and computer resources to view a video in synchrony while optimizing the video quality experienced. The autonomic controller is used within a tool for enhancing distance learning with synchronous group review of online multimedia material. The autonomic controller monitors video state at the clients' end, and adapts the quality of the video according to the resources of each client in (soft) real time. Experimental results show that the autonomic controller successfully synchronizes video for small groups of distributed clients and, at the same time, enhances the video quality experienced by users, in conditions of fluctuating bandwidth and variable frame rate.	(pdf)
Sequential Challenges in Synthesizing Esterel	Cristian Soviani, Jia Zeng, Stephen A. Edwards	2004-12-20	State assignment is a formidable task. As designs written in a hardware description language such as Esterel inherently carry more high level information that a register transfer level model, such information can be used to guide the encoding process. A question arises if the high level information alone is strong enough to suggest an efficient state assignment, allowing low-level details to be ignored. This report suggests that with Esterel's flexibility, most optimization potential is not within the high-level structure. It appears effective state assignment cannot rely solely on high level information.	(pdf) (ps)
Determining Interfaces using Type Inference	Stephen A. Edwards, Chun Li	2004-12-20	Porting software usually requires understanding what library functions the program being ported uses since this functionality must be either found or reproduced in the ported program's new environment. This is usually done manually through code inspections. We propose a type inference algorithm able to infer basic information about the library functions a particular C program uses in the absence of declaration information for the library (e.g., without header files). Based on a simple but efficient inference algorithm, we were able to infer declarations for much of the PalmOS API from the source of a twenty-seven-thousand-line C program. Such a tool will aid in the problem of program understanding when porting programs, especially from poorly-documented or lost legacy environments.	(pdf) (ps)
Remotely Keyed CryptoGraphics - Secure Remote Display Access Using (Mostly) Untrusted Hardware - Extended Version	Debra L. Cook, Ricardo Baratto, Angelos D. Keromytis	2004-12-11	Software that covertly monitors user actions, also known as {\it spyware,} has become a first-level security threat due to its ubiquity and the difficulty of detecting and removing it. Such software may be inadvertently installed by a user that is casually browsing the web, or may be purposely installed by an attacker or even the owner of a system. This is particularly problematic in the case of utility computing, early manifestations of which are Internet cafes and thin-client computing. Traditional trusted computing approaches offer a partial solution to this by significantly increasing the size of the trusted computing base (TCB) to include the operating system and other software. We examine the problem of protecting a user accessing specific services in such an environment. We focus on secure video broadcasts and remote desktop access when using any convenient, and often untrusted, terminal as two example applications. We posit that, at least for such applications, the TCB can be confined to a suitably modified graphics processing unit (GPU). Specifically, to prevent spyware on untrusted clients from accessing the user's data, we restrict the boundary of trust to the client's GPU by moving image decryption into GPUs. We use the GPU in order to leverage existing capabilities as opposed to designing a new component from scratch. We discuss the applicability of GPU-based decryption in these two sample scenarios and identify the limitations of the current generation of GPUs. We propose straightforward modifications to future GPUs that will allow the realization of the full approach.	(pdf) (ps)
Obstacle Avoidance and Path Planning Using a Sparse Array of Sonars	Matei Ciocarlie	2004-12-08	This paper proposes an exploration method for robots equipped with a set of sonar sensors that does not allow for complete coverage of the robot's close surroundings. In such cases, there is a high risk of collision with possible undetected obstacles. The proposed method, adapted for use in urban outdoors environments, minimizes such risks while guiding the robot towards a predefined target location. During the process, a compact and accurate representation of the environment can be obtained.	(pdf)
End System Service Examples	Xiaotao Wu, Henning Schulzrinne	2004-12-07	This technical report investigates services suitable for end systems. We look into ITU Q.1211 services, AT&T 5ESS switch services, services defined in CSTA Phase III, and new services integrating other Internet services, such as presence information. We also explore how to use the Language for End System Services (LESS) to program the services.	(pdf) (ps)
WebPod: Persistent Web Browsing Sessions with Pocketable Storage Devices	Shaya Potter, Jason Nieh	2004-11-19	We present WebPod, a portable device for managing web browsing sessions. WebPod leverages capacity improvements in portable solid state memory devices to provide a consistent environment to access the web. WebPod provides a thin virtualization layer that decouples a user's web session from any particular end-user device, allowing users freedom to move their work environments around. We have implemented a prototype in Linux that works with existing unmodified applications and operating system kernels. Our experimental results demonstrate that WebPod has very low virtualization overhead and can provide a full featured web browsing experience, including support for all helper applications and plug-ins one expects. WebPod is able to efficiently migrate a user's web session. This enables improved user mobility while maintaining a consistent work environment.	(pdf)
Design and Verification Languages	Stephen A. Edwards	2004-11-17	After a few decades of research and experimentation, register-transfer dialects of two standard languages---Verilog and VHDL---have emerged as the industry standard starting point for automatic large-scale digital integrated circuit synthesis. Writing RTL descriptions of hardware remains a largely human process and hence the clarity, precision, and ease with which such descriptions can be coded correctly has a profound impact on the quality of the final product and the speed with which the design can be created. While the efficiency of a design (e.g., the speed at which it can run or the power it consumes) is obviously important, its correctness is usually the paramount issue, consuming the majority of the time (and hence money) spent during the design process. In response to this challenge, a number of so-called verification languages have arisen. These have been designed to assist in a simulation-based or formal verification process by providing mechanisms for checking temporal properties, generating pseudorandom test cases, and for checking how much of a design's behavior has been exercised by the test cases. Through examples and discussion, this report describes the two main design languages---VHDL and Verilog---as well as SystemC, a language currently used to build large simulation models; SystemVerilog, a substantial extension of Verilog; and OpenVera, e, and PSL, the three leading contenders for becoming the main verification language.	(pdf) (ps)
Extracting Context To Improve Accuracy For HTML Content Extraction	Suhit Gupta, Gail Kaiser, Salvatore Stolfo	2004-11-08	Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "useful and relevant" content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, reducing noise for information retrieval systems and to generally improve the web browsing experience. In our previous work [16], we developed a framework that employed an easily extensible set of techniques that incorporated results from our earlier work on content extraction [16]. Our insight was to work with DOM trees, rather than raw HTML markup. We present here filters that reduce human involvement in applying heuristic settings for websites and instead automate the job by detecting and utilizing the physical layout and content genre of a given website. We also present work we have done towards improving the usability and performance of our content extraction proxy as well as the quality and accuracy of the heuristics that act as filters for inferring the context of a webpage.	(pdf) (ps)
Peer-to-Peer Internet Telephony using SIP	Kundan Singh, Henning Schulzrinne	2004-10-31	P2P systems inherently have high scalability, robustness and fault tolerance because there is no centralized server and the network self-organizes itself. This is achieved at the cost of higher latency for locating the resources of interest in the P2P overlay network. Internet telephony can be viewed as an application of P2P architecture where the participants form a self-organizing P2P overlay network to locate and communicate with other participants. We propose a pure P2P architecture for the Session Initiation Protocol (SIP)-based IP telephony systems. Our P2P-SIP architecture supports basic user registration and call setup as well as advanced services such as offline message delivery, voice/video mails and multi-party conferencing. We also provide an overview of practical challenges for P2P-SIP such as firewall, Network Address Translator (NAT) traversal and security.	(pdf) (ps)
Service Learning in Internet Telephony	Xiaotao Wu, Henning Schulzrinne	2004-10-29	Internet telephony can introduce many novel communication services, however, novelty puts learning burden on users. It will be a great help to users if their desired services can be created automatically. We developed an intelligent communication service creation environment which can handle automatic service creation by learning from users' daily communication behaviors. The service creation environment models communication services as decision trees and uses the Incremental Tree Induction (ITI) algorithm for decision tree learning. We use Language for End System Services (LESS) scripts to represent learned results and implemented a simulation environment to verify the learning algorithm. We also noticed that when users get their desired services, they may not be aware of unexpected behaviors that the serivces could introduce, for example, mistakenly rejecting expected calls. In this paper, we also did a comprehensive analysis on communication service fail-safe handling and propose several approaches to create fail-safe services.	(pdf) (ps)
A Microrobotic System For Protein Streak Seeding	Atanas Georgiev, Peter K. Allen, Ting Song, Andrew Laine, William Edstrom, John Hunt	2004-10-28	We present a microrobotic system for protein crystal micromanipulation tasks. The focus in this report is on a task called streak seeding, which is used by crystallographers to entice certain protein crystals to grow. Our system features a set of custom designed micropositioner end-effectors we call microshovels to replace traditional tools used by crystallographers for this task. We have used micro-electrical mechanical system (MEMS) techniques to design and manufacture various shapes and quantities of microshovels. Visual feedback from a camera mounted on the microscope is used to control the micropositioner as it lowers a microshovel into the liquid containing the crystals for poking and streaking. We present experimental results that illustrate the applicability of our approach.	(pdf)
Preventing Spam For SIP-based Instant Messages and Sessions	Kumar Srivastava, Henning Schulzrinne	2004-10-28	As IP telephony becomes more widely deployed and used, tele-marketers or other spammers are bound to start using SIP-based calls and instant messages as a medium for sending spam. As is evident from the fate of email, protection against spam has to be built into SIP systems otherwise they are bound to fall prey to spam. Traditional approaches used to prevent spam in email such as content-based filtering and access lists are not applicable to SIP calls and instant messages in their present form. We propose Domain-based Authentication and Policy-Enforced for SIP (DAPES): a system that can be easily implemented and deployed in existing SIP networks. Our system is capable of determining in real time, whether an incoming call or instant message is likely to be spam or not, while at the same time, supporting communication between both known and unknown parties. DAPES includes the deployment of reputation systems in SIP networks to enable real-time transfer of reputation information between parties to allow communication between entities unknown to each other.	(pdf) (ps)
Programmable Conference Server	Henning Schulzrinne, Kundan Singh, Xiaotao Wu	2004-10-15	Conferencing services for Internet telephony and multimedia can be enhanced by the integration of other Internet services, such as instant messaging, presence notification, directory lookups, location sensing, email and web. These services require a service programming architecture that can easily incorporate new Internet services into the existing conferencing functionalities, such as voice-enabled conference control. W3C has defined the Call Control eXtensible Markup Language (CCXML), along with its VoiceXML, for telephony call control services in a point-to-point call. However, it cannot handle other Internet service events such as presence enabled conferences. In this paper, we propose an architecture combining VoiceXML with our Language for End System Services (LESS) and the Common Gateway Interface (CGI) for multi-party conference service programming that integrates existing Internet services. VoiceXML provides the voice interface to LESS and CGI scripts. Our architecture enables many novel services such as conference setup based on participant location and presence status. We give some examples of the new services and describe our on-going implementation.	(pdf) (ps)
An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol	Salman A. Baset, Henning Schulzrinne	2004-10-11	Skype is a peer-to-peer VoIP client developed by KaZaa in 2003. Skype claims that it can work almost seamlessly across NATs and firewalls and has better voice quality than the MSN and Yahoo IM applications. It encrypts calls end-to-end, and stores user information in a decentralized fashion. Skype also supports instant messaging and conferencing. This report analyzes key Skype functions such as login, NAT and firewall traversal, call establishment, media transfer, codecs, and conferencing under three different network setups. Analysis is performed by careful study of Skype network traffic.	(pdf) (ps)
Building a Reactive Immune System for Software Services	Stelios Sidiroglou, Michael E. Locasto, Stephen W. Boyd, Angelos D. Keromytis	2004-10-10	We propose a new approach for reacting to a wide variety of software failures, ranging from remotely exploitable vulnerabilities to more mundane bugs that cause abnormal program termination (e.g., illegal memory dereference). Our emphasis is in creating "self-healing" software that can protect itself against a recurring fault until a more comprehensive fix is applied. Our system consists of a set of sensors that monitor applications for various types of failure and an instruction-level emulator that is invoked for selected parts of a program's code. Use of such an emulator allows us to predict recurrences of faults, and recover program execution to a safe control flow. Using the emulator for small pieces of code, as directed by the sensors, allows us to minimize the performance impact on the immunized application. We discuss the overall system architecture and a prototype implementation for the x86 platform. We evaluate the efficacy of our approach against a range of attacks and other software failures and investigate its performance impact on several server-type applications. We conclude that our system is effective in preventing the recurrence of a wide variety of software failures at a small performance cost.	(pdf) (ps)
Live CD Cluster Performance	Haronil Estevez, Stephen A. Edwards	2004-10-04	In this paper, we present a performance comparison of two linux live CD distributions, Knoppix (v.3.3) and Quantian (v 0.4.96). The library used for performance evaluation is the Parallel Image Processing Toolkit (PIPT), a software library that contains several parallel image processing routines. A set of images was chosen and a batch job of PIPT routines were run and timed using both live CD distributions. The point of comparison between the two live CDs was the total time the batch job required for completion.	(pdf) (ps)
Information Structures to Secure Control of Rigid Formations with Leader-Follower Structure	Tolga Eren, Walter Whiteley, Brian D.O. Anderson, A. Stephen Morse, Peter N. Belhumeur	2004-09-29	This paper is concerned with rigid formations of mobile autonomous agents using a leader-follower structure. A formation is a group of agents moving in real 2- or 3- dimensional space. A formation is called rigid if the distance between each pair of agents does not change over time under ideal conditions. Sensing/communication links between agents are used to maintain a rigid formation. Two agents connected by a sensing/communication link are called neighbors. There are two types of neighbor relations in rigid formations. In the first type, the neighbor relation is symmetric. In the second type, the neighbor relation is asymmetric. Rigid formations with a leader-follower structure have the asymmetric neighbor relation. A framework to analyze rigid formations with symmetric neighbor relations is given in our previous work. This paper suggests an approach to analyze rigid formations that have a leader-follower structure.	(pdf) (ps)
An Investigation Into the Detection of New Information	Barry Schiffman, Kathleen R. McKeown	2004-09-29	This paper explores new-information detection, describing a strategy for filtering a stream of documents to present only information that is fresh. We focus on multi-document summarization and seek to efficiently use more linguistic information than is often seen in such systems. We experimented with our linguistic system and with a more traditional sentence-based, vector-space system and found that a combination of the two approaches boosted performance over each one alone.	(pdf) (ps)
Machine Learning and Text Segmentation in Novelty Detection	Barry Schiffman, Kathleen R. McKeown	2004-09-29	This paper explores a combination of machine learning, approximate text segmentation and a vector-space model to distinguish novel information from repeated information. In experiments with the data from the Novelty Track at the Text Retrieval Conference, we show improvements over a variety of approaches, in particular in raising precision scores on this data, while maintaining a reasonable amount of recall.	(pdf) (ps)
Voice over TCP and UDP	Xiaotang Zhang, Henning Schulzrinne	2004-09-28	We compare UDP and TCP when transmitting voice data using PlanetLab where we can do experiments globally. For TCP, we also do experiments using TCP NODELAY which sends out requests immediately. We compare the performance of different protocols by their 90th percentile delay and jitter. The performance of UDP is better than that of TCP NODELAY and the performance TCP NODELAY is better than that of TCP. We also explore the relation between TCP delay time minus the transmission time and the packet loss rate and find there is a linear relationship between them.	(pdf) (ps)
Using Execution Transactions To Recover From Buffer Overflow Attacks	Stelios Sidiroglou, Angelos D. Keromytis	2004-09-13	We examine the problem of containing buffer overflow attacks in a safe and efficient manner. Briefly, we automatically augment source code to dynamically catch stack and heap-based buffer overflow and underflow attacks, and recover from them by allowing the program to continue execution. Our hypothesis is that we can treat each code function as a transaction that can be aborted when an attack is detected, without affecting the application's ability to correctly execute. Our approach allows us to selectively enable or disable components of this defensive mechanism in response to external events, allowing for a direct tradeoff between security and performance. We combine our defensive mechanism with a honeypot-like con guration to detect previously unknown attacks and automatically adapt an application's defensive posture at a negligible performance cost, as well as help determine a worm's signature. The main benefits of our scheme are its low impact on application performance, its ability to respond to attacks without human intervention, its capacity to handle previously unknown vulnerabilities, and the preservation of service availability. We implemented a stand-alone tool, DYBOC, which we use to instrument a number of vulnerable applications. Our performance benchmarks indicate a slow-down of 20% for Apache in full-protection mode, and 1.2% with partial protection. We validate our transactional hypothesis via two experiments: first, by applying our scheme to 17 vulnerable applications, successfully fixing 14 of them; second, by examining the behavior of Apache when each of 154 potentially vulnerable routines are made to fail, resulting in correct behavior in 139 of cases.	(pdf) (ps)
A Theoretical Analysis of the Conditions for Unambiguous Node Localization in Sensor Networks	Tolga Eren, Walter Whiteley, Peter N. Belhumeur	2004-09-13	In this paper we provide a theoretical foundation for the problem of network localization in which some nodes know their locations and other nodes determine their locations by measuring distances or bearings to their neighbors. Distance information is the separation between two nodes connected by a sensing/communication link. Bearing is the angle between a sensing/communication link and the x-axis of a node's local coordinate system. We construct grounded graphs to model network localization and apply graph rigidity theory and parallel drawings to test the conditions for unique localizability and to construct uniquely localizable networks. We further investigate partially localizable networks.	(pdf) (ps)
Modeling and Managing Content Changes in Text Databases	Panagiotis G. Ipeirotis, Alexandros Ntoulas, Junghoo Cho, Luis Gravano	2004-08-12	Large amounts of (often valuable) information are stored in web-accessible text databases. ``Metasearchers'' provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not need to change over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this paper, we first report the results of a study showing how the content summaries of 152 real web databases evolved over a period of 52 weeks. Then, we show how to use ``survival analysis'' techniques in general, and Cox's proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the quality of our schedules experimentally over real web databases.	(pdf) (ps)
Cross-Dimensional Gestural Interaction Techniques for Hybrid Immersive Environments	Hrvoje Benko, Edward W. Ishak, Steven Feiner	2004-08-09	We present a set of interaction techniques for a hybrid user interface that integrates existing 2D and 3D visualization and interaction devices. Our approach is built around one- and two-handed gestures that support the seamless transition of data between co-located 2D and 3D contexts. Our testbed environment combines a 2D multi-user, multi-touch, projection surface with 3D head-tracked, see-through, head-worn displays and 3D tracked gloves to form a multi-display augmented reality. We also address some of the ways in which we can interact with private data in a collaborative, heterogeneous workspace.	(pdf) (ps)
Group Ratio Round-Robin: O(1) Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems	Bogdan Caprita, Wong Chun Chan, Jason Nieh, Clifford Stein, Haoqiang Zheng	2004-07-30	Proportional share resource management provides a flexible and useful abstraction for multiplexing time-shared resources. We present Group Ratio Round-Robin ($GR^3$), the first proportional share scheduler that combines accurate proportional fairness scheduling behavior with $O(1)$ scheduling overhead on both uniprocessor and multiprocessor systems. $GR^3$ uses a novel client grouping strategy to organize clients into groups of similar processor allocations which can be more easily scheduled. Using this grouping strategy, $GR^3$ combines the benefits of low overhead round-robin execution with a novel ratio-based scheduling algorithm. $GR^3$ can provide fairness within a constant factor of the ideal generalized processor sharing model for client weights with a fixed upper bound and preserves its fairness properties on multiprocessor systems. We have implemented $GR^3$ in Linux and measured its performance against other schedulers commonly used in research and practice, including the standard Linux scheduler, Weighted Fair Queueing, Virtual-Time Round-Robin, and Smoothed Round-Robin. Our experimental results demonstrate that $GR^3$ can provide much lower scheduling overhead and much better scheduling accuracy in practice than these other approaches.	(pdf)
THINC: A Remote Display Architecture for Thin-Client Computing	Ricardo A. Baratto, Jason Nieh, Leo Kim	2004-07-29	Rapid improvements in network bandwidth, cost, and ubiquity combined with the security hazards and high total cost of ownership of personal computers have created a growing market for thin-client computing. We introduce THINC, a remote display system architecture for high-performance thin-client computing in both LAN and WAN environments. THINC transparently maps high-level application display calls to a few simple low-level commands which can be implemented easily and efficiently. THINC introduces a number of novel latency-sensitive optimization techniques, including offscreen drawing awareness, command buffering and scheduling, non-blocking display operation, native video support, and server-side screen scaling. We have implemented THINC in an XFree86/Linux environment and compared its performance with other popular approaches, including Citrix MetaFrame, Microsoft Terminal Services, SunRay, VNC, and X. Our experimental results on web and video applications demonstrate that THINC can be as much as five times faster than traditional thin-client systems in high latency network environments and is capable of playing full-screen video at full frame rate.	(pdf)
The Simplicity and Safety of the Language for End System Services (LESS)	Xiaotao Wu, Henning Schulzrinne	2004-07-20	This paper analyzes the simplicity and safety of the Language for End System Services (LESS).	(pdf) (ps)
Efficient Shadows from Sampled Environment Maps	Aner Ben-Artzi, Ravi Ramamoorthi, , Maneesh Agrawala	2004-06-10	This paper addresses the problem of efficiently calculating shadows from environment maps. Since accurate rendering of shadows from environment maps requires hundreds of lights, the expensive computation is determining visibility from each pixel to each light direction, such as by ray-tracing. We show that coherence in both spatial and angular domains can be used to reduce the number of shadow rays that need to be traced. Specifically, we use a coarse-to-fine evaluation of the image, predicting visibility by reusing visibility calculations from four nearby pixels that have already been evaluated. This simple method allows us to explicitly mark regions of uncertainty in the prediction. By only tracing rays in these and neighboring directions, we are able to reduce the number of shadow rays traced by up to a factor of 20 while maintaining error rates below 0.01\%. For many scenes, our algorithm can add shadowing from hundreds of lights at twice the cost of rendering without shadows.	(pdf) (ps)
Efficient Algorithms for the Design of Asynchronous Control Circuits	Michael Theobald	2004-05-27	Asynchronous (or ``clock-less'') digital circuit design has received much attention over the past few years, including its introduction into consumer products. One major bottleneck to the further advancement of clock-less design is the lack of optimizing CAD (computer-aided design) algorithms and tools. In synchronous design, CAD packages have been crucial to the advancement of the microelectronics industry. In fact, automated methods seem to be even more crucial for asynchronous design, which is widely considered as being much more error-prone. This thesis proposes several new efficient CAD techniques for the design of asynchronous control circuits. The contributions include: (i) two new and very efficient algorithms for hazard-free two-level logic minimization, including a heuristic algorithm, ESPRESSO-HF, and an exact algorithm based on implicit data structures, IMPYMIN; and (ii) a new synthesis and optimization method for large-scale asynchronous systems, which starts from a Control-Dataflow Graph (CDFG), and produces highly-optimized distributed control. As a case study, this latter method is applied to a differential equation solver; the resulting synthesized circuit is comparable in quality to a highly-optimized manual design.	(ps)
On decision trees, influences, and learning monotone decision trees	Ryan O'Donnell, Rocco A. Servedio	2004-05-26	In this note we prove that a monotone boolean function computable by a decision tree of size $s$ has average sensitivity at most $\sqrt{\log_2 s}$. As a consequence we show that monotone functions are learnable to constant accuracy under the uniform distribution in time polynomial in their decision tree size.	(pdf) (ps)
Orchestrating the Dynamic Adaptation of Distributed Software with Process Technology	Giuseppe Valetto	2004-05-24	Software systems are becoming increasingly complex to develop, understand, analyze, validate, deploy, configure, manage and maintain. Much of that complexity is related to ensuring adequate quality levels to services provided by software systems after they are deployed in the field, in particular when those systems are built from and operated as a mix of proprietary and non-proprietary components. That translates to increasing costs and difficulties when trying to operate large-scale distributed software ensembles in a way that continuously guarantees satisfactory levels of service. A solution can be to exert some form of dynamic adaptation upon running software systems: dynamic adaptation can be defined as a set of automated and coordinated actions that aim at modifying the structure, behavior and performance of a target software system, at run time and without service interruption, typically in response to the occurrence of some condition(s). To achieve dynamic adaptation upon a given target software system, a set of capabilities, including monitoring, diagnostics, decision, actuation and coordination, must be put in place. This research addresses the automation of decision and coordination in the context of an end-to-end and externalized approach to dynamic adaptation, which allows to address as its targets legacy and component-based systems, as well as new systems developed from scratch. In this approach, adaptation provisions are superimposed by a separate software platform, which operates from the outside of and orthogonally to the target application as a whole; furthermore, a single adaptation possibly spans concerted interventions on a multiplicity of target components. To properly orchestrate those interventions, decentralized process technology is employed for describing, activating and coordinating the work of a cohort of software actuators, towards the intended end-to-end dynamic adaptation. The approach outlined above, has been implemented in a prototype, code-named Workflakes, within the Kinesthetics eXtreme project investigating externalized dynamic adaptation, carried out by the Programming Systems Laboratory of Columbia University, and has been employed in a set of diverse case studies. This dissertation discusses and evaluates the concept of process-based orchestration of dynamic adaptation and the Workflakes prototype on the basis of the results of those case studies.	(pdf)
Elastic Block Ciphers: The Feistel Cipher Case	Debra L. Cook, Moti Yung, , Angelos Keromytis	2004-05-19	We discuss the elastic versions of block ciphers whose round function processes subsets of bits from the data block differently, such as occurs in a Feistel network and in MISTY1. We focus on how specific bits are selected to be swapped after each round when forming the elastic version, using an elastic version of MISTY1 and differential cryptanalysis to illustrate why this swap step must be carefully designed. We also discuss the benefit of adding initial and final key dependent permutations in all elastic block ciphers. The implementation of the elastic version of MISTY1 is analyzed from a performance perspective.	(pdf) (ps)
Exploiting the Structure in DHT Overlays for DoS Protection	Angelos Stavrou, Angelos Keromytis, Dan	2004-04-30	Peer to Peer (P2P) systems that utilize Distributed Hash Tables (DHTs) provide a scalable means to distribute the handling of lookups. However, this scalability comes at the expense of increased vulnerability to specific types of attacks. In this paper, we focus on insider denial of service (DoS) attacks on such systems. In these attacks, nodes that are part of the DHT system are compromised and used to flood other nodes in the DHT with excessive request traffic. We devise a distributed lightweight protocol that detects such attacks, implemented solely within nodes that participate in the DHT. Our approach exploits inherent structural invariants of DHTs to ferret out attacking nodes whose request patterns deviate from ``normal'' behavior. We evaluate our protocol's ability to detect attackers via simulation within a Chord network. The results show that our system can detect a simple attacker whose attack traffic deviates by as little as 5\% from a normal request traffic. We also demonstrate the resiliency of our protocol to coordinated attacks by up to as many as 25\% of nodes. Our work shows that DHTs can protect themselves from insider flooding attacks, eliminating an important roadblock to their deployment and use in untrusted environments.	(pdf) (ps)
Host-based Anomaly Detection Using Wrapping File Systems	Shlomo Hershkop, Linh H. Bui, Ryan Ferst, Salvatore J. Stolfo	2004-04-24	We describe an anomaly detector, called FWRAP, for a Host-based Intrusion Detection System that monitors file system calls to detect anomalous accesses. The system is intended to be used not as a standalone detector but one of a correlated set of host-based sensors. The detector has two parts, a sensor that audits file systems accesses, and an unsupervised machine learning system that computes normal models of those accesses.We report on the architecture of the file system sensor implemented on Linux using the FiST file wrapper technology and results of the anomaly detector applied to experimental data acquired from this sensor. FWRAP employs the Probabilistic Anomaly Detection (PAD) algorithm previously reported in our work on Windows Registry Anomaly Detection. The detector is first trained by operating the host computer for some amount of time and a model specific to the target machine is automatically computed by PAD, intended to be deployed to a real-time detector. In this paper we describe the feature set used to model file system accesses, and the performance results of a set of experiments using the sensor while attacking a Linux host with a variety of malware exploits. The PAD detector achieved impressive detection rates in some cases over 95\% and about a 2\% false positive rate when alarming on anomalous processes.	(pdf) (ps)
Self-Managing Systems: A Control Theory Foundation	Yixin Diao, Joseph L. Hellerstein, Sujay Parekh, Rean Griffith, Gail Kaiser, Dan Phung	2004-04-01	The high cost of ownership of computing systems has resulted in a number of industry initiatives to reduce the burden of operations and management. Examples include IBM's Autonomic Computing, HP's Adaptive Infrastructure, and Microsoft's Dynamic Systems Initiative. All of these efforts seek to reduce operations costs by increased automation, ideally to have systems be self-managing without any human intervention (since operator error has been identified as a major source of system failures). While the concept of automated operations has existed for two decades, as a way to adapt to changing workloads, failures and (more recently) attacks, the scope of automation remains limited. We believe this is in part due to the absence of a fundamental understanding of how automated actions affect system behavior, especially system stability. Other disciplines such as mechanical, electrical, and aeronautical engineering make use of control theory to design feedback systems. This paper uses control theory as a way to identify a number of requirements for and challenges in building self-managing systems, either from new components or layering on top of existing components.	(pdf)
Blurring of Light due to Multiple Scattering by the Medium, a Path Integral Approach	Michael Ashikhmin, Simon Premoze, Ravi R, Shree Nayar	2004-03-31	Volumetric light transport effects are significant for many materials like skin, smoke, clouds or water. In particular, one must consider the multiple scattering of light within the volume. Recently, we presented a path integral-based approach to this problem which identifies the most probable path light takes in the medium and approximates energy transport over all paths by only those surrounding this most probable one. In this report we use the same approach to derive useful expressions for the amount of spacial and angular blurring light experiences as it travels through a medium.	(pdf) (ps)
Jitter-Camera: High Resolution Video from a Low Resolution Detector	Moshe Ben-Ezra, Assaf Zomet, , Shree K. Nayar	2004-03-25	Video cameras must produce images at a reasonable frame-rate and with a reasonable depth of field. These requirements impose fundamental physical limits on the spatial resolution of the image detector. As a result, current cameras produce videos with a very low resolution. The resolution of videos can be computationally enhanced by moving the camera and applying super-resolution reconstruction algorithms. However, a moving camera introduces motion blur, which limits super-resolution quality. We analyze this effect and derive a theoretical result showing that motion blur has a substantial degrading effect on the performance of super resolution. The conclusion is, that in order to achieve the highest resolution, motion blur should be avoided. Motion blur can be minimized by sampling the space-time volume of the video in a specific manner. We have developed a novel camera, called the jitter camera, that achieves this sampling. By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, we show that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has a significantly higher resolution than the captured one.	(pdf) (ps)
Improved Controller Synthesis from Esterel	Cristian Soviani, Jia Zeng, , Stephen A. Edwards	2004-03-22	We present a new procedure for automatically synthesizing controllers from high-level Esterel specifications. Unlike existing \textsc{rtl} synthesis approaches, this approach frees the designer from tedious bit-level state encoding and certain types of inter-machine communication. Experimental results suggest that even with a fairly primitive state assignment heuristic, our compiler consistently produces smaller, slightly faster circuits that the existing Esterel compiler. We mainly attribute this to a different style of distributing state bits throughout the circuit. Initial results are encouraging, but some hand-optimized encodings suggest room for a better state assignment algorithm. We are confident that such improvements will make our technique even more practical.	(pdf) (ps)
MobiDesk: Mobile Virtual Desktop Computing	Ricardo Baratto, Shaya Potter, Gong Su, , Jason Nieh	2004-03-19	We present MobiDesk, a mobile virtual desktop computing hosting infrastructure that leverages continued improvements in network speed, cost, and ubiquity to address the complexity, cost, and mobility limitations of today's personal computing infrastructure. MobiDesk transparently virtualizes a user's computing session by abstracting underlying system resources in three key areas: display, operating system and network. MobiDesk provides a thin virtualization layer that decouples a user's computing session from any particular end user device and moves all application logic from end user devices to hosting providers. MobiDesk virtualization decouples a user's computing session from the underlying operating system and server instance, enabling high availability service by transparently migrating sessions from one server to another during server maintenance or upgrades. We have implemented a MobiDesk prototype in Linux that works with existing unmodified applications and operating system kernels. Our experimental results demonstrate that MobiDesk has very low virtualization overhead, can provide a full-featured desktop experience including full-motion video support, and is able to migrate users' sessions efficiently and reliably for high availability, while maintaining existing network connections.	(pdf) (ps)
When one Sample is not Enough: Improving Text Database Selection Using Shrinkage	Panagiotis G. Ipeirotis, Luis Gravano	2004-03-17	Database selection is an important step when searching over large numbers of distributed text databases. The database selection task relies on statistical summaries of the database contents, which are not typically exported by databases. Previous research has developed algorithms for constructing an approximate content summary of a text database from a small document sample extracted via querying. Unfortunately, Zipf's law practically guarantees that content summaries built this way for any relatively large database will fail to cover many low-frequency words. Incomplete content summaries might negatively affect the database selection process, especially for short queries with infrequent words. To improve the coverage of approximate content summaries, we build on the observation that topically similar databases tend to have related vocabularies. Therefore, the approximate content summaries of topically related databases can complement each other and increase their coverage. Specifically, we exploit a (given or derived) hierarchical categorization of the databases and adapt the notion of ``shrinkage'' --a form of smoothing that has been used successfully for document classification-- to the content summary construction task. A thorough evaluation over 315 real web databases as well as over TREC data suggests that the shrinkage-based content summaries are substantially more complete than their ``unshrunk'' counterparts. We also describe how to modify existing database selection algorithms to adaptively decide --at run-time-- whether to apply shrinkage for a query. Our experiments, which rely on TREC data sets, queries, and the associated ``relevance judgments,'' show that our shrinkage-based approach is significantly more accurate than state-of-the-art database selection algorithms, including a recently proposed hierarchical strategy that also exploits database classification.	(pdf) (ps)
Collaborative Distributed Intrusion Detection	Michael E. Locasto, Janak J. Parekh, Sal, Vishal Misra	2004-03-08	The rapidly increasing array of Internet-scale threats is a pressing problem for every organization that utilizes the network. Organizations often have limited resources to detect and respond to these threats. The sharing of information related to probes and attacks is a facet of an emerging trend toward "collaborative security." Collaborative security mechanisms provide network administrators with a valuable tool in this increasingly hostile environment. The perceived benefit of a collaborative approach to intrusion detection is threefold: greater clarity about attacker intent, precise models of adversarial behavior, and a better view of global network attack activity. While many organizations see value in adopting such a collaborative approach, several critical problems must be addressed before intrusion detection can be performed on an inter-organizational scale. These obstacles to collaborative intrusion detection often go beyond the merely technical; the relationships between cooperating organizations impose additional constraints on the amount and type of information to be shared. We propose a completely decentralized system that can efficiently distribute alerts to each collaborating peer. The system is composed of two major components that embody the main contribution of our research. The first component, named Worminator, is a tool for extracting relevant information from alert streams and encoding it in Bloom filters. The second component, Whirlpool, is a software system for scheduling correlation relationships between peer nodes. The combination of these systems accomplishes alert distribution in a scalable manner and without violating the privacy of each administrative domain.	(pdf) (ps)
Failover and Load Sharing in SIP Telephony	Kundan Singh, Henning Schulzrinne	2004-03-01	We apply some of the existing web server redundancy techniques for high service availability and scalability to the relatively new IP telephony context. The paper compares various failover and load sharing methods for registration and call routing servers based on the Session Initiation Protocol (SIP). In particular, we consider the SIP server failover techniques based on the clients, DNS (Domain Name Service), database replication and IP address takeover, and the load sharing techniques using DNS, SIP identifiers, network address translators and servers with same IP addresses. Additionally, we present an overview of the failover mechanism we implemented in our test-bed using our SIP proxy and registration server and the open source MySQL database.	(pdf) (ps)
Virtual Environment for Collaborative Distance Learning With Video Synchronization	Suhit Gupta, Gail Kaiser	2004-02-25	We present a 3D collaborative virtual environment, CHIME, in which geographically dispersed students can meet together in study groups or to work on team projects. Conventional educational materials from heterogeneous backend data sources are reflected in the virtual world through an automated metadata extraction and projection process that structurally organizes container materials into rooms and interconnecting doors, with atomic objects within containers depicted as furnishings and decorations. A novel in-world authoring tool makes it easy for instructors to design environments, with additional in-world modification afforded to the students themselves, in both cases without programming. Specialized educational services can also be added to virtual environments via programmed plugins. We present an example plugin that supports synchronized viewing of lecture videos by groups of students with widely varying bandwidths.	(pdf)
Optimizing Quality for Collaborative Video Viewing	Dan Phung, Giuseppe Valetto, Gail Kaiser, Suhit Gupta	2004-02-25	The increasing popularity of distance learning and online courses has highlighted the lack of collaborative tools for student groups. In addition, the introduction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources used by the students. We present an architecture and adaptation model called AI2TV (Adaptive Internet Interactive Team Video), a system that allows geographically dispersed participants, possibly some or all disadvantaged in network resources, to collaboratively view a video in synchrony. AI2TV upholds the invariant that each participant will view semantically equivalent content at all times. Video player actions, like play, pause and stop, can be initiated by any of the participants and the results of those actions are seen by all the members. These features allow group members to review a lecture video in tandem to facilitate the learning process. We employ an autonomic (feedback loop) controller that monitors clients' video status and adjusts the quality of the video according to the resources of each client. We show in experimental trials that our system can successfully synchronize video for distributed clients while, at the same time, optimizing the video quality given actual (fluctuating) bandwidth by adaptively adjusting the quality level for each participant.	(pdf) (ps)
Elastic Block Ciphers	Debra L. Cook, Moti Yung, , Angelos Keromytis	2004-02-25	We introduce a new concept of elastic block ciphers, symmetric-key encryption algorithms that for a variable size input do not expand the plaintext, (i.e., do not require plaintext padding), while maintaining the diffusion property of traditional block ciphers and adjusting their computational load proportionally to the size increase. Elastic block ciphers are ideal for applications where length-preserving encryption is most beneficial, such as protecting variable-length database entries or network packets. We present a general algorithmic schema for converting a traditional block cipher, such as AES, to its elastic version, and analyze the security of the resulting cipher. Our approach allows us to ``stretch'' the supported block size of a block cipher up to twice the original length, while increasing the computational load proportionally to the block size. Our approach does not allow us to use the original cipher as a ``black box'' (i.e., as an ideal cipher or a pseudorandom permutation as is used in constructing modes of encryption). Nevertheless, under some reasonable conditions on the cipher's structure and its key schedule, we reduce the security of the elastic version to that of the fixed size block cipher. This schema and the security reduction enable us to capitalize on secure ciphers and their already established security properties in elastic designs. We note that we are not aware of previous ``reduction type'' proofs of security in the area of concrete (i.e., non ``black-box'') block cipher design. Our implementation of the elastic version of AES, which accepts blocks of all sizes in the range 128 to 255 bits, was measured to be almost twice as fast when encrypting plaintext that is only a few bits longer than a full block (A128 bits), when compared to traditional ``pad and block-encrypt'' approach.	(pdf) (ps)
DotSlash: A Scalable and Efficient Rescue System for Handling Web Hotspots	Weibin Zhao, Henning Schulzrinne	2004-02-07	This paper describes DotSlash, a scalable and efficient rescue system for handling web hotspots. DotSlash allows different web sites to form a mutual-aid community, and use spare capacity in the community to relieve web hotspots experienced by any individual site. As a rescue system, DotSlash intervenes when a web site becomes heavily loaded, and is phased out once the workload returns to normal. It aims to complement existing web server infrastructure such as CDNs to handle short-term load spikes effectively, but is not intended to support a request load constantly higher than a web site's planned capacity. DotSlash is scalable, cost-effective, easy to use, self-configuring, and transparent to clients. It targets small web sites, although large web site can also benefit from it. We have implemented a prototype of DotSlash on top of Apache. Experiments show that DotSlash can provide an order of magnitude improvement for a web server in terms of the request rate supported and the data rate delivered to clients even if only HTTP redirect is used. Parts of this work may be applicable to other services such as the Grid computational services and media streaming.	(pdf) (ps)
Asymptotic bounds for $M^X/G/1$ processor sharing queues	Hanhua Feng, Vishal Misra	2004-02-03	This paper analyzed the asymptotic bounds of an $M/G/1$ processor sharing queue with bulk arrivals.	(pdf) (ps)
Secure Isolation and Migration of Untrusted Legacy Applications	Shaya Potter, Jason Nieh, , Dinesh Subhraveti	2004-01-27	Sting applications often contain security holes that are not patched until after the system has already been compromised. Even when software updates are applied to address security issues, they often result in system services being unavailable for some time. To address these system security and availability issues, we have developed peas and pods. A pea provides a least privilege environment that can restrict processes to the minimal subset of system resources needed to run. This mechanism enables the creation of environments for privileged program execution that can help with intrusion prevention and containment. A pod provides a group of processes and associated users with a consistent, machine-independent virtualized environment. Pods are coupled with a novel checkpoint-restart mechanism which allows processes to be migrated across minor operating system kernel versions with different security patches. This mechanism allows system administrators the flexibility to patch their operating systems immediately without worrying over potential loss of data or needing to schedule system downtime. We have implemented peas and pods in Linux without requiring any application or operating system kernel changes. Our measurements on real world desktop and server applications demonstrate that peas and pods impose little overhead and enable secure isolation and migration of untrusted applications.	(pdf) (ps)
Feature Interactions in Internet Telephony End Systems	Xiaotao Wu, Henning Schulzrinne	2004-01-24	Internet telephony end systems can offer many services. Different services may interfere with each other, a problem which is known as feature interaction. The feature interaction problem has existed in telecommunication systems for many years. The introduction of Internet telephony helps to solve some interaction problems due to the richness of its signaling information. However, many new feature interaction problems are also introduced in Internet telephony systems, especially in end systems, which are usually dumb in PSTN systems, but highly functional in Internet telephony systems. Internet telephony end systems, such as SIP soft-agents, can run on personal computers. The soft-agents can then perform call control and many other functions, such as presence information handling, instant messaging, and network appliance control. These new functionalities make the end system feature interaction problems more complicated. In this paper, we investigate ways features interact in Internet telephony end systems and propose a potential solution for detecting and avoiding feature interactions. Our solutions are based on the Session Initiation Protocol (SIP) and the Language for End System Services (LESS), which is a markup language specifically for end system service creation.	(pdf) (ps)
The Complexity of Fredholm Equations of the Second Kind: Noisy Information About Everything	Arthur G. Werschulz	2004-01-21	We study the complexity of Fredholm problems of the second kind $u - \int_\Omega k(\cdot,y)u(y)\,dy = f$. Previous work on the complexity of this problem has assumed that $\Omega$ was the unit cube~$I^d$. In this paper, we allow~$\Omega$ to be part of the data specifying an instance of the problem, along with~$k$ and~$f$. More precisely, we assume that $\Omega$ is the diffeomorphic image of the unit $d$-cube under a $C^{r_1}$ mapping~$\rho\:I^d\to I^l$. In addition, we assume that $k\in C^{r_2}(I^{2l})$ and $f\in W^{r_3,p}(I^l)$ with $r_3>l/p$. Our information about the problem data is contaminated by $\delta$-bounded noise. Error is measured in the $L_p$-sense. We find that the $n$th minimal error is bounded from below by $\Theta(n^{-\mu_1}+\delta)$ and from above by $\Theta(n^{-\mu_2}+\delta)$, where $$\mu_1 = \min\left\{\frac{r_1}{d}, \frac{r_2}{2d}, \frac{r_3-(d-l)/p}d\right\} \qquad\text{and}\qquad \mu_2 = \min\left\{\frac{r_1-\nu}d, \frac{r_2}{2d}, \frac{r_3-(l-d)/p}d\right\},$$ with $$\nu = \begin{cases} \displaystyle\frac{d}p & \text{if $r_1\ge 2$, $r_2\ge2$, and $d\le p$},\\ & \\ 1 & \text{otherwise}. \end{cases}$$ In particular, the $n$th minimal error is proportional to $\Theta(n^{-\mu_1}+\delta)$ when $p=\infty$. The upper bound is attained by a noisy modified Galerkin method, which can be efficiently implemented using multigrid techniques. We thus find bounds on the $\varepsilon$-complexity of the problem, these bounds depending on the cost $\mathbf{c}(\delta)$ of calculating a $\delta$-noisy function value. As an example, if $\mathbf{c}(\delta)=\delta^{-b}$, we find that the $\varepsilon$-complexity is between $(1/\varepsilon)^{b+1/\mu_1}$ and $(1/\varepsilon)^{b+1/\mu_2}$.	(pdf) (ps)
Secret Key Cryptography Using Graphics Cards	Debra L. Cook, John Ioannidis, Angelos D, Jake Luck	2004-01-13	One frequently cited reason for the lack of wide deployment of cryptographic protocols is the (perceived) poor performance of the algorithms they employ and their impact on the rest of the system. Although high-performance dedicated cryptographic accelerator cards have been commercially available for some time, market penetration remains low. We take a different approach, seeking to exploit {\it existing system resources,} such as Graphics Processing Units (GPUs) to accelerate cryptographic processing. We exploit the ability for GPUs to simultaneously process large quantities of pixels to offload cryptographic processing from the main processor. We demonstrate the use of GPUs for stream ciphers, which can achieve 75\% the performance of a fast CPU. We also investigate the use of GPUs for block ciphers, discuss operations that make certain ciphers unsuitable for use with a GPU, and compare the performance of an OpenGL-based implementation of AES with implementations utilizing general CPUs. In addition to offloading system resources, the ability to perform encryption and decryption within the GPU has potential applications in image processing by limiting exposure of the plaintext to within the GPU.	(pdf) (ps)
Automating Content Extraction of HTML Documents	Suhit Gupta, Gail Kaiser, Peter Grimm, M, Justin Starren	2004-01-06	Web pages often contain clutter (such as unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction of "useful and relevant" content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, and text summarization. Most approaches to making content more readable involve changing font size or removing HTML and data components such as images, which takes away from a webpage's inherent look and feel. Unlike "Content Reformatting", which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses "Content Extraction". We have developed a framework that employs an easily extensible set of techniques. It incorporates advantages of previous work on content extraction. Our key insight is to work with DOM trees, a W3C specified interface that allows programs to dynamically access document structure, rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy to extract content from HTML web pages. This proxy can be used both centrally, administered for groups of users, as well as by individuals for personal browsers. We have also, after receiving feedback from users about the proxy, created a revised version with improved performance and accessibility in mind.	(pdf) (ps)
AIM Encrypt: A Case Study of the Dangers of Cryptographic Urban Legends	Michael E. Locasto	2003-11-26	Like e--mail, instant messaging (IM) has become an integral part of life in a networked society. Until recently, IM software has been lax about providing confidentiality and integrity of these conversations. With the introduction of AOL's version 5.2.3211 of the AIM client, users can optionally encrypt and protect the integrity of their conversation. Taking advantage of the encryption capabilities of the AIM client requires that signed certificates for both parties be available. AIM (through VeriSign) makes such certificates available for purchase. However, in a ``public service'' effort to defray the cost of purchasing personal certificates to protect IM conversations, a website (www.aimencrypt.com) is offering a certificate free of cost for download. Unfortunately, the provided certificate is the same for everyone; this mistake reveals the dangers of a public undereducated about computer security, especially public key cryptography.	(pdf) (ps)
Countering Network Worms Through Automatic Patch Generation	Stelios Sidiroglou, Angelos D. Keromytis	2003-11-19	The ability of worms to spread at rates that effectively preclude human-directed reaction has elevated them to a first-class security threat to distributed systems. We propose an architecture for automatically repairing software flaws that are exploited by zero-day worms.Our approach relies on source code transformations to quickly apply automatically-created (and tested) localized patches to vulnerable segments of the targeted application. To determine these susceptible portions, we use a sandboxed instance of the application as a ``clean room'' laboratory that runs in parallel with the production system and exploit the fact that a worm must reveal its infection vector to achieve its goal ( i.e., further infection). We believe our approach to be the first end-point solution to the problem of malicious self-replicating code. The primary benefits of our approach are (a) its low impact on application performance, (b) its ability to respond to attacks without human intervention, and (c) its capacity to deal with zero-day worms (for which no known patches exist). Furthermore, our approach does not depend on a centralized update repository, which can be the target of a concerted attack similar to the Blaster worm. Finally, our approach can also be used to protect against lower intensity attacks, such as intrusion (``hack-in'') attempts. To experimentally evaluate the efficacy of our approach, we use our prototype implementation to test a number of applications with known vulnerabilities. Our preliminary results indicate a success rate of 82\%, and a maximum repair time of 8.5 seconds.	(pdf) (ps)
Dynamical Systems Trees	Tony Jebara, Andrew Howard	2003-11-05	We propose dynamical systems trees (DSTs) as a fexible model for describing multiple processes that interact via a hierarchy of aggregating processes. DSTs extend nonlinear dynamical systems to an interactive group scenario. Various individual processes interact as communities and sub-communities in a tree structure that is un-rolled in time. To accommodate nonlinear temporal activity, each individual leaf process is modeled as a dynamical system containing discrete and/or continuous hidden states with discrete and/or Gaussian emissions. Subsequent, higher level parent processes act like hidden Markov models that mediate the interaction between leaf processes or between other parent processes in the hierarchy. Aggregator chains are parents of the child processes the combine and mediate, yielding a compact overall parameterization. We provide tractable inference and learning algorithms for arbitrary DSTs topologies via structured mean field. Experiments are shown for real trajectory data of tracked American football plays where a DST tracks players as dynamical systems mediated by their team processes mediated in turn by a top-level game process.	(pdf) (ps)
Comprehensive Multi-platform Collaboration	Kundan Singh, Xiaotao Wu, Jonathan Lenno, Henning Schulzrinne	2003-10-26	We describe the architecture and implementation of our comprehensive multi-platform collaboration framework known as Columbia InterNet Extensible Multimedia Architecture (CINEMA). It provides a distributed architecture for collaboration using synchronous communications like multimedia conferencing, instant messaging, shared web-browsing, and asynchronous communications like discussion forums, shared files, voice and video mails. It allows seamless integration with various communication means like telephones, IP phones, web and electronic mail. In addition, it provides value-added services such as call handling based on location information and presence status. The paper discusses the media services needed for collaborative environment, the components provided by CINEMA and the interaction among those components.	(pdf) (ps)
Retrofitting Autonomic Capabilities onto Legacy Systems	Janak Parekh, Gail Kaiser, Philip Gross, , Giuseppe Valetto	2003-10-16	Autonomic computing - self-configuring, self-healing, self-optimizing applications, systems and networks - is a promising solution to ever-increasing system complexity and the spiraling costs of human management as systems scale to global proportions. Most results to date, however, suggest ways to architect new software constructed from the ground up as autonomic systems, whereas in the real world organizations continue to use stovepipe legacy systems and/or build "systems of systems" that draw from a gamut of disparate technologies from numerous vendors. Our goal is to retrofit autonomic computing onto such systems, externally, without any need to understand, modify or even recompile the target system's code. We present an autonomic infrastructure that operates similarly to active middleware, to explicitly add autonomic services to pre-existing systems via continual monitoring and a feedback loop that performs, as needed, reconfiguration and/or repair. Our lightweight design and separation of concerns enables easy adoption of individual components, independent of the rest of the full infrastructure, for use with a large variety of target systems. This work has been validated by several case studies spanning multiple application domains.	(pdf) (ps)
Evaluating Content Selection in Human- or Machine-Generated Summaries: The Pyramid Scoring Method	Rebecca J. Passonneau, Ani Nenkova	2003-09-03	From the outset of automated generation of summaries, the difficulty of evaluation has been widely discussed. Despite many promising attempts, we believe it remains an unsolved problem. Here we present a method for scoring the content of summaries of any length against a weighted inventory of content units, which we refer to as a pyramid. Our method is derived from empirical analysis of human-generated summaries, and provides an informative metric for human or machine-generated summaries.	(pdf) (ps)
Security testing of SIP implementations	Christian Wieser, Marko Laakso, , Henning Schulzrinne	2003-08-25	The Session Initiation Protocol (SIP) is a signaling protocol for Internet telephony, multimedia conferencing and instant messaging. Although SIP implementations have not yet been widely deployed, the product portfolio is expanding rapidly. We describe a method to assess the robustness of SIP implementation by describing a tool to find vulnerabilities. We prepared the test material and carried out tests against a sample set of existing implementations. Results were reported to the vendors and the test suite was made publicly available. Many of the implementations available for evaluation failed to perform in a robust manner under the test. Some failures had information security implications, and should be considered vulnerabilities.	(pdf) (ps)
Integrating Categorization, Clustering, and Summarization for Daily News Browsing	Regina Barzilay, David Evans, Vasileios, Sergey Sigelman	2003-08-12	Recently, there have been significant advances in several areas of language technology, including clustering, text categorization, and summarization. However, efforts to combine technology from these areas in a practical system for information access have been limited. In this paper, we present a system that integrates cutting-edge technology in these areas to automatically collect news articles from multiple sources, organize them and present them in both hierarchical and text summary form. Our system is publicly available and runs daily over real data. Through a sizable user evaluation, we show that users strongly prefer using the advanced features incorporated in our system, and that these features help users achieve more efficient browsing of news.	(pdf) (ps)
9-1-1 Calls for Voice-over-IP	Henning Schulzrinne	2003-08-10	This document enumerates some of the major opportunities and challenges for providing emergency call (9-1-1) services using IP technology. In particular, all VoIP devices are effectively mobile. The same IP telephony device works anywhere in the Internet, keeping the same external identifier such as an E.164 number or URL. (Note: This was also submitted as an ex-parte filing to the Federal Communications Commission.)	(pdf) (ps)
A Holistic Approach to Service Survivability	Angelos D. Keromytis, Janak Parekh, Phil, Sal Stolfo	2003-07-10	We present SABER (Survivability Architecture: Block, Evade, React), a proposed survivability architecture that blocks, evades and reacts to a variety of attacks by using several security and survivability mechanisms in an automated and coordinated fashion. Contrary to the ad hoc manner in which contemporary survivable systems are built--using isolated, independent security mechanisms such as firewalls, intrusion detection systems and software sandboxes--SABER integrates several different technologies in an attempt to provide a unified framework for responding to the wide range of attacks malicious insiders and outsiders can launch. This coordinated multi-layer approach will be capable of defending against attacks targeted at various levels of the network stack, such as congestion-based DoS attacks, software-based DoS or code-injection attacks, and others. Our fundamental insight is that while multiple lines of defense are useful, most conventional, uncoordinated approaches fail to exploit the full range of available responses to incidents. By coordinating the response, the ability to survive even in the face of successful security breaches increases substantially. We discuss the key components of SABER, how they will be integrated together, and how we can leverage on the promising results of the individual components to improve survivability in a variety of coordinated attack scenarios. SABER is currently in the prototyping stages, with several interesting open research topics.	(pdf) (ps)
Using Process Technology to Control and Coordinate Software Adaptation	Giuseppe Valetto, Gail Kaiser	2003-07-09	We have developed an infrastructure for end-to-end run-time monitoring, behavior/performance analysis, and dynamic adaptation of distributed software. This infrastructure is primarily targeted to pre-existing systems and thus operates outside the target application, without making assumptions about the target's implementation, internal communication/computation mechanisms, source code availability, etc. This paper assumes the existence of the monitoring and analysis components, presented elsewhere, and focuses on the mechanisms used to control and coordinate possibly complex repairs/reconfigurations to the target system. These mechanisms require lower level effectors somehow attached to the target system, so we briefly sketch one such facility (elaborated elsewhere). Our main contribution is the model, architecture, and implementation of Workflakes, the decentralized process engine we use to tailor, control, coordinate, etc. a cohort of such effectors. We have validated the Workflakes approach with case studies in several application domains. Due to space restrictions we concentrate primarily on one case study, briefly discuss a second, and only sketch others.	(pdf) (ps)
Kinesthetics eXtreme: An External Infrastructure for Monitoring Distributed Legacy Systems	Gail Kaiser, Janak Parekh, Philip Gross, Giuseppe Valetto	2003-07-06	Autonomic computing - self-configuring, self-healing, self-optimizing applications, systems and networks - is widely believed to be a promising solution to ever-increasing system complexity and the spiraling costs of human system management as systems scale to global proportions. Most results to date, however, suggest ways to architect new software constructed from the ground up as autonomic systems, whereas in the real world organizations continue to use stovepipe legacy systems and/or build "systems of systems" that draw from a gamut of new and legacy components involving disparate technologies from numerous vendors. Our goal is to retrofit autonomic computing onto such systems, externally, without any need to understand or modify the code, and in many cases even when it is impossible to recompile. We present a meta-architecture implemented as active middleware infrastructure to explicitly add autonomic services via an attached feedback loop that provides continual monitoring and, as needed, reconfiguration and/or repair. Our lightweight design and separation of concerns enables easy adoption of individual components, as well as the full infrastructure, for use with a large variety of legacy, new systems, and systems of systems. We summarize several experiments spanning multiple domains.	(pdf) (ps)
Group Round Robin: Improving the Fairness and Complexity of Packet Scheduling	Bogdan Caprita, Wong Chun Chan, Jason Nieh	2003-07-02	We introduce Group Round-Robin (GRR) scheduling, a hybrid scheduling framework based on a novel grouping strategy that narrows down the traditional tradeoff between fairness and computational complexity. GRR combines its grouping strategy with a specialized round-robin scheduling algorithm that utilizes the properties of GRR groups to schedule flows within groups in a manner that provides O(1) bounds on fairness with only O(1) time complexity. Under the practical assumption that GRR employs a small constant number of groups, we apply GRR to popular fair queueing scheduling algorithms and show how GRR can be used to achieve constant bounds on fairness and time complexity for these algorithms.	(pdf)
A General Framework for Designing Catadioptric Imaging and Projection Systems	Rahul Swaminathan, Michael D. Grossberg, Shree K. Nayar	2003-07-01	New vision applications have been made possible and old ones improved through the creation and design of novel catadioptric systems. Critical to the design of catadioptric imaging is determining the shape of one or more mirrors in the system. Almost all the previously designed mirrors for catadioptric systems used case specific tools and considerable effort on the part of the designer. Recently some new general methods have been proposed to automate the design process. However, all the methods presented so far determine the mirror shape by optimizing its geometric properties, such as surface normal orientations. A more principled approach is to determine a mirror that reduces image errors. In this paper we present a method for finding mirror shapes which meet user determined specifications while minimizing image error. We accomplish this by deriving a first order approximation of the image error. This permits us to compute the mirror shape using a linear approach that provides good results efficiently while avoiding the numerical problems associated with non-linear optimization. Since the design of mirrors can also be applied to projection systems, we also provide a method to approximate projection errors in the scene. We demonstrate our approach on various catadioptric systems and show our approach to provide much more accurate imaging characteristics. In some cases we achieved reduction in image error up to 80 percent.	(pdf)
Using Prosodic Features of Speech and Audio Localization in Graphical User Interfaces	Alex Olwal, Steven Feiner	2003-06-26	We describe several approaches for using prosodic features of speech and audio localization to control inter-active applications. This information can be used for parameter control, as well as for disambiguating speech recognition. We discuss how characteristics of the spoken sentences can be exploited in the user interface; for example, by considering the speed with which the sentence was spoken and the presence of extraneous utterances. We also show how coarse audio localization can be used for low-fidelity gesture tracking, by inferring the speaker's head position.	(pdf) (ps)
Statistical Acquisition of Content Selection Rules for Natural Language Generation	Pablo A. Duboue, Kathleen R. McKeown	2003-05-30	A Natural Language Generation system produces text using as input semantic data. One of its very first tasks is to decide which pieces of information to convey in the output. This task, called Content Selection, is quite domain dependent, requiring considerable re-engineering to transport the system from one scenario to another. In (Duboue and McKeown, 2003), we presented a method to acquire content selection rules automatically from a corpus of text and associated semantics. Our proposed technique was evaluated by comparing its output with information selected by human authors in unseen texts, where we were able to filter half the input data set without loss of recall. This report contains additional technical information about our system.	(pdf) (ps)
A platform for multilingual news summarization	David Kirk Evans, Judith L. Klavans	2003-05-23	We have developed a multilingual version of Columbia Newsblaster as a testbed for multilingual multi-document summarization. The system collects, clusters, and summarizes news documents from sources all over the world daily. It crawls news sites in many different countries, written in different languages, extracts the news text from the HTML pages, uses a variety of methods to translate the documents for clustering and summarization, and produces an English summary for each cluster. The system is robust, running daily over real-world data. The multilingual version of Columbia Newsblaster provides a platform for testing different strategies for multilingual document clustering, and approaches for multilingual multi-document summarization.	(pdf) (ps)
Querying Faceted Databases	Kenneth Ross, Angel Janevski	2003-05-21	Faceted classification allows one to model applications with complex classification hierarchies using orthogonal dimensions. Recent work has examined the use of faceted classification for browsing and search. In this paper, we go further by developing a general query language, called the entity algebra, for hierarchically classified data. The entity algebra is compositional, with query inputs and outputs being sets of entities. Our language has linear data complexity in terms of space and quadratic data complexity in terms of time. We compare the entity algebra with the relational algebra in terms of expressiveness. We also describe an implementation of the language in the context of two application domains, one for an archeological database, and another for a human anatomy database.	(pdf) (ps)
Group Ratio Round-Robin: An O(1) Proportional Share Scheduler	Wong Chun Chan, Jason Nieh	2003-05-15	Proportional share resource management provides a flexible and useful abstraction for multiplexing time-shared resources. However, previous proportional share mechanisms have either weak proportional sharing accuracy or high scheduling overhead. We present Group Ratio Round-Robin (GR3), a proportional share scheduler that can provide high proportional sharing accuracy with O(1) scheduling overhead. Unlike many other schedulers, a low-overhead GR3 implementation is easy to build using simple data structures. We have implemented GR3 in Linux and measured its performance against other schedulers commonly used in research and practice, including the standard Linux scheduler, Weighted Fair Queueing, Virtual-Time Round-Robin, and Smoothed Round-Robin. Our experimental results demonstrate that GR3 can provide much lower scheduling overhead and better scheduling accuracy in practice than these other approaches for large numbers of clients.	(pdf) (ps)
Approximate String Joins in a Database (Almost) for Free -- Erratum	Luis Gravano, Panagiotis G. Ipeirotis, H, Divesh Srivastava	2003-05-14		(pdf) (ps)
Newsblaster Russian-English Clustering Performance Analysis	Lawrence J. Leftin	2003-05-12	The Natural Language Group is developing a multi-language version of Columbia Newsblaster, a program that generates summaries of news articles collected from web sites. Newsblaster currently processes articles in Arabic, Japanese,Portuguese, Spanish, and Russian, as well as English. This report outlines the Russian language processing software,focusing on machine translation and document clustering. Russian-English clustering results are analyzed and indicate encouraging inter-language and intra-language performance.	(pdf) (ps)
Design Languages for Embedded Systems	Stephen A. Edwards	2003-05-11	Embedded systems are application-specific computers that interact with the physical world. Each has a diverse set of tasks to perform, and although a very flexible language might be able to handle all of them, instead a variety of problem-domain-specific languages have evolved that are easier to write, analyze, and compile. This paper surveys some of the more important languages, introducing their central ideas quickly without going into detail. A small example of each is included.	(pdf) (ps)
Parallel Probing of Web Databases for Top-k Query Processing	Am?lie Marian, Luis Gravano	2003-05-09	A top-k query specifies a set of preferred values for the attributes of a relation and expects as a result the k objects that are closest to the given preferences according to some distance function. In many web applications, the relation attributes are only available via probes to autonomous web-accessible sources. Probing these sources sequentially to process a top-k query is inefficient, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. These characteristics of web sources motivate the introduction of parallel top-k query processing strategies, which are the focus of this paper. We present efficient techniques that maximize source-access parallelism to minimize query response time, while satisfying source access constraints. A thorough experimental evaluation over both synthetic and real web sources shows that our techniques can be significantly more efficient than previously proposed sequential strategies. In addition, we adapt our parallel algorithms for the alternate optimization goal of minimizing source load while still exploiting source-access parallelism.	(pdf) (ps)
A Multi-resolution Block Storage Model for Database Design	Jingren Zhou, Kenneth A. Ross	2003-05-05	We propose a new storage model called MBSM (Multi-resolution Block Storage Model) for laying out tables on disks. MBSM is intended to speed up operations such as scans that are typical of data warehouse workloads. Disk blocks are grouped into ``super-blocks,'' with a single record stored in a partitioned fashion among the blocks in a super-block. The intention is that a scan operation that needs to consult only a small number of attributes can access just those blocks of each super-block that contain the desired attributes. To achieve good performance given the physical characteristics of modern disks, we organize super-blocks on the disk into fixed-size ``mega-blocks.'' Within a mega-block, blocks of the same type (from various super-blocks) are stored contiguously. We describe the changes needed in a conventional database system to manage tables using such a disk organization. We demonstrate experimentally that MBSM outperforms competing approaches such as NSM (N-ary Storage Model), DSM (Decomposition Storage Model) and PAX (Partition Attributes Across), for I/O bound decision-support workloads consisting of scans in which not all attributes are required. This improved performance comes at the expense of single-record insert and delete performance; we quantify the trade-offs involved. Unlike DSM, the cost of reconstructing a record from its partitions is small. MBSM stores attributes in a vertically partitioned manner similar to PAX, and thus shares PAX's good CPU cache behavior. We describe methods for mapping attributes to blocks within super-blocks in order to optimize overall performance, and show how to tune the super-block and mega-block sizes.	(pdf) (ps)
A Hybrid Approach for Answering Definitional Questions	Sasha Blair-Goldensohn, Kathleen R. McKe, Andrew Hazen Schlaikjer	2003-04-30	We present DefScriber, a fully implemented system that combines knowledge-based and statistical methods in forming multi-sentence answers to open-ended definitional questions of the form, ``What is X?'' We show how a set of definitional predicates proposed as the knowledge-based side of our approach can be used to guide the selection of definitional sentences. Finally, we present results of an evaluation of definitions generated by DefScriber from Internet documents.	(pdf) (ps)
SWAP: A Scheduler With Automatic Process Dependency Detection	Haoqiang Zheng, Jason Nieh	2003-04-18	Cooperating processes are increasingly used to structure modern applications in common client-server computing environments. This cooperation among processes often results in dependencies such that a certain process cannot proceed until other processes finish some tasks. Despite the popularity of using cooperating processes in application design, operating systems typically ignore process dependencies and schedule processes independently. This can result in poor system performance due to the actual scheduling behavior contradicting the desired scheduling policy. To address this problem, we have developed SWAP, a system that automatically detects process dependencies and accounts for such dependencies in scheduling. SWAP uses system call history to determine possible resource dependencies among processes in an automatic and fully transparent fashion. Because some dependencies cannot be precisely determined, SWAP associates confidence levels with dependency information that are dynamically adjusted using feedback from process blocking behavior. SWAP can schedule processes using this imprecise dependency information in a manner that is compatible with existing scheduling mechanisms and ensures that actual scheduling behavior corresponds to the desired scheduling policy in the presence of process dependencies. We have implemented SWAP in Linux and measured its effectiveness on microbenchmarks and real applications. Our experiment results show that SWAP has low overhead and can provide substantial improvements in system performance in scheduling processes with dependencies.	(pdf) (ps)
Projecting XML Documents	Amelie Marian, Jerome Simeon	2003-04-14	XQuery is not only useful to query XML in databases, but also to applications that must process XML documents as files or streams. These applications suffer from the limitations of current main-memory XQuery processors which break for rather small documents. In this paper we propose techniques, based on a notion of projection for XML, which can be used to drastically reduce memory requirements in XQuery processors. The main contribution of the paper is a static analysis technique that can identify at compile time which parts of the input document are needed to answer an arbitrary XQuery. We present a loading algorithm that takes the resulting information to build a projected document, which is smaller than the original document, and on which the query yields the same result. We implemented projection in the Galax XQuery processor. Our experiments show that projection reduces memory requirements by a factor of 20 on average, and is effective for a wide variety of queries. In addition, projection results in some speedup during query evaluation.	(pdf) (ps)
Personalized Search of the Medical Literature: An Evaluation	Vasileios Hatzivassililoglou, Simone Teu, Sergey Sigelman	2003-03-31	We describe a system for personalizing a set of medical journal articles (possibly created as the output of a search engine) by selecting those documents that specifically match a patient under care. Key element in our approach is the use of targeted parts of the electronic patient record to serve as a readily available user model for the personalization task. We discuss several enhancements to a TF*IDF based approach for measuring the similarity between articles and the patient record. We also present the results of an experiment involving almost 3,000 relevance judgments by medical doctors. Our evaluation establishes that the automated system surpasses in performance alternative methods for personalizing the set of articles, including keyword-based queries manually constructed by medical experts for this purpose.	(pdf) (ps)
A Framework for 3D Pushbroom Imaging	Naoyuki Ichimura, Shree K. Nayar	2003-03-14	Pushbroom cameras produce one-dimensional images of a scene with high resolution at a high frame-rate. As a result, they provide superior data compared to conventional two-dimensional cameras in cases where the scene of interest can be temporally scanned. In this paper, we consider the problem of recovering the structure of a scene using a set of pushbroom cameras. Although pushbroom cameras have been used to recover scene structure in the past, the algorithms for recovery were developed separately for different camera motions such as translation and rotation. In this paper, we present a general framework of structure recovery for pushbroom cameras with 6 degree-of-freedom motion. We analyze the translation and rotation cases using our framework and demonstrate that several previous results are really special cases of our result. Using this framework, we also show that three or more pushbroom cameras can be used to compute scene structure as well as motion of translation or rotation. We conclude with a set of experiments that demonstrate the use of pushbroom imaging to recover structure from unknown motion.	(pdf) (ps)
Improving the Coherence of Multi-document Summaries: a Corpus Study for Modeling the Syntactic Realization of Entities	Ani Nenkova, Kathleen McKeown	2003-03-04	References included in multi-document summaries are often problematic. In this paper, we present a corpus study performed to derive statistical models for the syntactic realization of referential expressions. Our work shows how the syntactic realization of entities can influence the coherence of the text and provides a model for rewriting references in multi-document summaries to smooth disfluencies. It shows how the syntactic realization of entities can influence the coherence of the text and how rewrite change s can smooth the disfluencies. A large corpus study is conducted in order to derive initial models for syntactic realization.	(pdf) (ps)
Protecting SNMP Through MarketNet	Marcela Jackowski	2002-12-19	As dependency on information technology becomes more critical so does the need for network computer security. Because of the distributed nature of networks, large-scale information systems are highly vulnerable to negative elements such as intruders and attackers. The types of attack on a system can be diverse and from different sources. Some of the factors contributing to creating an insecure system are the relentless pace of technology, the need for information processing, and the heterogeneity of hardware and software. In addition to these insecurities, the growth and success of e-commerce make networks a desirable target for intruders to steal credit card numbers, bank account balances, and other valuable information. This paper looks at two different security technologies, SNMP v3 and MarketNet, their architectures and how they have been developed to protect network resources and services, such as, internet applications, devices, and other services, against attacks.	(pdf) (ps)
Survivor: An Approach for Adding Dependability to Legacy Workflow Systems	Jean-Denis Gr??ze, Gail E. Kaiser, Gaura	2002-12-02	Although they often provide critical services, most workflow systems are not dependable. There has been much literature on dependable/survivable distributed systems; most is concerned with developing new architectures, not adapting pre-existing ones. Additionally, the literature is focused on hardening, security-based defense, as opposed to recovery. For deployed systems, it is often infeasible to completely replace existing infrastructures; what is more pragmatic are ways in which existing distributed systems can be adapted to offer better dependability. In this paper, we outline a general architecture that can easily be retrofitted to legacy workflow systems in order to improve dependability and fault tolerance. We do this by monitoring enactment and replicating partial workflow states as tools for detection, analysis and recovery. We discuss some policies that can guide these mechanisms. Finally, we describe and evaluate our implementation, Survivor, which modified an existing workflow system provided by the Naval Research Lab.	(pdf) (ps)
CASPER: Compiler-Assisted Securing of Programs at Runtime	Gaurav S. Kc, Stephen A. Edwards, Gail E	2002-11-19	Ensuring the security and integrity of computer systems deployed on the Internet is growing harder. This is especially true in the case of server systems based on open source projects like Linux, Apache, Sendmail, etc. since it is easier for a hacker to get access to the binary format of deployed applications if the source code to the software is publicly accessible. Often, having a binary copy of an application program is enough to help locate security vulnerabilities in the program. In the case of legacy systems where the source code is not available, advanced reverse-engineering and decompilation techniques can be used to construct attacks. This paper focuses on measures that reduce the effectiveness of hackers at conducting large-scale, distributed attacks. The first line of defense involves additional runtime checks that are able to counteract the majority of hacking attacks. Introducing diversity in deployed systems to severely diminish the probability of propagation to other systems helps to prevent effective attacks like the DDOS attack against the DNS root servers in October 21, 2002.	(pdf) (ps)
DOM-based Content Extraction of HTML Documents	Suhit Gupta, Gail Kaiser, David Neistadt	2002-11-15	Web pages often contain clutter around the body of the article as well as distracting features that take away from the true information that the user is pursuing. This can range from pop-up ads to flashy banners to unnecessary images and links scattered around the screen. Extraction of "useful and relevant" content from web pages, has many applications ranging from lightweight environments, like cell phone and PDA browsing, to speech rendering for the visually impaired, to text summarization Most approaches to removing the clutter or making the content more readable involves either changing the size of the font or simply removing certain HTML-denoted components like images, thus taking away from the webpage's inherent look and feel. Unlike Content Reformatting, which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses Content Extraction. We have developed a framework that employs an easily extensible set of techniques that incorporate advantages of previous work on content extraction while limiting the disadvantages. Our key insight is to work with the Document Object Model tree (after parsing and correcting the HTML), rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy that anyone can use to extract content from HTML web pages for their own purposes.	(pdf) (ps)
On Buffered Clos Switches	Santosh Krishnan, Henning Schulzrinne	2002-11-13	There is a widespread interest in switching architectures that can scale in capacity with increasing interface transmission rates and higher port counts. Furthermore, packet switches that provide Quality of Service (QoS), such as bandwidth and delay guarantees, to the served user traffic are also highly desired. This report addresses the issue of constructing a high-capacity QoS-capable, multi-module switching node. Output queued switches provide the best performance in terms of throughput as well as QoS but do not scale. Input queued switches, on the other hand, require complex arbitration procedures to achieve the same level of performance. We enumerate the design constraints in the construction of a packet switch and present several approaches to build a system composed of lower-capacity memory and space elements, and analyze their performance. Towards this goal, we establish a new taxonomy for a class of switches, which we call Buffered Clos switches, and present a formal framework for optimal packet switching performance, in terms of both throughput and QoS. Within the taxonomy, we augment the existing combined input-output queueing (CIOQ) systems with Aggregation and Pipelining techniques. Furthermore, we present the design and analysis of a novel parallel packet switch architecture. For the items in the taxonomy, we present algorithms that provide optimal throughput and QoS, in accordance with the above performance framework. While some of the presented ideas are still in the investigative stage, we believe that the current state of the work, especially the formal treatment of switching, will be beneficial to the ongoing research in the field.	(pdf) (ps)
An Overview of Information-Based Complexity	Arthur G. Werschulz	2002-10-16	\emph{Computational complexity} has two goals: finding the inherent cost of some problem, and finding optimal algorithms for solving this problem. \emph{Information-based complexity} (IBC) studies the complexity of problems for which the available information is partial, contaminated by error, and priced. Examples of such problems include integration, approximation, ordinary and partial differential equations, integral equations, and the solution of nonlinear problems such as root-finding and optimization. In this talk, we give a brief overview of IBC. We focus mainly on the integration problem (which is a simple, yet important, problem that can be used to illustrate the main ideas of IBC) and the approximation problem (which will be of most interest to specialists in learning theory). One important issue that we discuss is the ``curse of dimension''---the depressing fact that the worst case complexity of many problems depends exponentially on dimension, rendering them intractable. We explore IBC-based techniques for vanquishing the curse of dimension. In particular, we find that randomization beats intractability for the integration problem but not for the approximation problem; on the other hand, both these problems are tractable in the average case setting under a Wiener sheet measure.	(pdf) (ps)
An Approach to Autonomizing Legacy Systems	Gail Kaiser, Phil Gross, Gaurav Kc, Janak Parekh, Guiseppe Valetto	2002-09-14	Adding adaptation capabilities to existing distributed systems is a major concern. The question addressed here is how to retrofit existing systems with self-healing, adaptation and/or self management capabilities. The problem is obviously intensified for "systems of systems" composed of components, whether new or legacy, that may have been developed by different vendors, mixing and matching COTS and "open source" components. This system composition model is expected to be increasingly common in high performance computing. The usual approach is to train technicians to understand the complexities of these components and their connections, including performance tuning parameters, so that they can then manually monitor and reconfigure the system as needed. We envision instead attaching a "standard" feedback loop infrastructure to existing distributed systems for the purposes of continual monitoring and dynamically adapting their activities and performance. (This approach can also be applied to "new" systems, as an alternative to "building in" adaptation facilities, but we do not address that here.) Our proposed infrastructure consists of multiple layers with the objectives of probing, measuring and reporting of activity and state within the execution of the legacy system among its components and connectors; gauging, analysis and interpretation of the reported events; and possible feedback to focus the probes and gauges to drill deeper, or when necessary - direct but automatic reconfiguration of the running system.	(pdf) (ps)
Using Process Technology to Control and Coordinate Software Adaptation	Giuseppe Valetto, Gail Kaiser	2002-09-14	We have developed an infrastructure for end-to-end run-time monitoring, behavior / performance analysis, and dynamic adaptation of distributed software applications. This feedback-loop infrastructure is primarily targeted to pre-existing systems and thus operates outside the application itself without making assumptions about the target system's internal communication/computation mechanisms, implementation language/framework, availability of source code, etc. This paper assumes the existence of the monitoring and analysis components, presented elsewhere, and focuses on the mechanisms used to control and coordinate possibly complex repairs/reconfigurations to the target system. These mechanisms require lower-level actuators or effectors somehow attached to the target system, so we briefly sketch one such facility (elaborated elsewhere). The core of the paper is the model, architecture, and implementation of Workflakes, the decentralized process engine we use to tailor, control, coordinate, respond to contingencies, etc. regarding a cohort of such actuators. We have validated our approach and the Workflakes prototype in several case studies, related to different application domains. Due to space restrictions we concentrate primarily on one case study, elaborate with some details a second, and only sketch others.	(pdf) (ps)
Evaluating Top-k Queries over Web-Accessible Databases	Luis Gravano, Amelie Marian, Nicolas Bruno	2002-09-09	A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing such top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this paper, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present several new algorithms for processing such queries, and adapt existing techniques to our scenario as well. We also study the execution time of our algorithms analytically and present experimental results using both synthetic and real web-accessible data.	(pdf) (ps)
A Case Study In Software Adaptation	Guiseppe Valetto, Gail Kaiser	2002-09-09	We attach a feedback-control-loop infrastructure to an existing target system, to continually monitor and dynamically adapt its activities and performance. (This approach could also be applied to "new" systems, as an alternative to "building in" adaptation facilities, but we do not address that here.) Our infrastructure consists of multiple layers with the objectives of 1. probing, measuring and reporting of activity and state during the execution of the target system among its components and connectors; 2. gauging, analysis and interpretation of the reported events; and 3. whenever necessary, feedback onto the probes and gauges, to focus them (e.g., drill deeper), or onto the running target system, to direct its automatic adjustment and reconfiguration. We report on our successful experience using this approach in dynamic adaptation of a large-scale commercial application that requires both coarse and fine grained modifications.	(pdf) (ps)
Optimizing Top-K Selection Queries over Multimedia Repositories	Surajit Chaudhuri, Luis Gravano, Amelie Marian	2002-08-20	Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries, for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. In this paper, we investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted.	(pdf) (ps)
A Flexible and Efficient Protocol for Multi-Scope Service Registry Replication	Weibin Zhao, Henning Schulzrinne	2002-07-29	Service registries play an important role in service discovery systems by accepting service registrations and answering service queries; they can serve a wide range of purposes, such as membership services, lookup services, and search services. To provide fault tolerant, and enhance scalability, availability and performance, service registries often need to be replicated. In this paper, we present Swift (Selective anti-entropy WIth FasT update propagation), a flexible and efficient protocol for multi-scope service registry replication. As consistency is a less of concern compared with availability in service registry replication, we choose to build Swift on top of anti-entropy to support high availability replication. Swift makes two contributions as follows. First, it defines a more general and flexible form of anti-entropy called selective anti-entropy, which extends the applicability of anti-entropy from full replication to partial replication by selectively reconciling inconsistent states between two replicas, and improves anti-entropy efficiency by fine controlling update propagation within each subset. Selective anti-entropy is the first that we are aware of in using anti-entropy to support generic partial replication. Secondly, Swift integrates service registry overlay networks with selective anti-entropy. Different topologies, such as full mesh and spanning tree, can be used for constructing service registry overlay networks. These overlay networks are used to propagate new updates quickly so as to minimize inconsistency among replicas. We have implemented Swift for replicating multi-scope Directory Agents in the Service Location Protocol. Our experience shows that Swift is flexible, efficient, and lightweight.	(pdf) (ps)
The Design and Implementation of Elastic Quotas: A System for Flexible File System Management	Ozgur Can Leonard, Jason Nieh, Erez Zadok, Jeffrey Osborn, Ariye Shater, Charles Wright	2002-06-19	We introduce elastic quotas, a disk space management technique that makes disk space an elastic resource like CPU and memory. Elastic quotas allow all users to use unlimited amounts of available disk space while still providing system administrators the ability to control how the disk space is allocated among users. Elastic quotas maintain existing persistent file semantics while supporting user-controlled policies for removing files when the file system becomes too full. We have implemented an elastic quota system in Solaris and measured its performance. The system is simple to implement, requires no kernel modifications, and is compatible with existing disk space management methods. Our results show that elastic quotas are an effective, low-overhead solution for flexible file system management.	(pdf) (ps)
Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection	Panagiotis G. Ipeirotis, Luis Gravano	2002-06-19	Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query e??ciently and eRectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database contents. Unfortunately, web accessible text databases do not generally export content summaries. In this paper, we present an algorithm to derive content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases, automatically derived during probing, to compensate for potentially incomplete content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases. Our experiments indicate that our new content-summary construction technique is effcient and produces more accurate summaries than those from previously proposed strategies. Also, our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts.	(pdf) (ps)
Requirements for Scalable Access Control and Security Management Architectures	Angelos D. Keromytis, Jonathan M. Smith	2002-05-16	Maximizing local autonomy has led to a scalable Internet. Scalability and the capacity for distributed control have unfortunately not extended well to resource access control policies and mechanisms. Yet management of security is becoming an increasingly challenging problem, in no small part due to scaling up of measures such as number of users, protocols, applications, network elements, topological constraints, and functionality expectations. In this paper we discuss scalability challenges for traditional access control mechanisms and present a set of fundamental requirements for authorization services in large scale networks. We show why existing mechanisms fail to meet these requirements, and investigate the current design options for a scalable access control architecture. We argue that the key design options to achieve scalability are the choice of the representation of access control policy, the distribution mechanism for policy and the choice of access rights revocation scheme.	(pdf) (ps)
Using Density Estimation to Improve Text Categorization	Carl Sable, Kathleen McKeown, Vassilis Hatzivassiloglou	2002-05-01	This paper explores the use of a statistical technique known as density estimation to potentially improve the results of text categorization systems which label documents by computing similarities between documents and categories. In addition to potentially improving a system's overall accuracy, density estimation converts similarity scores to probabilities. These probabilities provide confidence measures for a system's predictions which are easily interpretable and could potentially help to combine results of various systems. We discuss the results of three complete experiments on three separate data sets applying density estimation to the results of a TF*IDF/Rocchio system, and we compare these results to those of many competing approaches.	(pdf) (ps)
CINEMA: Columbia InterNet Extensible Multimedia Architecture	Kundan Singh, Wenyu Jiang, Jonathan Lennox, Sankaran Narayanan, Henning Schulzrinne	2002-04-26	We describe the architecture and implementation of our Internet telephony system (CINEMA: Columbia InterNet Extensible Multimedia Architecture) intended to replace the departmental PBX (telephone switch). It interworks with the traditional telephone networks via a PSTN/IP gateway. It also serves as a corporate or campus infrastructure for existing and future services like web, email, video and streaming media. Initially intended for a few users, it will eventually replace the plain old telephones from our offices, due to the cost benefit and new services it offers. We also discuss common inter-operability problems between the PBX and the gateway. This paper is intended as a design document of the overall system.	(pdf) (ps)
Analysis of Routing Algorithms for Secure Overlay Service	Debra Cook	2002-04-23	The routing of packets through an overlay network designed to limit DDoS attacks is analyzed. The overlay network structure includes special purpose nodes which affect the routes taken through the overlay. Two main factors are considered: the routing algorithm utilized for the overlay and the method for selecting the special purpose nodes. The routing algorithms considered are based on methods originally defined for peer-to-peer services. A model was developed for the overlay network which allowed altering the routing algorithm, method for selection of special purpose nodes and the underlying ISP structure. The model was used to assess the impact of specific routing algorithms and selection methods on latency and path length. The implications of utilizing a specific method for node selection on the probability of a successful DDoS attack is briefly discussed.	(pdf) (ps)
Session-Aware Popularity-based Resource Allocation Across Several Differentiated Service Domains	Paulo Mendes, Henning Schulzrinne, Edmundo Monteiro	2002-04-17	The Differentiated Services model (DS) maps traffic into network services that have different quality levels. However, inside each service flows can be treated unfairly, since the DS model has no policy to distribute the service bandwidth between all sessions that compose the service aggregate traffic. Our goal is to study a signaling protocol that fairly distributes the resources reserved for each DS service between scalable multimedia sessions in a multicast network environment, where scalable sources divide session traffic in hierarchical layers, sending each layer to different multicast groups. We present a signaling protocol called Session-Aware Popularity Resource Allocation (SAPRA) that allows the distribution of DS services resources along sessions path, based upon the receiver population of each session. Simulations show that SAPRA protocol has small bandwidth overhead, is efficient updating the resources allocated to each session and also supplying receivers with reports about the quality level of their session.	(pdf) (ps)
Querying Large Text Databases for Efficient Information Extraction	Eugene Agichtein, Luis Gravano	2002-03-22	A wealth of data is hidden within unstructured text. This data is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database. This exhaustive approach is not practical, or sometimes even feasible, for large databases. In this paper, we develop an efficient query-based technique to identify documents that are potentially useful for the extraction of a target relation. We start by sampling the database to characterize the documents from which an information extraction system manages to extract relevant tuples. Then, we apply machine learning and information retrieval techniques to derive queries likely to match additional useful documents in the database. Finally, we issue these queries to the database to retrieve documents from which the information extraction system can extract the final relation. Our technique requires that databases support only a minimal boolean query interface, and is independent of the choice of the underlying information extraction system. We report a thorough experimental evaluation over more than one million documents that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents. Our proposed technique could be used to query a standard web search engine, hence providing a building block for efficient information extraction over the web at large.	(pdf) (ps)
Combining Mobile Agents and Process-based Coordination to Achieve Software Adaptation	Guiseppe Valetto, Gail Kaiser	2002-03-17	We have developed a model and a platform for end-to-end run-time monitoring, behavior and performance analysis, and consequent dynamic adaptation of distributed applications. This paper concentrates on how we coordinate and actuate the potentially multi-part adaptation, operating externally to the target systems, that is, without requiring any a priori built-in adaptation facilities on the part of said target systems. The actual changes are performed on the fly onto the target by communities of mobile software agents, coordinated by a decentralized process engine. These changes can be coarse-grained, such as replacing entire components or rearranging the connections among components, or fine-grained, such as changing the operational parameters, internal state and functioning logic of individual components. We discuss our successful experience using our approach in dynamic adaptation of a large-scale commercial application, which requires both coarse and fine grained modifications.	(pdf) (ps)
Holistic Twig Joins: Optimal XML Pattern Matching	Nicolas Bruno, Nick Koudas, Divesh Srivastava	2002-03-12	XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i)~using structural join algorithms to match the binary relationships against the XML database, and (ii)~stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable. In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.	(pdf) (ps)
SIPstone - Benchmarking SIP Server Performance	Henning Schulzrinne, Sankaran Narayanan, Jonathan Lennox, Michael Doyle	2002-03-11	SIP-based Internet telephony systems need to be appropriately dimensioned, as the call and registration rate can reach several thousand requests a second. This draft proposes an initial simple set of metrics for evaluating and benchmarking the performance of SIP proxy, redirect and registrar servers. The benchmark SIPstone-A expresses a weighted average of these metrics.	(pdf) (ps)
Performance of information discovery and message relaying in mobile ad hoc networks	Maria Papadopouli, Henning Schulzrinne	2002-03-09	This paper presents 7DS, a novel peer-to-peer resource sharing system. 7DS is an architecture, a set of protocols and an implementation enabling the exchange of data among peers that are not necessarily connected to the Internet. Peers can be either mobile or stationary. We focus on three different facets of cooperation, namely, data sharing, message relaying and network connection sharing. 7DS enables wireless devices to discover, disseminate, relay information among each other to increase the data access. We evaluate via extensive simulations the effectiveness of our system for data dissemination and message relaying among mobile devices with a large number of user mobility scenarios. We model several general data dissemination approaches and investigate the effect of the wireless coverage range, 7DS host density, and cooperation strategy among the mobile hosts as a function of time. We also present a power conservation mechanism that is beneficial, since it increases the power savings, without degrading the data dissemination. Using theory from random walks, random environments and diffusion of controlled processes, we model one of these data dissemination schemes and show that the analysis confirms the simulation results for this scheme.	(pdf) (ps)
Signalling Transport Protocols	Gonzalo Camarillo, Henning Schulzrinne, Raimo Kantola	2002-02-18	SCTP is a newly developed transport protocol tailored for signalling transport. Whereas in theory SCTP is supposed to achieve a much better performance than TCP and UDP, at present there are no ex-perimental results showing SCTP s real benefits. This paper analyzes SCTP s strengths and weaknesses and provides simulation results. We implemented SIP on top of UDP, TCP and SCTP in the network simulator and compared the three transport protocols under different network conditions.	(pdf) (ps)
Mobile Communication with Virtual Network Address Translation	Gong Su, Jason Nieh	2002-02-18	Virtual Network Address Translation (VNAT) is a novel architecture that allows transparent migration of end-to-end live network connections associated with various computation units. Such computation units can be either a single process, or a group of processes of an application, or an entire host. VNAT virtualizes network connections perceived by transport protocols so that identification of network connections is decoupled from stationary hosts. Such virtual connections are then remapped into physical connections to be carried on the physical network using network address translation. VNAT requires no modification to existing applications, operating systems, or protocol stacks. Furthermore, it is fully compatible with the existing communication infrastructure; virtual and normal connections can coexist without interfering each other. VNAT functions entirely within end systems and requires no third party proxies. We have implemented a VNAT prototype with the Linux 2.4 kernel and demonstrated its functionality on a wide range of popular real-world network applications. Our performance results show that VNAT has essentially no overhead except when connections are migrated, in which case the overhead of our Linux prototype is less than 7 percent.	(pdf) (ps)
Extending SDARTS: Extracting Metadata from Web Databases and Interfacing with the Open Archives Initiative	Panagiotis G. Ipeirotis, Tom Barry, Luis Gravano	2002-02-01	SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines two complementary existing protocols, SDLIP and STARTS, to define a uniform interface that collections should support for searching and exporting metasearch-related metadata. SDARTS also includes a toolkit with wrappers that are easily customized to make both local and remote document collections SDARTS-compliant. This paper describes two significant ways in which we have extended the SDARTS toolkit. First, we have added a tool that automatically builds rich content summaries for remote web collections by probing the collections with appropriate queries. These content summaries can then be used by a metasearcher to select over which collections to evaluate a given query. Second, we have enhanced the SDARTS toolkit so that all SDARTS-compliant collections export their metadata under the emerging Open Archives Initiative (OAI) protocol. Conversely, the SDARTS toolkit now also allows all OAI-compliant collections to be made SDARTS-compliant with minimal effort. As a result, we implemented a bridge between SDARTS and OAI, which will facilitate easy interoperability among a potentially large number of collections. The SDARTS toolkit, with all related documentation and source code, is publicly available at http://sdarts.cs.columbia.edu.	(pdf) (ps)
The Design of High-Throughput Asynchronous Pipelines	Montek Singh	2001-12-31	Clocked or synchronous design has traditionally been used for nearly all digital systems. However, it is now facing significant challenges as clock rates reach several GigaHertz, chip sizes increase, and the demand for low power and modular design become paramount. An alternative paradigm is clockless or asynchronous design, which has several potential advantages towards meeting these challenges. This thesis focuses on the design of very high-speed asynchronous systems. A more specific focus of this thesis is on high-throughput asynchronous pipelines, since pipelining is at the heart of most high-performance systems. This thesis contributes four new asynchronous pipeline styles: "lookahead," "high-capacity," "MOUSETRAP" and "dynamic GasP" pipelines. The styles differ from each other in many aspects, such as protocols, storage capacity, implementation style, and timing assumptions. The new styles are capable of multi-GigaHertz throughputs in today's silicon technology (e.g., 0.13-0.18 micron), yet each style has a simple implementation. High throughputs are obtained through efficient pipelining of systems at a fine granularity, though the pipeline styles are also useful for coarser-grain applications. The basic pipeline styles are extended to address several issues that arise in practice while designing real-world systems. In particular, the styles are generalized to handle a greater variety of architectures (e.g., datapaths with forks and joins), and to robustly interface with arbitrary-speed environments. Finally, the approaches of this thesis are validated by designing and fabricating real VLSI subsystems, including: simple FIFO's, pipelined adders, and an experimental digital FIR filter chip. All chips were tested to be fully functional; throughputs of over 2.4 GHz for the FIFO's, and up to 1.3 GHz for the FIR filter, were obtained in an IBM 0.18 micron technology.	(pdf) (ps)
G.729 Error Recovery for Internet Telephony	Jonathan Rosenberg	2001-12-19	This memorandum discusses the use of the ITU G.729 CS-ACELP speech coder on the Internet for telephony applications. In particular, the memo explores issues of error resiliency and recovery. In particular, the paper considers two questions. First, given N consecutive frame erasures (due to packet losses), how long does the decoder take to resynchronize its state with the encoder? What is the strength of the resulting error signal, both in terms of objective and subjective measures? The second issue explores which particular factors contribute to the strength of the error signal: the distortion of the speech due to the incorrect concealment of the erasures, or the subsequent distortion due to the loss of state synchronization, even though correct packets are being received. Both objective and subjective measures are used to characterize the importance of each of these two factors.	(pdf) (ps)
Where Does Smoothness Count the Most For Fredholm Equations of the Second Kind With Noisy Information?	Arthur G. Werschulz	2001-12-19	We study the complexity of Fredholm problems $(I-T_k)u=f$ of the second kind on the $I^d=[0,1]^d$. Previous work on the complexity of this problem has assumed either that we had complete information about the kernel~$k$ or that the kernel~$k$ and the right-hand side~$f$ had the same smoothness; moreover, this information is usually exact. In this paper, we suppose that $f\in W^{r,p}(I^d)$ and $k\in W^{s,p}(I^d)$. We also assume that $\delta$-noisy standard information is available. We find that the $n$th minimal error is $\Theta(n^{-\mu}+\delta)$, where $\mu = \min\{r/d, s/(2d)\}$, and that a noisy modified finite element method (MFEM) has nearly minimal error. This noisy modified FEM can be efficiently implemented using multigrid techniques. Using these results, we find tight bounds on the $\varepsilon$-complexity for this problem, said bounds depending on the cost~$c(\delta)$ of calculating a $\delta$-noisy information value. As an example, if the cost of a $\delta$-noisy evaluation is proportional to $\delta^{-t}$, then the $\varepsilon$-complexity is roughly $(1/\varepsilon)^{1/\mu+t}$.	(pdf) (ps)
Surface Approximation May Be Easier Than Surface Integration	Arthur G. Werschulz, Henryk Wozniakowski	2001-12-19	The approximation and integration problems consist of finding an approximation to a function~$f$ or its integral over some fixed domain~$\Sigma$. For the classical version of these problems, we have partial information about the functions~$f$ and complete information about the domain~$\Sigma$; for example, $\Sigma$ might be a cube or ball in~$\mathbb R^d$. When this holds, it is generally the case that integration is not harder than approximation; moreover, integration can be much easier than approximation. What happens if we have partial information about~$\Sigma$? This paper studies the surface approximation and surface integration problems, in which $\Sigma=\Sigma_g$ for functions~$g$. More specifically, our functions~$f$ are $r$~times continuously differentiable scalar functions of $l$~variables, and $g$ are $s$~times continuously differentiable injective functions of~$d$ variables with $l$~components. Error for the surface approximation problem is measured in the $L_q$-sense. Our problems are well-defined, provided that $d\le l$, $r\ge 0$, and $s\ge 1$. Information consists of function evaluations of~$f$ and~$g$. We show that the $\varepsilon$-complexity of surface approximation is proportional to $(1/\varepsilon)^{1/\mu}$ with $\mu=\mrs/d$. We also show that if $s\ge 2$, then the $\varepsilon$-complexity of surface integration is proportional to $(1/\varepsilon)^{1/\nu}$ with $$\nu=\min\left\{ \frac{r}{d},\frac{s-\delta_{s,1}(1-\delta_{d,l})}{\min\{d,l-1\}}\right\}.$$ (This bound holds as well for several subcases of $s=1$; we conjecture that it holds for all $r\ge0$, $s\ge1$, and $d\le l$.) Using these results, we determine when surface approximation is easier than, as easy as, or harder than, surface integration; all three possibilities can happen. In particular, we find that if $s=1$ and $d<l$, then $\mu=1/d$ and $\nu=0$, so that surface integration is unsolvable and surface approximation is solvable; this is an extreme case for which surface approximation is easier than surface integration.	(pdf) (ps)
Summarizing and Searching Hidden-Web Databases Hierarchically Using Focused Probes	Panagiotis G. Ipeirotis, Luis Gravano	2001-12-08	Many valuable text databases on the web have non-crawlable contents that are ``hidden'' behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database contents. Unfortunately, web-accessible text databases do not generally export content summaries. In this paper, we present an algorithm to derive content summaries from ``uncooperative'' databases by using ``focused query probes,'' which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. The content summaries that result from this algorithm are efficient to derive and more accurate than those from previously proposed probing techniques for content-summary extraction. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases, automatically derived during probing, to produce accurate results even for imperfect content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases.	(pdf) (ps)
Compiling Concurrent Languages for Sequential Processors	Stephen A.Edwards	2001-10-28	Embedded systems often include a traditional processor capable of executing sequential code, but both control and data-dominated tasks are often more naturally expressed using one of the many domain-specific concurrent specification languages. This paper surveys a variety of techniques for translating these concurrent specifications into sequential code. The techniques address compiling a wide variety of languages, ranging from dataflow to Petri nets. Each uses a different technique, to some degree chosen to match the semantics of concurrent language. Each technique is considered to consist of a partial evaluator operating on an interpreter. This combination provides a clearer picture of how parts of each technique could be used in a different setting.	(pdf) (ps)
Combining pairwise sequence similarity and support vector machines for remote protein homology detection	Li Liao, William Stafford Noble	2001-10-16	One key element in understanding the molecular machinery of the cell is to understand the meaning, or function, of each protein encoded in the genome. A very successful means of inferring the function of a previously unannotated protein is via sequence similarity with one or more proteins whose functions are already known. Currently, one of the most powerful such homology detection methods is the SVM-Fisher method of Jaakkola, Diekhans and Haussler (ISMB 2000). This method combines a generative, profile hidden Markov model (HMM) with a discriminative classification algorithm known as a support vector machine (SVM). The current work presents an alternative method for SVM-based protein classification. The method, SVM-pairwise, uses a pairwise sequence similarity algorithm such as Smith-Waterman in place of the HMM in the SVM-Fisher method. The resulting algorithm, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better remote protein homology detection than SVM-Fisher, profile HMMs and PSI-BLAST.	(pdf) (ps)
An Active Events Model for Systems Monitoring	Philip N. Gross, Suhit Gupta, Gail E. Kaiser, Gaurav S. Kc, Janak J. Parekh	2001-10-02	We present an interaction model enabling data-source probes and action-based gauges to communicate using an intelligent event model known as ActEvents. ActEvents build on conventional event concepts by associating structural and semantic information with raw data, thereby allowing recipients to be able to dynamically understand the content of new kinds of events. Two submodels of ActEvents are proposed: SmartEvents, which are XML-structured events containing references to their syntactic and semantic models, and Gaugents, which are heavier but more flexible intelligent mobile software agents. This model is presented in light of DARPA's DASADA program, where ActEvents are used in a larger-scale subsystem, called KX, which supports continual validation of distributed, component-based systems. ActEvents are emitted by probes in this architecture, and propagated to gauges, where "measurements" of the raw data associated with probes are made, thereby continually determining updated target-system properties. ActEvents are also proposed as solutions for a number of other applications, including a distributed collaborative virtual environment (CVE) known as CHIME.	(pdf) (ps)
QProber: A System for Automatic Classification of Hidden-Web Resources	Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami	2001-08-18	The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web ``crawlers.'' Recently, commercial web sites have started to manually organize web-accessible databases into Yahoo!-like hierarchical classification schemes. Here, we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.	(pdf) (ps)
Multi-Layer Utilization Maximal Fairness for Multi-Rate Multimedia Sessions	Paulo Mendes, Henning Schulzrinne, Edmundo Monteiro	2001-08-12	We present a fairness model, called Multi-Layer Utilization Maximal Fairness (MLUM). The motivation of the MLUM model is to accomplish intra-session and inter-session fairness in the presence of multi-rate (multi-layer) multimedia sessions, distributing bandwidth between multicast trees of different sessions, considering the number of receivers in each session, and improving bandwidth efficiency. To accomplish this goal, the model is divided in three components: a fairness definition,a policy and a protocol. The MLUM definition guarantees fairness between receivers in a session and fairness between different sessions, considering the number of receivers in each session. The MLUM policy implements the MLUM definition in multicast aware border routers of Autonomous Systems (AS). It's functionality is divided in the control plane and the data plane. In the control plane, sessions fair rates are estimated based upon the MLUM fairness definition. In the data plane, a queueing discipline will fairly distribute bandwidth between sessions, considering their fair rates. The MLUM protocol allows the exchange of control information (sessions number of receivers and fair rates) between MLUM policy routers in order to accomplish a fair distribution of bandwidth between concurrent sessions. This paper aim to present the MLUM fairness definition, to describe the MLUM fairness policy control and data plane functionality and to evaluate how the policy data plane can fairly distribute bandwidth. We present some simulations that evaluate the performance of the MLUM policy data plane in different scenarios and that compare its performance to other queueing disciplines. We also briefly describe the MLUM protocol, due to its close relation with of the MLUM policy functionality.	(pdf) (ps)
Perceived Quality of Packet Audio under Bursty Losses	Wenyu Jiang, Henning Schulzrinne	2001-08-12	We examine the impact of bursty losses on the perceived quality of packet audio, and investigate the effectiveness various approaches to improve the quality. Because the degree of burstiness depends on the packet interval, we first derive a formula to re-compute the conditional loss probability of a Gilbert loss model when the packet interval changes. We find that FEC works better at a larger packet interval under bursty losses. In our MOS-based (Mean Opinion Score) listening tests, we did not find a consistent trend in MOS when burstiness increases if FEC is not used. That is, In some occasions MOS can be higher with a higher burstiness. With FEC, our results confirms the analytical results that quality is better with a larger packet interval, but $T$ should not be too large to avoid severe penalty on a single packet loss. We also find that low bit-rate redundancy generally produces lower perceived quality than FEC, if the main codec is already a low bit-rate codec. Finally, we compare our MOS results with objective quality estimation algorithms (PESQ, PSQM/PSQM+, MNB and EMBSD). We find PESQ has the best linear correlation with MOS, but the value is still less than commonly cited, implying they cannot be used in isolation to predict MOS.	(pdf) (ps)
Distributed Data Mining: The JAM system architecture	Andreas L. Prodromidis, Sal J. Stolfo, Shelley Tselepis, Terrance Truta, Jeffrey Sherwin, David Kalina	2001-08-05	This paper describes the system architecture ofJAM (Java Agents for Meta-learning), a distributed data mining systemthat scales up to large and physically separated data sets. An earlyversion of the JAM system was described inStolfo-et-al-97-KDD-JAM. Since then, JAM has evolved botharchitecturally and functionally and here we present the final designand implementation details of this system architecture. JAM is anextensible agent-based distributed data mining system that supportsthe remote dispatch and exchange of agents among participating datasites and employs meta-learning techniques to combine the multiplemodels that are learned. One of JAM's target applications is fraud andintrusion detection in financial information systems. A briefdescription of this learning task and JAM's applicability and summaryresults are also discussed.	(pdf) (ps)
Effects of power conservation, wireless coverage and cooperation on data dissemination among mobile devices	Maria Papadopouli, Henning Schulzrinne	2001-06-21	This paper presents 7DS, a novel peer-to-peer datasharing system. Peers can be either mobile or stationary. 7DS is anarchitecture, a set of protocols and an implementation enabling theexchange of data among peers that are not necessarily connected to theInternet. It runs as an application complementary to other dataaccess approaches (such as via base stations or info-stations). Itanticipates the information needs of users and fulfills them bysearching for information among peers. We evaluate via extensivesimulations the effectiveness of our system in data dissemination formobile devices. We model several general data disseminationapproaches and investigate the effect of the wireless coverage range,network size, query mechanism, cooperation strategy among the mobilehosts and power conservation with a large number of user mobilityscenarios. Using theory from random walks and random environments anddiffusion of controlled processes, we model one of these datadissemination schemes and show that the analysis confirms thesimulation results for this scheme.	(pdf) (ps)
Non-Single Viewpoint Catadioptric Cameras: Geometry and Analysis	Rahul Swaminathan, Michael Grossberg, Shree K. Nayar	2001-02-07	Conventional vision systems and algorithms assume the camera to have a single viewpoint. However, sensors need not always maintain a single viewpoint. For instance, an incorrectly aligned system could cause non-single viewpoints. Also, systems could be designed to specifically deviate from a single viewpoint to trade-off image characteristics such as resolution and field of view. In these cases, a locus of viewpoints is formed, called a caustic. In this paper, we present an in-depth analysis of the viewpoint loci for catadioptric cameras with conic reflectors. Properties of these viewpoint loci with regard to field of view, resolution and other geometric properties are presented. In addition, we present a simple technique to calibrate such non-single viewpoint catadioptric cameras and estimate their viewpoint loci (caustics) from known camera motion.	(pdf) (ps)
Combining visual layout and lexical cohesion features for text segmentation	Min-Yen Kan	2001-01-30	We propose integrating features from lexicalcohesion with elements from layout recognition to build a compositeframework. We use supervised machine learning on this compositefeature set to derive discourse structure on the topic level. Wedemonstrate a system based on this principle and use both an intrinsicevaluation as well as the task of genre classification to assess itsperformance.	(pdf) (ps)
Synthesizing composite topic structure trees for multiple domain specific documents	Min-Yen Kan, Kathleen R. McKeown, Judith L. Klavans	2001-01-30	Domain specific texts often have implicit rules oncontent and organization. We introduce a novel method forsynthesizing this topical structure. The system uses corpus examplesand recursively merges their topics to build a hierarchical tree. Asubjective cross domain evaluation showed that the system performedwell in combining related topics and in highlighting important ones.	(pdf) (ps)
Process-based Software Tweaking with Mobile Agents	Giuseppe Valetto, Gail Kaiser	2001-01-23	We describe an approach based upon software process technology to on-the-fly monitoring, redeployment, reconfiguration, and in general adaptation of distributed software applications, in short "software tweaking". We choose the term tweaking to refer to modifications in structure and behavior that can be made to individual components, as well as sets thereof, or the overall target system configuration, such as adding, removing or substituting components, while the system is running and without bringing it down. The goal of software tweaking is manifold: supporting run-time software composition, enforcing adherence to requirements, ensuring uptime and quality of service of mission-critical systems, recovering from and preventing faults, seamless system upgrading, etc. Our approach involves dispatching and coordinating software agents - named Worklets - via a process engine, since successful tweaking of a complex distributed software system often requires the concerted action of multiple agents on multiple components. The software tweaking process must incorporate and decide upon knowledge about the specifications and architecture of the target software, as well as Worklets capabilities. Software tweaking is correlated to a variety of other software processes - such as configuration management, deployment, validation and evolution - and allows to address at run time a number of related concerns that are normally dealt with only at development time.	(pdf) (ps)
What is the complexity of volume calculation?	A. G. Werschulz, H. Wozniakowski	2000-12-21	We study the worst case complexity of computing $\varepsilon$-approximations of volumes of $d$-dimensional regions~$g([0,1]^d)$, by sampling the function~$g$. Here, $g$ is an $s$ times continuously differentiable injection from~$[0,1]^d$ to~$\reals^d$, where we assume that $s\ge1$. Since the problem can be solved exactly when $d=1$, we concentrate our attention on the case $d\ge 2$. This problem is a special case of the surface integration problem we studied previously. Let $\cc$ be the cost of one function evaluation. The results of our previous work might suggest that the $\varepsilon$-complexity of volume calculation should be proportional to $\cc(1/\varepsilon)^{d/s}$ when $s\ge 2$. However, using integration by parts to reduce the dimension, we show that if $s\ge 2$, then the complexity is proportional to $\cc(1/\varepsilon)^{(d-1)/s}$. Next, we consider the case $s=1$, which is the minimal smoothness for which the volume problem is well-defined. We show that when $s=1$, an $\varepsilon$-approximation can be computed with cost proportional to at most $\cc(1/\varepsilon)^{(d-1)d/2}$. Since a lower bound proportional to $\cc(1/\varepsilon)^{d-1}$ holds when $s=1$, it follows that the complexity in the minimal smoothness case is proportional to $\cc(1/\varepsilon)$ when $d=2$, and that there is a gap between the lower and upper bounds when $d\ge 3$.	(pdf) (ps)
Performance of Size-Changing Algorithms in Stackable File Systems	Erez Zadok, Johan M. Andersen, Ion Badulescu, Jason Nieh	2000-11-21	Stackable file systems can provide extensible file system functionality with minimal performance overhead and development cost. However, previous approaches are limited in the functionality they provide. In particular, they do not support size-changing algorithms, which are important and useful for many applications, such as compression and security. We propose fast index files, a technique for efficient support of size-changing algorithms in stackable file systems. Fast index files provide a page mapping between file system layers in a way that can be used with any size-changing algorithm. Index files are designed to be recoverable if lost and add less than 0.1\% disk space overhead. We have implemented fast indexing using portable stackable templates, and we have used this system to build several example file systems with size-changing algorithms. We demonstrate that fast index files have very low overhead for typical workloads, only 2.3\% over other stacked file systems. Our system can deliver much better performance on size-changing algorithms than user-level applications, as much as five times faster.	(pdf) (ps)
A Comparison of Thin-Client Computing Architectures	Jason Nieh, S. Jae Yang, Naomi Novik	2000-11-14	Thin-client computing offers the promise of easier-to-maintain computational services with reduced total cost of ownership. The recent and growing popularity of thin-client systems makes it important to develop techniques for analyzing and comparing their performance, to assess the general feasibility of the thin-client computing model, and to determine the factors that govern the performance of these architectures. To assess the viability of the thin-client computing model, we measured the performance of five popular thin-client platforms running over a wide range of network access bandwidths. Our results show that current thin-client solutions generally work well in a LAN environment, but their performance degrades significantly when they are used in today's broadband environments. We also find that the efficiency of the thin-client protocols varies widely. In some cases, the efficiency of the thin client protocol for web applications is within a factor of two of standard web protocols, while others are 30 times more inefficient. We analyze the differences in the various approaches and explain the impact of the underlying remote display protocols on overall performance.	(pdf) (ps)
Performance of Multiattribute Top-K Queries on Relational Systems	Nicolas Bruno, Surajit Chaudhuri, Luis Gravano	2000-11-03	In many applications, users specify target valuesfor the attributes of a relation, and expect in return the k tuplesthat best match these values. Traditional RDBMSs do not process these``top-k queries'' efficiently. In our previous work, we outlined afamily of strategies to map a top-k query into a traditional selectionquery that a RDBMS can process efficiently. The goal of such mappingstrategies is to get all needed tuples (but minimize the number ofretrieved tuples) and thus avoid ``restarts'' to get additionaltuples. Unfortunately, no single mapping strategy performedconsistently the best under all data distributions. In this paper, wedevelop a novel mapping technique that leverages information about thedata distribution and adapts itself to the local characteristics ofthe data and the histograms available to do the mapping. We alsoreport the first experimental evaluation of the new and old mappingstrategies over a real RDBMS, namely over Microsoft's SQL Server7.0. The experiments show that our new techniques are robust andsignificantly more efficient than previously known strategiesrequiring at least one sequential scan of the data sets.	(pdf) (ps)
Unified Messaging using SIP and RTSP	Kundan Singh, Henning Schulzrinne	2000-10-16	Traditional answering machines and voice mailservices are closed systems, tightly coupled to a single end system,the local PBX or local exchange carrier. Even simple services, such asforwarding voice mail to another user outside the local system, arehard to provide. With the advent of Internet telephony, we need toprovide voice and video mail services. This also offers theopportunity to address some of the shortcomings of existing voice mailsystems. We list general requirements for a multimedia mail system for Internettelephony. We then propose an architecture using SIP (SessionInitiation Protocol) and RTSP (Real-Time Streaming Protocol) andcompare various alternative approaches to solving call forwarding,reclaiming and retrieval of messages. We also briefly describe ourprototype implementation.	(pdf) (ps)
PF_IPOPTION: A Kernel Extension for IP Option Packet Processing	Ping Pan, Henning Schulzrinne	2000-08-09	Existing UNIX kernels cannot easily deliver IP packets containing IP options to applications. To address this problem, we have defined and implemented a new protocol family called {\PFIPOPTION} for the FreeBSD kernel. We have verified the implementation with some of the commonly used IP options. Measurements in kernel and user space showed that BSD socket I/O is the performance bottleneck.	(pdf) (ps)
Lightweight Resource Reservation Signaling: Design, Performance and Implementation	Ping Pan, Henning Schulzrinne	2000-08-09	Recent studies take two different approaches to admission control.Some argue that due scalability limitations, using a signaling protocol to setup reservations is too costly and CPU-intensive for routers. Instead, endusers should apply various end-to-end measurement-based mechanisms to runadmission control. Several other proposals have recommended to reduce thenumber of reservations in the network by using aggregation algorithms, and,thus, reduce the number of signaling messages and states. We study the signaling cost factors, propose several solutions that achievegood performance with reduced processing cost, and evaluate an implementationof a lightweight signaling protocol that incorporates these solutions. First,we identify some the protocol design issues that determine protocol complexityand efficiency, namely the choice of a two-pass vs. one-pass reservationmodel, partial reservation, and the effect of reservation fragmentation. Wealso explore several design options that can speed up reservation setup andquickly recover from reservation fragmentation. Based on the conclusion ofthese studies, we developed a lightweight signaling protocol that can achievegood performance with low processing cost. We also show that with carefulimplementation and by using some of basic hashing techniques to manage flowstates, we can support up to 10,000 flow setups per second (or about 300,000active flows) on a commodity 700 MHz Pentium PC.	(pdf) (ps)
Process-Orchestrated Software: Towards a Workflow Approach to the Coordination of Distributed Systems	Giuseppe Valetto	2000-05-17	Distributed workflow supports collaborative processes composed ofactivities, in which the synchronization and coordination of the activities andthe people having a part in them (the stakeholders of the process) is anessential characteristic. Workflow is based on the concept of a process model,describing the process to be followed, and on facilities (collectively termedthe process enactment engine) for supporting and guiding the work ofstakeholders according to that model. Distributed workflow technology nowadaysprovides many of the necessary paradigms, techniques and tools to support themanagement of complex, dynamic and decentralized business practices, theirstakeholders, and their processes. There are a number of dimensions concurringto workflow distribution, which interact - and sometimes conflict - with oneother in various, complex ways, such as distribution of the workflowinformation, the workflow actors, tools employed to carry out the work, and thework itself Complete decentralization of WF along all of its distributiondimensions is still a challenge. On the other hand approaches and techniquesthat have been established for and have become typical of distributed workflowshow great potential towards the coordination of various kinds of distributedsoftware systems, which require the execution of some kind of process: twoimportant domains are the run-time monitoring and control of functional andnon-functional properties of the components of a distributed system, and thedynamically determined cooperation within a group of software agents towardssome common goal. This proposal intends to investigate the characteristics ofthose software coordination problems and how they can be tackled withdistributed workflow models and techniques. We intend to carry out ourinvestigation in the light of a case study directed towards process-awareprovision of multimedia services and information to groups of dispersed userscarrying out teamwork, including the automated workflow-based management of anadvanced architecture for broadcasting and streaming video and audio, on thebasis of available network and system resources. The case study can be alsoseen as a particular example within a more generic distributed system scenario,i.e. the continual validation of generic distributed applications. The scaleof distribution is that of global dispersion of processes, software components,stakeholders, and data. Thus, we are going to experiment with a set oftechniques and mechanisms, which address those aspects at the global networkingscale, and whose integration within a globally decentralized workflowmanagement system will help reconciling the various distributed dimensions ofworkflow, as well as resolving diverse distributed coordination problems.	(pdf) (ps)
Interworking Between SIP/SDP and H.323	Kundan Singh, Henning Schulzrinne	2000-05-08	There are currently two standards for signaling and control of Internet telephone calls, namely ITU-T Recommendation H.323 and the IETF Session Initiation Protocol (SIP). We describe how an interworking function (IWF) can allow SIP user agents to call H.323 terminals and vice versa. Our solution addresses user registration, call sequence mapping and session description. We also describe and compare various approaches for multi-party conferencing and call tranfer.	(pdf) (ps)
mSLP - Mesh-enhanced Service Location Protocol	Weibin Zhao, Henning Schulzrinne	2000-05-02	The Service Location Protocol (SLP) is a proposed standard from IETF. It provides a flexible and scalable service discovery framework in IP network, and it can work with or without a directory service. This paper presents mSLP - Mesh-enhanced Service Location Protocol. mSLP proposes to use a fully meshed peering Directory Agent (DA) architecture. Peer DAs exchange service registration information, and keep the same consistent data for the shared scopes. mSLP provides a reliable directory service for an SLP system. It also greatly simplifies SLP service registration leading to a thin-client Service Agent (SA) implementation. mSLP is backward compatible with SLPv2, and incremental deployment is supported.	(pdf) (ps)
Serving datacube Tuples from Main Memory	Kenneth A. Ross, Kazi A. Zaman	2000-05-02	Datacubes compute aggregates of a set of database records at a variety of different granularities. For large datasets with many dimensions, the complete datacube may be very large. In order to support on-line access to datacube results, one would like to perform some precomputation to enhance query performance. Existing schemes materialize selected datacube tuples on disk, choosing the most beneficial cuboids (i.e., combinations of dimensions) to materialize given a space limit. However, in the context of a data-warehouse receiving frequent ``append'' updates to the database, the cost of keeping these disk-resident cuboids up-to-date can be high. In this paper, we propose a main memory based framework which provides rapid response to queries and requires considerably less maintenance cost than a disk based scheme in an append-only environment. We materialize in main memory (a) selected coarse-granularity tuples of the datacube, and (b) all tuples at the finest level of granularity of the cube. Our approach is limited to applications in which the finest granularity tuples of the datacube fit in main memory. We argue that there are important applications that meet this requirement. Further, as main memory sizes grow over the coming years, more and more applications will meet this requirement. For a given datacube query, we first look among our coarse-granularity tuples to see if we have a direct answer for the query. If so, that answer is returned directly. If not, we use a hash based scheme reminiscent of partial match retrieval to rapidly compute the answer to the query from the finest-level data without having to scan all of the base tuples. Our in-memory data structures allow for rapid updates in response to the appearance of new base data. We show how to choose which coarse-level tuples to precompute, and how to select partial-match keys to minimize the effects of skew. We present analytical and experimental results demonstrating the benefits of our approach.	(pdf) (ps)
Seven Degrees of Separation in Mobile Ad Hoc Networks	Maria Papadopouli, Henning Schulzrinne	2000-05-01	We present an architecture that enables the sharing of information among mobile, wireless, collaborating hosts that experience intermittent connectivity to the Internet. Participants in the system obtain data objects from Internet-connected servers, cache them and exchange them with others who are interested in them. The system exploits the fact that there is a high locality of information access within a geographic area. It aims to increase the data availability to participants with lost connectivity to the Internet. We discuss the main components of the system and possible applications. Finally, we present simulation results that show that the ad hoc networks can be very effective in distributing popular information.	(pdf) (ps)
Combining microarray expression data and phylogenetic profiles to learn gene functional categories using support vector machines	Paul Pavlidis, William Noble Grundy	2000-04-23	A primary goal in biology is to understand the molecular machinery of the cell. The sequencing projects currently underway provide one view of this machinery. A complementary view is provided by data from DNA microarray hybridization experiments. Synthesizing the information from these disparate types of data requires the development of improved computational techniques. We demonstrate how to apply a machine learning algorithm called support vector machines to a heterogeneous data set consisting of expression data as well as phylogenetic profiles derived from sequence similarity searches against a collection of complete genomes. The two types of data provide accurate pictures of overlapping subsets of the gene functional categories present in the cell. Combining the expression data and phylogenetic profiles within a single learning algorithm frequently yields superior classification performance compared to using either data set alone. However, the improvement is not uniform across functional classes. For the data sets investigated here, 23-element phylogenetic profiles typically provide more information than 79-element expression vectors. Often, adding expression data to the phylogenetic profiles introduces more noise than information. Thus, these two types of data should only be combined when there is evidence that the functional classification of interest is clearly reflected in both data sets.	(pdf) (ps)
Power-Pipelining for Enhanced Query Performance	Jun Rao, Kenneth A. Ross	2000-04-19	As random access memory gets cheaper, it becomes increasingly affordable to build computers with large main memories. In this paper, we consider processing queries within the context of a main memory database system and try to enhance the query execution engine of such a system. An execution plan is usually represented as an operator tree. Traditional query execution engines evaluate a query by recursively iterating each operator and returning exactly one tuple result for each iteration. This generates a large number of function calls. In a main-memory database system, the cost of a function call is relatively high. We propose a new evaluation method called power-pipelining. Each operator processes and passes many tuples instead of one tuple each time. We keep the number of matches generated in a join in a local counter array. We reconstitute the run-length encoding of the join result (broken down by input tables) at the end of the join processing or when necessary. We describe an overflow handling mechanism that allows us to always evaluate a query without using too much space. Besides the benefit of reducing the number of function calls, power-pipelining compresses the join result automatically and thus can reduce the transmission cost. However, power-pipelining may not always outperform the traditional pipelining method when the join selectivity is low. To get the benefit of both techniques, we propose to use them together in an execution engine and let the optimizer choose the preferred execution. We discuss how to incorporate this in a cost-based query optimizer. We implemented power-pipelining and the reconstitution algorithm in the Columbia main memory database system. We present a performance study of power-pipelining and the traditional pipelining method. We show that the improvement of using power-pipelining can be very significant. As a result, we believe that power-pipelining is a useful and feasible technique and should be incorporated into main memory database query execution engines.	(pdf) (ps)
Combining Strategies for Extracting Relations from Text Collections	Eugene Agichtein, Eleazar Eskin, Luis Gravano	2000-04-05	Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. Our Snowball system extracts these relations from document collections starting with only a handful of user-provided example tuples. Based on these tuples, Snowball generates patterns that are used, in turn, to find more tuples. In this paper we introduce a new pattern and tuple generation scheme for Snowball, with different strengths and weaknesses than those of our original system. We also show preliminary results on how we can combinethe two versions of Snowball to extract tuples more accurately.	(pdf) (ps)
IP Multicast Fault Recovery in PIM over OSPF	Xin Wang, Chienming Yu, Henning Schulzrinne, Paul Stirpe, Wei Wu	2000-03-11	Relatively little attention has been given to understanding the fault recovery characteristics and performance tuning of native IP multicast networks. This paper focuses on the interaction of the component protocols to understand their behavior in network failure and recovery scenarios. We consider a multicast environment based on the Protocol Independent Multicast (PIM) routing protocol, the Internet Group Management Protocol (IGMP) and the Open Shortest Path First (OSPF) protocol. Analytical models are presented to describe the interplay of all of these protocols in various multicast channel recovery scenarios. Quantitative results for the recovery time of IP multicast channels are given as references for network configurations, and protocol development. Simulation models are developed using the OPNET simulation tool to measure the fault recovery time and the associated protocol control overhead, and study the influence of important protocol parameters. A testbed with five Cisco routers is configured with PIM, OSPF, and IGMP to measure the multicast channel failure and recovery times for a variety of different link and router failures. In general, the failure recovery is found to be light-weight in terms of control overhead and recovery time. Failure recovery time in a WAN is found to be dominated by the unicast protocol recovery process. Failure recovery in a LAN is more complex, and strongly influenced by protocol interactions and implementation specifics. Suggestions for improvement of the failure recovery time via protocol enhancements, parameter tuning, and network configuration are provided.	(pdf)
Automatic Classification of Text Databases Through Query Probing	Panagiotis Ipeirotis, Luis Gravano, Mehran Sahami	2000-03-07	Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier's rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.	(pdf) (ps)
Internet Quality of Service: an Overview	Weibin Zhao, David Olshefski, Henning Schulzrinne	2000-02-11	This paper presents an overview of Internet QoS, covering motivation and considerations for adding QoS to Internet, the definition of Internet QoS, traffic and service specifications, IntServ and DiffServ frameworks, data path operations including packet classification, shaping and policing, basic router mechanisms for supporting QoS, including queue management and scheduling, control path mechanisms such as admission control, policy control and bandwidth brokers, the merging of routing and QoS, traffic engineering, constraint-based routing and multiprotocol label switching (MPLS), as well as end host support for QoS. We identify some important design principles and open issues for Internet QoS.	(pdf) (ps)
Computing Geographical Scopes of Web Resources	Junyan Ding, Luis Gravano	2000-01-28	Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are relevant primarily to web users in geographical proximity to these locations. In contrast, other information resources are relevant to a broader geographical community. For instance, an on-line newspaper may be relevant to users across the United States. Unfortunately, most current web search engines largely ignore the geographical scope of web resources. In this paper, we introduce techniques for automatically computing the geographical scope of web resources, based on the textual content of the resources, as well as on the geographical distribution of hyperlinks to them. We report an extensive experimental evaluation of our strategies using real web data. Finally, we describe a geographically-aware search engine that we have built using our techniques for determining the geographical scope of web resources.	(pdf) (ps)
Toward Cost-Sensitive Modeling for Intrusion Detection	Wenke Lee, Matthew Miller, Sal Stolfo, Kahil Jallad, Christoper Park, Erez Zadok, , Vijay Prabhakar	2000-01-28	Intrusion detection systems need to maximize security while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models. We examine the major cost factors: development costs, operational costs, damage costs incurred due to intrusions, and the costs involved in responding to intrusions. We propose cost-sensitive machine learning techniques to produce models that are optimized for user-defined cost metrics. We describe an automated approach for generating efficient run-time versions of these models. Empirical experiments in off-line analysis and real-time detection show that our cost-sensitive modeling and deployment techniques are effective in reducing the overall cost of intrusion detection.	(pdf) (ps)
A Lock-Free Multiprocessor OS Kernel	H. Massalin, C. Pu	2000-01-01	Typical shared-memory multiprocessor OS kernels use interlocking, implemented as spin-locks or waiting semaphores. We have implemented a complete multiprocessor OS kernel (including threads, virtual memory, and I/O including a window system and a file system) using only lock-free synchronization methods based on Compare-and-Swap. Lock-free synchronization avoids many serious problems caused by locks: considerable overhead, concurrency bottlenecks, deadlocks, and priority inversion in real-time scheduling. Measured numbers show the low overhead of our implementation, competitive with user-level thread management systems.	(ps)
Learning Cost-Sensitive Classification Rules for Network Intrusion Detection using RIPPER	Matthew Miller	1999-12-31	A system for automating the process of network intrusiondetection is currently underway as part of the JAM Project. This systemutilizes many data mining methods to build classifiers of network intrusionswhich can be used to test live network stream input in order to detectintrusions. This is done by using Link Analysis and Sequence Analysis methodsto determine statistical attributes of network connections to build a set ofconnection profile records that can be useful in detection. These statisticalattributes have various costs associated with their computation in a liveenvironment. This paper studies the problem of building rule-sets with asensitivity to the cost of computing each attribute. Low-cost attributes wouldbe biased wherever possible, using high-cost attributes only when needed forreliable classification.	(pdf) (ps)
QoS measurement of Real Time Multimedia services on the Internet	Wenyu Jiang, Henning Schulzrine	1999-12-20	Real-time applications such as IP telephony, Internet radio stationsand video conferencing tools require certain levels of QoS (Quality ofService). Because the Internet is still a best-effort network, the QoSof these applications must be measured and monitored in order toprovide feedbacks to applications for rate and/or error control, andto both end-users and service providers. A standardized objectivemeasurement technology makes it possible to compare between serviceproviders in a fair way. We address the problems of packet delay andloss measurement, since they are the major determining factor ofmultimedia quality. We first describe the problems and techniques inobtaining good measurement results. Then, we discuss the modeling andanalysis of delay and loss. Our goal is to establish feasible metricsthat can reliably predict perceived quality. We find that the extendedGilbert model (2-state being a special case) is a suitable loss model,and the inter-loss distance metric is useful in capturing theburstiness between loss runs. For delay, besides the autocorrelationmetric, a conditional cumulative distribution function may beuseful. We apply these models to some of our Internet packet traces,and find that losses are generally bursty, and that delays usuallyhave a strong temporal dependency component. We also find that thefinal loss pattern after applying playout delay adjustment (and FEC ifused) still corresponds well to the extended Gilbert model.	(pdf) (ps)
BGRP: A Tree-Based Aggregation Protocol for Inter-domain Reservations	P.Pan, E. Hahne, H. Schurlzrinne	1999-12-06	Resource reservation needs to accommodate the rapidly growing size andincreasing service diversity of the Internet. Recently, hierarchicalarchitectures have been proposed that provide domain-level reservation.However, it is not clear that these proposals can set up and maintainreservations in an efficient and scalable fashion. In this paper, we describe a distributed architecture and protocol,called the Border Gateway Reservation Protocol (BGRP), for inter-domain resource reservation that can scale in terms of messageprocessing load. state storage and bandwidth. Each stub or transitdomain may use its own intra-domain resource reservation protocol. BGRPbuilds a sink tree for each of the stub domains. Each sink treeaggregates bandwidth reservations from all data sources in the network.Since backbone routers only maintain the sink tree information, thetotal number of reservation states at each router scales, in the worstcase, linearly with the number of domains in the Internet. BGRP relies on differentiated services for data forwarding. As aresult, the number of packet classifier entries is small, not the numberof micro-flows.To reduce the protocol message traffic, routers mayreserve domain bandwidth beyond the current load so that sources canjoin or leave the tree or change their reservation without having tosend messages all the way to the root for every such change. We use``soft state'' to maintain reservations. In contrast to RSVP, refreshmessages are delivered reliably, allowing us to reduce the refreshfrequency.	(pdf)
FiST: A Language for Stackable File Systems	Erez Zadok, Jason Nieh	1999-12-06	Stackable file systems promise to ease thedevelopment of file systems. Operating system vendors, however,resist making extensive changes to support stacking, because of theimpact on performance and stability. Existing file system interfacesdiffer from system to system and they support extensibilitypoorly. Consequently, extending file system functionality acrossplatforms is difficult. We propose a new language, FiST, to describe stackable file systems. FiSTuses operations common to file system interfaces. From a singledescription, FiST's compiler produces file system modules for multipleplatforms. The generated code handles many kernel details, freeingdevelopers to concentrate on the main issues of their file systems. This paper describes the design, implementation, and evaluation of FiST. Weextended file system functionality in a portable way without changingexisting kernels. We built several file systems using FiST on Solaris,FreeBSD, and Linux. Our experiences with these examples shows the followingbenefits of FiST: average code size over other stackable file systems isreduced ten times; average development time is reduced seven times;performance overhead of stacking is 1-2\%.	(pdf) (ps)
Extracting Relations from Large Plain-Text Collections	Eugene Agichtein, Luis Gravano	1999-12-02	Text documents often contain valuable structured data thatis hidden in regular English sentences. This data is best exploited ifavailable as a relational table that we could use for answering precise queriesor for running data mining tasks. We explore a technique for extracting suchtables from document collections that requires only a handful of trainingexamples from users. These examples are used to generate extraction patterns,that in turn result in new tuples being extracted from the documentcollection. We build on this idea and present our Snowball system. Snowballintroduces novel strategies for generating patterns and extracting tuples fromplain-text documents. At each iteration of the extraction process, Snowballevaluates the quality of these patterns and tuples without human intervention,In this paper we also develop a scalable evaluation methodology and metrics forour task, and present a thorough experimental evaluation of Snowball andcomparable techniques over a collection of more than 300,000 newspaperdocuments.	(pdf) (ps)
Information Extraction and Summarization: Domain Independence through Focus Types	Min-Yen Kan, Kathleen R. McKeown	1999-11-01	We show how information extraction (IE) andsummarization can be merged in a sequential pipeline, resulting in anew approach to domain-independent summarization. IE finds thedocument's terms and entities, that when processed by the methodsshown, result in a more informative treatment of the document'stopics.	(pdf) (ps)
Generating Natural Language Summaries from Multiple On-Line Sources: Language Reuse and Regeneration	Dragomir R. Radev	1999-10-05	The abundance of newswire on the World-Wide Webhas resulted in at least four major problems, which seem to presentthe most interesting challenges to users and researchers alike: size,heterogeneity, change, and conflicting information. Size: several hundred newspapers and news agencies maintain their Web siteswith thousands of news stories in each. Heterogeneity: some of the data related to news is in structuredformat (e.g., tables); more exists in semi-structured format (e.g.,Web pages, encyclopedias, textual databases); while the rest of thedata is in textual form (e.g., newswire). Change: most Web sites and certainly all news sources change on a daily basis. Disagreement: different sources present conflicting or at least different viewsof the same event. We have approached the second, third, and fourth of these four problems fromthe point of view of text generation. We have developed a system, {\scsummons}, which when coupled with appropriate information extractiontechnology, generates a specific genre of natural language summaries of aparticular event (which we call briefings) in a restricted domain. Thebriefings are concise, they contain facts from multiple and heterogeneoussources, and incorporate evolving information, highlighting agreements andcontradictions among sources on the same topic. We have developed novel techniques and algorithms for combining data frommultiple sources at the conceptual level (using natural languageunderstanding), for identifying new information on a given topic; and forpresenting the information in natural language form to the user. We named theframework that we have developed for these problems {\em language reuse andregeneration} (LRR). Its novelty lies in the ability to produce text bycollating together text already written by humans on the Web. The main features of LRR are: increased robustness through a simplifiedparsing/generation component, leverage on text already written by humans, andfacilities for the inclusion of structured data in computer-generated text. The present thesis contains an introduction to LRR and its use inmulti-document summarization. We have paid special attention to the techniquesfor producing conceptual summaries of multiple sources, to the creation and useof a LRR-based lexicon for text generation, to a methodology used to identifynew and old information in threads of documents, and to the generation offluent natural language text using all the components above. The thesis contains evaluations of the different components of {\sc summons} aswell as certain aspects of LRR as a methodology. A review of the relevantliterature is included as a separate chapter.	(pdf) (ps)
Topic Shift Detection - finding new information in threaded news	Dragomir R. Radev	1999-10-05	On-line sources of news typically follow aparticular pattern when presenting updates on a news event overtime. First, they produce a preliminary report on the event, and latersend out updates as the story evolves. There are two classes ofreaders accessing the latter stories - these who have read theoriginal announcement and are familiar with the story background andthose who are ``joining'' the thread at a later point in time. Becauseof the existence of the two clases of readers, news sources typicallyinclude in consequent stories some information that was alreadypresent in earlier stories. We discuss our approach to identifyingsuch repeated pieces of information in news threads and show how thisknowledge can help in generating user-specific summaries of entirethreads of articles.	(pdf) (ps)
Frequently Asked Questions about Natural Language Processing - Second Edition	Dragomir R. Radev	1999-10-05	This is an attempt to put together a list of frequently (and not sofrequently) asked questions about Natural Language Processing andtheir answers. This document is in no way perfect or complete or 100\%accurate. In no way should the maintainer be responsible for damageresulting directly or indirectly from using information in this FAQ. The FAQ originated from Mark Kantrowitz's FAQ on AI. Some questions inthe present document come directly from Mark's original FAQ (availableat http://www.faqs.org). This FAQ is maintained by Dragomir R. Radev from Columbia University.Please send me all your comments, suggestions, corrections, additions,and such to my e-mail address: radev@cs.columbia.edu	(pdf) (ps)
Proceedings of the Workshop on Algorithmic Aspects of Advanced Programming Languages (WAAAPL'99)	Chris Okasaki	1999-09-17	The first Workshop on Algorithmic Aspects of Advanced Programming Languages was held on September 30, 1999, in Paris, France, in conjunction with the PLT99 conferences and workshops. The choice of programming languages has a huge effec t on the algorithms and data structures that are to be implemented in that language. Traditionally, algorithms and data structures have been studied in the context of imperative languages. This workshop considers the algorithmic implications of choosing a n advanced functional or logic programming language instead. A total of eight papers were selected for presentation at the workshop, together with an invited lecture by Robert Harper. We would like to thank Dider Remv, general chair of PLT'99, for his ass istance in organizing this workshop.	(pdf) (ps)
CHIME: A Metadata-Based Distributed Software Environment	Stephen E. Dossick, Gail E. Kaiser	1999-09-09	We introduce CHIME, the Columbia Hypermedia IMmersion Environment, a metadata-based information environment, and describe its potential applications for internet and intranet-based distributed software development. CHIME derives many of its concepts from Multi-User Domains (MUDs), placing users in a semi-automatically generated 3D virtual world representing the software system. Users interact with project artifacts by "walking around" the virtual world, where they potentially encounter and collaborate with other users' avatars. CHIME aims to support large software development projects, in which team members are often geographically and temporally dispersed, through novel use of virtual environment technology. We describe the mechanisms through which CHIME worlds are populated with project artifacts, as well as our initial experiments with CHIME and our future goals for the system.	(pdf) (ps)
An Optimal Algorithm for Scheduling Reservations in a Bandwidth Market	David Olshefski	1999-09-08	An algorithm is presented which provides anoptimal solution to the problem of scheduling non-relocatable timeintervals of bandwidth in differentiated services networks.Simulations found an asymmetry between valuing an interval^?s lengthand bandwidth requirement. Longer intervals requiring less resourceare favored over shorter intervals requiring more resource. Theoptimal algorithm is shown to respond appropriately to price as amechanism of control whereas the offline greedy and the online FCFSalgorithms do not. The solution uses integer programming, and it isshown that, except in the general case, the problem can be solved inpolynomial time.	(pdf) (ps)
A Mobile Agent Approach to Lightweight Process Workflow	Gail Kaiser, Adam Stone, , Stephen E. Dossick	1999-07-29		(pdf)
Transaction Management in Collaborative Virtual Environments	Jack J. Yang, Gail E. Kaiser, , Stephen E. Dossick	1999-07-26	Collaborative Virtual Environments (CVEs) are groupware systems in whichdata are represented as objects in a virtual world. CVE supportsactivities ranging from software design and development to health caremanagement where user operations are long in duration and intensive inhuman interaction. We found the advanced database transaction modeling andmanagement technologies that have developed in the database community veryuseful in the design and development of CVE. In this paper, we firstpresent a dynamically customizable transaction manager, JPern, thendescribe how a CVE could benefit from integrating with such a transactionmanagement service. Finally, we describe an instance of CVE that wedeveloped, CHIME, and how to integrate JPern with it.	(pdf) (ps)
MINIMALIST: An Environment for the Synthesis, Verification and Testability of Burst-Mode Asynchronous Machines	Robert M. Fuhrer, Steven M. Nowick, Mich	1999-07-26	MINIMALIST is a new extensible environment for the synthesis andverification of burst-mode asynchronous finite-state machines. MINIMALISTembodies a complete technology-independent synthesis path, withstate-of-the-art exact and heuristic asynchronous synthesis algorithms, e.g.optimal state assignment (CHASM), two-level hazard-free logic minimization(HFMIN, ESPRESSO-HF, and IMPYMIN), and synthesis-for-testability. Unlikeother asynchronous synthesis packages, MINIMALIST also offers many options:literal vs. product optimization, single- vs. multi-output logicminimization, using vs. not using fed-back outputs as state variables, andexploring varied code lengths during state assignment, thus allowing thedesigner to explore trade-offs and select the implementation style whichbest suits the application. MINIMALIST benchmark results demonstrate itsability to produce implementations with an average of 34\% and up to 48\% lessarea, and an average of 11\% and up to 37\% better performance, than the bestexisting package. Our synthesis-for-testability method guarantees 100\%testability under both stuck-at and robust path delay fault models,requiring little or no overhead. MINIMALIST also features both command-lineand graphic user interfaces, and supports extension via well-definedinterfaces for adding new tools. As such, it is easily augmented to form acomplete path to technology-dependent logic.	(pdf) (ps)
Programming and Problem Solving: A Transcript of the Spring 1999 Class	Kenneth A. Ross, Simon R. Shamoun	1999-05-24	This report contains edited transcripts of the discussion held inColumbia's Programming and Problem-Solving course, taught as W4995-01during Spring 1999. The class notes were taken by the teachingassistant so that students could focus on the class material. As aservice to both the students and to others who would like to get someinsight into the class experience, we have drawn all of the classhandouts, discussion, and some of the results into this technicalreport.	(pdf) (ps)
Agent-Based Distributed Learning Applied to Fraud Detection	Andreas L. Prodromidis, Salvatore J. Stolfo	1999-05-03	Inductive learning and classification techniqueshave been applied in many problems in diverse areas. In this paper wedescribe an AI-based approach that combines inductive learningalgorithms and meta-learning methods as a means to compute accurateclassification models for detecting electronic fraud. Inductivelearning algorithms are used to compute detectors of anomalous orerrant behavior over inherently distributed data sets andmeta-learning methods integrate their collective knowledge into higherlevel classification models or ``meta-classifiers''. By supporting theexchange of models or ``classifier agents'' among data sites, ourapproach facilitates the cooperation between financial organizationsand provides unified and cross-institution protection mechanismsagainst fraudulent transactions. Through experiments performed onactual credit card transaction data supplied by two differentfinancial institutions, we evaluate this approach and we demonstrateits utility.	(pdf) (ps)
Polycameras: Camera Clusters for Wide Angle Imaging	Rahul Swaminathan, Shree K. Nayar	1999-04-29	We present the idea of a polycamera which is defined as a tightlypacked camera cluster. The cluster is arranged so as to minimize theoverlap between adjecent views. The objective of such clusters is tobe able to image a very large field of view without loss ofresolution. Since these clusters do not have a single viewpoint,analysis is provided on the effects of such non-singularities. We alsopresent certain conbfigurations for polycameras which cover varyingfields of view. We would like to minimizze the number of sensorsrequired to capture a given field of view. Therefore we recommend theuse of wide-angle sensors as opposed to traditional long focal lengthsensors. However, such wide-angle sensors tend to have severedistortions which pull points towards the optical center. This paperalso proposes a method for recovering the distortion parameterswithout the use of any calibration objects. Since distortions causestraight lines in the scene to appear as curves in the image, ouralgorithm seeks to find the distortion parameters that would map theimage curves to straight lines. The user selects a small set of pointsalong the image curves. Recovery of the distortion parameters isformulated as the minimization of an objective function which isdesigned to explicitly account for noise in the selected imagepoints. Experimental results are presented for synthetic data withdifferent noise levels as well as for real images. Once calibrated,the image stream from a wide angle camera can be undistorted in realtime using look up tables. Finally, we apply our distortion correctiontechnique to a polycamera made of four wide-angle cameras to create ahigh resolution 360 degree panorama in real-time.	(pdf) (ps)
Pararover: A Remote Controlled Vehicle with Omnidirectional Sensors	Simon Lok, Shree K. Nayar	1999-03-22	The process of teleoperation can be described as allowing a remote user to control a vehicle by interpretting sensor information captured by the vehicle. One method that is frequently used to implement teleoperation is to provide the user with a real-time video display of a perspective camera mounted on the vehicle. This method limits the remote user to seeing the environment in which is vehicle is present through the fixed viewpoint with which the camera is mounted. Having a fixed viewpoint is extremely limiting and significantly impedes the ability of the remote user to properly navigate. One way to address this problem is to mount the perspective camera on a pan-tilt device. This is rarely done because it is expensive and introduces a significant increase in implementation complexity from both the mechanical and electrical point of view. With the advent of omnidirectional camera technology, there is now a second more attractive alternative. This paper describes the \rover, a remote controlled vehicle constructed in the summer of 1998 to demonstrate the use of omnidirectional camera technology and a virtual reality display for vehicular teleoperation, audio-video surveillance and forward reconnaissance.	(pdf) (ps)
An Experimental Hybrid User Interface for Collaboration	Andreas Butz, Tobias Hollerer, Clifford Beshers, Steven Feiner, Blair MacIntyre	1999-03-22	We present EMMIE (Environment Management for Multi-userInformation Environments), an experimental user interface to acollaborative augmented environment. Users share a 3D virtual spaceand manipulate virtual objects representing information to bediscussed. This approach not only allows for cooperation in a sharedphysical space, but also addresses tele-collaboration in physicallyseparate but virtually shared spaces. We refer to EMMIE as a hybriduser interface because it combines a variety of different technologiesand techniques, including virtual elements such as 3D widgets, andphysical objects such as tracked displays and inputdevices. See-through head-worn displays overlay the virtualenvironment on the physical environment. Our research prototypeincludes additional 2D and 3D displays, ranging from palm-sized towall-sized, allowing the most appropriate one to be used for any task.Objects can be moved among displays (including acrossdimensionalities) through drag \& drop. In analogy to 2D windowmanagers, we describe a prototype implementation of a shared 3Denvironment manager that is distributed across displays, machines, andoperating systems. We also discuss two methods we are exploring forhandling information privacy in such an environment."	(pdf) (ps)
What is the complexity of surface integration?	H. Wozniakowski, A. G. Werschulz	1999-03-18	We study the worst case complexity of computing $\varepsilon$-approximations of surface integrals. This problem has two sources of partial information: the integrand~$f$ and the function~$g$ defining the surface. The problem is nonlinear in its dependence on~$g$. Here, $f$ is an $r$~times continuously differentiable scalar function of $l$~variables, and $g$ is an $s$~times continuously differentiable injective function of $d$ variables with $l$~components. We must have $d\le l$ and $s\ge1$ for surface integration to be well-defined. Surface integration is related to the classical integration problem for functions of $d$~variables that are $\min\{r,s-1\}$ times continuously differentiable. This might suggest that the complexity of surface integration should be $\Theta((1/\varepsilon)^{d/\min\{r,s-1\}})$. Indeed, this holds when $d<l$ and $s=1$, in which case the surface integration problem has infinite complexity. However, if $d\le l$ and $s\ge 2$, we prove that the complexity of surface integration is $O((1/\varepsilon)^{d/\min\{r,s\}})$. Furthermore, this bound is sharp whenever $d<l$.	(pdf) (ps)
Information-based complexity and information-based optimization	J.F. Traub, A. G. Werschulz	1999-03-18	This is an article that will appear in the {\em Encyclopedia ofOptimization\/} (Kluwer, 2000). It concerns optimization in twosenses. The first is that informaion-based complexity (IBC) is thestudy of the minimal computational resources to solve continuousmathematical problems. The second is that the computationalcomplexity of optimization problems is one of the areas studied inIBC. We discuss IBC and information-based optimization in turn.	(pdf) (ps)
Programming Internet Telephony Services	Jonathan Rosenberg, Jonathan Lennox, Henning Schulzrinne	1999-03-14	Internet telephony enables a wealth of new servicepossibilities. Traditional telephony services, such as callforwarding, transfer, and 800 number services, can be enhanced byinteraction with email, web, and directory services. Additional mediatypes, like video and interactive chat, can be added as well. One ofthe challenges in providing these services is how to effectivelyprogram them. Programming these services requires decisions regardingwhere the code executes, how it interfaces with the protocols thatdeliver the services, and what level of control the code has. In thispaper, we consider this problem in detail. We develop requirements forprogramming Internet telephony services, and we show that at least twosolutions are required --- one geared for service creation by trustedusers (such as administrators), and one geared for service creation byuntrusted users (such as consumers). We review existing techniques forservice programmability in the Internet and in the telephone network,and extract the best components of both. The result is a CommonGateway Interface (CGI) that allows trusted users to developservices, and the Call Processing Language (CPL) that allows untrustedusers to develop services.	(pdf) (ps)
Detecting and Measuring Asymmetric Links in an IP Network	Wenyu Jiang	1999-03-08	The rapid growth of the Web has caused a lot of congestion in theInternet. Pinpointing bottlenecks in a network is very helpful incongestion control and performance tuning. Measuring link bandwidthscan help identifying such bottlenecks. Existing tools for bandwidthmeasurement like Pathchar, Bing and Bprobe assume symmetriclinks. Hence their results will be incomplete or incorrect in thepresence of asymmetric links. It becomes important to considerasymmetric links, as they are gaining popularity in recentyears. Examples are ADSL lines, Cable modems, Satellite links, and 56Kmodems. In this paper, we present an algorithm that can measure eachhop's link bandwidth in both directions in an IP network. Therefore,it is trivial to detect asymmetry of a link. We performed severalexperiments to validate our algorithm. We also discuss some factorsthat can adversely affect the precision and/or correctness ofbandwidth measurement, and suggest possible solutions.	(pdf) (ps)
Contributions to the Design of Asynchronous Macromodular Systems	Luis A. Plana	1999-02-09	In this thesis, I advocate the use of macromodules to design andbuild robust and performance-competitive asynchronous systems. The contributions of the work relate to different aspects ofthe design of asynchronous macromodular systems. First, an architectural optimization for 4-phase systems isintroduced. The goal of the optimization is to increase theperformance of a system by increasing the level of concurrent activityin the sequencing of data processing stages. In particular, three newasynchronous sequencers are designed, which increase the throughputof the system. Existing asynchronous datapaths do not operatecorrectly at this increased level of concurrency: data hazardsmay result. Interlock mechanisms are introduced to insure correctoperation. The technique can also be regarded as a low-poweroptimization: The increased throughput can be traded for a significantreduction in the power consumption of the entire system. SPICEsimulation results show that the new sequencers allow roughly twicethe throughput of non-concurrent sequencers. The simulations also show	(pdf) (ps)
Implementing Intelligent Network Services with the Session Initiation Protocol	Jonathan Lennox, Henning Schulzrinne, Thomas F. La Porta	1999-01-21	Internet telephony is receiving increasing interest as an alternative totraditional telephone networks. This article shows how the IETF's SessionInitiation Protocol (SIP) can be used to perform the services of traditionalIntelligent Network protocols, as well as additional services.	(pdf) (ps)
Cache Conscious Indexing for Decision-Support in Main Memory	Jun Rao, Ken Ross	1998-12-04	As random access memory gets cheaper, it becomesincreasingly affordable to build computers with large main memories.We consider decision support workloads within the context of a mainmemory database system, and consider ways to enhance the performanceof query evaluation. Indexes can potentially speed up a variety of operations in a databasesystem. In our context, we are less concerned about incrementalupdating of the index, since we can rebuild indexes in response tobatch updates relatively quickly. Our primary concerns are the timetaken for a lookup, and the space occupied by the index structure. We study indexing techniques for main memory, including hash indexes,binary search trees, T-trees, B+-trees, interpolation search, andbinary search on arrays. At one extreme, hash-indexes provide fastlookup but require a lot of space. At another extreme, binary searchon an array requires no additional space beyond the array, butperforms relatively poorly. An analysis of binary search on an arrayshows that it has poor reference locality, leading to many cachemisses and slow lookup times. Our goal is to provide faster lookup times than binary search bypaying better attention to reference locality and cache behavior,without using substantial extra space. We propose a new indexingtechnique called "Cache-Sensitive Search Trees" (CSS-trees). Ourtechnique stores a directory structure on top of a sorted array. Thedirectory represents a balanced search tree stored itself as an array.Nodes in this search tree are designed to have size matching thecache-line size of the machine. Because we store the tree in an arrayNodes in this search tree are designed to have size matching thecache-line size of the machine. Because we store the tree in an arraystructure, we do not have to store internal-node pointers; child nodescan be found by performing arithmetic on array offsets. We provide an analytic comparison of the algorithms based on theirtime and space requirements. We have implemented all of thetechniques, and present a performance study on two popular modernmachines. We demonstrate that with a small space overhead, we canreduce the cost of binary search on the array by more than a factor oftwo. We also show that our technique dominates B+-trees, T-trees, andbinary search trees in terms of both space and time. Our performancegraphs show significantly different rankings of algorithms than shownin a 1986 study by Lehman and Carey. The explanation of thedifference is that during the last twelve years, the relative cost ofa cache miss to that of a CPU instruction cycle has increased by twoorders of magnitude. As a result, it is now much more important todesign techniques with good cache behavior.	(pdf) (ps)
Optimizing Selections over Data Cubes	Kenneth A. Ross, Kazi A. Zaman	1998-11-24	Datacube queries compute aggregates over databaserelations at a variety of granularities, and they constitute animportant class of decision support queries. Often one wants onlydatacube output tuples whose aggregate value satisfies a certaincondition, such as exceeding a given threshold. For example, onemight ask for all combinations of model, color, and year of cars(including the special value ``ALL'' for each of the dimensions) forwhich the total sales exceeded a given amount of money. Computing a selection over a datacube can naively be done by computingthe entire datacube and checking if the selection condition holds foreach tuple in the result. However, it is often the case thatselections are relatively restrictive, meaning that a lot of workcomputing datacube tuples is ``wasted'' since those tuples don't satisfythe selection condition. Our approach is to develop algorithms for processing a datacube queryusing the selection condition internally during the computation. Bymaking use of the selection condition within the datacube computation,we can safely prune parts of the computation and end up with a moreefficient computation of the answer. Our first technique, called``specialization'', uses the fact that a tuple in the datacube does notmeet the given threshold to infer that all finer level aggregatescannot meet the threshold. We propose a scheme of specializationtransformations on the underlying data sets, using properties of theaggregates and threshold functions. Our second technique is called ``generalization'', and applies in the casewhere the actual value of the aggregate is not needed in the output,but used just to compare with the threshold. Generalization uses thefact that a tuple meets the given threshold to infer that all coarserlevel aggregates also meet the threshold. We also propose a scheme ofgeneralization transformations. We demonstrate the efficiency of these techniques by implementing themwithin the sparse datacube algorithm of Ross and Srivastava. Wepresent a performance study using synthetic and real-world data sets.Our results indicate substantial performance improvements for querieswith selective conditions.	(pdf) (ps)
What is the complexity of Stieltjes integration?	Arthur G. Werschulz	1998-11-02	We study the complexity of approximating the Stieltjes integral$\int_0^1 f(x)\,dg(x)$ for functions $f$ having $r$ continuousderivatives and functions $g$ whose $s$th derivative has boundedvariation. Let $r(n)$ denote the $n$th minimal error attainable byapproximations using at most~$n$ evaluations of $f$ and~$g$, and let$\mathop{\rm comp}(\varepsilon)$ denote the $\varepsilon$-complexity(the minimal cost of computing an $\varepsilon$-approximation). Weshow that $r(n)\asymp n^{-\min\{r,s+1\}}$ and that $\mathop{\rm comp}(\varepsilon)\asymp\varepsilon^{-1/\min\{r,s+1\}}$. We also present an algorithm that computes an$\varepsilon$-approimation at nearly-minimal cost.	(pdf) (ps)
Pruning Classifiers in a Distributed Meta-Learning System	Andreas Prodromidis, Salvatore Stolfo	1998-07-18	JAM is a powerful and portable agent-baseddistributed data mining system that employs meta-learning techniquesto integrate a number of independent classifiers (models) derived inparallel from independent and (possibly) inherently distributeddatabases. Although meta-learning promotes scalability and accuracy ina simple and straightforward manner, brute force meta-learningtechniques can result in large, redundant, inefficient and some timesinaccurate meta-classifier hierarchies. In this paper we exploreseveral methods for evaluating classifiers and composingmeta-classifiers, we expose ther limitations and we demonstrate thatmeta-learning combined with certain pruning methods has the potentialto achieve similar or even better performance results in a much morecost effective manner.	(pdf) (ps)
Integrating Transaction Services into Web-based Software Development Environments	Jack Jingshuang Yang, Gail Kaiser, Steve Dossick, Wenyu Jiang	1998-04-20	Software Development Environments (SDE) requiresophisticated database transaction models due to the long-duration,interactive, and cooperative nature of the software engineeringactivities. Such Extended Transaction Models (ETM) have been proposedand implemented by building application-specific databases for theSDEs. With the development of World Wide Web (WWW), there have been a numberof efforts to build SDEs on top of the WWW. Using web serversas the databases to store the software artifacts provided us with a newchallenge: how to implement the ETMs in such web-based SDEs without requiringthe web servers to be customized specifically according to the applicationdomains of the SDEs. This paper presents our experiences of integrating transactionservices into web based SDEs. We evolved from the traditional approachof building a transaction management component that operated on top ofa dedicated database to the external transaction server approach. Atransaction server, called JPernLite, was built to operateindependently of the web servers and provide the necessaryextensibility for SDEs to implement their ETMs. The transaction servercan be integrated into the SDE via a number of interfaces, and wediscuss the pros and cons of each alternative in detail.	(pdf) (ps)
JPernLite: Extensible Transaction Services for WWW	Jack Jingshuang Yang, Gail E. Kaiser	1998-03-26	Concurrency control is one of the key problems in design and implementation of collaborative systems such as hypertext/hypermedia systems, CAD/CAM systems and software development environments. Most existing systems store data in specialized databases with built-in concurrency control policies, usually implemented via locking. It is desirable to construct such collaborative systems on top of the World Wide Web, but most web servers do not support even conventional transactions, let alone distributed (multi-website) transactions or flexible concurrency control mechanisms oriented towards teamwork - such as event notification, shared locks and fine granularity locks. We present a transaction server that operates independently of web servers or the collaborative systems, to fill the concurrency control gap. The transaction server by default enforces the conventional atomic transaction model, where sets of operations are performed in an all-or-nothing fashion and isolated from concurrent users. The server can be tailored dynamically to apply more sophisticated concurrency control policies appropriate for collaboration. The transaction server also supports applications employing information resources other than web servers, such as legacy databases, CORBA objects, and other hypermedia systems. Our implementation permits a wide range of system architecture styles.	(pdf) (ps)
Signaling for Internet Telephony	Henning Schulzrinne, Jonathan Rosenberg	1998-02-10	Internet telephony must offer the standard telephony services.However, the transition to Internet-based telephony services alsoprovides an opportunity to create new services more rapidly and withlower complexity than in the existing public switched telephone network(PSTN). The Session Initiation Protocol (SIP) is a signaling protocolthat creates, modifies and terminates associations between Internet endsystems, including conferences and point-to-point calls. SIP supportsunicast, mesh and multicast conferences, as well as combinations ofthese modes. SIP implements services such as call forwarding andtransfer, placing calls on hold, camp-on and call queueing by a smallset of call handling primitives. SIP implementations can re-use partsof other Internet service protocols such as HTTP and the Real-TimeStream Protocol (RTSP). In this paper, we describe SIP, and show howits basic primitives can be used to construct a wide range of telephonyservices.	(pdf) (ps)
Synthesis for Logical Initializability of Synchronous Finite State Machines	Montek Singh, Steven M. Nowick	1998-01-15	A new method is introduced for the synthesis forlogical initializability of synchronous state machines. The goal isto synthesize a gate-level implementation that is initializable whensimulated by a 3-valued (0,1,X) simulator. The method builds on anexisting approach of Cheng and Agrawal, which uses constrained stateassignment to translate functional initializability into logicalinitializability. Here, a different state assignment method isproposed which, unlike the method of Cheng and Agrawal, is guaranteedsafe and yet is not as conservative. Furthermore, it is demonstratedthat certain new constraints on combinational logic synthesis are bothnecessary and sufficient to insure that the resulting gate-levelcircuit is 3-valued simulatable. Interestingly, these constraints aresimilar to those used for hazard-free synthesis of asynchronouscombinational circuits. Using the above constraints, we present acomplete synthesis for initializability method, targeted to bothtwo-level and multi-level circuits.	(pdf) (ps)
Fast Heuristic and Exact Algorithms for Two-Level Hazard-Free Logic Minimization	Michael Theobald, Steven M. Nowick	1998-01-09	None of the available minimizers for 2-levelhazard-free logic minimization can synthesizevery large circuits. This limitation has forcedresearchers to resort to manual and automatedcircuit partitioning techniques. This paperintroduces two new 2-level logic minimizers:ESPRESSO-HF, a heuristic method which is looselybased on ESPRESSO-II, and IMPYMIN, an exact methodbased on implicit data structures. Both minimizers can solve all currently availableexamples, which range up to 32 inputs and 33 outputs.These include examples that have never been solved before.For examples that can be solved by other minimizers ourmethods are several orders of magnitude faster. As by-products of these algorithms, we also presenttwo additional results. First, we introduce a fast newalgorithm to check if a hazard-free covering problemcan feasibly be solved. Second, we introduce a novelformulation of the 2-level hazard-free logic minimizationproblem by capturing hazard-freedom constraints withina synchronous function by adding new variables.	(pdf) (ps)
Cryptfs: A Stackable Vnode Level Encryption File System	Erez Zadok, Ion Badulescu, and Alex Shender	1998-01-01	Data encryption has become an increasinglyimportant factor in everyday work. Users seek a method of securingtheir data with maximum comfort and minimum additional requirements ontheir part; they want a security system that protects any files usedby any of their applications, without resorting toapplication-specific encryption methods. Performance is an importantfactor to users since encryption can be time consuming. Operatingsystem vendors want to provide this functionality but withoutincurring the large costs of developing a new file system. This paper describes the design and implementation of Cryptfs -- afile system that was designed as a stackable Vnode layer loadablekernel module. Cryptfs operates by "encapsulating" a client filesystem with a layer of encryption transparent to the user. Being kernel resident, Cryptfs performs better than user-level or NFSbased file servers such as CFS and TCFS. It is 2 to 37 times faster onmicro-benchmarks such as read and write; this translates to 12-52\%application speedup, as exemplified by a large build. Cryptfs offersstronger security by basing its keys on process session IDs as well asuser IDs, and by the fact that kernel memory is harder toaccess. Working at and above the vnode level, Cryptfs is more portablethan a file system which works directly with native media such asdisks and networks. Cryptfs can operate on top of any other nativefile system such as UFS/FFS and NFS. Finally, Cryptfsrequires no changes to client file systems or remote servers.	(pdf) (ps)
Usenetfs: A Stackable File System for Large Article Directories	Erez Zadok, Ion Badulescu	1998-01-01	The Internet has grown much in popularity in thepast few years. Numerous users read USENET newsgroups daily forentertainment, work, study, and more. USENET News servers have seen agradual increase in the traffic exchanged between them, to a pointwhere the hardware and software supporting the servers is no longercapable of meeting demand, at which point the servers begin "dropping"articles they could not process. The rate of this increase has beenfaster than software or hardware improvements were able to keep up,resulting in much time and effort spent by administrators upgradingtheir news systems. One of the primary reasons for the slowness of news servers has beenthe need to process many articles in very large flat directoriesrepresenting newsgroups such as control. cancel andmisc.jobs.offered. A large portion of the resources is spent onprocessing articles in these few newsgroups. Most Unix directoriesare organized as a linear unsorted sequence of entries. Largenewsgroups can have hundreds of thousands of articles in onedirectory, resulting in significant delays processing any singlearticle. Usenetfs is a file system that rearranges the directory structure frombeing flat to one with small directories containing fewer articles. Bybreaking the structure into smaller directories, it improves theperformance of looking for, creating, or deleting files, since theseoperations occur on smaller directories. Usenetfs takes advantage ofarticle numbers; knowing that file names representing articles arecomposed of digits helps to bound the size of the smallerdirectories. Usenetfs improves overall performance by at least 22\%for average news servers; common news server operations such aslooking up, adding, and deleting articles are sped up by as much asseveral orders of magnitude. Usenetfs was designed and implemented as a stackable Vnode layerloadable kernel module. It operates by "encapsulating" a client filesystem with a layer of directory management. To the processperforming directory operations through a mounted Usenetfs, alldirectories appear flat; but when inspecting the underlying storagethat it manages, small directories are visible. Usenetfs is small and is transparent to the user. It requires nochange to News software, to other file systems, or to the rest of theoperating system. Usenetfs is more portable than other nativekernel-based file systems because it interacts with the Vnodeinterface which is similar on many different platforms.	(pdf) (ps)
On the management of distributed learning agents	Andreas L. Prodromidis	1997-12-19	This thesis research concentrates on the problem of managing a distributed collection of intelligent learning agents across large and distributed databases. The main challenge is to identify and address the issues related to the efficiency, scalability,adaptivity and compatibility of these agents and the design andimplemention of a complete and coherent distributed meta-learning systemfor large scale data mining applications. The resulting system should beable to scale with many large databases and make effective use of the available system resources. Furthemore, it should be capable to adapt tochanges in its computational environment and be flexible enough to circumvent variances in database schema definitions. In this thesis proposal we present the architecture of JAM(Java Agents for Meta-learning),a distributed data mining system, and we describe in detail several methods to cope with the issues of scalability, efficiency, adaptivityand compatibility. Through experiments, performed on actual credit cardand other public domain data sets, we evaluate the effectiveness and performance of our approaches and we demonstrate their potential.	(ps)
Automated Visual Discourse Synthesis: Coherence, Versatility, and Interactivity (Ph.D. Thesis Proposal)	Michelle X. Zhou	1997-12-17	In this proposal, we present comprehensive andsystematic approaches of building systems that can automaticallygenerate coherent visual discourse for interactive envirornments. Avisual discourse refers to a series of connected visual displays. Acoherent visual discourse requires smooth transitions betweendisplays, consistent designs within and among displays, and effectiveintegration of various components. While our main research goal is todevelop approaches to automatically create coherent, versatile, andinteractive visual discourse, we also emphasize integrating theseapproaches into a general framework to provide a reference model inwhich a specific system is considered an instantiation of theframework. In other words, the framework should consist of a knowledgebase, an inference engine, a visual realizer, and an interactionhandler. As a consequence, not only can a general framework serve as atemplate from which a specific generation system can be instantiated,but the framework also can be used as a base for comparing orevaluating different systems. We concentrate on the basic issuesinvolved in developing comprehensive and systematic approaches toensure a visual discourse's coherence, versatility, andinteractivity. In particular, To ensure coherence, we have establisheda set of comprehensive design criteria to measure both expressivenessand effectiveness of a visual discourse. To provide versatility, weaddress the design of visual discourse for heterogeneousinformation. Within such discussions, heterogeneous information refersto both quantitative and qualitative, or static and dynamicinformation. In addition, we are also concerned with characterizingand employing different visual media and a wide variety of visualtechniques. To support interaction, we integrate many conventionaluser interface metaphors and styles into visual discourse design andexplore reactive planning approaches to provide proper response touser interactions. To establish the framework, we identify variousknowledge sources and determine effective knowledge representationparadigms in constructing the knowledge base. We emphasize theefficiency, usability, and flexibility issues in modeling theinference engine. We are concerned with portability andparallelization issues in building the visual realizer, and we alsotake into account interaction capabilities for interactiveenvironments. We describe a system called IMPROVISE (IllustrativeMetaphor Production in Reactive Object-oriented VISual Environments)that serves as a proof-of-concept prototype. IMPROVISE is built basedon our framework, aiming to automatically generate coherent visualdiscourse for various application domains in interactiveenvironments. IMPROVISE has been used in two testbed applicationdomains to demonstrate its generality and flexibility. Examples fromboth domains will be given to illustrate IMPROVISE's generationprocess and to identify the future research areas.	(pdf)
Building a Rich Large-scale Lexical Base for Generation	Hongyan Jing, Kathleen Mcheown, Rebecca	1997-09-26		(ps)
Where does smoothness count the most for two-point boundary-value problems?	Arthur G. Werschulz	1997-09-12	We are concerned with the complexity of $2m$thorder elliptic two-point boundary-value problems $Lu=f$. Previouswork on the complexity of these problems has generally assumed that wehad partial information about the right-hand side~$f$ and completeinformation about the coefficients of~$L$, often making unrealisticassumptions about the smoothness of the coefficients of~$L$. In thispaper, we study problems in which $f$ has $r$ derivatives in the$L_p$-sense and for~$L$ having the usual divergence form $$Lv =\sum_{0\le i,j\le m} (-1)^i D^i(a_{i,j}D^jv),$$ with $a_{i,j}$ being$r_{i,j}$-times continuously differentiable. We find that ifcontinuous information is permissible, then the$\varepsilon$-complexity is proportional to$(1/\varepsilon)^{1/(\tilde r+m)}$, where $$\tilder=\min\{\,r,\min_{0\le i,j\le m} \{ r_{i,j}-i \}\,\},$$ and show thata finite element method (FEM) is optimal. If only standardinformation (consisting of function and/or derivative evaluations) isavailable, we find that the complexity is proportional to$(1/\varepsilon)^{1/{r_{\mathop{\rm min}}}}$, where $$r_{\mathop{\rmmin}}=\min\{\,r,\min_{0\le i,j\le m} \{ r_{i,j} \}\,\},$$ and we showthat a modified FEM (which uses only function evaluations, and notderivatives) is optimal.	(ps)
The Competitive Analysis of Risk Taking with Application to Online Trading	Sabah al-Binali	1997-08-05	Competitive analysis is concerned with minimizinga relative measure of performance. When applied to financial tradingstrategies, competitive analysis leads to the development ofstrategies with minimum relative performance risk. This approach istoo inflexible. Many investors are interested in managing their risk:they may be willing to increase their risk for some form ofreward. They may also have some forecast of the future. In this paper,we extend competitive analysis to provide a framework in whichinvestors can develop optimal trading strategies based on their risktolerance and forecast. We first define notions of risk and rewardthat are smooth extensions of classical competitive analysis. We thenillustrate our ideas using the ski-rental problem, and analyze afinancial game using the risk-reward framework.	(ps)
Applying Reliability Metrics to Co-Reference Annotation	Rebecca J. Passonneau	1997-06-05	Studies of the contextual and linguistic factorsthat constrain discourse phenomena such as reference are coming todepend increasingly on annotated language corpora. In preparing thecorpora, it is important to evaluate the reliability of theannotation, but methods for doing so have not been readily available.In this report, I present a method for computing reliability ofcoreference annotation. First I review a method for applying theinformation retrieval metrics of recall and precision to coreferenceannotation proposed by Marc Vilain and his collaborators. I show howthis method makes it possible to construct contingency tables forcomputing Cohen's Kappa, a familiar reliability metric. By comparingrecall and precision to reliability on the same data sets, I also showthat recall and precision can be misleadingly high. Because Kappafactors out chance agreement among coders, it is a preferable measurefor developing annotated corpora where no pre-existing targetannotation exists.	(ps)
Generating Natural Language Summaries from Multiple On-Line Sources	Dragomir Radev	1997-05-13	We present a methodology for summarization of newson current events. Our approach is included in a system, calledSUMMONS which presents news summaries to the user in a naturallanguage form along with appropriate background (historical)information from both textual (newswire) and structured (database)knowledge sources. The system presents novel approaches to severalproblems: summarization of multiple sources, summarization of multiplearticles, symbolic summarization through text understanding andgeneration, asynchronous summarization and generation of textualupdates. We pay specific attention to the generation of summaries that includedescriptions of entities such as people and places. We show how certainclasses of lexical resources can be automatically extracted from on-linecorpora and used in the generation of textual summaries. We describe ourapproach to solving the interoperability problem of the various componentsby wrapping all system modules with facilitators which effect thecommunication between the components using a standardized language. Wepresent a plan for completion of the research as well as a set of metricsthat can be used in measuring the performance of the system.	(ps)
A Risk-Reward Framework for the Competitive Analysis of Adaptive Trading Strategies (Thesis Proposal)	Sabah al-Binali	1997-05-08	Competitive analysis is concerned with minimizinga relative measure of performance. When applied to financial tradingstrategies, competitive analysis leads to the development ofstrategies with minimum relative performance risk. This approach istoo inflexible. Many investors are interested in managing their risk:they may be willing to increase their risk for some form ofreward. They may also have some forecast of the future. We propose to extend competitive analysis to provide a framework inwhich investors may develop trading strategies based on their risktolerance and forecast. We introduce a new, nonstochastic, measure ofthe risk of an online algorithm, and a reward metric that is in thespirit of competitive analysis. We then show how investors can selecta strategy that maximizes their reward should their forecast becorrect, whilst still respecting their risk tolerance.	(ps)
A Theory of Catadioptric Image Formation	Shree K. Nayar, Simon Baker	1997-04-23	Conventional video cameras have limited fields ofview which make them restrictive for certain applications incomputational vision. A catadioptric sensor uses a combination oflenses and mirrors placed in a carefully arranged configuration tocapture a much wider field of view. When designing a catadioptricsensor, the shape of the mirror(s) should ideally be selected toensure that the complete catadioptric system has a single effectiveviewpoint. The reason a single viewpoint is so desirable is that it isa requirement for the generation of pure perspective images from thesensed image(s). In this paper, we derive and analyze the completeclass of single-lens single-mirror catadioptric sensors which satisfythe fixed viewpoint constraint. Some of the solutions turn out to bedegenerate with no practical value, while other solutions lead torealizable sensors. We also derive an expression for the spatialresolution of a catadioptric sensor, and include a preliminaryanalysis of the defocus blur caused by the use of a curved mirror.	(ps)
The Unknown and the Unknowable	Joseph F. Traub	1997-04-21		(ps)
Do Negative Results from Formal Systems Limit Scientific Knowledge?	Joseph F. Traub	1997-04-15		(ps)
Non-Computability and Intractability: Does it Matter to Physics?	Joseph F. Traub	1997-04-15	Should the impossibility results of theoreticalcomputer science be of concern to physics?	(ps)
On the Parameter Estimation Accuracy of Model-Matching Feature Detectors	Simon Baker	1997-04-04	The performance of model-fitting feature detectorsis critically dependent upon the function used to measure the degreeof fit between the feature model and the image data. In this paper, weconsider the class of weighted $\mathrm{L}^{2}$ norms as potentialfitting functions and study the effect which the choice of fittingfunction has on one particular aspect of performance, namely parameterestimation accuracy. We first derive an optimality criterion basedupon how far an ideal feature instance is perturbed around the featuremanifold when noise is added to it. We then show that a first-order(linear) approximation to the feature manifold results in theEuclidean $\mathrm{L}^{2}$ norm being optimal. We next showempirically that for non-linear manifolds the Euclidean$\mathrm{L}^{2}$ norm is no longer, in general, optimal. Finally, wepresent the results of several experiments comparing the performanceof various weighting functions on a number of ubiquitous features.	(ps)
The complexity of indefinite elliptic problems with noisy data	Arthur G. Werschulz	1997-03-26	We study the complexity of second-order indefiniteelliptic problems $-\mathop{\rm div}(a\nabla u) + bu=f$ (withhomogeneous Dirichlet boundary conditions) over a $d$-dimensionaldomain~$\Omega$, the error being measured in the $H^m(\Omega)$-norm.The problem elements~$f$ belong to the unit ball of~$W^{r,p}(\Omega)$,where $p\in[2,\infty]$ and $r>d/p$. Information consists of(possibly-adaptive) noisy evaluations of $f$, $a$, or~$b$ (or theirderivatives). The absolute error in each noisy evaluation is atmost~$\delta$. We find that the $n$th minimal radius for this problemis proportional to $n^{-r/d}+\delta$, and that a noisy finite elementmethod with quadrature (FEMQ), which uses only function values, andnot derivatives, is a minimal error algorithm. This noisy FEMQ can beefficiently implemented using multigrid techniques. Using theseresults, we find tight bounds on the $\varepsilon$-complexity (minimalcost of calculating an $\varepsilon$-approximation) for this problem,said bounds depending on the cost~$c(\delta)$ of calculating a$\delta$-noisy information value. As an example, if the cost of a$\delta$-noisy evaluation is $c(\delta)=\delta^{-s}$ (for $s>0$), thenthe complexity is proportional to $(1/\varepsilon)^{d/r+s}$.	(ps)
Adapting Materialized Views after Redefinitions: Techniques and a Performance Study	Ashish Gupta, Inderpal S. Mumick, Jun Rao, Kenneth A. Ross	1997-03-14	We consider a variant of the view maintenance problem: How does one keep a materialized view up-to-date when the view definition itself changes? Can one do better than recomputing the view from the base relations? Traditional view maintenance tries to maintain the materialized view in response to modifications to the base relations; we try to ``adapt'' the view in response to changes in the view definition. Such techniques are needed for applications where the user can changequeries dynamically and see the changes in the results fast. Dataarchaeology, data visualization, and dynamic queries are examples ofsuch applications. We consider all possible redefinitions of SQL\Select-\From-\Where-\Groupby-\Having, \Union, and \Minus\ views, and show howthese views can be adapted using the old materialization for the caseswhere it is possible to do so. We identify extra informationthat can be kept with a materialization to facilitate redefinition.Multiple simultaneous changesto a view can be handledwithout necessarily materializing intermediate results. Weidentify guidelines for users and database administrators that can beused to facilitate efficient view adaptation. We perform a systematic experimental evaluation of our proposedtechniques. Our evaluation indicates that adaptation is more efficient thanrematerialization in most cases. Certain adaptation techniques can be up to1,000 times better. We also point out the physical layouts that can benefitadaptation.	(ps)
Federating Process-Centered Environments: the Oz Experience	Israel Z. Ben-Shaul, Gail E. Kaiser	1997-03-10		(ps)
WWW-based Collaboration Environments with Distributed Tool Services	Gail E. Kaiser, Stephen E. Dossick, Wenyu Jiang, Jack Jingshuang Yang, Sonny Xi Ye	1997-03-03	We have developed an architecture and realizationof a framework for hypermedia collaboration environments that supportpurposeful work by orchestrated teams. The hypermedia represents allplausible multimedia artifacts concerned with the collaborativetask(s) at hand that can be placed or generated on-line, fromapplication-specific materials (e.g., source code, chip layouts,blueprints) to formal documentation to digital library resources toinformal email and chat transcripts. The environment capabilitiesinclude both internal (hypertext) and external (link server) linksamong these artifacts, which can be added incrementally as usefulconnections are discovered; project-specific hypermedia search andbrowsing; automated construction of artifacts and hyperlinks accordingto the semantics of the group and individual tasks and the overallprocess workflow; application of tools to the artifacts; andcollaborative work for geographically dispersed teams. We present a general architecture for what we call hypermedia {\emsubwebs}, and imposition of {\em groupspace} services operating onshared subwebs, based on World Wide Web technology --- which could beapplied over the Internet and/or within an organizational intranet.We describe our realization in OzWeb, which reuses object-orienteddata management for application-specific subweb organization, andworkflow enactment and cooperative transactions as built-in groupspaceservices, which were originally developed for the Oz process-centeredsoftware development environment framework. Further, we present ageneral architecture for a WWW-based distributed tool launchingservice. This service is implemented by the generic Rivendellcomponent, which could be employed in a stand-alone manner, but hasbeen integrated into OzWeb as an example ``foreign'' (i.e., add-on)groupspace service.	(ps)
FiST: A File System Component Compiler (Ph.D. Thesis Proposal)	Erez Zadok	1997-01-01		(ps)
An Extensible Meta-Learning Approach for Scalable and Accurate Inductive Learning	Philip Chan	1996-10-18	Much of the research in inductive learning concentrateson problems with relatively small amounts of data. With the coming age ofubiquitous network computing, it is likely that orders of magnitudemore data in databases will be available for various learning problemsof real world importance. Some learning algorithms assume that theentire data set fits into main memory, which is not feasible formassive amounts of data, especially for applications in data mining.One approach to handling a large data set is to partition the data setinto subsets, run the learning algorithm on each of the subsets, andcombine the results. Moreover, data can be inherently distributedacross multiple sites on the network and merging all the data in onelocation can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a {\itmeta-learning} approach to integrating the results of multiplelearning processes. Our approach utilizes machine learning to guidethe integration. We identified two main meta-learning strategies:{\it combiner} and {\it arbiter}. Both strategies are independent tothe learning algorithms used in generating the classifiers. Thecombiner strategy attempts to reveal relationships among the learnedclassifiers' prediction patterns. The arbiter strategy tries todetermine the correct prediction when the classifiers have differentopinions. Various schemes under these two strategies have beendeveloped. Empirical results show that our schemes can obtainaccurate classifiers from inaccurate classifiers trained from datasubsets. We also implemented and analyzed the schemes in a paralleland distributed environment to demonstrate their scalability.	(ps)
Distributed Tool Services via the World Wide Web	Steve Dossick, Gail Kaiser, Jack Jingshuang Yang	1996-10-15	We present an architecture for a distributed toolservice which operates over HTTP, the underlying protocol of the WorldWide Web. This allows unmodified Web browsers to request toolexecutions from the server as well as making integration with existingsystems easier. We describe Rivendell, a prototype implementation ofthe architecture.	(ps)
The complexity of definite elliptic problems with noisy data	Arthur G. Werschulz	1996-09-11	We study the complexity of $2m$th order definiteelliptic problems $Lu=f$ (with homogeneous Dirichlet boundaryconditions) over a $d$-dimensional domain~$\Omega$, error beingmeasured in the $H^m(\Omega)$-norm. The problem elements~$f$ belongto the unit ball of~$W^{r,p}(\Omega)$, where $p\in[2,\infty]$ and$r>d/p$. Information consists of (possibly-adaptive) noisyevaluations of~$f$ or the coefficients of~$L$. The absolute error ineach noisy evaluation is at most~$\delta$. We find that the $n$thminimal radius for this problem is proportional to $n^{-r/d}+\delta$,and that a noisy finite element method with quadrature (FEMQ), whichuses only function values, and not derivatives, is a minimal erroralgorithm. This noisy FEMQ can be efficiently implemented usingmultigrid techniques. Using these results, we find tight bounds onthe $\varepsilon$-complexity (minimal cost of calculating an$\varepsilon$-approximation) for this problem, said bounds dependingon the cost~$c(\delta)$ of calculating a $\delta$-noisy informationvalue. As an example, if the cost of a $\delta$-noisy evaluation is$c(\delta)=\delta^{-s}$ (for $s>0$), then the complexity isproportional to $(1/\varepsilon)^{d/r+s}$.	(ps)
An Architecture for WWW-based Hypercode Environments	Gail E. Kaiser, Stephen E. Dossick, Wenyu Jiang, Jack Jingshuang Yang	1996-08-09	A hypercode software engineering environmentrepresents all plausible multimedia artifacts concerned with softwaredevelopment and evolution that can be placed or generated on-line,from source code to formal documentation to digital library resourcesto informal email and chat transcripts. A hypercode environmentsupports both internal (hypertext) and external (link server) linksamong these artifacts, which can be added incrementally as usefulconnections are discovered; project-specific hypermedia search andbrowsing; automated construction of artifacts and hyperlinks accordingthe software process; application of tools to the artifacts accordingto the process workflow; and collaborative work for geographicallydispersed teams. We present a general architecture for what we callhypermedia subwebs, and groupspace services operating on sharedsubwebs, based on World Wide Web technology which could be appliedover the Internet or within an intranet. We describe our realizationin OzWeb.	(ps)
Reflectance and Texture of Real-World Surfaces: Summary Report	Kristin J. Dana, Bram van Ginneken, Shree K. Nayar, Jan J. Koenderink	1996-07-20	In this work, we investigate the visual appearanceof real-world surfaces and the dependence of appearance on scale,viewing direction and illumination direction. At fine scale, surfacevariations cause local intensity variation or image texture. Theappearance of this texture depends on both illumination and viewingdirection and can be characterized by the BTF (bidirectional texturefunction). At sufficiently coarse scale, local image texture is notresolvable and local image intensity is uniform. The dependence ofthis image intensity on illumination and viewing direction isdescribed by the BRDF (bidirectional reflectance distributionfunction). We simultaneously measure the BTF and BRDF of over 60different rough surfaces, each observed with over 200 differentcombinations of viewing and illumination direction. The resulting BTFdatabase is comprised of over 12,000 image textures. To enableconvenient use of the BRDF measurements, we fit the measurements totwo recent models and obtain a BRDF parameter database. Theseparameters can be used directly in image analysis and synthesis of awide variety of surfaces. The BTF, BRDF, and BRDF parameter databaseshave important implications for computer vision and computer graphicsand each is made publicly available.	(ps)
An Analytical Approach to File Prefetching	Hui Lei, Dan Duchamp	1996-06-26	File prefetching is an effective technique forimproving file access performance. In this paper, we present a fileprefetching mechanism that is based on on-line analytic modeling ofinteresting system events and is transparent to higher levels. Themechanism, incorporated into a client's file cache manager, seeks tobuild semantic structures, called access trees, that capture thecorrelations between file accesses. It then heuristically uses thesestructures to represent distinct file usage patterns and exploits themto prefetch files from a file server. We show results of a simulationstudy and of a working implementation. Measurements suggest that ourmethod can predict future file accesses with an accuracy around 90\%,that it can reduce cache miss rate by up to 47\% and applicationlatency by up to 40\%. Our method imposes little overhead, even underantagonistic circumstances.	(ps)
Fast Joins Using Join Indices	Zhe Li, Kenneth Ross	1996-06-26	Two new algorithms, ``Jive-join'' and``Slam-join,'' are proposed for computing the join of two relationsusing a join index. The algorithms are duals: Jive-joinrange-partitions input relation tuple-ids then processes eachpartition, while Slam-join forms ordered runs of input relationtuple-ids and then merges the results. Each algorithm has featuresthat make it preferable to the other depending on the context in whichit is being used. Both algorithms make a single sequential passthrough each input relation, in addition to one pass through the joinindex and two passes through a temporary file whose size is half thatof the join index. Both algorithms perform this efficiently even whenthe relations are much larger than main memory, as long as the numberof blocks in main memory is of the order of the square root of thenumber of blocks in the smaller relation. By storing intermediate andfinal join results in a vertically partitioned fashion, our algorithmsneed to manipulate less data in memory at a given time than otheralgorithms. Almost all the I/O of our algorithms is sequential, thusminimizing the impact of seek and rotational latency. The algorithmsare resistant to data skew and adaptive to memory fluctuations. Theycan be extended to handle joins of multiple relations by usingmultidimensional partitioning while still making only a single passover each input relation. They can also be extended to handle joinsof relations that do not satisfy the memory bound by recursivelyapplying the algorithms. We also show how selection conditions can beincorporated into the algorithms. Using a detailed cost model, thealgorithms are analyzed and compared with competing algorithms. Forlarge input relations, our algorithms perform significantly betterthan Valduriez's algorithm and hash join algorithms. An experimentalstudy is also conducted to validate the analytical results and todemonstrate the performance characteristics of each algorithm inpractice.	(ps)
Reflectance and Texture of Real-World Surfaces	Kristin J. Dana, Bram van Ginneken, Shree K. Nayar, Jan J. Koenderink	1996-06-24		(ps)
New methodologies for valuing derivatives	Spassimir Paskov	1996-06-10	High-dimensional integrals are usually solved withMonte Carlo algorithms although theory suggests that low discrepancyalgorithms are sometimes superior. We report on numerical testingwhich compares low discrepancy and Monte Carlo algorithms on theevaluation of financial derivatives. The testing is performed on aCollateralized Mortgage Obligation (CMO) which is formulated as thecomputation of ten integrals of dimension up to 360. We tested two low discrepancy algorithms (Sobol and Halton) and tworandomized algorithms (classical Monte Carlo and Monte Carlo combinedwith antithetic variables). We conclude that for this CMO the Sobolalgorithm is always superior to the other algorithms. We believe thatit will be advantageous to use the Sobol algorithm for many othertypes of financial derivatives. Our conclusion regarding the superiority of the Sobol algorithm alsoholds when a rather small number of sample points are used, animportant case in practice. We have built a software system called FINDER for computing highdimensional integrals. Routines for computing Sobol points have beenpublished. However, we incorporated major improvements in FINDER andwe stress that the results reported here were obtained using thissoftware. The software system FINDER runs on a network of heterogeneousworkstations under PVM 3.2 (Parallel Virtual Machine). Sinceworkstations are ubiquitous, this is a cost-effective way to do verylarge computations fast. The measured speedup is at least $.9N$ for$N$ workstations, $N \leq 25$. The software can also be used tocompute high dimensional integrals on a single workstation.	(ps)
Faster valuation of financial derivatives	Spassimir Paskov	1996-06-10	Monte Carlo simulation is widely used to valuecomplex financial instruments. An alternative to Monte Carlo is touse ``low discrepancy'' methods. Theory suggests that low discrepancymethods might be superior to the Monte Carlo method. We compared theperformance of low discrepancy methods with Monte Carlo on aCollateralized Mortgage Obligation (CMO) with ten tranches. We foundthat a particular low discrepancy method based on Sobol pointsconsistently outperforms Monte Carlo. Although our tests were for aCMO, we believe it will be advantageous to use the Sobol method formany other types of instruments. We have made major improvements inpublished routines for generating Sobol points which we have embeddedin a software system called FINDER.	(ps)
New Results on Deterministic Pricing of Financial Derivatives	A. Papageorgiou, J.F. Traub	1996-06-01		(ps)
Closed Terminologies and Temporal Reasoning in Description Logic for Concept and Plan Recognition (phd thesis)	Robert Anthony Weida	1996-05-29	Description logics are knowledge representationformalisms in the tradition of frames and semantic networks, but withan emphasis on formal semantics. A terminology contains descriptionsof concepts, such as UNIVERSITY, which are automatically classified ina taxonomy via subsumption inferences. Individuals such as COLUMBIAare described in terms of those concepts. This thesis enhances thescope and utility of description logics by exploiting new completenessassumptions during problem solving and by extending the expressivenessof descriptions. First, we introduce a predictive concept recognition methodology basedon a new closed terminology assumption (CTA). The terminology isdynamically partitioned by modalities (necessary, optional, andimpossible) with respect to individuals as they are specified. In ourinteractive configuration application, a user incrementally specifiesan individual computer system and its components in collaboration witha configuration engine. Choices can be made in any order and at anylevel of abstraction. We distinguish between abstract and concreteconcepts to formally define when an individual's description may beconsidered finished. We also exploit CTA, together with theterminology's subsumption-based organization, to efficiently track thetypes of systems and components consistent with current choices, inferadditional constraints on current choices, and appropriately restrictfuture choices. Thus, we can help focus the efforts of both user andconfiguration engine. This work is implemented in the K-REP system. Second, we show that a new class of complex descriptions can be formedvia constraint networks over standard descriptions. For example, wemodel plans as constraint networks whose nodes represent actions.Arcs represent qualitative and metric temporal constraints, plusco-reference constraints, between actions. By combiningterminological reasoning with constraint satisfaction techniques,subsumption is extended to constraint networks, allowing automaticclassification of a plan library. This work is implemented in theT-REX system, which integrates and builds upon an existing descriptionlogic system (K-REP or CLASSIC) and temporal reasoner (MATS). Finally, we combine the preceding, orthogonal results to conductpredictive recognition of constraint network concepts. As an example,this synthesis enables a new approach to deductive plan recognition,illustrated with travel plans. This work is also realized in T-REX.	(ps)
An intractability result for multiple integration	Henryk Wozniakowski, I. H. Sloan	1996-05-20		(ps)
Explicit cost bounds of algorithms for multivariate tensor product problems	Henryk Wozniakowski, Greg Wasilkowski	1996-05-20		(ps)
Strong tractability of weighted tensor products	Henryk Wozniakowski	1996-05-20		(ps)
On tractability of path integration	Henryk Wozniakowski, Greg Wasilkowski	1996-05-20		(ps)
Estimating a largest eigenvector by polynomial algorithms with a random start	Henryk Wozniakowski, Z. Leyk	1996-05-20		(ps)
Overview of information-based complexity	Henryk Wozniakowski	1996-05-20		(ps)
Computational complexity of continuous problems	Henryk Wozniakowski	1996-05-20		(ps)
The exponent of discrepancy is at most 1.4778...	Henryk Wozniakowski, Greg Wasilkowski	1996-05-20		(ps)
A Metalinguistic Approach to Process Enactment Extensibility	Gail E. Kaiser, Israel Z. Ben-Shaul, Steven S. Popovich, Stephen E. Dossick	1996-05-14	We present a model for developing rule-basedprocess servers with extensible syntax and semantics. New processenactment directives can be added to the syntax of the processmodeling language, in which the process designer may specifyspecialized behavior for particular tasks or task segments. Theprocess engine is peppered with callbacks to instance-specific code inorder to implement any new directives and to modify the defaultenactment behavior and the kind of assistance that theprocess-centered environment provides to process participants. Werealized our model in the Amber process server, and describe how weexploited Amber's extensibility to replace Oz's native process enginewith Amber and to integrate the result with a mockup of TeamWare.	(ps)
Architectures for Federation of Process-Centered Environments	Israel Z. Ben-Shaul, Gail E. Kaiser	1996-05-07	We describe two models for federatingprocess-centered environments, homogeneous federation where theinteroperability is among distinct process models enacted by differentcopies of the same system and heterogeneous federation withinteroperability among distinct process enactment systems. Weidentify the requirements and possible architectures for eachmodel. The bulk of the paper presents the specific architecture andinfrastructure for homogeneous federation we realized in the \ozsystem. We briefly consider how \oz might be integrated into aheterogeneous federation to serve as one of its interoperating PCEs.	(ps)
On Reality and Models	Joseph F. Traub	1996-05-05		(ps)
From Infoware to Infowar	Joseph F. Traub	1996-04-16		(ps)
OzCare: A Workflow Automation System for Care Plans	Wenke Lee, Gail E. Kaiser, Paul D. Clayton, Eric H Sherman	1996-03-29		(ps)
Incremental Process Support for Code Reengineering: An Update (Experience Report)	Gail E. Kaiser, George T. Heineman, Peter D. Skopp, Jack J. Yang	1996-02-15	{\em Componentization} is an important, emergingapproach to software modernization whereby a stovepipe system isrestructured into components that can be reused in other systems.More significantly from the system maintenance perspective, selectedcomponents in the original system can be completely replaced, e.g.,the database or user interface. In some cases, a new architecture canbe developed, for example to convert a monolithic system to theclient/server paradigm, and the old components plugged into placealong with some new ones. We update a 1994 publication in thisconference series, where we proposed using process modeling andenactment technology to support both construction of systems fromcomponents and re-engineering of systems to permit componentreplacement. This paper describes our experience following thatapproach through two generations of component-oriented process models.	(ps)
An Architecture for Integrating OODBs with WWW	Jack Jingshuang Yang, Gail E. Kaiser	1996-02-13		(ps)
Concurrency-Oriented Optimization for Low-Power Asynchronous Systems	Luis A. Plana, Steven M. Nowick	1996-01-20	We introduce new architectural optimizations forasynchronous systems. These optimizations allow application ofvoltage scaling, to reduce power consumption while maintaining systemthroughput. In particular, three new asynchronous sequencer designsare introduced, which increase the concurrent activity of the system.We show that existing datapaths will not work correctly at theincreased level of concurrency. To insure correct operation, modifiedlatch and multiplexer designs are presented, for both dual-rail andsingle-rail implementations. The increased concurrency allows theopportunity for substantial system-wide power savings throughapplication of voltage scaling.	(ps)
A 3-level Atomicity Model for Decentralized Workflow Management Systems	Israel Z. Ben-Shaul, George T. Heineman	1996-01-16	Decentralized workflow management systems (WFMSs) provide an architecture for multiple, heterogeneous WFMSs to interoperate. Atomicity is a standard correctness model for guaranteeing that a set of operations occurs as an atomic unit, or none of them occur at all. Within a single WFMS, atomicity is the concern of its transaction manager. In a decentralized environment, however, the autonomous transaction managers must find ways to cooperate if an atomic unit is split between multiple WFMSs. This paper describes a flexible atomicity model that enables workflow administrators to specify the scope of multi-site atomicity based upon the desired semantics of multi-site tasks in the decentralized WFMS.	(ps)
Automatic Generation of RBF Networks	Shayan Mukherjee, Shree K. Nayar	1995-01-01	Learning can be viewed as mapping from an input space to an output space. Examples of these mappings are used to construct a continuous function that approximates the given data and generalizes for intermediate instances. Radial basis function (RBF) networks are used to formulate this approximating function. A novel method is introduced that automatically constructs a RBF network for a given mapping and error bound. This network is shown to be the smallest network within the error bound for the given mapping. The integral wavelet transform is used to determine the parameters of the network. Simple one-dimensional examples are used to demonstrate how the network constructed using the transform is superior to one constructed using standard ad hoc optimization techniques. The paper concludes with the automatic generation of a network for a multidimensional problem, namely, object recognition and pose estimation. The results of this application are favorable.	(ps)
Integrating Groupware Activities into Workflow Management Systems	Israel Z. Ben-Shaul, Gail E. Kaiser	1995-01-01	Computer supported cooperative work (CSCW) has been recognized as a crucial enabling technology for multi-user computer-based systems, particularly in cases where synchronous human-human interaction is required between geographically dispersed users. Workflow is an emerging technology that supports complex business processes in modern corporations by allowing to explicitly define the process, and by supporting its execution in a workflow management system (WFMS). Since workflow inherently involves humans carrying out parts of the process, it is only natural to explore how to synergize these two technologies. We analyze the relationships between groupware and workflow management, present our general approach to integrating synchronous groupware tools into a WFMS, and conclude with an example process that was implemented in the WFMS and integrated such tools. Our main contribution lies in the integration and synchronization of individual groupware activities into modeled workflow processes, as opposed to being a built-in part of the workflow WFMS.	(ps)
Coerced Markov Models for Cross-lingual Lexical-Tag Relations	Pascale Fung, Dekai Wu	1995-01-01	We introduce the {\it Coerced Markov Model\/} (CMM) to model the relationship between the lexical sequence of a source language and the tag sequence of a target language, with the objective of constraining search in statistical transfer-based machine translation systems. CMMs differ from Hidden Markov Models in that state sequence assignments can take on values coerced from external sources. Given a Chinese sentence, a CMM can be used to predict the corresponding English tag sequence, thus constraining the English lexical sequence produced by a translation model. The CMM can also be used to score competing translation hypotheses in N-best models. Three fundamental problems for CMM designed are discussed. Their solutions lead to the training and testing stages of CMM.	(ps)
Generalization of Band-Joins and the Merge/Purge Problem	Mauricio A. Hernandez	1995-01-01	Many commercial organizations routinely gather large numbers of databases for various marketing and business analysis functions. The task is to correlate information from different databases by identifying distinct individuals that appear in a number of different databases typically in an inconsistent and often incorrect fashion. The problem we study here is the task of merging data from multiple sources in as efficient manner as possible, while maximizing the accuracy of the result. We call this the m merge/purge problem. The key to successful merging requires a means of identifying "equivalent" data from diverse sources. The determination that two pieces of information are equivalent, and that they represent some aspect of the same domain entity, depends on sophisticated inference techniques and knowledge of the domain. We introduce the use of a rule program that declaratively specifies an "equational theory" for this purpose. Furthermore, the number and size of the data sets involved may be so large that parallel and distributed computing systems may be the only hope for achieving accurate results in a reasonable amount of time with acceptable cost. In this paper we detail the "sorted neighborhood" method that is used by some commercial organizations to solve merge/purge and present experimental results that demonstrates this approach works well in practice but at great expense. An alternative method based upon clustering is also presented with a comparative evaluation to the sorted neighborhood method. We show a means of improving the accuracy of the results based upon a "multi-pass" approach that succeeds by computing the Transitive Closure over the results of independent runs considering alternative primary key attributes in each pass. The moral is that a multiplicity of "cheap" passes produces higher accuracy than one "expensive" pass over the data.	(ps)
The gigabit per second Isochronet switch	Danilo Florissi, Yechiam Yemini	1995-01-01	ABSTRACT = { This paper overviews an electronic design and implementation of a scaleable gigabit per second multi-protocol switch based on the Isochronets high-speed network architecture. Isochronets do not require frame header processing in the switches and, consequently, they are scaleable with link speeds, support multiple frame structures, and are suitable for all-optical implementations. The electronic switch has low cost and uses simple off-the-shelf components. The switch interface is simple and provides novel services such as guaranteed Quality of Service (QoS) and propagation of synchronization signals to upper protocol layers, including applications. The switch allows the negotiation of QoS while promoting flexible resource sharing. This paper also proposes an all-optical switch design that can be realized with current optical devices. The modular switch designs are scaleable in number of nodes and link speed. Using faster implementation technology, the electronic design can reach scores of gigabits per second, while the all-optical design can potentially operate at terabits per second.	(ps)
An Overview of the Isochronets architecture for high speed networks	Yechiam Yemini, Danilo Florissi	1995-01-01	This paper overviews the novel Route Switching (RS) technique. In RS, network nodes switch different routing trees over time. Traffic moves along routing trees to the roots during the tree activation period or band. Contrary to circuits, routes do not change frequently, allowing RS to pre-established routes for prolonged network operations. Contrary to packets, routes can be switched over time without frame header processing, allowing RS to scale to any network speed and to be implemented using all-optical technology. RS provide flexible quality of service control and multicasting through allocation of bands to trees. They can be tuned to span a spectrum of performance behaviors, outperforming both Circuit or Packet Switching, as evidenced by the performance study in this paper. RS is the basis for the novel Isochronets high-speed network architecture. Isochronets operate at the media-access layer and can support multiple framing protocols simultaneously. Internetworking is reduced to a simple media-layer bridging. A gigabit-per-second Isochronet switch prototype has been implemented to demonstrate the feasibility of RS.	(ps)
Integrating, Customizing and Extending Environments with a Message-Based Architecture	John E. Arnold, Steven S. Popovich	1995-01-01	Message-based architectures have typically been used for integrating an engineer's set of tools as in FIELD and SoftBench. This paper presents our experience using a message-based architecture to integrate complex, multi-user environments. Where this style of control integration has been effective for encapsulating independent tools within an environment, we show that these techniques are also useful for integrating environments themselves. Our experience comes from our integration of two types of process-centered software development environments: a groupware application that implements a Fagan-style code inspection process and a software development process environment where code inspection is a single step in the overall process. We use a message-based mechanism to federate the two process engines such that the two process formalisms complement rather than compete with each other. Moreover, we see that the two process engines can provide some synergy when used in a single, integrated software process environment, Specifically, the integrated environment uses the process modeling and enactment services of one process engine to customize and extend the code inspection process implemented in a different process engine. The customization and extension of the original collaborative application was accomplished without modifying the application. This was possible because the integration mechanism was designed for multi-user, distributed evironments and encouraged the use of an environment's services by other environments. The results of our study indicate that the message-based architecture originally conceived for tool-oriented control integration is equally well-suited for environment integration.	(ps)
What is the complexity of solution-restricted operator equations?	Arthur G. Werschulz	1995-01-01	We study the worst case complexity of operator equations $Lu=f$, where $L\colon G\to X$ is a bounded linear injection, $G$ is a Hilbert space, and $X$ is a normed linear space. Past work on the complexity of such problems has generally assumed that the class~$F$ of problem elements~$f$ to be the unit ball of~$X$. However, there are many problems for which this choice of~$F$ yields unsatisfactory results. Mixed elliptic-hyperbolic problems are one example, the difficulty being that our technical tools are not strong enoguh to give good complexity bounds. Ill-posed problems are another example, because we know that the complexity of computing finite-error approximations is infinite if $F$ is a ball in~$X$. In this paper, we pursue another idea. Rather than directly restrict the class~$F$ of problem elements~$f$, we will consider problems that are solution-restricted, i.e., we restrict the class$~U$ of solution elements~$u$. In particular, we assume that~$U$ is the unit ball of a Hilbert space~$W$ continuously embedded in~$G$. The main idea is that our problem can be reduced to the standard {\it approximation problem\/} of approximating the embedding of $W$ into~$G$. This allows us to characterize optimal information and algorithms for our problem. Then, we consider specific applications. The first application we consider is any problem for which $G$ and~$W$ are standard Sobolev Hilbert spaces; we call this the ``standard problem'' since it includes many problems of practical interest. We show that finite element information and generalized Galerkin methods are nearly optimal for standard problems. We then look at elliptic boundary-value problems, Fredholm integral equations of the second kind, the Tricomi problem (a mixed hyperbolic-elliptic problem arising in the study of transonic flow), the inverse finite Laplace transform, and the backwards heat equation. (Note that with the exception of the backwards heat equation, all of these are standard problems. Moreover, the inverse finite Laplace transform and the backwards heat equation are ill-posed problems.) We determine the problem complexity and derive nearly optimal algorithms for all these problems.	(ps)
Adapting Materialized Views after Redefinitions	Ashish Gupta, Inderpal S. Mumick, Kenneth A. Ross	1995-01-01	We consider a variant of the view maintenance problem: How does one keep a materialized view up-to-date when the view definition itself changes? Can one do better than recomputing the view from the base relations? Traditional view maintenance tries to maintain the materialized view in response to modifications to the base relations; we try to ``adapt'' the view in response to changes in the view definition. Such techniques are needed for applications where the user can change queries dynamically and see the changes in the results fast. Data archaeology, data visualization, and dynamic queries are examples of such applications. We consider all possible redefinitions of SQL Select-From-Where-Groupby-Having, Union, and Except views, and show how these views can be adapted using the old materialization for the cases where it is possible to do so. We identify extra information that can be kept with a materialization to facilitate redefinition. Multiple simultaneous changes to a view can be handled without necessarily materializing intermediate results. We identify guidelines for users and database administrators that can be used to facilitate efficient view adaptation.	(ps)
A Tabular Approach to Transition Signaling	Stephen H. Unger	1995-01-01	Transition signaling is a popular technique used in the design of unclocked digital systems. A number of basic modules have been developed by various researchers for use in such systems, and they have presented ingenious circuits for implementing them. In this paper, a method is presented for describing the behavior of such modules clearly and concisely by means of transition flow tables. These tables can be used to generate Huffman type primitive flow tables. All of the formal (and informal) techniques developed for the design of asynchronous sequential logic circuits can then be applied to develop implementations with attention to problems of hazards, metastability, and the like. These transition tables may also be useful for describing larger subsystems.	(ps)
Interfacing Oz with PCTE OMS	Wenke Lee, Gail Kaiser	1995-01-01	This paper details our experiment interfacing Oz with the Object Management System (OMS) of PCTE. Oz is a process-centered multi-user software development environment. PCTE is a specification which defines a language independent interface providing support mechanisms for software engineering environments (SEE) populated with CASE tools. Oz is, in theory, a SEE that can be built (or extended) using the services provided by PCTE. Oz historically has had a native OMS component whereas the PCTE OMS is an open data repository with an API for external software tools. Our experiment focused on changing Oz to use the PCTE OMS. This paper describes how several Oz components were changed in order to make the Oz server interface with the PCTE OMS. The resulting system of our experiment is an environment that has process control and integration services provided by Oz, data integration services provided by PCTE, and tool integration services provided by both. We discusses in depth the concurrency control problems that arise in such an environment and their solutions. The PCTE implementation used in our experiment is the Emeraude PCTE V 12.5.1 supplied by Transtar Software Incorporation.	(ps)
A Theory of Pattern Rejection	Simon Baker, Shree K. Nayar	1995-01-01	The efficiency of pattern recognition is critical when there are a large number of classes to be discriminated, or when the recognition algorithm must be applied a large number of times. We propose and analyze a general technique, namely pattern rejection, that leads to great efficiency improvements in both cases. Rejectors are introduced as algorithms that very quickly eliminate from further consideration, most of the classes or inputs (depending on the setting). Importantly, a number of rejectors may be combined to form a composite rejector, which performs far more effectively than any of its component rejectors. Composite rejectors are analyzed, and conditions derived which guarantee both efficiency and practicality. A general technique is proposed for the construction of rejectors, based on a single assumption about the pattern classes. The generality is shown through a close relationship with the Karhunen-Lo\'{e}ve expansion. Further, a comparison with Fisher's discriminant analysis is included to illustrate the benefits of pattern rejection. Composite rejectors were constructed for two applications, namely, object recognition and local feature detection. In both cases, a substantial improvement in efficiency over existing techniques is demonstrated.	(ps)
A Paradigm for Decentralized Process Modeling and its Realization in the Oz Environment	Israel Z. Ben-Shaul	1995-01-01	This dissertation investigates decentralization of software processes and Process Centered Environments (PCEs), and addresses a wide range of issues concerned with supporting interoperability and collaboration among autonomous and heterogeneous processes, both in their definition and in their execution in possibly physically dispersed PCEs. Decentralization is addressed at three distinct levels of abstraction. The first proposes a generic conceptual model that is both language- and PCE-independent. The second level explores the realization of the model in a specific PCE, Oz, and its rule-based process modeling language. The third level addresses architectural issues in interconnecting autonomous PCEs as a basis for process interoperability. Two key concerns guide this research. The first is maximizing local autonomy, so as not to force a priori any global constraints on the definition and execution of local processes, unless explicitly and voluntarily specified by a particular process instance. The second concern is tailorability, dynamicity and fine-grained control over the degree of interoperability. The essence of the interoperability model lies in two abstraction mechanisms --- Treaty and Summit --- for inter-process definition and execution, respectively. Treaties enable to specify shared sub-processes while retaining the privacy of local sub-processes. To promote autonomy, Treaties are established by explicit and active participation of the involved processes. To promote fine granularity, Treaties are defined pairwise between two collaborating processes and formed over a possibly small sub-process unit, although multi-site Treaties over large shared sub-processes can be constructed, if desired. Finally, Treaties are superimposed on top of pre-existing instantiated processes, enabling their dynamic and incremental establishment and supporting a decentralized bottom-up approach. Summits are the execution counterparts of Treaties. They support ``global'' execution of shared sub-processes involving artifacts and/or users from multiple sites, as well as local execution of private sub-processes. Summits successively alternate between shared and private execution modes, where the former is used for synchronous execution of shared activities, and the latter for autonomous execution of any private subtasks emanating from the shared activities as defined in the local processes.	(ps)
Pattern Matching for Translating Domain-Specific Terms from Large Corpora	Pascale Fung	1995-01-01	Translating domain-specific terms is one significant component of machine translation and machine-aided translation systems. These terms are often not found in standard dictionaries. Human translators, not being experts in every technical or regional domain, cannot produce their translations effectively. Automatic translation of domain-specific terms is therefore highly desirable. Most other work on automatic term translation uses statistical information of words from parallel corpora. Parallel corpora of clean, translated texts are hard to come by whereas there are more noisy, translated texts and many more monolingual texts in various domains. We propose using noisy parallel texts and same-domain texts of a pair of languages to translate terms. In our work, we propose using a novel paradigm of pattern matching of statistical signals of word features. These features are robust to the syntactic structure, character sets, language of the text, and to the domain. We obtain statistical information which is related to the lexical properties of a word and its translation in any other language of the same domain. These lexical properties are extracted from the corpora and represented in vector form. We propose using signal processing techniques for matching these features vectors of a word to those of its translation. Another matching technique we propose is applying discriminative analysis of the word features. For each word, the various features are combined into a single vector which is then transformed into a smaller dimension eigenvector for matching. Since most domain specific terms are nouns and noun phrases, we concentrate on translating English nouns and noun phrases into other languages. We study the relationship between English noun phrases and their translations in Chinese, Japanese and French in parallel corpora. The result of this study is used in our system for translation of English noun phrases into these other languages from noisy parallel and non-parallel corpora.	(ps)
The complexity of the Poisson problem for spaces of bounded mixed derivatives	Art Werschulz	1995-01-01	We are interested in the complexity of the Poisson problem with homogeneous Dirichlet boundary conditions on the $d$-dimensional unit cube~$\Omega$. Error is measured in the energy norm, and only standard information (consisting of function evaluations) is available. In previous work on this problem, the standard assumption has been that the class~$F$ of problem elements has been the unit ball of a Sobolev space of fixed smoothness~$r$, in which case the $\e$-complexity is proportional to $\e^{-d/r}$. Given this exponential dependence on~$d$, the problem is intractable for such classes~$F$. In this paper, we seek to overcome this intractability by allowing $F$ to be the unit ball of a space $\circhrho$ of bounded mixed derivatives, with $\rho$ a fixed multi-index with positive entries. We find that the complexity is proportional to $c(d)(1/\e)^{1/\rhomin}[\ln(1/\e)]^b$, and we give bounds on~$b=b_{\rho,d}$. Hence, the problem is tractable in $1/\e$, with exponent at most~$1/\rhomin$. The upper bound on the complexity (which is close to the lower bound) is attained by a modified finite element method (MFEM) using discrete blending spline spaces; we obtain an explicit bound (with no hidden constants) on the cost of using this MFEM to compute $\e$-approximations. Finally, we show that for any positive multi-index ~$\rho$, the Poisson problem is strongly tractable, and that the MFEM using discrete blended piecewise polynomial splines of degree~$\rho$ is a strongly polynomial time algorithm. In particular, for the case $\rho={\bf 1}$, the MFEM using discrete blended piecewise linear splines produces an $\e$-approximation with cost at most $$0.839262\,(c(d)+2)\left({1\over\e}\right)^{5.07911}.$$	(ps)
Tractable Reasoning in Knowledge Representation Systems	Dalal Mukesh	1995-01-01	This document addresses some problems raised by the well-known intractability of deductive reasoning in even moderately expressive knowledge representation systems. Starting from boolean constraint propagation (BCP), a previously known linear-time incomplete reasoner for clausal propositional theories, we develop {\em fact propagation} (FP) to deal with non-clausal theories, after motivating the need for such an extension. FP is specified using a confluent rewriting systems, for which we present an algorithm that has quadratic-time complexity in general, but is still linear-time for clausal theories. FP is the only known tractable extension of BCP to non-clausal theories; we prove that it performs strictly more inferences than CNF-BCP, a previously-proposed extension of BCP to non-clausal theories. We generalize a refutation reasoner based on FP to a family of sound and tractable reasoners that are ``increasingly complete'' for propositional theories. These can be used for anytime reasoning, i.e., they provide partial answers even if they are stopped prematurely, and the ``completeness'' of the answer improves with the time used in computing it. A fixpoint construction based on FP gives an alternate characterization of the reasoners in this family, and is used to define a transformation of arbitrary theories into logically-equivalent ``vivid'' theories --- ones for which our FP algorithm is complete. Our final contribution is to the description of tractable classes of reasoning problems. Based on FP, we develop a new property, called bounded intricacy, which is shared by a variety of tractable classes that were previously presented, for example, in the areas of propositional satisfiability, constraint satisfaction, and OR-databases. Although proving bounded intricacy for these classes requires domain-specific techniques (which are based on the original tractability proofs), bounded intricacy is one more tool available for showing that a family of problems arising in some application is tractable. As we demonstrate in the case of constraint satisfaction and disjunctive logic programs, bounded intricacy can also be used to uncover new tractable classes.	(ps)
Isochronets: a High-speed Network Switching Architecture (Ph.D. Thesis	Danilo Florissi	1995-01-01	Traditional network architectures present two main limitations when applied to High-Speed Networks (HSNs): they do not scale with link speeds and they do not adequately support the Quality of Service (QoS) needs of high-performance applications. This thesis introduces the Isochronets architecture that overcomes both limitations. Isochronets view frame motions over links in analogy to motions on roads. In the latter, traffic lights can synchronize to create green waves of uninterrupted motion. Isochronets accomplish similar uninterrupted motion by periodically configuring network switches to create end-to-end routes in the network. Frames flow along these routes with no required header processing at intermediate switches. Isochronets offer several advantages. First, they are scaleable with respect to transmission speeds. Switches merely configure routes on a time scale that is significantly longer than and independent of the average frame transmission time. Isochronets do not require frame processing and thus avoid conversions from optical to electronic representations. They admit efficient optical transmissions under electronically controlled switches. Second, Isochronets ensure QoS for high-performance applications in terms of latency, jitter, loss, and other service qualities. Isochronet switches can give priority to frames arriving from selected links. At one extreme, they may give a source the right-of-way to the destination by assigning priority to all links in its path. Additionally, other sources may still transmit at lower priority. At the other extreme, they may give no priority to sources and frames en route to the same destination contend for intermediate links. In between, Isochronets can accomplish a myriad of priority allocations with diverse QoS. Third, Isochronets can support multiple protocols without adaptation between different frame structures. End nodes view the network as a media access layer that accepts frames of arbitrary structure. The main contributions of this thesis are: (1) design of the Isochronets architecture; (2) design and implementation of a gigabit per second Isochronet switch (Isoswitch); (3) definition of the Loosely-synchronous Transfer Mode (LTM) and the Synchronous Protocol Stack (SPS) that add synchronous and isochronous services to any existing protocol stack; and (4) performance evaluation of Isochronets.	(ps)
Enveloping Sophisticated Tools into Process-Centered Environments	Giuseppe Valetto, Gail E. Kaiser	1995-01-01	We present a tool integration strategy based on enveloping pre-existing tools without source code modifications or recompilation, and without assuming an extension language, application programming interface, or any other special capabilities on the part of the tool. This Black Box enveloping (or wrapping) idea has existed for a long time, but was previously restricted to relatively simple tools. We describe the design and implementation of, and experimentation with, a new Black Box enveloping facility intended for sophisticated tools --- with particular concern for the emerging class of groupware applications.	(ps)
The CORD approach to extensible concurrency control	George Heineman, Gail E. Kaiser	1995-01-01	Database management systems (DBMSs) have been increasingly used for advanced application domains, such as software development environments, network management, workflow management systems, computer-aided design and manufacturing, and managed healthcare. In these domains, the standard correctness model of serializability is often too restrictive. We introduce the notion of a Concurrency Control Language (CCL) that allows a database application designer to specify concurrency control policies to tailor the behavior of a transaction manager. A well-crafted set of policies defines an extended transaction model. The necessary semantic information required by the CCL run-time engine is extracted from a task manager, a (logical) module by definition included in all advanced applications. This module stores task models that encode the semantic information about the transactions submitted to the DBMS. We have designed a rule-based CCL, called CORD, and have implemented a run-time engine that can be hooked to a conventional transaction manager to implement the sophisticated concurrency control required by advanced database applications. We present an architecture for systems based on CORD and describe how we integrated the CORD engine with the Exodus Storage Manager to implement Altruistic Locking.	(ps)
Parametric feature detection	Shree K. Nayar, Simon Baker, Hiroshi Murase	1995-01-01	A large number of visual features are parametric in nature, including, edges, lines, corners, and junctions. We present a general framework for the design and implementation of detectors for parametrized features. For robustness, we argue in favor of elaborate modeling of features as they appear in the physical world. In addition, optical and sensing artifacts are incorporated to achieve realistic feature models in image domain. Each feature is represented as a densely sampled parameterized manifold in a low-dimensional subspace. During detection, the brightness distribution around each image pixel is projected to the subspace. If the projection lies close to the feature manifold, the exact location of the closest manifold point reveals the parameters of the feature. The concepts of parameter reduction by normalization, dimension reduction, pattern rejection, and efficient search are employed to achieve high efficiency. Detectors have been implemented for five specific features, namely, step edge (5 parameters), roof edge (5 parameters), line (6 parameters), corner (5 parameters), and circular disc (6 parameters). All five of these detectors were generated using the same technique by simply inputing different feature models. Detailed experiments are reported on the robustness of detection and the accuracy of parameter estimation. In the case of the step edge, our results are compared with those obtained using popular detectors. We conclude with a brief discussion on the use of relaxation to refine outputs from multiple feature detectors, and sketch a hardware architecture for a general feature detection machine.	(ps)
Playback and Jitter Control for Real-Time Video-conferencing	Sanjay K. Jha	1995-01-01	The purpose of this report is to examine the problems associated with display (playback) of live continuous media under varying conditions for an internetwork of workstations running general purpose operating system such as Unix. Under the assumption that the network cannot guarantee the required bounds on delay and jitter, there is a need to accommodate the delay jitter in the end systems. Various methods of jitter smoothing at the end systems and their suitability to audio as well as video transmission have been discussed. An experimental test bed which was used to investigate these problems is described. Some initial empirical results obtained using this test bed are also presented.	(ps)
An Interoperability Model for Process-Centered Software Engineering Environments and its Implementation in Oz	Israel Z. Ben-Shaul, Gail E. Kaiser	1995-01-01	A process-centered software engineering environment (PSEE) enables to model, evolve, and enact the process of software development and maintenance. This paper addresses the problem of process-interoperability among decentralized and autonomous PSEEs by presenting the generic International Alliance model, which consists of two elements, namely Treaty and Summit. The Treaty abstraction allows pairwise peer-peer definition of multi-site shared sub-processes that are integrated inside each of the participating sites, while retaining the definition- and evolution-autonomy of non-shared local sub-processes. Summits are the execution abstraction for Treaty-defined sub-processes. They enact Treaty sub-processes in multiple sites by successively alternating between shared and private execution modes: the former is used for the synchronous execution of the shared activities, and the latter is used for the autonomous execution of any private subtasks emanating from the shared activities. We describe the realization of the models in the Oz multi-site PSEE and evaluate the models and system based on experience gained from using Oz for production purposes. We also consider the application of the model to Petri net-based and grammar-based PSEEs.	(ps)
QoSME: QoS Management Environment (Ph.D. Thesis	Patricia Gomes Soares Florissi	1995-01-01	Distributed multimedia applications are sensitive to the Quality of Service (QoS) delivered by underlying communication networks. For example, a video conference exchange can be very sensitive to the effective network throughput. Network jitter can greatly disrupt a speech stream. The main question this thesis addresses is how to adapt multimedia applications to the QoS delivered by the network and vice versa. Such adaptation is especially important because current networks are unable to assure the QoS required by applications and the latter is usually unprepared for periods of QoS degradation. This work introduces the QoS Management Environment (QoSME) that provides mechanisms for such adaptation. The main contributions of this thesis are: \begin{itemize} \item Language level abstractions for QoS management. The Quality Assurance Language (QuAL) in QoSME enables the specification of how to allocate, monitor, analyze, and adapt to delivered QoS. Applications can express in QuAL their QoS needs and how to handle potential violations. \item Automatic QoS monitoring. QoSME automatically generates the instrumentation to monitor QoS when applications use QuAL constructs. The QoSME runtime scrutinizes interactions among applications, transport protocols, and Operating Systems (OS) and collects in QoS Management Information Bases (MIBs) statistics on the QoS delivered. \item Integration of QoS and standard network management. A Simple Network Management Protocol (SNMP) agent embedded in QoSME provides QoS MIB access to SNMP managers. The latter can use this feature to monitor end-to-end QoS delivery and adapt network resource allocation and operations accordingly. \end{itemize} A partial prototype of QoSME has been released for public access. It runs on SunOS 4.3 and Solaris 2.3 and supports communication on ATM adaptation layer, ST-II, UDP/IP, TCP/IP, and Unix internal protocols.	(ps)
Estimating an Eigenvector by the Power Method with a Random Start	Gianna M. Del Corso	1995-01-01	This paper addresses the problem of approximating an eigenvector belonging to the largest eigenvalue of a symmetric positive definite matrix by the power method. We assume that the starting vector is randomly chosen with uniform distribution over the unit sphere. This paper provides lower and upper as well as asymptotic bounds on the randomized error in the ${\cal L}_p$ sense, $p\in[1,+\infty]$. We prove that it is impossible to achieve bounds that are independent of the ratio between the two largest eigenvalues. This should be contrasted to the problem of approximating the largest eigenvalue for which Kuczy\'nski and Wo\'zniakowski in 1992 proved that it is possible to bound the randomized error at the $k$-th step with a quantity that depends only on $k$ and on the size of the matrix. We prove that the rate of convergence depends on the ratio of the two largest eigenvalues, on their multiplicities, and on the particular norm. The rate of convergence is at most linear in the ratio of the two largest eigenvalues.	(ps)
Topological visual navigation in large environments	Il-Pyung Park, John R. Kender	1994-01-01	In this paper, we investigate a new model for robot navigation in large unstructured environments. Our model consists of two parts, the map-maker and the navigator. Given a source and a goal, the map-maker derives a navigational path based on the topological relationships between landmarks. A navigational path is generated as a combination of ``parkway'' and ``trajectory'' paths, both of which are abstractions of the real world into topological data structures. Traversing within a parkway enables the navigator to follow visible landmarks. Traversing on a trajectory enables the navigator to move reliably towards a target, based on shapes formed by visible landmarks. Error detection and error recovery routines are also encoded into the path segments. The optimal path is further abstracted into a ``custom map,'' which consists of a list of directional instructions, the vocabulary of which is defined by our environmental description language. Based on the custom map generated by the map-maker, the navigating robot looks for events that are characterized by spatial properties of the environment. The map-maker and navigator are implemented using two cameras, an IBM 7575 robot arm, and a PIPE (Pipelined Image Processing Engine.)	(ps)
Error Detection and Recovery in Two Dimensional Topological Navigation	Il-Pyung Park, John R. Kender	1994-01-01	In this paper we describe error detection and error recovery methods applicable to navigation in large scale unstructured environmental navigation. We relax the assumption of error-free following of topological landmarks; the navigator is ``permitted'' to make mistakes during its journey. The error detection method involves the navigator observing its immediate surrounding and checking for one of several types of disparities. The error recovery method is based on a simple fixed set of movements which is triggered by the navigator's local observation. Alternately described, this work characterizes those environments in which robust topological navigation is possible, including those landmarks which, literally, ``can't be missed''. These methods have been implemented on our qualitative environmental navigation system consisting of a camera mounted IBM 7575 robot arm.	(ps)
Qualtative Environmental Navigation: Theory and Practice	Il-Pyung Park	1994-01-01	In this thesis we propose and investigate a new model for robot navigation in large unstructured environments. Current models which depend on metric information contain inherent mechanical and sensory uncertainties. Instead we supply the navigator with qualitative information. Our model consists of two parts, the map-maker and the navigator. Given a source and a goal, the map-maker derives a navigational path based on the topological relationships between landmarks. A navigational path is generated as a combination of ``parkway'' and ``trajectory'' paths, both of which are abstractions of the real world into topological data structures. Traversing within a parkway enables the navigator to follow visible landmarks. Traversing on a trajectory enables the navigator to move reliably into a homogeneous space, based on shapes formed by visible landmarks that are robust to positional and orientational errors. Reliability measures of parkway and trajectory traversals are defined by appropriate error models that account for the sensory errors of the navigator, the motor errors of the navigator, and the population of neighboring objects. Error detection and error recovery methods are also encoded into the generated path. The optimal path is further abstracted into a ``custom map,'' which consists of a list of verbal directional instructions, the vocabulary of which is defined by our environmental description language. Based on the custom map generated by the map-maker, the navigating robot looks for events that are characterized by spatial properties of the environment. The map-maker and the navigator are implemented using two cameras, an IBM 7575 robot arm and PIPE (Pipelined Image Processing Engine.) Various experiments show the effectiveness of navigation ``in the large'' using the proposed methods.	(ps)
Expanding the Repertoire of Process-based Tool Integration	Giuseppe Valetto	1994-01-01	The purpose of this thesis is to design and implement a new protocol for tool enveloping, in the context of the Oz Process Centered Environment. This new part of the system would be complementary to the already existing Black Box protocol for Oz and would deal with additional families of tools, whose character would be better serviced by a different approach, providing enhanced flexibility and a greater amount of interaction between the human operator, the tools and the environment during the execution of the wrapped activities. To achieve this, the concepts of persistent tool platforms, tool sessions and transaction-like activities will be introduced as the main innovative features of the protocol. We plan to be able to encapsulate and service conveniently classes of tools such as interpretive systems, databases, medium and large size applications that allow for incremental binding of parameters and partial retrieving of results, and possibly multi-user tools. Marginal modification and upgrading of the Oz general architecture and components will necessarily be performed.	(ps)
A flexible rule-based process engine	A. Z. Tong, Pr. G Kaiser	1994-01-01	The @marvel process-centered environment (PCE) modeled knowledge of the software development process in the form of rules, each representing a process step with its prerequisites and consequences. @marvel enforced that prerequisites were satisfied and used backward and forward chaining over the rule base to automate tool invocations. In order to support multiple users, the process engine ensured atomicity of those rule chains annotated as transactions. It has recently become clear, however, that these three modes of process assistance are not sufficient for all projects and all users. Guidance to users regarding the process steps they can perform, monitoring of divergences from the prerequisites and consequences of a process, delegation of process steps to other users, collaboration among multiple users working together on a step (or a process fragment), planning of a process in advance, simulation of processes for training or analysis purposes, and instrumentation of processes for statistics gathering may also be desirable, and there are probably other modes that we haven't considered. Yet neither @marvel nor any other existing PCE supports more than a handful of built-in process assistance modes. We present the design of the new @amber system, which will generalize @marvel and other rule-based PCEs to support enforcement, automation, atomicity, guidance, monitoring, delegation, collaboration, planning, simulation, instrumentation and potentially other applications. In particular, @amber will be fully knowledge-based, tailored by knowledge regarding the @i(process assistance policies) to be supported as well as the @i(process model)	(ps)
Large Flow Trees: a Lower Bound Computation Tool for Network Optimization	Bulent Yener	1994-01-01	This paper presents a new method for computing the lower bounds for multihop network design problems which is particularly well suited to lightwave networks. The lower bound can be computed in time polynomial in the network size. Consequently, the results in this work yields a tool which can be used in (i) evaluating the quality of heuristic design algorithms, and (ii) determining a termination criteria during minimization. More specifically, given $N$ stations each with at most $d$ transceivers, and pairwise average traffic values of the stations, the method provides a lower bound for the combined problem of finding optimum (i) allocation of wavelengths to the stations (configuration), and (ii) routing of the traffic on this configuration while minimizing {\em congestion} (i.e. minimizing the maximum flow on any link). The lower bound computation is based on first building {\em flow trees} to find a lower bound on the total flow, and then distributing the total flow over the links to minimize the congestion.	(ps)
QuAL: Quality Assurance Language	Patricia Gomes Soares Florissii	1994-01-01	Distributed multimedia applications are sensitive to the Quality of Services (QoS) provided by their computing and communication environment. For example, scheduling of processing activities or network queueing delays may cause excessive jitter in a speech stream, rendering it difficult to understand. It is thus important to establish effective technologies to ensure delivery of QoS required by distributed multimedia applications. This proposal presents a new language for the development of distributed multimedia applications: Quality Assurance Language (QuAL). QuAL abstractions allow the specification of QoS constraints expected from the underlying computing and communication environment. QuAL specifications are compiled into run time components that monitor the actual QoS delivered. Upon QoS violations, application provided exception handlers are signaled to act upon the faulty events. Language level abstractions of QoS shelter programs from the heterogeneity of underlying infrastructures. This simplifies the development and maintenance of multimedia applications and promotes their portability and reuse. QuAL generates Management Information Bases (MIBs) that contain QoS statistics per application. Such MIBs may be used to integrate application level QoS management into standard network management frameworks.	(ps)
Process Support for Incremental Code Re-engineering	George T. Heineman, Gail E. Kaiser	1994-01-01	Reengineering a large code base can be a monumental task, and the situation becomes even worse if the code is concomitantly being modified. For the past two years, we have been using the Marvel process centered environment (PCE) for all of our software development and are currently using it to develop the Oz PCE (Marvel's successor). Towards this effort, we are reengineering Oz's code base to isolate the process engine, transaction manager, and object management system as separate components that can be mixed and matched in arbitrary systems. In this paper, we show how a PCE can guide and assist teams of users in carrying out code reengineering while allowing them to continue their normal code development. The key features to this approach are its incremental nature and the ability of the PCE to automate most of the tasks necessary to maintain the consistency of the code base. Key words: Process Centered Environments, Componentization	(ps)
Better Semijoins Using Tuple Bit-Vectors	Zhe Li, Kenneth A. Ross	1994-01-01	This paper presents the idea of ``tuple bit-vectors'' for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially ``free'' backward reduction capability is proposed. We explore in detail the benefits and costs of 2SJ++ compared with other semijoin variants, and its effect on distributed query processing performance. We then focus on one particular distributed query processing algorithm, called the ``one-shot'' algorithm. We modify the one-shot algorithm by using 2SJ++ and demonstrate the improvements achieved in network transmission cost compared with the original one-shot technique. We use this improvement to demonstrate that equipped with the 2SJ++ technique, one can improve the performance of distributed query processing algorithms significantly without adding much complexity to the algorithms.	(ps)
Hamiltonian Decompositions of Regular Topology Networks for Convergence Routing	Bulent Yener, Terry Boult, Yoram Ofek	1994-01-01	This paper introduces embeddings of multiple into the hypercube and the circulant networks such that each virtual ring is hamiltonian, and the rings are mutually edge-disjoint. It is shown that multiple virtual rings improve (i) the bound on the length of routing, and (ii) the fault tolerance. The hamiltonian virtual rings are studied on the hypercube and the circulant graphs. On the circulant graphs, necessary and sufficient conditions for hamiltonian decomposition is established. On the hypercube three algorithms are designed for an hypercube with even dimension: (i) an O(N) time algorithm to find two edge-disjoint hamiltonian circuits, (ii) an $O(N \log N)$ time algorithm to find $\frac{\log N}{2}$ hamiltonian circuits with only epsilon leq $0.1$ common edges, and (iii) a recursive algorithm for the hamiltonian decomposition of the hypercube with dimension power of two. The routing algorithm on multiple virtual rings is which combines the actual routing decision with the internal flow control state. It is shown analytically, and verified by simulations on the circulants that with the d virtual ring embeddings, a bound of O(N/d) is established on the maximum length of routing.	(pdf) (ps)
Integrating a Transaction manager component with Process WeaverBetter	George T. Heineman, Gail E. Kaiser	1994-01-01	This paper details our experience using Process Weaver. We have been using Process Weaver for two distinct purposes. The first concerns a set of experiments we are performing to integrate a transaction manager component, called PERN with Process Weaver. The second is a prototype system for automatic translation of process modeling formalisms. We have already developed a compiler that translates a process model, created using the SEI-developed Statemate-approach, into a Marvel environment. We are currently designing a way of integrating this compiler with Process Weaver to take advantage of the best features of these three powerful applications.	(ps)
A Comparative Study of Divergence Control Algorithms	Akira Kawaguchi, Kui Mok, Calton Pu, Kun-Lung Wu, Philip S. Yu	1994-01-01	This paper evaluates and compares the performance of two-phase locking divergence control (2PLDC) and optimistic divergence control (ODC) algorithms using a comprehensive centralized database simulation model. We examine a system with multiclass workloads in which on-line update transactions and long-duration queries progress based on epsilon serializability (ESR). Our results demonstrate that significant performance enhancements can be achieved with a non-zero tolerable inconsistency ({\mbox{\Large$\epsilon$}}-spec). With sufficient {\mbox{\Large$\epsilon$}}-spec and limited system resources, both algorithms achieve comparable performance. However, with low resource contention, ODC performs significantly better than 2PLDC. Moreover, given a small {\mbox{\Large$\epsilon$}}-spec, ODC returns more accurate results on the committed queries then 2PLDC.	(ps)
A New Client-Server Architecture for Distributed Query Processing	Zhe Li, Kenneth A. Ross	1994-01-01	This paper presents the idea of ``tuple bit-vectors'' for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially ``free'' backward reduction capability is proposed. We explore in detail the benefits and costs of 2SJ++ compared with other semijoin variants, and its effect on distributed query processing performance. We then focus on one particular distributed query processing algorithm, called the ``one-shot'' algorithm. We modify the one-shot algorithm by using 2SJ++ and demonstrate the improvements achieved in network transmission cost compared with the original one-shot technique. We use this improvement to demonstrate that equipped with the 2SJ++ technique, one can improve the performance of distributed query processing algorithms significantly without adding much complexity to the algorithms.	(ps)
Statistical Augmentation of a {C}hinese Machine-Readable Dictionary	Pascale Fung, Dekai Wu	1994-01-01	We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.	(ps)
The complexity of multivariate elliptic problems with analytic data	Arthur G. Werschulz	1994-01-01	Let $F$ be a class of functions defined on a $d$-dimensional domain. Our task is to compute $H^m$-norm $\e$-approximations to solutions of $2m$th-order elliptic boundary-value problems $Lu=f$ for a fixed~$L$ and for $f\in F$. We assume that the only information we can compute about $f\in F$ is the value of a finite number of continuous linear functionals of~$f$, each evaluation having cost~$c(d)$. Previous work has assumed that $F$ was the unit ball of a Sobolev space $H^r$ of fixed smoothness~$r$, and it was found that the complexity of computing an $\e$-approximation was $\comp(\e,d)=\Theta(c(d)(1/\e)^{d/(r+m)})$. Since the exponent of $1/\e$ depends on~$d$, we see that the problem is intractable in~$1/\e$ for any such $F$ of fixed smoothness~$r$. In this paper, we ask whether we can break intractability by letting $F$ be the unit ball of a space of infinite smoothness. To be specific, we let $F$ be the unit ball of a Hardy space of analytic functions defined over a complex $d$-dimensional ball of radius greater than one. We then show that the problem is tractable in~$1/\e$. More precisely, we prove that $\comp(\e,d)=\Theta(c(d)(\ln 1/\e)^d)$, where the $\Theta$-constant depends on~$d$. Since for any $p>0$, there is a function $K(\cdot)$ such that $\comp(\e,d)\le c(d) K(d)(1/\e)^p$ for sufficiently small~$\e$, we see that the problem is tractable, with (minimal) exponent~$0$. Furthermore, we show how to construct a finite element $p$-method (in the sense of Babu\u{s}ka) that can compute an $\e$-approximation with cost $\Theta(c(d)(\ln 1/\e)^d)$. Hence this finite element method is a nearly optimal complexity algorithm for $d$-dimensional elliptic problems with analytic data.	(ps)
Protocols for Loosely Synchronous Networks	Danilo Florissi , Yechiam Yemini	1994-01-01	This paper overviews a novel transfer mode for B-ISDN: Loosely-synchronous Transfer Mode (LTM). LTM operates by signaling periphery nodes when destinations become available. No frame structure is imposed by LTM, thus avoiding adaptation layers. Additionally, LTM can deliver a spectrum of guaranteed quality of services. New Synchronous Protocol Stacks (SPSs) build on LTM by synchronizing their activities to LTM signals. Such signals can be delivered directly to applications that may synchronize its operations to transmissions, thus minimizing buffering due to synchronization mismatches. SPSs can use current transport protocols unchanged and, potentially, enhance them with the real-time capabilities made possible through LTM.	(ps)
An Architecture for Integrating Concurrency Control into Environment Frameworks	George T. Heineman, Gail E. Kaiser	1994-01-01	Research in layered and componentized systems shows the benefit of dividing the reponsibility of services into separate components. It is still an unresolved issue, however, how a system can be created from a set of existing (independently developed) components. This issue of integration is of immense concern to software architects since a proper solution would reduce duplicate implementation efforts and promote component reuse. In this paper we take a step towards this goal within the domain of software development environments (SDEs) by showing how to integrate an external concurrency control component, called PERN, with an environment framework. We discuss two experiments of integrating PERN with OZ, a decentralized process centered environment, and ProcessWEAVER, a commercial process server. We introduce an architecture for retrofitting an external concurrency control component into an environment. Keywords: Componentization, Transactions, Software Architecture, Collaborative Work	(ps)
Management of Application Quality of Service	Patricia Gomes Soares Florissi , Yechiam Yemini	1994-01-01	This paper proposes a new language for the development of distributed multimedia applications: Quality Assurance Language (QuAL). QuAL abstractions allow the specification of Quality of Service (QoS) constraints expected from the underlying computing and communication environment. QuAL specifications are compiled into run time components that monitor the actual QoS delivered. Upon QoS violations, application provided exception handlers are signaled to act upon the faulty events. Language level abstractions of QoS shelter programs from the heterogeneity of underlying infrastructures. This simplifies the development and maintenance of mul- timedia applications and promotes their portability and reuse. QuAL generates Management Information Bases (MIBs) that contain QoS statistics per application. Such MIBs may be used to integrate application level QoS management into standard network management frameworks.	(ps)
Computing High Dimensional Integrals with Applications to Finance	Spassimir H. Paskov	1994-01-01	High-dimensional integrals are usually solved with Monte Carlo algorithms although theory suggests that low discrepancy algorithms are sometimes superior. We report on numerical testing which compares low discrepancy and Monte Carlo algorithms on the evaluation of financial derivatives. The testing is performed on a Collateralized Mortgage Obligation (CMO) which is formulated as the computation of ten integrals of dimension up to 360. We tested two low discrepancy algorithms (Sobol and Halton) and two randomized algorithms (classical Monte Carlo and Monte Carlo combined with antithetic variables). We conclude that for this CMO the Sobol algorithm is always superior to the other algorithms. We believe that it will be advantageous to use the Sobol algorithm for many other types of financial derivatives. Our conclusion regarding the superiority of the Sobol algorithm also holds when a rather small number of sample points are used, an important case in practice. We built a software system which runs on a network of heterogeneous workstations under PVM 3.2 (Parallel Virtual Machine). Since workstations are ubiquitous, this is a cost-effective way to do very large computations fast. The measured speedup is at least .9N for N workstations, $N \leq 25$. The software can also be used to compute high dimensional integrals on a single workstation.	(ps)
Predictive Dynamic Load Balancing of Parallel and Distributed Rule and Query Processing	Hasanat M. Dewan, Salvatore J. Stolfo, Mauricio A. Hernandez, Jae-Jun Hwang	1994-01-01	Expert Databases are environments that support the processing of rule programs against a disk resident database. They occupy a position intermediate between active and deductive databases, with respect to the level of abstraction of the underlying rule language. The operational semantics of the rule language influ- ences the problem solving strategy, while the architecture of the processing environment determines efficiency and scalability. In this paper, we present elements of the PARADISER architecture and its kernel rule language, PARULEL. The PARADISER environment provides support for parallel and distributed evaluation of rule programs, as well as static and dynamic load balancing protocols that predictively balance a computation at runtime. This combi- nation of features results in a scalable database rule and com- plex query processing architecture. We validate our claims by analyzing the performance of the system for two realistic test cases. In particular, we show how the performance of a parallel implementation of transitive closure is significantly improved by predictive dynamic load balancing.	(ps)
Predictive Dynamic Load Balancing of Parallel Hash-Joins over Heterogeneous Processors in the Presence of Data Skew	Hasanat M. Dewan, Mauricio A. Hernandez, Kui W. Mok, Salvatore J. Stolfo	1994-01-01	In this paper, we present new algorithms to balance the computa- tion of parallel hash joins over heterogeneous processors in the presence of data skew and external loads. Heterogeneity in our model consists of disparate computing elements, as well as gen- eral purpose computing ensembles that are subject to external loading. Data skew appears as significant non-uniformities in the distribution of attribute values of underlying relations that are involved in a join. We develop cost models and predictive dynamic load balancing pro- tocols to detect imbalance during the computation of a single large join. Our algorithms can account for imbalance due to data skew as well as heterogeneity in the computing environment. Sig- nificant performance gains are reported for a wide range of test cases on a prototype implementation of the system. Keywords: Load Balancing, Parallel Join, Hash Join, Data Skew, Parallel Processing, Heterogeneity	(ps)
Expanding the Repertoire of Process-based Tool Integration	Giuseppe Valetto	1994-01-01	The purpose of this thesis is to design and implement a new protocol for Black Box tool enveloping, in the context of the Oz Process Centered Environment, as an auxiliary mechanism that deals with additional families of tools, whose character prevents a thoroughly satisfactory service by the current encapsulation method. We mean to address interpretive and query systems, multi-user collaborative and non-collaborative tools, and programs that allow incremental binding of parameters after start-up and storing of intermediate and/or partial results. Our goal is to support a greater amount of interaction between multiple human operators, the tools and the environment, in the context of complex software development and management tasks. During the realization of this project, we introduced several concepts related to integration of Commercial Off-The-Shelf tools into Software Development Environments: an approach based on multiple enveloping protocols, a categorization of tools according to their multi-tasking and multi-user capabilities, the ideas of loose wrapping (as opposed to the usual tight wrapping) and of persistent tools (with respect to the duration of a single task), and a functional extension of some intrinsically single-user applications to a (limited) form of collaboration.	(ps)
Enveloping Sophisticated Tools into Computer-Aided Software Engineering Environments	Giuseppe Valetto, Gail E. Kaiser	1994-01-01	We present a CASE-tool integration strategy based on enveloping pre-existing tools without source code modifications, recompilation, or assuming an extension language or any other special capabilities on the part of the tool. This Black Box enveloping (or wrapping) idea has been around for a long time, but was previously restricted to relatively simple tools. We describe the design and implementation of a new Black Box enveloping facility intended for sophisticated tools --- with particular concern for the emerging class of groupware applications.	(ps)
Stereo in the Presence of Specular Reflection	Dinkar N. Bhat, Shree K. Nayar	1994-01-01	The problem of accurate depth estimation using stereo in the presence of specular reflection is addressed. Specular reflection is viewpoint dependent and can cause large intensity differences at corresponding points. Hence, mismatches could result causing significant depth errors. Current stereo algorithms largely ignore specular reflection which is a fundamental reflection phenomenon from surfaces, both smooth and rough. We analyzed the physics of specular reflection and the geometry of stereopsis which led us to an interesting relationship between stereo vergence, surface roughness, and the likelihood of a correct match. Given the lower bound on surface roughness, an optimal binocular stereo configuration can be determined which maximizes precision in depth estimation despite specular reflection. However, surface roughness is difficult to estimate in unstructured environments. Therefore, multiple view configurations independent of surface roughness are determined such that at each scene point visible to all sensors, at least one stereo pair provides a correct depth estimate. We have developed a simple algorithm to reconstruct depth from the multiple view images. Experiments with real surfaces confirm the viability of our approach. A key feature of this approach is that we do not seek to eliminate or avoid specular reflection, but rather minimize its effect on stereo matching.	(ps)
Toward Scalable and Parallel Inductive Learning: A Case Study in Splice Junction Prediction	Philip K. Chan, Salvatore J. Stolfo	1994-01-01	Much of the research in inductive learning concentrates on problems with relatively small amounts of training data. With the steady progress of the Human Genome Project, it is likely that orders of magnitude more data in sequence databases will be available in the near future for various learning problems of biological importance. Thus, techniques that provide the means of {\it scaling} machine learning algorithms requires considerable attention. {\it Meta-learning} is proposed as a general technique to integrate a number of distinct learning processes that aims to provide a means of scaling to large problems. This paper details several meta-learning strategies for integrating independently learned classifiers on subsets of training data by the same learner in a parallel and distributed computing environment. Our strategies are particularly suited for massive amounts of data that main-memory-based learning algorithms cannot handle efficiently. The strategies are also independent of the particular learning algorithm used and the underlying parallel and distributed platform. Preliminary experiments using different learning algorithms in a simulated parallel environment demonstrate encouraging results: parallel learning by meta-learning can achieve comparable prediction accuracy in less space and time than serial learning.	(ps)
Essential-Hazard-Free State Minimization of Incompletely Specified Asynchronous Sequential Machines	Fu-Chiung J. Cheng, Luis Plana, Stephen H. Unger	1994-01-01	This paper describes a novel algorithm for essential-hazard-free state minimization of incompletely specified asynchronous sequential machines. Some novel techniques for elimination and avoidance of potential transient and steady state essential hazards under unbounded delay assumption are proposed and exploited in our algorithm. This paper illustrates that apparent steady state essential hazards can be removed from a flow table if at least two of the rows which contribute to the hazard can be merged. It also shows that the existing state merging algorithms introduce steady state and transient essential hazards that can be avoided. The algorithm has been implemented and applied to over a dozen asynchronous sequential machines. Results are compared with results of non-essential-hazard-free method SIS. Most of the tested cases can be reduced to essential hazard free flow tables.	(ps)
REVISION-BASED GENERATION OF NATURAL LANGUAGE SUMMARIES PROVIDING HISTORICAL BACKGROUND: Corpus-based Analysis, Design, Implementation and Evaluation	Jacques Robin	1994-01-01	Automatically summarizing vast amounts of on-line quantitative data with a short natural language paragraph has a wide range of real-world applications. However, this specific task raises a number of difficult issues that are quite distinct from the generic task of language generation: conciseness, complex sentences, floating concepts, historical background, paraphrasing power and implicit content. In this thesis, I address these specific issues by proposing a new generation model in which a first pass builds a draft containing only the essential new facts to report and a second pass incrementally revises this draft to opportunistically add as many background facts as can fit within the space limit. This model requires a new type of linguistic knowledge: revision operations, which specifyies the various ways a draft can be transformed in order to concisely accommodate a new piece of information. I present an in-depth corpus analysis of human-written sports summaries that resulted in an extensive set of such revision operations. I also present the implementation,based on functional unification grammars, of the system STREAK, which relies on these operations to incrementally generate complex sentences summarizing basketball games. This thesis also contains two quantitative evaluations. The first shows that the new revision-based generation model is far more robust than the one-shot model of previous generators. The second evaluation demonstrates that the revision operations acquired during the corpus analysis and implemented in STREAK are, for the most part, portable to at least one other quantitative domain (the stock market). STREAK is the first report generator that systematically places the facts which it summarizes in their historical perspective. It is more concise than previous systems thanks to its ability to generate more complex sentences and to opportunistically convey facts by adding a few words to carefully chosen draft constituents. The revision operations on which STREAK is based constitute the first set of corpus-based linguistic knowledge geared towards incremental generation. The evaluation presented in this thesis is also the first attempt to quantitatively assess the robustness of a new generation model and the portability of a new type of linguistic knowledge.	(ps)
PGMAKE: A Portable Distributed Make System	Andrew Lih, Erez Zadok	1994-01-01		(ps)
Discovery and Hot Replacement of Replicated Read-Only File Systems, with Application to Mobile Computing (M.S. Thesis	Erez Zadok	1994-01-01		(ps)
MeldC: A Reflective Object-Oriented Coordination Language	Gail E. Kaiser, Wenwey Hseush, James Lee, Felix Wu, Ester Woo, Eric Hilsdale	1993-01-01	We present a coordination language, MeldC, for open system programming. MeldC is a C-based, concurrent, object-oriented language built on a reflective architecture. The key of the reflective feature is that the metaclass supports {\it shadow} objects that implement {\it secondary behaviors} of objects. The behavior of an object can be extended by dynamically composing multiple secondary behaviors with the primary behavior of the object which is defined by its class. The mechanism is referred to as {\it dynamic composition}. Our focus is to study the language architecture with which programmers are able to construct -- without modifying the language internals -- new language features in a high-level and efficient way. We demonstrate that the MeldC reflective architecture is the right approach to building distributed and persistent systems.	(ps)
Extending Attribute Grammars to Support Programming-in-the-Large	Josephine Micallef, Gail E. Kaiser	1993-01-01	Attribute grammars add specification of static semantic properties to context-free grammars, which in turn describe the syntactic structure of program units. However, context-free grammars cannot express programming-in-the-large features common in modern programming languages, including unordered collections of units, included units and sharing of included units. We present extensions to context-free grammars, and corresponding extensions to attribute grammars, suitable for defining such features. We explain how batch and incremental attribute evaluation algorithms can be adapted to support these extensions, resulting in a uniform approach to intra-unit and inter-unit static semantic analysis and translation of multi-unit programs.	(ps)
On the Cost of Transitive Closures in Relational Databases	Zhe Li, Ken Ross	1993-01-01	We consider the question of taking transitive closures on top of pure relational systems (Sybase and Ingres in this case). We developed three kinds of transitive closure programs, one using a stored procedure to simulate a built-in transitive closure operator, one using the C language embedded with SQL statements to simulate the iterated execution of the transitive closure operation, and one using Floyd's matrix algorithm to compute the transitive closure of an input graph. By comparing and analyzing the respective performances of their different versions in terms of elapsed time spent on taking the transitive closure, we identify some of the bottlenecks that arise when defining the transitive closure operator on top of existing relational systems. The main purpose of the work is to estimate the costs of taking transitive closures on top of relational systems, isolate the different cost factors (such as logging, network transmission cost, etc.), and identify some necessary enhancements to existing relational systems in order to support transitive closure operation efficiently. We argue that relational databases should be augmented with efficient transitive closure operators if such queries are made frequently.	(ps)
crep: a regular expression-matching textual corpus tool	Darrin Duford	1993-01-01	crep is a UNIX tool which searches either a tagged or free textual corpus file and outputs each sentence that matches the specified regular expression provided by the user as a parameter. The expression consists of user-defined regular expressions and part-of-speech tags. The purpose of crep is to make the searches faster and easier than by either a) searching through corpora by hand; or b) constructing a lexical scanner for each specific search. crep achieves this facilitation by offering the user a simple expression syntax, from which it automatically constructs an appropriate scanner. The user therefore has the ability to execute a whole search in one command, invoking several modules explicity or implicitly, including a sentence delimiter, a part-of-speech tagger, an expression matcher, and various output filters.	(ps)
Cooperative Transactions for Multi-User Environments	Gail E. Kaiser	1993-01-01	We survey extended transaction models proposed to support long duration, interactive and/or cooperative activities in the context of multi-user software development and CAD/CAM environments. Many of these are variants of the checkout model, which addresses the long duration and interactive nature of the activities supported by environments but still isolates environment users so that they cannot (or at least are not supposed to) collaborate while their activities are in progress. However, a few cooperative transaction models have been proposed to facilitate collaboration, usually while maintaining some guarantees of consistency.	(ps)
There Exists a Problem Whose Computational Complexity is Any Given Function of the Information Complexity	Ming Chu	1993-01-01	We present an information-based complexity problem for which the computational complexity can be any given increasing function of the information complexity, and the information complexity can be any non-decreasing function of $\varepsilon^{-1}$, where $\varepsilon$ is the error parameter.	(pdf)
Logical Embeddings for Minimum Congestion Routing in Lightwave Networks	B\"ulent Yener, Terrance E. Boult"	1993-01-01	The problem considered in this paper is motivated by the independence between logical and physical topology in Wavelength Division Multiplexing (WDM) based local and metropolitan lightwave networks. This paper suggests logical embeddings of digraphs into multihop lightwave networks to maximize the throughput under nonuniform traffic conditions. Defining {\it congestion} as the maximum flow carried on any link, two perturbation heuristics are presented to find a {\it good logical} embedding on which the routing problem is solved with minimum congestion. A constructive proof for a lower bound of the problem is given, and obtaining an optimal solution for integral routing is shown to be NP-Complete. The performance of the heuristics is empirically analyzed on various traffic models. Simulation results show that our heuristics perform, on the average, $20\%$ from a computed lower bound. Since this lower bound is not quite tight, we suspect that the actual performance is better. In addition, we show that $5\% - 20\%$ performance improvements can be obtained over the previous work.	(ps)
Oz: A Decentralized Process Centered Environment (Ph.D. Thesis Proposal	Israel Z. Ben-Shaul	1993-01-01	This is a proposal for a model and an architecture for decentralized process centered environments, supporting collaboration and concerted efforts among geographically-dispersed teams -- each team with its own autonomous process -- with emphasis on flexible control over the degree of collaboration and autonomy provided. The focus is on decentralized process {\it modeling} and on decentralized process {\it enaction.	(ps)
The complexity of two-point boundary-value problems with piecewise analytic data	Arthur G. Werschulz	1993-01-01	Previous work on the $\e$-complexity of elliptic boundary-value problems $Lu=f$ assumed that the class~$F$ of problem elements~$f$ was the unit ball of a Sobolev space. In a recent paper, we considered the case of a model two-point boundary-value problem, with $F$ being a class of analytic functions. In this paper, we ask what happens if $F$ is a class of piecewise analytic functions. We find that the complexity depends strongly on how much a priori information we have about the breakpoints. If the location of the breakpoints is known, then the $\e$-complexity is proportional to $\ln(\e^{-1})$, and there is a finite element $p$-method (in the sense of Babu\u{s}ka) whose cost is optimal to within a constant factor. If we know neither the location nor the number of breakpoints, then the problem is unsolvable for $\e<\sqrt{2}$. If we know only that there are $b\ge 2$ breakpoints, but we don't know their location, then the $\e$-complexity is proportional to $b\e^{-1}$, and a finite element $h$-method is nearly optimal. In short, knowing the location of the breakpoints is as good as knowing that the problem elements are analytic, whereas only knowing the number of breakpoints is no better than knowing that the problem elements have a bounded derivative in the $L_2$ sense.	(pdf)
A Repository for a CARE Environment	Toni A. B\"unter"	1993-01-01	Repositories in CASE hold information about the development process and the structure of developing software. The migration or reuse of CASE repositories for CARE (Computer Aided Re-Engineering) is not adequate for the reengineering process. The main reasons for its inadequacy are the emptyness of such repositories, and the nature of the process itself. In the following report we will define a CARE architecture, from the reengineering point of view, and derive a structure of a repository appropriate to the reengineering process.	(ps)
A Non-Deterministic Approach to Restructuring Flow Graphs	Toni A. B\"unter"	1993-01-01	The history of programming is filled with works about the properties of program flow graphs. There are many approaches to defining the quality of such graphs, and to improving a given flow graph by restructuring the underlying source code. We present here a new, twofold approach to restructuring the control flow of arbitrary source code. The first part of the method is a classical deterministic algorithm; the second part is non-deterministic and involves user interaction. The method is based on node splitting, enabling it to satisfy the definition of the extended Nassi-Shneiderman diagrams.	(ps)
Isochronets: a High-Speed Network Switching Architecture	Danilo Florissi	1993-01-01	Traditional switching techniques need hundred- or thousand-MIPS processing power within switches to support Gbit/s transmission rates available today. These techniques anchor their decision-making on control information within transmitted frames and thus must resolve routes at the speed in which frames are being pumped into switches. Isochronets can potentially switch at any transmission rate by making switching decisions independent of frame contents. Isochronets divide network bandwidth among routing trees, a technique called Route Division Multiple Access (RDMA). Frames access network resources through the appropriate routing tree to the destination. Frame structures are irrelevant for switching decisions. Consequently, Isochronets can support multiple framing protocols without adaptation layers and are strong candidates for all-optical implementations. All network-layer functions are reduced to an admission control mechanism designed to provide quality of service (QOS) guarantees for multiple classes of traffic. The main results of this work are: (1) A new network architecture suitable for high-speed transmissions; (2) An implementation of Isochronets using cheap off-the-shelf components; (3) A comparison of RDMA with more traditional switching techniques, such as Packet Switching and Circuit Switching; (4) New protocols necessary for Isochronet operations; and (5) Use of Isochronet techniques at higher layers of the protocol stack (in particular, we show how Isochronet techniques may solve routing problems in ATM networks).	(ps)
A Configuration Process for a Distributed Software Development Environment	Israel Z. Ben-Shaul, Gail E. Kaiser	1993-01-01	This paper describes work-in-progress on a configuration facility for a multi-site software development environment. The environment supports collaboration among geographically-dispersed teams of software developers. Addition and deletion of local subenvironment sites to a global environment is performed interactively inside any one of the existing local subenvironments, with the same user interface normally employed for invoking software development tools. This registration process is defined and executed using the same notation and mechanisms, respectively, as for the software development process. Each remote site is represented by a root object in the distributed objectbase containing the software under development; each local subobjectbase can be displayed and queried at any site, but only its root is physically copied at every site. Everything described in this paper has been implemented and is working, but since we are in the midst of experimentation, we do not expect that the ``final'' system will be exactly as described here.	(ps)
Graphical Editing by Example	David Kurlander	1993-01-01	Constructing illustrations by computer can be both tedious and difficult. This doctoral thesis introduces five example-based techniques to facilitate the process. These techniques are independently useful, but also interrelate in interesting ways: * Graphical Search and Replace, the analogue to textual search and replace in text editors, is useful for making repetitive changes throughout graphical documents. * Constraint-Based Search and Replace, an extension to graphical search and replace, allows users to define their own illustration beautification rules and constraint inferencing rules by demonstration. * Constraint Inferencing from Multiple Snapshots facilitates constraint specification by automatically computing constraints that hold in multiple configurations of an illustration. * Editable Graphical Histories, a visual representation of commands in a graphical user interface, are useful for reviewing, undoing, and redoing sets of operations. * Graphical Macros By Example, based on this history representation, allow users to scroll through previously executed commands and encapsulate useful sequences into macros. These macros can be generalized into procedures, with arguments and flow of control using graphical and constraint-based search and replace. Individually and in combination, these techniques reduce repetition in graphical editing tasks, visually and by example, using the application's own interface. These techniques have been implemented in Chimera, an editor built to serve as a testbed for this research.	(ps)
DFLOPS:A Dataflow Machine for Production Systems	Fu-Chiung Cheng, Mei-Yi Wu	1993-01-01	Many production system machines have been proposed to speed up the execution of production system programs. Most of them are implemented in kinds of conventional control flow model of execution which is limited by "von Neumann bottleneck." In this paper we propose a new multiprocessor dataflow machine, called DFLOPS, for parallel processing of production systems. The rules programs are compiled into dataflow graphs and then map into DFLOPS processing elements. Our parallel execution model exploits not only matching rules in parallel but also firing rules in parallel. The design and implementation of DFLOPS is presented in detail. The distinguishing characteristics of this proposed machine lies in its simplicity, fully-pipelined processing and fine grain parallelism. MISD, SIMD and/or MIMD modes of execution can be exploited in this machine according to the properties of applications. The initial results reveal that the performance of production systems is greatly improved.	(ps)
Disconnected Operation in a Multi-User Software Development Environment	Peter D. Skopp, Gail E. Kaiser	1993-01-01	Software Development Environments have traditionally relied upon a central project database and file repository, accessible to a programmer's workstation via a local area network connection. The introduction of powerful mobile computers has demonstrated the need for a new model, which allows for machines with transient network connectivity to assist programmers in product development. We propose a process-based checkout model by which process and product files that may be needed during a planned period of dis-connectivity are pre-fetched with minimal user effort. Rather than selecting each file by hand, which is tedious and error-prone, the user only informs the environment of the portion of the software development process intended to be executed while disconnected. The environment is then responsible for pre-fetching the necessary files. We hope that this approach will enable programmers to continue working on a project without network access.	(ps)
Terminological Constraint Network Reasoning and its Application to Plan Recognition (Thesis Proposal	Robert Weida	1993-01-01	Terminological systems in the tradition of KL-ONE are widely used in AI to represent and reason with concept descriptions. They compute subsumption relations between concepts and automatically classify concepts into a taxonomy having well-founded semantics. Each concept in the taxonomy describes a set of possible instances which are a superset of those described by its descendants. One limitation of current systems is their inability to handle complex compositions of concepts, such as constraint networks where each node is described by an associated concept. For example, plans are often represented (in part) as collections of actions related by a rich variety of temporal and other constraints. The T-REX system integrates terminological reasoning with constraint network reasoning to classify such plans, producing a ``terminological'' plan library. T-REX also introduces a new theory of plan recognition as a deductive process which dynamically partitions the plan library by modalities, e.g., necessary, possible and impossible, while observations are made. Plan recognition is guided by the plan library's terminological nature. Varying assumptions about the accuracy and monotonicity of the observations are addressed. Although this work focuses on temporal constraint networks used to represent plans, terminological systems can be extended to encompass constraint networks in other domains as well.	(ps)
Hazards, Critical Races, and Metastability	Stephen H. Unger	1993-01-01	The various modes of failure of asynchronous sequential logic circuits due to timing problems are considered. These are hazards, critical races and metastable states. It is shown that there is a mechanism common to all forms of hazards and to metastable states. A similar mechanism, with added complications, is shown to characterize critical races. Means for defeating various types of hazards and critical races through the use of one-sided delay constraints are introduced. A method is described for determining from a flow table situations in which metastable states may be entered. A circuit technique for defeating metastability problems in self timed systems is presented. It is shown that the use of simulation for verifying the correctness of a circuit with given bounds on the branch delays cannot be relied upon to expose all timing problems. An example is presented that refutes the conjecture that replacing pure delays with inertial delays can only eliminate glitches.	(ps)
Optimization of the Binding Mechanism of the Characteristic Function in Marvel	Bunter, Toni A.	1993-01-01	The applied binding mechanism in {\sc Marvel} 3.1 for the characteristic function checks all instances of a given class against the binding formula, regardless of the actual structure of the formula and its predicates. This can cause unnecessary computation overhead while executing a rule. This report displays a more advanced mechanism considering relational information between objects, the structure of the binding formula and optimizing rewriting of the binding formula.	(ps)
An Approach for Distributed Query Processing in Marvel: Concepts and Implementation	Bunter, Toni A.	1993-01-01	This work displays an approach for the query processing of {\sc Marvel} rules upon a distributed {\sc Marvel} objectbase. Rules and rest rules run simultaneously on different subenvironments, synchronized by a coordinating subenvironment. Instead of transmitting objects, the showed method transmits images. The concept of lazy calling is introduced.	(ps)
New Lower Bounds on the Cost of Binary Search Trees	De Prisco, R., De Santis, A.	1993-01-01	In this paper we provide new lower bounds on the cost of binary search trees. The bounds are expressed in terms of the entropy of the probability distribution, the number of elements and the probability that a search is successfully. Most of our lower bounds are derived by means of a new technique which exploits the relation between trees and codes. Our lower bounds compare favorably with known limitations. We also provide an achievable upper bound on the Kraft sum generalized to the internal nodes of a tree. This improves on a previous result.	(ps)
Catastrophic Faults in Reconfigurable Linear Arrays of Processors	De Prisco, R., De Santis, A.	1993-01-01	In regular architectures of identical processing elements, a widely used technique to improve the reconfigurability of the system consists of providing redundant processing elements and mechanisms of reconfiguration. In this paper we consider linear arrays of processing elements, with unidirectional bypass links of length $g$. We count the number of particular sets of faulty processing elements. We show that the number of catastrophic faults of $g$ elements is equal to the $(g-1)$-th Catalan number. We also provide algorithms to rank and unrank all catastrophic sets of $g$ faults. Finally, we describe a linear time algorithm that generates all such sets of faults.	(ps)
Minimal Path Length of Trees with Known Fringe	Roberto De Prisco, Giuseppe Parlati, Giu	1993-01-01	In this paper we study the path length of binary trees with known number of leaves and known fringe, that is the difference between the longest and the shortest root-leaf path. We compute the path length of the minimal tree with given number of leaves $N$ and fringe $\Delta$ for the case $\Delta\ge N/2$. This complements a known tight lower bound that holds in the case $\Delta\le N/2$. Our method also yields a linear time algorithm for constructing the minimal tree valid for any $N$ and $\Delta$.	(ps)
Solution of Ulam's Problem on Binary Search with Four Lies	Vincenzo Auletta, Alberto Negro, Giusepp	1993-01-01	S.M. Ulam, in his autobiography (1976), suggested an interesting two--person search game which can be formalized as follows: a Responder chooses an element $x$ in $\{ 1,2,\ldots,1000000 \}$ unknown to a Questioner. The Questioner has to find it by asking queries of the form ``$x \in Q ?$", where $Q$ is an arbitrary subset of $\{ 1,2,\ldots, 1000000 \}$. The Responder provides ``yes" or ``no" answers, some of which may be {\it erroneous }. In this paper we determine the minimal number of yes-no queries needed to find the unknown integer $x$ between $1$ and $1000000$ if at most four of the answers may be erroneous.	(ps)
Process Centered Software Development on Mobile Hosts	Peter D. Skopp	1993-01-01	Software Development Environments have traditionally relied upon a central project database and file repository, accessible to a programmer's workstation via a high speed local area network connection. The introduction of powerful mobile computers has demonstrated the need for a new model, which allows for variable bandwidth machines as well as transient network connectivity to assist programmers in product development. A new client-server model is introduced which minimizes network traffic when bandwidth is limited. To support disconnected operation, I propose a em process-based checkout model by which process information and product files that may be needed during a planned period of dis-connectivity are pre-fetched with minimal user effort. Rather than selecting each file by hand, which is tedious and error-prone, the user only informs the environment of the portion of the software development process intended to be executed while disconnected. The environment is then responsible for pre-fetching the necessary files. It is hoped that these research efforts will enable programmers to continue working on a project without continuous high speed network access.	(ps)
Automatic Translation of Process Modeling Formalisms	George T. Heineman	1993-01-01	This paper demonstrates that the enaction of a software process can be separated from the formalism in which the process is modeled. To show this, we built a compiler capable of automatically translating a process model specified using Statemate into a MARVEL environment which then supports the enaction of the process.	(pdf) (ps)
Adaptive Remote Paging for Mobile Computers	Bill N. Schilit, Dan Duchamp	1991-01-01	There is a strong trend toward the production of small ``notebook'' computers. The small size of portable computers places inherent limits on their storage capacity, making remote paging desirable or necessary. Further, mobile computers can ``walk away from'' their servers, increasing load on network routing resources and/or breaking network connections altogether. Therefore, it is desirable to allow client-server matchups to be made dynamically and to vary over time, so that a client might always be connected to nearby servers. Accordingly, we have built a self-organizing paging service that adapts to changes in locale and that stores pages in remote memory if possible. We show empirically that there is no performance penalty for using our paging facility instead of a local disk. This suggests that portable computers need neither a hard disk nor an excessive amount of RAM, provided that they will operate in environments in which remote storage is plentiful. These are important facts because both a hard disk and large amounts of RAM are undesirable characteristics for very small portable computers.	(pdf) (ps)
{IP}-based Protocols for Mobile Internetworking	John Ioannidis, Dan Duchamp, Gerald Q. Maguire Jr.	1991-01-01	We consider the problem of providing network access to hosts whose physical location changes with time. Such hosts cannot depend on traditional forms of network connectivity and routing because their location, and hence the route to reach them, cannot be deduced from their IP address. We present protocols that seamlessly integrate mobile hosts into the current IP networking infrastructure. They are primarily targeted at supporting a campus environment with mobile computers, but also extend gracefully to accomodate hosts moving between different networks. The key feature is the dependence on ancillary machines to track the location of the mobile hosts. Our protocols are designed to react quickly to changing topologies, to scale well, and not to place an overwhelming burden on the network.	(ps)
Inferring Constraints from Multiple Snapshots	David Kurlander, Steven Feiner	1991-01-01	Many graphics tasks, such as the manipulation of graphical objects, and the construction of user-interface widgets, can be facilitated by geometric constraints. However, the difficulty of specify-ing constraints by traditional methods forms a barrier to their widespread use. In order to make constraints easier to declare, we have developed a method of specifying constraints implicitly, through multiple examples. Snapshots are taken of an initial scene configuration, and one or more additional snapshots are taken after the scene has been edited into other valid configurations. The constraints that are satisfied in all the snapshots are then applied to the scene objects. We discuss an efficient algorithm for inferring constraints from multiple snapshots. The algorithm has been incorporated into the Chimera editor, and several examples of its use are discussed.	(pdf) (ps)
Extending A Tool Integration Language	Mark A. Gisi, Gail E. Kaiser	1991-01-01	The {\sc Marvel} environment supports rule-based automation of software processes. {\sc Marvel} invokes external tools to carry out steps in a software process. One of the major objectives of this research is to invoke external tools to carry out steps in a software process without modifying the tools. This is achieved by encapsulating tools in {\it envelopes}, designed to abstract the details of a tool from the {\sc Marvel} kernel, thereby providing a ``black box'' interface. Initially we used the {\sc Unix} shell script language to write envelopes. However, due to several limitations of the shell language the black box abstraction could not be fully supported. We describe these limitations and discuss how we extended the shell language to obtain a new envelope language that fully supports the black box abstraction.	(ps)
Fully Dynamic Algorithms for 2-Edge Connectivity	Zvi Galil, Giuseppe F. Italiano	1991-01-01	We study the problem of maintaining the 2-edge-connected components of a graph undergoing repeated dynamic modifications, such as edge insertions and edge deletions. We show how to test at any time whether two vertices belong to the same 2-edge-connected component, and how to insert and delete an edge in $O(m^{2/3})$ time in the worst case, where $m$ is the current number of edges in the graph. This answers a question posed by Westbrook and Tarjan [WT89]. For planar graphs, we present algorithms that support all these operations in $O(\sqrt{n\log\log n}\,)$ worst-case time each, where $n$ is the total number of vertices in the graph.	(ps)
Maintaining the 3-Edge-Connected Components of a Graph On-Line	Zvi Galil, Giuseppe F. Italiano	1991-01-01	We study the problem of maintaining the 3-edge-connected components of a graph undergoing repeated dynamic modifications, such as edge and vertex insertions. We show how to answer whether two vertices belong to the same 3-edge-connected component of a connected graph that is undergoing only edge insertions. Any sequence of $q$ query and updates on an $n$-vertex graph can be performed in $O((n+q)\alpha(q,n))$ time.	(ps)
Maintaining Biconnected Components of Dynamic Planar Graphs	Zvi Galil, Giuseppe F. Italiano	1991-01-01	We present algorithms for maintaining the biconnected components of a planar graph undergoing repeated dynamic modifications, such as insertions and deletions of edges and vertices. We show how to test at any time whether two vertices belong to the same biconnected component, and how to insert and delete an edge in $O(n^{2/3})$ time in the worst case, where $n$ is the number of vertices in the graph. The data structure supports also insertions of new vertices and deletions of disconnected vertices in the same time bounds. This is the first sublinear-time algorithm known for this problem.	(ps)
Dynamic data structures for graphs	Giuseppe Francesco Italiano	1991-01-01	We consider dynamic algorithms and dynamic data structures for various graph problems, such as edge connectivity, vertex connectivity and shortest paths. We first show how to maintain efficiently a solution to the shortest-path problem during edge insertions and edge-cost decreases. We then present the first sublinear-time algorithms known for maintaining the 2-edge-connected components of a general graph and the 2-vertex-connected components of a planar graph during edge and vertex insertions and deletions. Finally, we introduce a fast algorithm for maintaining the 3-edge-connected components of a graph during edge insertions only. The solutions of all the above problems share many common features and combine a variety of graph properties, novel algorithmic techniques and new data structures.	(ps)
Control in Functional Unification Grammars for Text Generation	Michael Elhadad, Jacques Robin	1991-01-01	Standard Functional Unification Grammars (FUGs) provide a structurally guided top-down control regime for text generation that is not appropriate for handling non-structural and dynamic constraints. We introduce two control tools that we have implemented for FUGs to address these limitations: {\em bk-class}, a tool to limit search by using a form of dependency-directed backtracking and {\em external}, a co-routine mechanism allowing a FUG to cooperate with dynamic constraint sources. We show how these tools complement the top-down regime of FUGs to enhance lexical choice."	(ps)
MpD: A Multiprocessor Debugger (M.S. Thesis	Krish Ponamgi	1991-01-01	MpD is a multiprocessor C debugger designed for multithreaded applications running under the Mach operating system. MpD is built on top of gdb, an existing sequential debugger. The MpD layer utilizes the modeling languages Data Path Expressions developed by Hseush and Kaiser to provide a rich set of commands to trace sequential and parallel execution of a program. Associated with each DPE are actions that allow access to useful trace variables and I/O facilities. DPEs are useful for describing sequential and concurrent patterns of events to be verified during execution. The patterns include conditions such as synchronizations, race conditions, and wrongly classified sequential/concurrent behavior. We show in this thesis Data Path Expressions are a viable language for multiprocessor debuggers.	(pdf)
Rule Chaining in {\sc Marvel}: Dynamic Binding of Parameters	George T.~Heineman, Gail E.~Kaiser, Naser S.~Barghouti, Israel Z.~Ben-Shaul	1991-01-01	{\sc Marvel} is a rule-based development environment (RBDE) that assists in the development of software projects. {\sc Marvel} encapsulates each software development activity in a rule that specifies the condition for invoking the activity and its effects on the components of the project under development. These components are abstracted as objects and stored in a persistent object database. Each rule applies to a specific class of objects, which is specified as the parameter of the rule. Firing a rule entails binding its formal parameter to a specific object. If the rule changes the object in such a way that the conditions of other rules become satisfied, these other rules are automatically fired. A problem arises in this forward chaining model when the classes of the objects manipulated by the rules are different. {\sc Marvel} has to determine which object to bind to the parameter of each rule in the chain, based on the object manipulated by the original rule that initiated the chain. We describe a heuristic approach for solving this problem in the current {\sc Marvel} implementation and introduce an algorithmic approach that does better.	(pdf) (ps)
Incremental Attribute Evaluation for Multi-User Semantics-Based Editors	Josephine Micallef	1991-01-01	This thesis addresses two fundamental problems associated with performing incremental attribute evaluation in multi-user editors based on the attribute grammar formalism: (1) multiple asynchronous modifications of the attributed derivation tree, and (2) segmentation of the tree into separate modular units. Solutions to these problems make it possible to construct semantics-based editors for use by teams of programmers developing or maintaining large software systems. Multi-user semantics-based editors improve software productivity by reducing communication costs and snafus. The objectives of an incremental attribute evaluation algorithm for multiple asynchronous changes are that (a) all attributes of the derivation tree have correct values when evaluation terminates, and (b) the cost of evaluating attributes necessary to reestablish a correctly attributed derivation tree is minimized. We present a family of algorithms that differ in how they balance the tradeoff between algorithm efficiency and expressiveness of the attribute grammar. This is important because multi-user editors seem a practical basis for many areas of computer-supported cooperative work, not just programming. Different application areas may have distinct definitions of efficiency, and may impose different requirements on the expressiveness of the attribute grammar. The characteristics of the application domain can then be used to select the most efficient strategy for each particular editor. To address the second problem, we define an extension of classical attribute grammars that allows the specification of interface consistency checking for programs composed of many modules. Classical attribute grammars can specify the static semantics of monolithic programs or modules, but not inter-module semantics; the latter was done in the past using @i[ad hoc] techniques. Extended attribute grammars support programming-in-the-large constructs found in real programming languages, including textual inclusion, multiple kinds of modular units and nested modular units. We discuss attribute evaluation in the context of programming-in-the-large, particularly the separation of concerns between the local evaluator for each modular unit and the global evaluator that propagates attribute flows across module boundaries. The result is a uniform approach to formal specification of both intra-module and inter-module static semantic properties, with the ability to use attribute evaluation algorithms to carry out a complete static semantic analysis of a multi-module program.	(pdf)
Execution Autonomy in Distributed Transaction Processing	Calton Pu, Avraham Leff	1991-01-01	We study the feasibility of execution autonomy in systems with asynchronous transaction processing based on epsilon-serializability (ESR). The abstract correctness criteria defined by ESR are implemented by techniques such as asynchronous divergence control and asynchronous consistency restoration. Concrete application examples in a distributed environment, such as banking, are described in order to illustrate the advantages of using ESR to support execution autonomy.	(ps)
An Architectural Framework for Object Management Systems	Steven S. Popovich	1991-01-01	Much research has been done in the last decade in the closely related areas of object-oriented programming languages and databases. Both areas now seem to be working toward a common end, that of an @i(object management system), or OMS. An OMS is constructed similarly to an OODB but provides a general purpose concurrent object oriented programming language as well, sharing the object base with the OODB query facilities. In this paper, we will define several different types of object systems (object servers, persistent OOPL's, OODB's and OMS's) in terms of their interfaces and capabilities. We will examine the distinguishing features and general architecture of systems of each type in the light of a general model of OMS architecture.	(ps)
Implementing Activity Structures Process Modeling On Top Of The Marvel Environment Kernel	Gail E. Kaiser, Israel Ben-Shaul, Steven S. Popovich	1991-01-01	Our goal was to implement the activity structures model defined by Software Design \& Analysis on top of the {\sc Marvel} environment kernel. This involved further design of the activity structures process definition language and enaction model as well as translation and run-time support in terms of facilities provided by {\sc Marvel}. The result is an elegant declarative control language for multi-user software processes, with data and activities defined as classes and rules in the previously existing {\sc Marvel} Strategy Language. Semantics-based concurrency control is provided by a combination of the {\sc Marvel} kernel's lock and transaction managers and the send/receive synchronization primitives of the activity structures model.	(ps)
Service Interface and Replica Management Algorithm for Mobile File System Clients	Carl D. Tait, Dan Duchamp	1991-01-01	Portable computers are now common, a fact that raises the possibility that file service clients might move on a regular basis. This new development requires re-thinking some features of distributed file system design. We argue that existing approaches to file replica management would not cope well with the likely behavior of mobile clients, and we present our solution: a lazy ``server-based'' update operation. This operation facilitates fast, scalable, and highly fault-tolerant implementations of both read and write operations in the usual case. To cope with the weak semantics of the update operation, we propose a new file system service interface that allows applications to opt for ``UNIX semantics'' by use of a slower, less fault-tolerant read operation.	(ps)
An Architecture for Dynamic Reconfiguration in a Distributed Object-Based Programming Language	Brent Hailpern, Gail E. Kaiser	1991-01-01	Distributed applications ideally allow reconfiguration while the application is running, but changes are usually limited to adding new client and server processes and changing the bindings among such processes. In some application domains, such as real-time financial services, it is necessary to support finer grained reconfiguration at the level of entities smaller than processes, but desirable to reduce the overhead associated with ad hoc dynamic storage allocation. We present a scheme for special cases of fine-grained dynamic reconfiguration sufficient for our application domain and show how it can be used for practical changes. We introduce new language concepts to apply this scheme in the context of an object-based programming language that supports shared data in a distributed environment.	(ps)
Real-Time portfolio management and automatic extensions	Tushar M. Patel	1991-01-01	Voluminous amounts of rapidly changing data in financial markets create a challenging problem for portfolio managers attempting to exploit such changes to achieve their investment objectives. The SPLENDORS real time portfolio management system, built using the PROFIT language, is such a system. Changes in security prices are monitored in a real-time fashion and individual portfolios can be optimized by taking advantage of these changes within the constraints imposed by the investment philosophy of the investor. SPLENDORS allows great flexibility in creating portfolio managers for the sophisticated C programmer and considerable ease of programming for the PROFIT programmer. Additionally, the non-programming financial analyst may also create portfolio managers from parts of a predefined library without any knowledge of programming at all. Extensions to SPLENDORS made by the end user are available immediately without bringing the system down for re-compilation.	(ps)
Navigating the MeldC: The MeldC User's Manual	Howard Gershen, Erik Hilsdale	1991-01-01	MeldC is an attempt to create a computer language that is a watershed of developments in the fields of object-oriented programming, parallel programming, and distributed applications. This easy-to-understand manual provides readers with the details of how to write programs in MeldC, in addition to plain-English explanations of some of the theory (e.g., object/class hierarchies, race condition problems, and remote-file access problems) behind MeldC's features. Illustrated. Commented sample programs and BNF included.	(ps)
Marvel 3.0 Administrator's manual	Programming System Laboratory	1991-01-01	This manual is intended for system/project administrators of Marvel 3.0. It covers various aspects of managing a Marvel environment, including installation of the system, administrator built-in commands, and mainly, how to design and write a Marvel environment using Marvel Strategy Language (MSL) and Shell Envelope Language (SEL).	(pdf) (ps)
Marvel 3.0 User's manual	Programming System Laboratory	1991-01-01	This manual is intended for Marvel end-users, who want to learn how to use Marvel in a specific environment set-up by an administrator. It includes a tutorial, and explanations on the user-interface, messages, and the various built-in commands.	(pdf) (ps)
Taxonomic Plan Reasoning	Premkumar T. Devanbu, Diane J. Litman	1991-01-01	CLASP (ClAssification of Scenarios and Plans) is a knowledge representation system that extends the notion of subsumption from frame-based languages to plans. The CLASP representation language provides description forming operators that specify temporal and conditional relationships between actions represented in CLASSIC (a current subsumption-based knowledge representation language). CLASP supports subsumption inferences between plan concepts and other plan concepts, as well as between plan concepts and plan instances. These inferences support the automatic creation of a plan taxonomy. Subsumption in CLASP builds on term subsumption in CLASSIC and illustrates how term subsumption can be exploited to serve special needs. In particular, the CLASP algorithms for plan subsumption integrate work in automata theory with work in term subsumption. We are using CLASP to store and retrieve information about feature specifications and test scripts in the context of a large software development project.	(pdf) (ps)
FUF User Manual - Version 5.0	Michael Elhadad	1991-01-01	This report is the user manual for FUF version 5.0, a natural language generator program that uses the technique of unification grammars. The program is composed of two main modules: a unifier and a linearizer. The unifier takes as input a semantic description of the text to be generated and a unification grammar, and produces as output a rich syntactic description of the text. The linearizer interprets this syntactic description and produces an English sentence. This manual includes a detailed presentation of the technique of unification grammars and a reference manual for the current implementation (FUF 5.0). Version 5.0 now includes novel techniques in the unification allowing the specification of types and the expression of complete information. It also allows for procedural unification and supports sophisticated forms of control.	(ps)
Empirical Studies on the Disambiguation of Cue Phrases	Julia Hirschberg, Diane Litman	1991-01-01	Cue phrases are linguistic expressions such as @i(now) and @i(well) that function as explicit indicators of the structure of a discourse. For example, @i(now) may signal the beginning of a subtopic or a return to a previous topic, while @i(well) may mark subsequent material as a response to prior material, or as an explanatory comment. However, while cue phrases @b(may) convey discourse structure, each also has one or more alternate uses. While @i(incidentally) may be used @b(SENTENTIALLY) as an adverbial, for example, the @b(DISCOURSE) use initiates a digression. The question of how speakers and hearers distinguish between discourse and sentential uses of cue phrases is rarely addressed in discourse studies. This paper reports results of several empirical studies of discourse and sentential uses of cue phrases, in which both text-based and prosodic features were examined for disambiguating power. Based on these studies, we propose that discourse versus sentential usage may be distinguished by intonational features, specifically, @b(PITCH) @b(ACCENT) and @b(INTONATIONAL) @b(PHRASING). We identify a prosodic model that characterizes these distinctions. We associate this model with features identifiable from text analysis, including orthography and part-of-speech, to permit the application of our findings to the generation of appropriate intonational features for discourse and sentential uses of cue phrases in synthetic speech.	(pdf)
Parallel Dynamic Programming	Z. Galil, K. Park	1991-01-01	We study the parallel computation of dynamic programming. There are five important dynamic programming problems which have wide application, and that have been studied extensively in sequential computation. Two problems among the five have fast parallel algorithms; almost no work has been done for parallelizing the other three. We give fast parallel algorithms for four of these problems, including the three. We use two well-known methods as general paradigms for developing parallel algorithms. Combined with various techniques, they lead to a number of new results. At the heart of our new results is a processor reduction technique, which enables us to solve all four problems with a fewer number of operations than that needed for computing a matrix closure.	(ps)
Machine Learning in Molecular Biology Sequence Analysis	Philip K. Chan	1991-01-01	To investigate how human characteristics are inherited, molecular biologists have been analyzing chemical sequences from DNA, RNA, and proteins. To facilitate this process, sequence analysis knowledge has been encoded in computer programs. However, translating human knowledge to programs is known to be problematic. Machine Learning techniques allow these systems to be generated automatically. This article discusses the application of learning techniques to various analysis tasks. It is shown that the learned systems constructed to date are often more accurate than human-designed systems. Moreover, learning can form plausible new hypotheses, which potentially lead to discovering new knowledge.	(ps)
A Formal Characterization of Epsilon Serializability	Krithi Ramamritham, Calton Pu	1991-01-01	Epsilon Serializability (ESR) is a generalization of classic serializability (SR). ESR allows some limited amount of inconsistency in transaction processing (TP), through an interface called epsilon-transactions (ETs). For example, some query ETs may view inconsistent data due to non-SR interleaving with concurrent updates. In this paper, we restrict our attention to the situation where query-only ETs run concurrently with {\em consistent} update transactions that are SR without the ETs. This paper presents a {\em formal characterization} of ESR and ETs. Using the ACTA framework, the first part of this characterization formally expresses the inter-transaction conflicts that are recognized by ESR and, through that, defines ESR, analogous to the manner in which conflict-based serializability is defined. The second part of the paper is devoted to deriving expressions for: (1) the {\em inconsistency} in the values of data -- arising from ongoing updates, (2) the inconsistency of the results of a query -- arising from the inconsistency of the data read in order to process the query, and (3) the inconsistency exported by an update ET -- arising from ongoing queries reading uncommitted data produced by the update ET. These expressions are used to determine the preconditions that ET operations have to satisfy in order to maintain the limits on the inconsistency in the data read by query ETs, the inconsistency exported by update ETs, and the inconsistency in the results of queries. This determination suggests possible mechanisms that can be used to realize ESR.	(ps)
Knowledge Representation and Reasoning with Definitional Taxonomies	Robert A. Weida	1991-01-01	We provide a detailed overview of knowledge representation issues in general and terminological knowledge representation in particular. Terminological knowledge representation, which originated with KL-ONE, is an object-centered approach in the tradition of semantic networks and frames. Terminological systems share three distinguishing characteristics: (1) They are intended to support the definition of conceptual terms comprising a "terminology" and to facilitate reasoning about such terms. As such, they are explicitly distinguished from assertional systems which make statements of fact based on some terminology. (2) Their concepts are arranged in a taxonomy so that the attributes of a concept apply to its descendants without exception. Thus, the proper location of any concept within the taxonomy can be uniquely determined from the concept's definition by an automatic process known as classification. (3) They restrict the expressiveness of their language to achieve relatively efficient performance. We first survey important general issues in the field of knowledge representation, consider the semantics of concepts and their interrelationship, and examine the intertwined notions of taxonomy and inheritance. After discussing classification, we present a number of implemented terminological systems in detail, along with several hybrid systems which couple terminological and assertional reasoning components. We conclude by assessing the current state of the art in terminological knowledge representation.	(ps)
TLex User Manual	Steven M. Kearns	1990-01-01	The TLex Language, a regular expression-based language for finding and extracting information from text, is described. TLex features a pattern language that includes intersection, complement, and context sensitive operators. In addition, TLex automatically creates a parse tree from a successful match and offers convenient functions for manipulating it.	(ps)
Modeling Safe and Regular Concurrent Processes	Wenwey Hseush, Timothy S. Balraj, Gail E. Kaiser	1990-01-01	The authors have previously described the use of data path expressions and predecessor automata in debugging concurrent systems. In this paper we examine the relationship of these models to two traditional models of concurrent processes: pomset languages and k-safe Petri net systems. We explore the regularity and safety of the concurrent languages described by each of the four models. Our main result is the equivalence of regular safe pomset languages and the languages described by safe data path expressions, safe predecessor automata and k-safe Petri net systems.	(ps)
Lexical Choice in Natural Language Generation	Jacques Robin	1990-01-01	In this paper we survey the issue of lexical choice in natural language generation. We first define lexical choice as the choice of {\em open-class} lexical items (whether phrasal patterns or individual words) appropriate to express the content units of the utterance to generate in a given situation of enunciation. We then distinguish between five major classes of constraints on lexical choice: grammatical, inter-lexical, encyclopedic, interpersonal and discursive. For each class we review how they have been represented and used in existing generators. We conclude by pointing out the common limitations of these generators with respect to lexical choice as well as some directions for future research in the field.	(ps)
Testing Object-Oriented Programs by Mutually Suspicious Parties	Travis Lee Winfrey	1990-01-01	Testing object-oriented programs has been studied primarily in terms of paradigms that apply to all programs, i.e., white-box and black-box testing. We describe a new testing method for object-oriented programs that specifically exploits encapsulation properties. Individual object classes or even individual instances of objects may be instrumented for testing. At the heart of the method is the systematic renaming and duplication of object classes.	(ps)
Porting AIX onto the Student Electronic Notebook	John Ioannidis, Gerald Q. Maguire Jr., Israel Ben-Shaul, Marios Levedopoulos, Micky Liu	1990-01-01	We describe the Student Electronic Notebook and the process of porting IBM's AIX 1.1 to run on it. We believe that portable workstation-class machines connected by wireless networks and dependent on a computational and informational infrastructure raise a number of important issues in operating systems and distributed computation (e.g., the partitioning of tasks between workstations and infrastructure), and therefore the development of such machines and their software is important. We conclude by summarizing our activites, itemizing the lessons we learned and identifying the key criteria for the design of the successor machines.	(ps)
The Coherent Trivial File Transfer Protocol	John Ioannidis, Gerald Q. Maguire Jr.	1990-01-01	Coherent TFTP is a protocol that takes advantage of the broadcast nature of CSMA networks to speed up simultaneous one-to-many file transfers (e.g., when booting diskless workstations). The CTFTP server lisens and services request for entire files or portions thereof. CTFTP clients first determine whether the file they are interested in is already being transferred, in which case they ``eavesdrop'' and load as much of it as they can, or they initiate a new transfer. The clients timeout when the server stops transmitting, and if they are still missing parts of the file they request them with a block-transfer request. CTFTP is a back-end protocol; a front end is needed to handle naming and security issues.	(ps)
A Methodology for Measuring Applications over the Internet	Calton Pu, Frederick M. Korz, Robert C. Lehman	1990-01-01	Measuring the performance of applications running over live wide area networks such as the Internet is difficult. Many of the important variables (path, network load, etc) cannot be controlled. Monitoring the entire Internet to follow individual messages is impractical. We developed the Layered Refinement methodology to meet these challenges. Our method uses generic software tools to collect data simultaneously at important software layers. Data are analyzed and refined to reduce variance, and correlated to give us a good understanding of each layer. Sample data spanning seven days illustrate the method, which narrows the confidence interval by an order of magnitude to show the end-to-end performance of applications that run over Internet.	(ps)
Automated Sensor Planning and Modeling for Robotic Vision Tasks	Kostantinos Tarabanis	1990-01-01	In this paper we present new results on the automatic determination of the loci of camera poses and optical settings that satisfy the machine vision task requirements of visibility, field-of-view, depth-of-field and resolution for given features of interest. It is important to determine the entire admissible domain of sensor locations, orientations, and settings for each task constraint, so that these component results can be combined in order to find globally admissible sensor parameter values. This work is part of more extensive research that we are pursuing, as part of our "MVP" (Machine Vision Planner) system, on the problem of sensor planning for satisfaction of several ge neric machine vision requirements. When designing a vision system that will satisfy the requirements of the machine vision task at hand, it is necessary to properly select the specifications of the image sensor (e.g. pixel size), as well as decide the image sensor placement and settings. We present techniques to determine the latter two using a synthesis approach to the problem. This approach provides significant advantages over the generate-and-test techniques currently employed, in which, sensor configurations are generated and then tested for satisfaction of the task criteria. Compared to the previous papers we published on "MVP", the new results presented in this paper include techniques and implementation for determining the constraint satisfaction loci when all degrees of freedom of camera placement are taken into account; obtaining globally feasible sensor configurations by posing the constraint satisfaction problem in an optimization setting; analyzing and formulating a new constraint, that of depth-of-field, for the determination of sensor placement and setting that satisfy this constraint; providing a more efficient approach to the visibility constraint; plus presenting the new test results of the above mentioned techniques. Camera placement experiments are shown that demonstrate the method in an actual robotic setup for a general three-dimensional viewing configuration. A camera mounted on a robotic arm is placed and set according to the results of the new technique and camera views are taken to verify that the features of interest are visible, within the camera field-of-view and depth-of-field, and resolvable to the given specification. Results of this research will help automate the vision system design process, assist in programming the vision system itself and lead to intelligent automated robot imaging systems.	(ps)
Scaling Up Rule-Based Development Environments	Naser S. Barghouti, Gail E. Kaiser	1990-01-01	Rule-based development environments (RBDEs) model the software development process in terms of rules that encapsulate development activities, and execute forward and backward chaining on the rules to provide assistance in carrying out the development process. We investigate the scaling up of RBDEs along two dimensions. The first dimension covers the nature of the assistance provided by the RBDE, and the second spans the functionality. There is a spectrum of assistance models implemented by RBDEs. We consider three models: pure automation, strict consistency preservation, and a maximalist assistance model that integrates consistency and automation. The choice of assistance model impacts the functionality of the RBDE, more precisely the solutions to three problems that arise in scaling up RBDEs: (1) multiple views; (2) evolution; and (3) concurrency control. We explore the solutions to the three problems with respect to each of the three assistance models. Throughout the paper, we use the Marvel RBDE as an example.	(ps)
An Information Retrieval Approach for Automatically Constructing Software Libraries	Yoelle S. Maarek, Daniel M. Berry, Gail E. Kaiser	1990-01-01	Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable components. This paper describes a technology for automatically assembling large software libraries that promote software reuse by helping the user locate the components closest to her/his needs. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using a new indexing scheme based on the notions of lexical affinities and quantity of information. Then, a hierarchy for browsing is automatically generated using a clustering technique that draws only on the information provided by the attributes. Thanks to the free-text indexing scheme, tools following this approach can accept free-style natural language queries. This technology has been implemented in the {\sc Guru} system, which has been applied to construct an organized library of {\sc Aix} utilities. An experiment was conducted in order to evaluate the retrieval effectiveness of {\sc Guru} as compared to {\sc InfoExplorer} a hypertext library system for {\sc Aix} 3 on the IBM RISC System/6000 series. We followed the usual evaluation procedure used in information retrieval, based upon recall and precision measures, and determined that our system performs 15\% better on a random test set, while being much less expensive to build than {\sc InfoExplorer	(pdf)
Detection and Exploitation of File Working Sets	Carl Tait, Dan Duchamp	1990-01-01	The work habits of most individuals yield file access patterns that are quite pronounced and can be regarded as defining working sets of files used for particular applications. This paper describes a client-side cache management technique for detecting these patterns and then exploiting them to successfully prefetch files from servers. Trace-driven simulations show the technique substantially increases the hit rate of a client file cache in an environment in which a client workstation is dedicated to a single user. Successful file prefetching carries three major advantages: (1) applications run faster, (2) there is less ``burst'' load placed on the network, and (3) properly-loaded client caches can better survive network outages. Our technique requires little extra code, and DASH because it is simply an augmentation of the standard LRU client cache management algorithm DASH is easily incorporated into existing software.	(ps)
An Object-Based Approach to Implementing Distributed Concurrency Control	Steven S. Popovich, Gail E. Kaiser, S.F. Wu	1990-01-01	We have added distributed concurrency control to the MELD object system by representing in-progress transactions as simulated objects. {\em Transaction objects} exploit MELD's normal message passing facilities to support the concurrency control mechanism. We have completed the implementation of an optimistic mechanism using transaction objects and have designed a two-phase locking mechanism based on the same paradigm. We discuss the tradeoffs made and lessons learned, dealing both with transactions {\em on} objects and with transactions {\em as} objects.	(ps)
Applications of Epsilon-Serializability in Federated Databases	Calton Pu, Avraham Leff	1990-01-01	We apply the concept of {\em epsilon-serializability} (ESR) to data replication. ESR converges to global serializability when the replicated system reachs a quiescent state, i.e., all the inter-site messages arrive their destination. ESR integrates many previous proposals for increasing availability for replicated data and bring new insights.	(pdf) (ps)
Epsilon-Serializability	Calton Pu, Avraham Leff	1990-01-01	We introduce and formalize the concept of espsilon-serializability (ESR). ESR converges to serializability (SR) when the database reachs a quiescent state, i.e., all the update messages are processed. Divergence control methods make ESR practical by restricting the temporary inconsistency in the database. ESR is useful since it allows more concurrency than SR and can be integrated with SR.	(ps)
PIP-1: A Personal Information Portal with wireless access to an information infrastructure	John Ioannidis, Gerald Q. Maguire Jr.	1990-01-01	We discuss our ideas for the Personal Information Portal, its hardware and software platform, our visions for its use and its impact on the student and the professional community.	(ps)
A distributed algorithm for adaptive replication of data	Ouri Wolfson	1990-01-01	We present a distributed algorithm for replication of a data-item in a set of processors interconnected by a tree network. The algorithm is adaptive in the sense that the replication scheme of the item (i.e. the set of processors, each of which stores a replica of the data-item), changes as the read-write pattern of the processors in the network changes. The algorithm is optimal in the sense that when the replication scheme stabilizes, the total number of messages required for the reads and writes is minimal. The algorithm is extremely simple.	(pdf)
Incremental Evaluation of Rules and its Relationship to Parallelism	Ouri Wolfson, Hasanat Dewan, Salvatore Stolfo, Yechiam Yemini	1990-01-01	Rule interpreters usually start with an initial database and perform the inference procedure in cycles, ending with a final database. In a real time environment it is possible to receive updates to the initial database after the inference procedure has started or even after it has ended. We present an algorithm for incremental maintenance of the inference database in the presence of such updates. Interestingly, the same algorithm is useful for parallel and distributed rule processing. When the processors evaluating a program operate asynchronously, then they may have different {\em views} of the database. The incremental maintenance procedure we present can be used to synchronize these views.	(pdf)
Active databases for communication network management	Ouri Wolfson, Soumitra Sengupta, Yechiam Yemini	1990-01-01	This paper has two purposes. First is to propose new database language features for systems used in real time management. These features enable the specification of change-traces, events and correlation among events, and they do so in a declarative set-oriented fashion. Second is to introduce network management as an important and interesting application of active distributed databases.	(pdf)
Getting and Keeping the Center of Attention	Rebecca J. Passonneau	1990-01-01	The present work investigates the contrastive discourse functions of a definite and a demonstrative pronoun in similar contexts of use. It thus provides an opportunity to examine the separate contributions to attentional state [15] of two linguistic features---definiteness and demonstrativity---independently of pronominalization per se. The two pronouns, "it" and "that", have clearly contrastive contexts of use, explained here in terms of distinct pragmatic functions. Certain uses of "it" are claimed to perform a distinctive cohesive function, namely to establish a "local center" (which modifies rather than replaces the notion of a center). The crucial distinction between a local center and the Cb of the centering framework (cf. [38] [13] [14] [21]) is that there is only a single potential local center rather than an ordered set of Cfs. The local center is argued to constitute a reference point in the model of the speech situation in a manner analogous to 1st and 2nd person pronouns. In contrast, a deictic function is posited for apparently anaphoric uses of "that" whereby the attentional status of a discourse entity is changed, or a new discourse entity is constructed based on non-referential constituents of the linguistic structure. Since it is impossible to observe attentional processes directly, I present an empirical method for investigating discourse coherence relations. I analyze statistically significant distributional models in terms of three types of transitions in the cognitive states of conversational participants: expected transitions, unexpected transitions, and transitions with no relevant effect.	(ps)
Complexity of Mortgage Pool Allocation	Salvatore J. Stolfo , Xiangdong Yu	1990-01-01	A new class of NP-hard problems are proved and this result is applied to the complexity analysis of the mortgage pool allocation problem in American banks. A boundary on pool sizes is shown to sharply distinguish hard instances from easy ones in this kind of combinational optimazition problems.	(ps)
Dynamic Programming with Convexity, Concavity, and Sparsity	Zvi Galil, Kunsoo Park	1990-01-01	Dynamic programming is a general problem-solving technique that has been widely used in various fields such as control theory, operations research, biology, and computer science. In many applications dynamic programming problems satisfy additional conditions of convexity, concavity, and sparsity. This paper presents a classification of dynamic programming problems and surveys efficient algorithms based on the three conditions.	(ps)
Inversion of Toeplitz Matrices with only Two Standard Equations	George Labahn, Tamir Shalom	1990-01-01	It is shown that the invertibility of a Toeplitz matrix can be determined through the solvability of two standard equations. The inverse matrix is represented by two of its columns, (which are the solutions of the two standard equations) and the entries of the original Toeplitz matrix.	(ps)
Incremental Algorithms for Minimal Length Paths	Giorgio Ausiello, Giuseppe F. Italiano, Alberto Marchetti Spaccamela, Umberto Nanni	1990-01-01	We consider the problem of maintaining on-line a solution to the All Pairs Shortest Paths Problem in a directed graph $G=(V,E)$ where edges may be dynamically inserted or have their cost decreased. For the case of integer edge costs in a given range $[1\ldots C]$, we introduce a new data structure which is able to answer queries concerning the length of the shortest path between any two vertices in constant time, and to trace out the shortest path between any two vertices in time linear in the number of edges reported. The total time required to maintain the data structure under a sequence of at most $O(n^2)$ edge insertions and at most $O(Cn^2)$ edge cost decreases is $O(Cn^3\log (nC))$ in the worst case, where $n$ is the total number of vertices in $G$. For the case of unit edge costs, the total time required to maintain the data structure under a sequence of at most $O(n^2)$ insertions of edges becomes $O(n^3\log n)$ in the worst case. Both data structures can be adapted to solve the problem of maintaining on-line maximal length paths in directed acyclic graphs. All our algorithms improve on previously known algorithms and are only a logarithmic factor away from the best possible bounds.	(ps)
onstraint-based Text Generation: Using local constraints and argumentation to generate a turn in conversation	Michael Elhadad	1990-01-01	This report describes work on the problem of generating a turn within an ongoing conversation. Three main points are advanced: (1) Previous discourse determines the form and content of a new turn; this work proposes techniques to take the influence of previous discourse into account when generating text. (2) The connection between previous discourse and a new turn can be described as the interaction between five local constraints; local constraints are relations between one discourse segment of the previous discourse and the new turn. This multi-dimensional description of discursive relations allows a flexible definition of complex rhetorical relations. (3) Argumentation is one class of local constraints which has important effects on the form of the language produced and interacts closely with the other types of local constraints used in this work. The problem of generating a turn is then posed as the problem of finding a set of local constraints compatible with each other, with the previous discourse, and with the information to convey. The application of the local constraints determines the value of pragmatic features of the new turn. A sophisticated grammar can then translate these decisions into appropriate lexical and syntactic decisions. The approach is applied to the implementation of an explanation module for the {\sc ADVISOR} expert system. The implementation is based on an extended form of functional unification; this extended formalism is a good choice for handling the problem of constraint interaction.	(ps)
An Object-Oriented Approach to Content Planning for Text Generation	Ursula Wolz	1990-01-01	This paper describes Genie, an object-oriented architecture that generates text with the intent of extending user expertise in interactive environments. Instead of generating text based solely on either discourse goals, intentions, or the domain, we found a need to combine techniques from each. We have developed an object- oriented architecture in which the concepts about which we talk (domain entities), the goals that may be accomplished with them (intentions), and the rhetorical acts through which we express them (discourse goals) are represented as objects with localized knowledge and methods. A three stage process of content selection, content structuring and content filtering is presented. Content selection and structuring allow us to produce text that is within the context of the task at hand for the user. Content filtering allows us to revise and restructure the utterance to achieve clarity and conciseness.	(ps)
Decomposability and Its Role in Parallel Logic-Program Evaluation	Ouri Wolfson, Avi Silberschatz	1990-01-01	This paper is concerned with the issue of parallel evaluation of logic programs. We define the concept of ``program decomposability'', which means that the load of evaluation can be partitioned among a number of processors, without a need for communication among them. This in turn results in a very significant speed-up of the evaluation process. Some programs are decomposable, whereas others are not. We completely syntactically characterize three classes of single rule programs with respect to decomposability: nonrecursive, simple linear, and simple chain programs. We also establish two sufficient conditions for decomposability.	(pdf)
Network Management with Consistently Managed Objects	Shyhtsun F. Wu, Gail E. Kaiser	1990-01-01	A {\it consistency constraint} exists between two objects in a network {\it management information base (MIB)} if a change in value of one object will cause a change in another. Based on the definition of {\it common knowledge} by Halpern, Fagin, and Moses, such constraints can be classified by the time needed to maintain them and the probability of correctly maintaining them. In this paper, we study the practical issues of how to build an MIB with a set of objects and consistency constraints. We will go through an experimental MIB example showing that distributed network entities can be consistently managed in an elegant way, even when the network environment is very dynamic.	(ps)
Redundancy Management in a Symmetric Main Memory Database	Calton Pu, Avraham Leff, Frederick Korz, Shu-Wie Chen	1990-01-01	We describe the architecture of a symmetric distributed main memory database. The high performance networks in many large distributed systems enable a machine to reach the main memory of other nodes faster than local disks. Thus we introduce {\em remote memory} as an additional layer in the memory hierarchy between local memory anddisks. Exploiting the remote memory (every node's cache) improves performance and increases availability. We have created a simulation program that shows this significant advantage over a wide range of cache sizes and compares the effects of several object replacement policies in the symmetric distributed main memory database.	(ps)
WHY A SINGLE PARALLELIZATION STRATEGY IS NOT ENOUGH IN KNOWLEDGE BASES	Simona Rabinovici Cohen, Ouri Wolfson	1990-01-01	We address the problem of parallelizing the evaluation of logic programs in data intensive applications. We argue that the appropriate parallelization strategy for logic-program evaluation depends on the program being evaluated. Therefore, this paper is concerned with the issues of program-classification, and parallelization-strategies. We propose several parallelization strategies based on the concept of data-reduction -- the original logic-program is evaluated by several processors working in parallel, each using only a subset of the database. The strategies differ on the evaluation cost, the overhead of communication and synchronization among processors, and the programs to which they are applicable. In particular, we start our study with pure-parallelization, i.e., parallelization without overhead. An interesting class-structure of logic programs is demonstrated, when considering amenability to pure-parallelization. The relationship to the NC complexity class is demonstrated. Then we propose strategies that do incur an overhead,but are optimal in a sense that will be precisely defined. This paper makes the initial steps towards a theory of parallel logic-programming.	(pdf)
A Comparison of Cache Performance in Server-Based and Symmetric Database Architectures	Avraham Leff, Calton Pu, Frederick Korz	1990-01-01	We study the cache performance in a symmetric distributed main-memory database. The high performance networks in many large distributed systems enable a machine to reach the main memory of other nodes more quickly than the time to access local disks. We therefore introduce {\em remote memory} as an additional layer in the memory hierarchy between local memory and disks. In order to appreciate the tradeoffs of memory and cpu in the symmetric architecture, we compare system performance in alternative architectures. Simulations show that, by exploiting remote memory (in each node's cache), performance improves over a wide range of cache sizes as compared to a distributed client/server architecture. We also compare the symmetric model to a centralized-server model and parameterize the performance tradeoffs.	(ps)
An Object-Oriented Framework for Modeling Cooperation in Multi-Agent Rule-Based Development Environments	N. S. Barghouti, G. E. Kaiser	1990-01-01	Rule-Based Development Environments (RBDEs) exploit expert systems via rule-based modeling of the software development process. RBDEs store the software artifacts in a project database, and define each software development activity that manipulates these artifacts as a rule. Opportunistic forward and backward chaining on the rules automates some of chores that developers would have otherwise done manually and ensures consistency in the database. Existing RBDEs do not scale up to real software development projects involving large teams of developers, in part because the current expert system paradigm does not handle multiple agents sharing the working memory of the rule system. In this article, we investigate the scaling up of RBDEs through the use of an object-oriented framework that integrates the structural and behavioral aspects of data modeling and rule-based process modeling. We use this framework to support both consistency maintenance and cooperation in multi-agent RBDEs. }}	(ps)
A Formalization and Implementation of Topological Visual Navigation in Two Dimensions	John R. Kender, Il-Pyung Park, David Yang	1990-01-01	In this paper we formalize and implement a model of topological visual navigation in two-dimensional spaces. Unlike much of traditional quantitative visual navigation, the emphasis throughout is on the methods and the efficiency of qualitative visual descriptions of objects and environments, and on the methods and the efficiency of direction-giving by means of visual landmarks. We formalize three domains--the world itself, the map-maker's view of it, and the navigator's experience of it--and the concepts of custom maps and landmarks. We specify, for a simplified navigator (the ``level helicopter'') the several ways in which visual landmarks can be chosen, depending on which of several costs (sensor, distance, or communication) should be minimized. We show that paths minimizing one measure can make others arbitrarily complex; the algorithm for selecting the path is based on a form of Dijkstra's algorithm, and therefore automatically generates intelligent navigator overshooting and backtracking. We implement, using an arm-held camera, such a navigator, and detail its basic seek-and-adjust behaviors as it follows visual highways (or departs from them) to reach a goal. Seeking is based on topology, and adjusting is based on symmetry; there are essentially no quantitative measures. We describe under what circumstances its environment is visually difficult and perceptively shadowed, and describe how errors in path-following impact landmark selection. Since visual landmark selection and direction-giving are in general NP-complete, and rely on the nearly intractable concept of characteristic views, we suggest some heuristics; one is that the landmark object ``itself'', rather than its views, may be its most compact encoding. We conclude with speculations about the feasibility of intelligent navigation in very large self-occluding visual worlds.	(ps)
Parallel Least-Squares Solution of General and Toeplitz-like Linear Systems	Victor Pan	1990-01-01	We use $O(\log^2n)$ parallel arithmetic steps and $n^2$ processors to compute the least-squares solution ${\bf x} = A^+{\bf b}$ to a linear system of equations, $A {\bf x} = {\bf b}$, given a $g\times h$ matrix $A$ and a vector ${\bf b}$, both filled with integers or with rational numbers, provided that $g+h\le 2n$ and that $A$ is given with its displacement generator of length $r = O(1)$ and thus has displacement rank $O(1)$. For a vector ${\bf b}$ and for a general $p\times q$ matrix $A$ (with $p + q \le n$), we compute $A^+$ and $A^+ {\bf b}$ by using $O(\log^2 n)$ parallel arithmetic steps and $n^{2.851}$ processors, and we may also compute $A^+{\bf b}$ by using O$(n^{2.376})$ arithmetic operations.	(ps)
On the evaluation of the eigenvalues of a banded Toeplitz block matrix	Dario Bini, Victor Pan	1990-01-01	Let $A=(a_{ij})$ be an $n\times n$ Toeplitz matrix with bandwidth $k+1,\ k=r+s$, that is, $a_{ij}=a_{j-i},\ i,j=1,\ldots ,n,\ a_i=0\ \hbox{ if} \ i>s\ \hbox{ and if}\ i<-r$. We compute $p(\lambda)=\det (A-\lambda I)$, as well as $p(\lambda)/p'(\lambda)$, where $p'(\lambda)$ is the first derivative of $p(\lambda)$, by using $O(k\log k\log n)$ arithmetic operations. Moreover, if $a_i$ are $m\times m$ matrices, so that $A$ is a banded Toeplitz block matrix, then we compute $p(\lambda)$, as well as $p(\lambda)/p'(\lambda)$, by using $O(m^3k(\log^2k+\log n)+ m^2k\log k\log n)$ arithmetic operations. The algorithms can be extended to the computation of $\det(A-\lambda B)$ and of its first derivative, where both $A$ and $B$ are banded Toeplitz matrices. The algorithms may be used as a basis for iterative solution of the eigenvalue problem for the matrix $A$ and of the generalized eigenvalue problem for $A$ and $B$.	(ps)
Computing matrix eigenvalues and polynomial zeros where the output is real	Dario Bini, Victor Pan	1990-01-01	Surprisingly simple corollaries from the Courant-Fischer minimax characterization theorem enable us to devise a very effective algorithm for the evaluation of a set $ S$ interleaving the set $E$ of the eigenvalues of a real symmetric tridiagonal matrix $T_n$ (as well as a point that splits $E$ into two subsets of comparable cardinality). This is similar to the more complicated algorithms of [BOT] and [BFKT] that compute such interleaving sets and splitting points for the set of the zeros of a polynomial $p(\lambda)$ having only real zeros. As a result, the record upper estimates for the parallel and sequential complexity of approximating to all the zeros of such a polynomial are obtained in an alternate way and, moreover, are extended to the eigenvalue computation for $T_n$. Furthermore, we dramatically decreased the previous record upper estimates for the parallel complexity of the latter problem, to the level within polylogarithmic factors from the straightforward lower bounds.	(ps)
On a Recursive Triangular Factorization of Matrices	Victor Pan	1990-01-01	We estimate the parallel and sequential complexity of computing a recursive triangular factorization of an $n\times n$ matrix, which may, in particular, replace Choleski factorization as a means of computing the QR-factorization of a matrix and may also have some other applications. The estimates for the storage space, sequential time and the number of processors required for the computation and storage of such a recursive factorization is by almost the factor of $n$ less than the similar estimates for the Choleski factorization in the case of Toeplitz-like and Hankel-like matrices.	(ps)
A Framework for Immibrating Existing Software into New Software Development Environements	Michael H. Sokolsky, Gail E. Kaiser	1990-01-01	We have investigated the problem of immigrating software artifacts from one software development environment (SDE) to another for the purpose of upgrading to new SDEs as technology improves, while continuing development or maintenance of existing software systems. We first taxonomize the larger problem of data migration, to establish the scope of immigration. We then classify SDEs in terms of the ease of immigrating software artifacts out of the data repository of the source SDE without knowledge of its internal representation. A framework is presented for constructing automatic immigration tools as utilities provided by destination SDEs. We describe a specific immigration tool, called Marvelizer, that we have implemented as part of the Marvel SDE and discuss our experience using the tool.	(ps)
Complexity of Computations with Matrices and Polynomials	Victor Pan	1990-01-01	We review the complexity of polynomial and matrix computations, as well as their various correlations to each other and some major techniques for the design of algebraic and numerical algorithms.	(ps)
Parametrization of Newton's Iteration for Computations with Structured Matrices and Applications	Victor Pan	1990-01-01	We apply a new parametrized version of Newton's iteration in order to compute (over any field $F$ of constants) the solution or a least-squares solution to a linear system $Bx = v$ with an $n x n$ Toeplitz or Topelitz-like matrix $B$, as well as the determinant of $B$ and the coefficients of its characteristic polynomial, $det(\lambda I-B)$, dramatically improving the processor efficiency of the know fast parallel algorithms. Our algorithms, together with some previously know and some resect results, as well as with our new techniques for computing polynomial GCDs and LCMs, imply respective improvement of known estimates for parallel arithmetic complexity of several fundamental computations with polynomials and with both structured and general matrices.	(ps)
Understanding Data Refinement Using Procedural Refinement	Steven M. Kearns	1990-01-01	Data refinement is converting a program that uses one set of variables to an equally correct program that uses another set of variables, usually of different types. There have been a number of seemingly different mathematical definitions of data refinement. We present a unifying view of data refinement as a special case of procedural refinement, which is simpler to understand. All the data refinement theories in the literature are shown to be instances of two formulas, but we show that there are actually an infinite number of theories. In addition, we introduce the concepts of nonlocal data refinement, data refinement using semi-invariants, and procedural optimization using data refinement.	(ps)
Real-time visual servoing	Peter Allen, Billibon Yoshimi, Aleksandar Timcenko	1990-01-01	This paper describes a new real-time tracking algorithm in conjunction with a predictive filter to allow real-time visual servoing of a robotic arm. The system consists of two calibrated cameras that provide images to a real-time, pipelined-parallel optic-flow algorithm that can robustly compute optic-flow and calculate the 3-D position of a moving object at approximately 5 Hz rates. These 3-D positions serve as input to a predictive kinematic control algorithm that uses an $alpha~ - ~beta~ - ~gamma$ filter to update the position of a robotic arm tracking the moving object. Experimental results are presented for the tracking of a moving model train in a variety of different trajectories.	(ps)
IMAGE WARPING WITH SPATIAL LOOKUP TABLES	George Wolberg	1989-01-01		(pdf)
A REGULARIZATION METHOD FOR THE SOLUTION OF THE SHAPE FROM SHADOWS PROBLEM	Michael G. Hatzitheodorou, Tomasz Jackowski, Anargyros Papageorgiou	1989-01-01		(pdf)
A DYNAMIC DATA STRUCTURE FOR GENERAL SERIES PARALLEL DIGRAPHS	G. F. Italiano, A. Marchetti Spaccamela, U. Nanni	1989-01-01		(pdf)
DICTIONARIES FOR LANGUAGE GENERATION ACCOUNT FOR CO-OCCURRENCE KNOWLEDGE	Franck Smadja	1989-01-01		(pdf)
A PROCEDURE FOR THE SELECTION OF CONNECTIVES IN TEXT GENERATION: HOW DEEP IS THE SURFACE?	M. Elhadad, K. R. McKeown	1989-01-01		(pdf)
EXTENDED FUNCTIONAL UNIFICATION PROGRAMMARS	M. Elhadad	1989-01-01		(pdf)
A VISUAL LANGUAGE FOR BROWSING, UNDOING AND REDOING GRAPHICAL INTERFACE COMMANDS	David Kurlander, Steven Feiner	1989-01-01		(ps)
ENERGY-BASED SEGMENTATION OF VERY SPARSE RANGE SURFACES	Terrance E. Boult, Mark Lerner	1989-01-01	This paper describes a new segmentation technique for very sparse surfaces which is based on minimizing the energy of the surfaces in the scene. While it could be used in almost any system as part of surface reconstruction/model recovery, the algorithm is designed to be usable when the depth information is scattered and very sparse, as is generally the case with depth generated by stereo algorithms. We show results from a sequential algorithm and discuss a working prototype that executes on the massively parallel Connection Machine. The idea of segmentation by energy minimization is not new, however past techniques have relied on discrete regularization or Markov random fields to model the surfaces to build smooth surfaces and detect depth edges. In addition, both of the aforementioned techniques are ineffective at energy minimization for very sparse data. The technique presented herein models thesurfaces with reproducing kernel-based splines which can be shown to solve a regularized surface reconstruction problem. From the functional form of these splines we derive computable upper and lower bounds on the energy of a surface over a given finite region. The computation of the spline, and the corresponding surface representation are quite efficient for very sparse data. An interesting property of the algorithm is that it makes no attempt to determine segmentation boundaries; the algorithm can be viewed as a classification scheme which segments the data into collections of points which are ``from'' the same surface.	(pdf)
COST FUNCTION ERROR IN ASYNCHRONOUS PARALLEL SIMULATED ANNEALING ALGORITHMS	M. D. Durand	1989-01-01	Reducing synchronization constraints in parallel simulated annealing algorithms can improve performance. However, this introduces error in the global cost function. Previous work in parallel simulated annealing suggests that if the amount of error in the cost function is controlled, good quality results can still be obtained. In this paper, we present a model of error in asynchronous parallel simulated annealing algorithms to partition graphs and use it to predict the behavior of three different synchronization strategies. These three approaches were implemented on a 20-processor Encore, a shared memory, MIMD multiprocessor, and tested on a variety of graphs. As predicted, the strategy which allows controlled error yields solutions comparable to those of the serial algorithm with greatly improved running time. Speedups from 5 to 11 (depending on the density of the graphs) using 16 processors were obtained. In contrast, the more synchronized version exhibited unacceptably high running times, whereas the version characterized by uncontrolled error yielded significantly poorer results. This confirms behavior seen in parallel simulated annealing algorithms to perform placement in VLSI circuit layout systems.	(pdf)
DATA MIGRATION IN AN OBJECT-ORIENTED SOFTWARE DEVELOPMENT ENVIRONMENT	Michael Sokolsky	1989-01-01		(pdf)
MODULAR SYNCHRONIZATION IN MULTIVERSION DATABASES: VERSION CONTROL AND CONCURRENCY CONTROL	Soumitra Sengupta, Divyakant Agrawal	1989-01-01		(pdf)
TESTING RELIABLE DISTRIBUTED APPLICATIONS THROUGH SIMULATED EVENTS	Travis L. Winfrey, Gail E. Kaiser	1989-01-01		(pdf)
{MARVEL} IMPLEMENTATION GUIDE	Michael Sokolsky	1989-01-01		(pdf)
THE {LLC} PARALLEL LANGUAGE AND ITS IMPLEMENTATION ON {DADO2}	Russell C. Mills	1989-01-01		(pdf)
A CRITIQUE OF THE {LLC} PARALLEL LANGUAGE AND SOME SOLUTIONS	Russell C. Mills	1989-01-01		(pdf)
{DADO2} {LLC} USERS' MANUAL	Russell C. Mills	1989-01-01		(pdf)
{LLC} REFERENCE MANUAL	Russell C. Mills	1989-01-01		(pdf)
A TUTORIAL INTRODUCTION TO {LLC}	Russell C. Mills	1989-01-01		(pdf)
ON-LINE MAINTENANCE OF MINIMAL DIRECTED HYPERGRAPHS	G. F. Italiano, U. Hanni	1989-01-01		(pdf)
EXPLORING ``MULTIPLE WORLDS'' IN PARALLEL	Jonathan M. Smith, Gerald Q. Maguire Jr.	1989-01-01		(pdf)
SYSTEM SUPPORT FOR ``MULTIPLE WORLDS''	Jonathan M. Smith, Gerald Q. Maguire Jr.	1989-01-01		(pdf)
TOWARDS A FRAMEWORK FOR COMPARING OBJECT-ORIENTED SYSTEMS	Shyhtsun F. Wu	1989-01-01		(pdf)
OBJECT-ORIENTED PROGRAMMING LANGUAGE FACILITIES FOR CONCURRENCY CONTROL	Gail E. Kaiser	1989-01-01		(pdf)
MELDING MULTIPLE GRANULARITIES OF PARALLELISM	Gail E. Kaiser, Wenwey Hseush, Steven S. Popovich, Shyhtsun Felix Wu	1989-01-01		(pdf)
CONCURRENT EXECUTION OF MUTUALLY EXCLUSIVE ALTERNATIVES ({Ph.D}. DISSERTATION	Jonathan M. Smith	1989-01-01		(pdf)
ACQUISITION AND INTERPRETATIONS OF 3-D SENSOR DATA FROM TOUCH	Peter K. Allen, Paul Michelman	1989-01-01		(pdf)
INCREMENTAL ATTRIBUTE EVALUATION WITH APPLICATIONS TO MULTI-USER LANGUAGE-BASED ENVIRONMENTS	Josephine Micallef	1989-01-01		(pdf)
A FLEXIBLE TRANSACTION MODEL FOR SOFTWARE ENGINEERING	Gail E. Kaiser	1989-01-01		(pdf)
EXPERIENCE WITH PROCESS MODELING IN THE {MARVEL} SOFTWARE DEVELOPMENT ENVIRONMENT KERNEL	Gail E. Kaiser	1989-01-01		(pdf)
MICROCODING THE LEXICON WITH CO-OCCURRENCE KNOWLEDGE LANGUAGE GENERATION, Thesis Proposal, {June} 89	Frank Smadja	1989-01-01		(pdf)
INTERACTIVE MULTIMEDIA EXPLANATION FOR EQUIPMENT MAINTENANCE AND REPAIR	Steven Feiner, Kathleen R. McKeown	1989-01-01		(pdf)
INDUCTIVE LEARNING WITH {BCT}	Philip K. Chan	1989-01-01	BCT (Binary Classification Tree) is a system that learns from examples and represents learned concepts as a binary polythetic decision tree. Polythetic trees differ from monothetic decision trees in that a logical combination of multiple (versus a single) attribute values may label each tree arc. Statistical evaluations are used to recursively partition the concept space in two and expand the tree. As with standard decision trees, leaves denote classifications. Classes are predicted for unseen instances by traversing appropriate branches in the tree to the leaves. Empirical results demonstrated that BCT is generally more accurate or comparable to two earlier systems.	(pdf) (ps)
HAPTIC PERCEPTION WITH A ROBOT HAND: REQUIREMENTS AND REALIZATION	Paul Michelman, Peter K. Allen	1989-01-01		(pdf)
VALUED REDUNDANCY	C. Pu, A. Leff, S.W. Chen, F. Korz, J. Wha	1989-01-01		(pdf)
ON THE COMPLEXITY OF CONSTRAINED DISTANCE TRANSFORMS AND DIGITAL DISTANCE MAP UPDATES IN TWO DIMENSIONS	Terrence E. Boult	1989-01-01		(pdf)
AN OPTIMIZATION TO THE TWO-PHASE COMMITMENT PROTOCOL	Daniel J. Duchamp	1989-01-01		(pdf)
ANALYSIS OF TRANSACTION MANAGEMENT PERFORMANCE	Daniel J. Duchamp	1989-01-01		(pdf)
A NON-BLOCKING COMMITMENT PROTOCOL	Daniel J. Duchamp	1989-01-01		(pdf)
EXPERIENCE WITH A MULTI-THREADED EVENT-DRIVEN PROGRAM	Daniel J. Duchamp	1989-01-01		(pdf)
AN ABORT PROTOCOL FOR NESTED, DISTRIBUTED TRANSACTIONS	Daniel J. Duchamp	1989-01-01		(pdf)
MODELING CONCURRENCY IN PARALLEL DEBUGGING	Wenwey Hseush, Gail E. Kaiser	1989-01-01		(pdf)
THE {MELD} PROGRAMMING LANGUAGE USER MANUAL	Bill N. Schilit, Wen-Wey Hseush, Shyhtsun Felix Wu, Steven S. Popovich	1989-01-01		(ps)
THE MULTICAST POLICY AND ITS RELATIONSHIP TO REPLICATED DATA PLACEMENT	Amir Milo, Ouri Wolfson	1989-01-01	In this paper we consider the communication complexity of maintaining the replicas of a logical data-item, in a database distributed over a computer network. We propose a new method, called the minimum spanning tree write, by which a processor in the network should multicast a write of a logical data-item, to all the processors that store replicas of the item. Then we show that the minimum spanning tree write is optimal from the communication cost point of view. We also demonstrate that the method by which a write is multicast to all the replicas of a data-item, affects the optimal replication scheme of the item, i.e., at which processors in the network the replicas should be located. Therefore, next we consider the problem of determining an optimal replication scheme for a data item, assuming that each processor employs the minimum spanning tree write at run-time. The problem for general networks is shown NP-Complete, but we provide efficient algorithms to obtain an optimal allocation scheme for three common types of network topologies. They are completely-connected, tree, and ring networks. For these topologies, efficient algorithms are also provided for the case in which reliability considerations dictate a minimum number of replicas.	(pdf)
A NEW PARADIGM FOR PARALLEL AND DISTRIBUTED RULE-PROCESSING	Ouri Wolfson, Aya Ozeri	1989-01-01	This paper is concerned with the parallel evaluation of datalog rule programs, mainly by processors that are interconnected by a communication network. We introduce a paradigm, called data-reduction, for the parallel evaluation of a general datalog program. Several parallelization strategies discussed previously in [CW, GST, W, WS] are special cases of this paradigm. The paradigm parallelizes the evaluation by partitioning among the processors the instantiations of the rules. After presenting the paradigm, we discuss the following issues, that we see fundamental for parallelization strategies derived from the paradigm: properties of the strategies that enable a reduction in the communication overhead, decomposability, load balancing, and application to programs with negation. We prove that decomposability, a concept introduced previously in [WS, CW], is undecidable.	(pdf)
AVERAGE CASE COMPLEXITY OF MULTIVARIATE INTEGRATION	H. Wozniakowski	1989-01-01		(pdf)
ESTIMATING THE LARGEST EIGENVALUE BY THE POWER AND LANCZOS ALGORITHMS WITH A RANDOM START	J. Kuczynski, H. Wozniakowski	1989-01-01		(pdf)
FINDING NEW RULES FOR INCOMPLETE THEORIES: INDUCTION WITH EXPLICIT BIASES IN VARYING CONTEXTS	Andrea Danyluk	1989-01-01		(pdf)
A SURVEY OF MACHINE LEARNING SYSTEMS INTEGRATING EXPLANATION-BASED AND SIMILARITY-BASED METHODS	Andrea Danyluk	1989-01-01		(pdf)
AN IMPROVED ALGORITHM FOR APPROXIMATE STRING MATCHING	Zvi Galil, Kunsoo Park	1989-01-01		(pdf)
A LINEAR-TIME ALGORITHM FOR CONCAVE ONE-DIMENSIONAL DYNAMIC PROGRAMMING	Zvi Galil, Kunsoo Park	1989-01-01		(pdf)
AN OVERVIEW OF THE {Synthesis} OPERATING SYSTEM	Calton Pu, Henry Massalin	1989-01-01		(pdf)
SPARSE DYNAMIC PROGRAMMING {I}: LINEAR COST FUNCTIONS	D. Eppstein, Z. Galil, R. Giancarlo, G. F. Italiano	1989-01-01		(pdf)
SPARSE DYNAMIC PROGRAMMING {II}: CONVEX AND CONCAVE COST FUNCTIONS	D. Eppstein, Z. Galil, R. Giancarlo, G. F. Italiano	1989-01-01		(pdf)
DATA STRUCTURES AND ALGORITHMS FOR DISJOINT SET UNION PROBLEMS	Z. Galil, G. F. Italiano	1989-01-01		(pdf)
TELEOPERATOR CONTROL OF THE {Utah}/{MIT} DEXTROUS HAND-- A FUNCTIONAL APPROACH	Amy Morishima, Thomas H. Speeter	1989-01-01		(ps)
MAPPING HAPTIC EXPLORATORY PROCEDURES TO MULTIPLE SHAPE REPRESENTATIONS	Peter K. Allen	1989-01-01		(pdf)
THE APPLICATION OF MICROECONOMICS TO THE DESIGN OF RESOURCE ALLOCATION AND CONTROL ALGORITHMS - {Ph.D}. Dissertation	Donald Francis Ferguson	1989-01-01		(pdf)
{PSi}: A SILICON COMPILER FOR VERY FAST PROTOCOL PROCESSING	H. Abu-Amara, T. Balraj, T. Barzilai, Y. Yemini	1989-01-01		(pdf)
{NEST}: A NETWORK SIMULATION \& PROTOTYPING TESTBED	A. Dupuy, J. Schwartz, Y. Yemini, D. Bacon	1989-01-01		(pdf)
LOGIC LEVEL AND FAULT SIMULATION ON THE {RP3} PARALLEL PROCESSOR	Steven Unger	1989-01-01		(pdf)
ROBOT ACTIVE TOUCH EXPLORATION: CONSTRAINTS AND STRATEGIES	Kenneth S. Roberts	1989-01-01		(pdf)
COORDINATING A ROBOT ARM AND MULTI-FINGER HAND	Kenneth S. Roberts	1989-01-01		(pdf)
ON THE AVERAGE GENUS OF A GRAPH	Jonathan L. Gross, E. Ward Klein, Robert G. Rieper	1989-01-01		(pdf)
STRATIFIED GRAPHS	Jonathan L. Gross, Thomas W. Tucker	1989-01-01		(pdf)
GRAPH IMBEDDINGS AND OVERLAP MATRICES -- Preliminary Report	Jainer Chen, Jonathan L. Gross	1989-01-01		(pdf)
FULL TEXT INDEXING BASED ON LEXICAL RELATIONS -- AN APPLICATION: SOFTWARE LIBRARIES	Yoelle S. Maarek, Frank A. Smadja	1989-01-01		(pdf)
LEXICAL CO-OCCURRENCE: THE MISSING LINK	Frank A. Smadja	1989-01-01		(pdf)
A CONTRASTIVE EVALUATION OF FUNCTIONAL UNIFICATION GRAMMAR FOR SURFACE LANGUAGE GENERATION: A CASE STUDY IN CHOICE OF CONNECTIVES	Kathleen R. McKeown, Michael Elhadad	1989-01-01		(pdf)
RADIO PROJECT USER MODEL (Version 1.0	Matthew Kamerman	1989-01-01		(pdf)
{FUF}: THE UNIVERSAL UNIFIER -- USER MANUAL, VERSION 2.0	Michael Elhadad	1989-01-01		(pdf)
LEARNING THE PROBABILITY DISTRIBUTION FOR PROBABILISTIC EXPERT SYSTEMS	Michelle Baker, Kenneth Roberts	1989-01-01		(pdf)
AN ALGORITHM TO RECOVER GENERALIZED CYLINDERS FROM A SINGLE INTENSITY VIEW	Ari D. Gross, Terrance E. Boult	1989-01-01		(pdf)
AN OPTIMAL $O(\log \log n$ TIME PARALLEL STRING MATCHING ALGORITHM	Dany Breslauer	1989-01-01		(pdf)
IMAGE UNDERSTANDING AND ROBOTICS RESEARCH AT {Columbia University}	John R. Kender, Peter K. Allen, Terrance E. Boult	1989-01-01		(pdf)
SHAPE FROM TEXTURES: A PARADIGM FOR FUSING MIDDLE LEVEL VISION CUES -- {Ph.D}. Dissertation	Mark Laurence Moerdler	1989-01-01		(pdf)
USER-DEFINED PREDICATES IN {OPS5}: A NEEDED LANGUAGE EXTENSION FOR FINANCIAL EXPERT SYSTEMS	A. J. Pasik, D. P. Miranker, S. J. Stolfo, T. Kresnicka	1989-01-01		(pdf)
TECHNIQUES FOR BUILDING HIGHLY AVAILABLE DISTRIBUTED FILE SYSTEMS	Carl D. Tait	1989-01-01	This paper analyzes recent research in the field of distributed file systems, with a particular emphasis on the problem of high availability. Several of the techniques involved in building such a system are discussed individually: naming, replication, multiple versions, caching, stashing, and logging. These techniques range from extensions of ideas used in centralized file systems, through new notions already in use, to radical ideas that have not yet been implemented. A number of working and proposed systems are described in conjunction with the analysis of each technique. The paper concludes that a low degree of replication, a liberal use of client and server caching, and optimistic behavior in the face of network partition are all necessary to ensure high availability.	(pdf) (ps)
{MARVEL} USERS MANUAL, version 2.5	Michael Sokolsky, Mara Cohen	1989-01-01		(pdf)
THE COMMUNICATION COMPLEXITY OF ATOMIC COMMITMENT AND OF GOSSIPING	Ouri Wolfson	1989-01-01	We consider the problem of atomic commitment of a transaction in a distributed database. This is a variant of the famous gossiping problem (see [HHL] for a survey). Given a set of communication costs between pairs of participant sites, we establish that the necessary communication cost for any atomic commitment algorithm is twice the cost of a certain minimum spanning tree. We also establish the necessary communication time for any atomic commitment algorithm, given a set of communication delays between pairs of participant sites, and the time at which each participant completes its subtransaction. Then we determine that both lower bounds are also upper bounds in the following sense. There is an efficient (i.e. polynomial-time) algorithm that, in the absence of failures, has a minimum communication cost. There is another efficient algorithm that, in the absence of failures, has a minimum communication time. However, unless P=NP, there is no efficient algorithm which has a minimum communication complexity, namely, for which the product of communication cost and communication time is minimum. Then we present a simple, linear time, distributed algorithm, called TREE-COMMIT, whose communication complexity is not worse than $p$ times the minimum complexity, where $p$ is the number of participants. Finally, we demonstrate that TREE-COMMIT is superior to the existing variations of the two-phase commit protocol.	(pdf)
Surface Orientation from Texture Autocorrelation	Lisa Gottesfeld Brown	1989-01-01	We report on a refinement of our technique for determining the orientation of a textured surface from the two-point autocorrelation function of its image. We replace our previous assumptions of isotropic texture by knowledge of the autocorrelation moment matrix of the texture when viewed head on. The orientation of a textured surface is then deduced from the effects of foreshortening on these autocorrelation moments. This technique is applied to natural images of planar textured surfaces and gives significantly improved results when applied to anisotropic textures which under the assumption of isotropy mimic the effects of projective foreshortening. The potential practicality of this method for higher level image understanding systems is discussed.	(pdf)
CAMERA PLACEMENT PLANNING AVOIDING OCCLUSION: TEST RESULTS USING A ROBOTIC HAND/EYE SYSTEM	Kostantinos Tarabanis, Roger Y. Tsai	1989-01-01	Camera placement experiments are presented that demonstrate the effectiveness of a viewpoint planning algorithm that avoids occlusion of a visual target. A CCD camera mounted on a robot in a hand-eye configuration is placed at planned unobstructed viewpoints to observe a target on a real object. The validity of the method is tested by placing the camera inside the viewing region, that is constructed using the proposed new sensor placement planning algorithm, and observing whether the target is truly visible. The accuracy of the boundary of the constructed viewing region is tested by placing the camera at the critical locations of the viewing region boundary and confirming that the target is barely visible. The corresponding scenes from the candidate viewpoints are shown demonstrating that occlusions are properly avoided.	(pdf)
MODEL-BASED PLANNING OF SENSOR PLACEMENT AND OPTICAL SETTINGS	Kostantinos Tarabanis, Roger Y. Tsai	1989-01-01	We present a model-based vision system that automatically plans the placement and optical settings of vision sensors in order to meet certain generic task requirements common to most industrial machine vision applications. From the planned viewpoints, features of interest on an object will satisfy particular constraints in the image. In this work, the vision sensor is a CCD camera equipped with a programmable lens (i.e. zoom lens) and the image constraints considered are: visibility, resolution and field of view. The proposed approach uses a geometric model of the object as well as a model of the sensor, in order to reason about the task and the environment. The sensor planning system then computes the regions in space as well as the optical settings that satisfy each of the constraints separately. These results are finally combined to generate acceptable viewing locations and optical settings satisfying all constraints simultaneously. Camera planning experiments are described in which a robot-arm positions the camera at a computed location and the planned optical settings are set automatically. The corresponding scenes from the candidate viewpoints are shown demonstrating that the constraints are indeed satisfied. Other constraints, such as depth of focus, as well as other vision sensors can also be considered resulting in a fully integrated sensor planning system.	(pdf)
EXTENDING THE {\em Mercury} SYSTEM TO SUPPORT TEAMS OF {Ada} PROGRAMMERS	Josephine Micallef, Gail E. Kaiser	1989-01-01		(pdf)
SYNCHRONIZATION OF MULTIPLE AGENTS IN RULE-BASED DEVELOPMENT ENVIRONMENTS -- Thesis Proposal	Naser S. Barghouti	1989-01-01	The Rule-Based Development Environment (RBDE) is a recently-developed approach for providing intelligent assistance to developers working on a large-scale software project. RBDEs model the development process in terms of rules, and then enact this model by automatically firing rules at the appropriate time. The RBDE approach has been used to develop single-user environments, but support for multiple developers cooperating on the same project is still not available because of the lack of mechanisms that can synchronize the efforts of multiple developers, who concurrently select commands, causing the firing of multiple rules (either directly or via chaining) that concurrently access shared data. Conflicts between different rules and concurrent access to shared data may cause the violation of consistency in the project database, and thus necessitate the synchronization of concurrent activities. The conjecture of this proposal is that an RBDE can provide the required synchronization if it is provided with knowledge about what it means for the data of a specific project to be in a consistent state, and about the semantics of operations that developers perform on the data. The research that this paper proposes will formulate a framework for specifying consistency of data in an RBDE, and formulate a mechanism for synchronizing the actions of concurrent rules fired on behalf of multiple developers cooperating on a common or different tasks.	(pdf)
PARALLEL BOTTOM-UP EVALUATION OF {Datalog} PROGRAMS BY LOAD SHARING	Ouri Wolfson	1989-01-01	We propose a method of syntactically parallelizing bottom-up-evaluation of logic programs. The method is distinguished by the fact that it is .i pure i.e., does not require interprocessor communication, or synchronization overhead. The method cannot be used to parallelize every logic program, but we syntactically characterize several classes of logic programs that are $sharable$, i.e. programs to which the method can be applied (e.g., the class of linear single rule programs). We also provide a characterization of a class of $nonsharable$ programs, and demonstrate that sharability is a fundamental semantic property of the program, that is independent of the parallelization method proposed in this paper.	(pdf)
IMAGE-FLOW COMPUTATION: ESTIMATION-THEORETIC FRAMEWORK, UNIFICATION AND INTEGRATION, THESIS PROPOSAL	Ajit Singh	1989-01-01	Visual-motion is a major source of three-dimensional information. It is commonly recovered from time-varying imagery in the form of a two-dimensional image-flow field. This thesis is about computation of image-flow. A new framework is proposed that classifies the image-flow information available in time-varying imagery into two categories - conservation information and neighborhood information. Each type of information is recovered in the form of an estimate accompanied by a covariance-matrix. Image-flow is then computed by fusing the two estimates using estimation-theoretic techniques. This framework is shown to allow estimation of certain types of discontinuous flow-fields without any a-priori knowledge about the location of discontinuities. An algorithm based on this framework is described. Results of applying this algorithm to a variety of image-sequences are also discussed. The new framework is shown to be applicable identically to each one of the three major approaches for recovering conservation information, i.e., gradient-based approach, correlation-based approach and spatiotemporal energy-based approach. The formulation of neighborhood information used in this framework is also shown to reduce to some of the existing smoothing-based formulations under various simplifying assumptions. In essence, the framework described in this thesis unifies various existing approaches for image-flow computation. Such unification is shown to be useful in analyzing various existing frameworks as well as in generating new frameworks. The new framework is also shown to serve as a platform to integrate the three approaches mentioned above. It is observed that the measurements obtained by the three approaches have different error-characteristics. This situation is regarded analogous to the multi-sensor fusion problem, where the algorithms based on the three approaches behave as multiple sensors measuring image-flow. An integrated framework is described that applies the principles of statistical estimation theory to fuse the measurements obtained from different approaches. The resulting estimate of image-flow has the minimum expected-error. Some algorithms based on this framework are described. Several proposals for extension of this thesis are also included.	(pdf)
AN OBJECT MODEL FOR SHARED DATA	Gail E. Kaiser, Brent Hailpern	1989-01-01		(pdf)
{\em AI} TECHNIQUES IN SOFTWARE ENGINEERING	Gail E. Kaiser	1989-01-01		(pdf)
AN INTEGRATED APPROACH TO STEREO MATCHING, SURFACE RECONSTRUCTION AND DEPTH SEGMENTATION USING CONSISTENT SMOOTHNESS ASSUMPTIONS	L. H. Chen, Terrence E. Boult	1988-01-01		(pdf)
REGULARIZATION: PROBLEMS AND PROMISES	Terrence E. Boult	1988-01-01		(pdf)
PERFORMANCE ANALYSIS AND IMPROVEMENT IN UNIX FILE SYSTEM TREE TRAVERSAL	Jonathan M. Smith	1988-01-01		(pdf)
A SURVEY OF PROCESS MIGRATION MECHANISMS	Jonathan M. Smith	1988-01-01		(pdf)
A SURVEY OF SOFTWARE FAULT TOLLERENCE TECHNIQUES	Jonathan M. Smith	1988-01-01		(pdf) (ps)
PARALLEL ALGORITHMIC TECHNIQUES FOR COMBINATORIAL COMPUTATION	David Eppstein, Zvi Galil	1988-01-01		(pdf)
SPEEDING UP DYNAMIC PROGRAMMING WITH APPLICATION TO THE COMPUTATION OF RNA STRUCTURE	David Eppstein, Zvi Galil, Raffaele Giancarlo	1988-01-01		(pdf)
MINIMUM-KNOWLEDGE INTERACTIVE PROOFS FOR DECISION PROBLEMS	Zvi Galil, Stuart Haber, Moti Yung	1988-01-01		(pdf)
IMAGE WARPING AMONG ARBITRARY PLANAR SHAPES	George Wolberg	1988-01-01		(pdf)
IMPROC: AN INTERACTIVE IMAGE PROCESSING SOFTWARE PACKAGE	George Wolberg	1988-01-01		(pdf)
ON THE RECOVERY OF SUPERELLIPSOIDS	Terrence E. Boult, Ari Gross	1988-01-01		(pdf)
IMAGE UNDERSTANDING AND ROBOTICS RESEARCH AT COLUMBIA UNIVERSITY	John R. Kender, Peter K. Allen, Terrence E. Boult, Hussein A. Ibrahim	1988-01-01		(pdf)
RAY TRACING USING POLARIZATION PARAMETERS	Lawrence B. Wolff, David J. Kurlander	1988-01-01		(pdf)
BUT WHAT DO YOU NEED TO PRODUCE A BUT?	Michael Elhadad, Kathleen R. McKeown	1988-01-01		(pdf)
THE APPLICATION OF APPROXIMATION AND COMPLEXITY THEORY METHODS TO THE SOLUTION OF COMPUTER VISION PROBLEMS	Michael Hatzitheodorou	1988-01-01		(pdf)
GLOBAL READING OF ENTIRE DATABASES	Calton Pu, Christine H. Hong, Jae M. Wha	1988-01-01		(pdf)
METHODS AND APPROACHES IN REAL TIME HIERARCHICAL MOTION DETECTION	Ajit Singh, Peter K. Allen	1988-01-01		(pdf)
AN ALGORITHMIC TAXONOMY OF PRODUCTION SYSTEM MACHINES	Russell C. Mills	1988-01-01		(pdf)
USER'S MANUAL FOR PYRAMID EMULATION ON THE CONNECTION MACHINE	Lisa Gottesfeld Brown, Qifan Ju, Cynthia Norman	1988-01-01		(pdf)
SYNCHRONIZATION, COMMUNICATION AND I/O FACTORS IN DATABASE MACHINE PERFORMANCE	Andy Lowry	1988-01-01		(pdf)
STRUCTURE OF COMPLEXITY CLASSES: SEPARATIONS, COLLAPSES, AND COMPLETENESS	Lane A. Hemachandra	1988-01-01		(pdf)
ON THE STRUCTURE OF SOLUTIONS OF COMPUTABLE REAL FUNCTIONS	Juris Hartmanis, Lane Hemachandra	1988-01-01		(pdf)
HIGHLIGHTING USER RELATED ADVICE	Kathleen R. McKeown, Robert A. Weida	1988-01-01		(pdf)
INFERRING USER-ORIENTED ADVICE IN 'ADVISOR'	Robert A. Weida, Kathleen R. McKeown	1988-01-01		(pdf)
APPROXIMATE THEORY FORMATION: AN EXPLANATION-BASED APPROACH	Thomas Ellman	1988-01-01		(pdf)
THE DERIVATION OF TWO-DIMENSIONAL SHAPE FROM SHADOWS	Michael Hatzitheodorou	1988-01-01		(pdf)
INFUSE TEST MANAGEMENT	Gail E. Kaiser, Dewayne E. Perry	1988-01-01		(pdf)
THE WORLD ACCORDING TO GARP	Gail E. Kaiser, Roy Campbell, Steven Goering, Susan Hinrichs, Brenda Jackels, Joe Loyall	1988-01-01		(pdf)
Incremental Dynamic Semantics for Language-Based Programming Environments	Gail E. Kaiser	1988-01-01		(pdf) (ps)
SPLIT-TRANSACTIONS FOR OPEN-ENDED ACTIVITIES	Calton Pu, Gail E. Kaiser, Norman Hutchinson	1988-01-01		(pdf)
SUPPORT FOR RELIABLE DISTRIBUTED COMPUTING	Gail E. Kaiser, Wenwey Hseush	1988-01-01		(pdf)
A RETROSPECTIVE ON DOSE: AN INTERPRETIVE APPROACH TO STRUCTURE EDITOR GENERATION	Gail E. Kaiser, Peter H. Feiler, Fahimeh Jalili, Johann H. Schlichter	1988-01-01		(pdf)
A PRACTICAL COURSE IN SOFTWARE DESIGN	Jonathan M. Smith	1988-01-01		(pdf)
SOLVING THE DEPTH INTERPOLATION PROBLEM ON A PARALLEL ARCHITECTURE WITH EFFICIENT NUMERICAL METHODS -- Ph.D. Dissertation	Dong Jae Choi	1988-01-01		(pdf)
AN EXPERIMENT WITH THE STUDENT ADVISOR SYSTEM	Jong G. Lim",	1988-01-01		(pdf)
ALGORITHM-BASED FAULT TOLERANCE IN MASSIVELY PARALLEL SYSTEMS	Mark D. Lerner	1988-01-01		(pdf)
APPROXIMATE STRING MATCHING ON THE DADO2 PARALLEL COMPUTER	Toshikatsu Mori, Salvatore J. Stolfo	1988-01-01		(pdf)
A METHODOLOGY FOR PROGRAMMING PRODUCTION SYSTEMS AND ITS IMPLICATIONS ON PARALLELISM	Alexander J. Pasik	1988-01-01		(pdf)
HAPTIC OBJECT RECOGNITION USING A MULTI-FINGERED DEXTROUS HAND	Peter K. Allen, Kenneth S. Roberts	1988-01-01		(pdf)
THE RB LANGUAGE	Jonathan M. Smith, Gerald Q. Maguire Jr.	1988-01-01		(ps)
IMPLEMENTING REMOTE FORK ( WITH CHECKPOINT/RESTART	Jonathan M. Smith, John Ioannidis	1988-01-01		(ps)
RAPID LOCATION OF MOUNT POINTS	Jonathan M. Smith	1988-01-01		(pdf) (ps)
TOOL EXTENSION IN AN ALOE EDITOR	Takahisa Ishizuka	1988-01-01		(pdf)
IMAGE FORMATS: FIVE YEARS AFTER THE AAPM STANDARD FOR DIGITAL IMAGE INTERCHANGE	Gerald Q. Maguire Jr., Marilyn E. Noz	1988-01-01		(pdf)
ON THE POWER OF PROBABILISTIC POLYNOMIAL TIME: P(NP[LOG] IS CONTAINED IN PP	Lane A. Hemachandra, Gerd Wechsung	1988-01-01		(pdf)
NEST USER'S GUIDE	Alexander Dupuy, Jed Schwartz	1988-01-01		(pdf)
NEST LIBRARY REFERENCE MANUAL	Alexander Dupuy, Jed Schwartz	1988-01-01		(pdf)
NEST SYSTEM OVERVIEW	Alexander Dupuy, Jed Schwartz	1988-01-01		(pdf)
NEST USER INTERFACE MANUAL	Alexander Dupuy, Jed Schwartz	1988-01-01		(pdf)
INCREMENTAL EVALUATION OF ORDERED ATTRIBUTE GRAMMARS FOR ASYNCHRONOUS SUBTREE REPLACEMENTS	Josephine Micallef	1988-01-01		(pdf)
SEQUENCE COMPARISON WITH MIXED CONVEX AND CONCAVE COSTS	David Eppstein	1988-01-01		(pdf)
A DISTRIBUTED SIGNAL PROCESSING FACILITY FOR SPEECH RESEARCH	Nathaniel Polish	1988-01-01		(pdf)
TRANSPARENT CONCURRENT EXECUTION OF MUTUALLY EXCLUSIVE ALTERNATIVES	Jonathan M. Smith, Gerald Q. Maguire, Jr.	1988-01-01		(pdf)
FAST FOURIER TRANSFORMS: A REVIEW	George Wolberg	1988-01-01		(pdf)
CUBIC SPLINE INTERPOLATION: A REVIEW	George Wolberg	1988-01-01		(pdf)
GEOMETRIC TRANSFORMATION TECHNIQUES FOR DIGITAL IMAGES: A SURVEY	George Wolberg	1988-01-01		(pdf)
AN AUTOMATED CONSULTANT FOR INTERACTIVE ENVIRONMENTS	U. Wolz, G. E. Kaiser	1988-01-01		(pdf)
AUTOMATED TUTORING IN INTERACTIVE ENVIRONMENTS: A TASK CENTERED APPROACH	U. Wolz, K. R. McKeown, G. E. Kaiser	1988-01-01		(pdf)
AUTOMATED CONSULTING FOR EXTENDING USER EXPERTISE IN INTERACTIVE ENVIRONMENTS: A TASK CENTERED APPROACH	Ursula Wolz	1988-01-01		(pdf)
SEGMENTING SPECULAR HIGHLIGHTS FROM OBJECT SURFACES	Lawrence B. Wolff	1988-01-01		(pdf)
SHAPE UNDERSTANDING FROM LABERTIAN PHOTOMETRIC FLOW FIELDS	Lawrence B. Wolff	1988-01-01		(pdf)
COMPUTATIONAL ASPECTS OF LANGUAGE ACQUISITION FROM WORLD TO WORD	Frank Smadja	1988-01-01		(pdf)
MECHANICAL GENERATION OF HEURISTICS FOR INTRACTABLE THEORIES	Thomas Ellman	1988-01-01		(pdf)
Intelligent Assistance for Software Development and Maintenance	Gail E. Kaiser, Peter H. Feiler, Steven S. Popovich	1988-01-01		(ps)
EXTENDED TRANSACTION MODELS FOR SOFTWARE DEVELOPMENT ENVIRONMENTS	Gail E. Kaiser	1988-01-01		(pdf)
AN INTEGRATED SYSTEM FOR DEXTROUS MANIPULATION	Peter K. Allen, Paul Michelman, Kenneth S. Roberts	1988-01-01		(pdf)
A MODEL OF COMPUTATION FOR TRANSACTION PROCESSING	Avraham Leff, Calton Pu	1988-01-01		(pdf)
COMPARISON OF SURFACE LANGUAGE GENERATORS: A CASE STUDY IN CHOICE OF CONNECTIVES	K. R. McKeown, M. Elhadad	1988-01-01		(pdf)
THE FUF FUNCTIONAL UNIFIER: USER'S MANUAL	M. Elhadad	1988-01-01		(pdf)
FINDING A BETTER WAY: CHOOSING AND EXPLAINING ALTERNATIVE PLANS	U. Wolz	1988-01-01		(pdf)
TUTORING THAT RESPONDS TO USER QUESTIONS AND PROVIDES ENRICHMENT	U. Wolz	1988-01-01		(pdf)
LANDMARKS, MAPS, AND SELF-LOCATION IN THE PRESENCE OF ERROR	Avraham Leff	1988-01-01		(pdf)
CONCURRENT ALGEBRAS FOR VLSI DESIGN	T. S. Balraj	1988-01-01		(pdf)
AN OBJECT-ORIENTED MODEL FOR NETWORK MANAGEMENT	Soumitra Sengupta, Alexander Dupuy, Jed Schwartz, Yechiam Yemini	1988-01-01		(pdf)
THE SYNTHESIS KERNEL	Calton Pu, Henry Massalin, John Ioannidis	1987-01-01		(pdf)
MONITORING CONCEPTUAL DECISIONS TO GENERATE COHERENT TEXT	Michal Blumenstyck, Kathleen R. McKeown	1987-01-01		(pdf)
A SURVEY OF AUTOMATED CONSULTING IN INTERACTIVE PROGRAMMING ENVIRONMENTS	Ursula Wolz	1987-01-01		(pdf)
FINDING A MAXIMUM-GENUS GRAPH IMBEDDING	Merrick L. Furst, Jonathan L. Gross, Lyle A. McGeoch	1987-01-01		(pdf)
THE THEORY OF MINIMUM-KNOWLEDGE PROTOCOLS	Stuart Haber	1987-01-01		(pdf)
EXPLANATION-BASED METHODS FOR SIMPLIFYING INTRACTABLE THEORIES: Thesis Proposal	Thomas Ellman	1987-01-01		(pdf)
EXPLANATION-BASED LEARNING: A SURVEY OF PROGRAMS AND PERSPECTIVES	Thomas Ellman	1987-01-01		(pdf)
PROCESS MIGRATION: EFFECTS ON SCIENTIFIC COMPUTATION	Gerald Q. Maguire, Jr., Jonathan A. Smith	1987-01-01		(pdf)
RB: PROGRAMMER SPECIFICATION OF REDUNDANCY	Jonathan M. Smith, Gerald Q. Maguire, Jr.	1987-01-01		(pdf)
CHOICES IN THE DESIGN OF A LANGUAGE PRE-PROCESSOR FOR SPECIFYING REDUNDANCY	Jonathan M. Smith, Gerald Q. Maguire, Jr.	1987-01-01		(pdf)
ON THE POWER OF PARITY	Jin-yi Cai, Lane A. Hemachandra	1987-01-01		(pdf)
DIAL: DIAGRAMMATIC ANIMATION LANGUAGE TUTORIAL AND REFERENCE MANUAL	Steven Feiner	1987-01-01		(pdf)
RELIABLE NETWORK COMMUNICATIONS	Gail E. Kaiser, Yael J. Cycowicz, Wenwey Hseush, Josephine Micallef	1987-01-01		(pdf)
SOFTWARE DEVELOPMENT ENVIRONMENTS FOR VERY LARGE SOFTWARE SYSTEMS	Gail E. Kaiser, Yoelle S. Maarek, Dewayne E. Perry, Robert W. Schwanke	1987-01-01		(pdf)
MERCURY: DISTRIBUTED INCREMENTAL ATTRIBUTE GRAMMAR EVALUATION	Gail E. Kaiser, Josephine Micallef, Simon M. Kaplan	1987-01-01		(pdf)
MELD: A MULTI-PARADIGM LANGUAGE WITH OBJECTS, DATAFLOW AND MODULES	Gail E. Kaiser, David Garlan	1987-01-01		(pdf)
PARALLEL COMPUTERS, NUMBER THEORY PROBLEMS, AND EXPERIMENTAL RESULTS	Mark D. Lerner	1987-01-01		(pdf)
AN INCONSISTENCY MANAGEMENT SYSTEM: MASTER'S THESIS	Harris M. Morgenstern	1987-01-01		(pdf)
ENCAPSULATION, REUSABILITY AND EXTENSIBILITY IN OBJECT-ORIENTED PROGRAMMING LANGUAGES	Josephine Micallef	1987-01-01		(pdf)
DERIVING SHAPE FROM SHADOWS: A HILBERT SPACE SETTING	Michael Hatzitheodorou	1987-01-01		(pdf)
LINKS BETWEEN SITUATION AND LANGUAGE USE; TOWARDS A MORE COHERENT INTERACTION	Michael Elhadad	1987-01-01		(pdf)
UPDATING DISTANCE MAPS WHEN OBJECTS MOVE	Terrence E. Boult	1987-01-01		(pdf)
RECOVERING SUPERQUADRICS FROM 3-D INFORMATION	Terrence E. Boult, Ari Gross	1987-01-01		(pdf)
AN EXPERIMENTAL SYSTEM FOR THE INTEGRATION OF INFORMATION FROM STEREO AND MULTIPLE SHAPE-FROM-TEXTURE ALGORITHMS	Terrence E. Boult, Mark L. Moerdler	1987-01-01		(pdf)
GENUS DISTRIBUTIONS FOR BOUQUETS OF CIRCLES	Jonathan L. Gross, David P. Robbins, Thomas W. Tucker	1987-01-01		(pdf)
GENUS DISTRIBUTIONS FOR TWO CLASSES OF GRAPHS	Merrick L. Furst, Jonathan L. Gross, Richard Statman	1987-01-01		(pdf)
RESET SEQUENCES FOR FINITE AUTOMATA WITH APPLICATION TO DESIGN OF PARTS ORIENTERS	David Eppstein	1987-01-01		(pdf)
A SYNTACTIC OMNI-FONT CHARACTER RECOGNITION SYSTEM	George Wolberg	1987-01-01		(pdf)
DATA STRUCTURES AND ALGORITHMS FOR APPROXIMATE STRING MATCHING	Zvi Galil, Raffaele Giancarlo	1987-01-01		(pdf)
SPEEDING UP DYNAMIC PROGRAMMING WITH APPLICATIONS TO MOLECULAR BIOLOGY	Zvi Galil, Raffaele Giancarlo	1987-01-01		(pdf)
ON NONTRIVIAL SEPARATORS FOR k-PAGE GRAPHS AND SIMULATIONS BY NONDETERMINISTIC ONE-TAPE TURING MACHINES	Zvi Galil, Ravi Kannan, Endre Szemeredi	1987-01-01		(pdf)
AN O(n2(m+nlog nlog n MIN-COST FLOW ALGORITHM	Zvi Galil, Eva Tardos	1987-01-01		(pdf)
ON 3-PUSHDOWN GRAPHS WITH LARGE SEPARATORS	Zvi Galil, Ravi Kannan, Endre Szemeredi	1987-01-01		(pdf)
TWO LOWER BOUNDS IN ASYNCHRONOUS DISTRIBUTED COMPUTATION	Pavol Duris, Zvi Galil	1987-01-01		(pdf)
A SURVEY OF INTERFACES TO DATA BASE AND EXPERT SYSTEMS	Galina Datskovsky Moerdler	1987-01-01		(pdf)
BUILDING A NATURAL LANGUAGE INTERFACE TO EXPERT SYSTEMS	Galina Datskovsky Moerdler	1987-01-01		(pdf)
A COLLECTION OF 4 TRANSCRIPTS OF LONG CONVERSATIONS THROUGH COMPUTERS - INCLUDING THE EXPERIMENTS PROTOCOLS	Michael Elhadad	1987-01-01		(pdf)
THE USE OF EXPLICIT USER MODELS IN TEXT GENERATION: TAILORING TO A USER'S LEVEL OF EXPERTISE	Cecile Laurence Paris	1987-01-01		(pdf)
MIXED DISTANCE MEASURES FOR OPTIMIZING CONCATENATIVE VOCABULARIES FOR SPEACH SYNTHESIS: A THESIS PROPOSAL	Nathaniel Polish	1987-01-01		(pdf)
A SURVEY OF PARALLEL PROGRAMMING CONSTRUCTS	Michael van Biema	1987-01-01		(pdf)
IMPROVING PRODUCTION SYSTEM PERFORMANCE ON PARALLEL ARCHTECTURES BY CREATING CONSTRAINED COPIES OF RULES	Alexander J. Pasik, Salvatore J. Stolfo	1987-01-01		(pdf)
CIRCUIT MINIMIZATION TECHNIQUES APPLIED TO KNOWLEDGE ENGINEERING	Alexander J. Pasik	1987-01-01		(pdf)
THE EXPECTED-OUTCOME MODEL OF TWO-PLAYER GAMES (Ph.D. Dissertation	Bruce Abramson	1987-01-01		(pdf)
SIMD TREE ALGORITHMS FOR IMAGE CORRELATION	Hussein A. H. Ibrahim, John R. Kender, David Elliot Shaw	1986-01-01		(pdf)
DISK RESPONSE TIME MEASUREMENTS	Thomas D. Johnson, Jonathan M. Smith, Eric S. Wilson	1986-01-01		(pdf)
CHANGE MANAGEMENT SUPPORT FOR LARGE SOFTWARE SYSTEMS	Gail Kaiser, Dewayne E. Perry, Robert W. Schwanke	1986-01-01		(pdf)
GENERATION OF DISTRIBUTED PROGRAMMING ENVIRONMENTS	Gail E. Kaiser, Simon M. Kaplan, Josephine Micallef	1986-01-01		(pdf)
MELD/FEATURES: AN OBJECT-ORIENTED APPROACH TO REUSABLE SOFTWARE	Gail E. Kaiser	1986-01-01		(pdf)
SMILE/MARVEL: TWO APPROACHES TO KNOWLEDGE-BASED PROGRAMMING ENVIRONMENTS	Gail E. Kaiser, Peter H. Feiler	1986-01-01		(pdf)
THE DO-LOOP CONSIDERED HARMFUL IN PRODUCTION SYSTEM PROGRAMMING	Michael van Biema, Daniel P. Miranker, Salvatore J. Stolfo	1986-01-01		(pdf)
PHYSICAL OBJECT REPRESENTATION AND GENERALIZATION: A SURVEY OF PROGRAMS FOR SEMANTIC-BASED NATURAL LANGUAGE PROCESSING	Kenneth Wasserman	1986-01-01		(pdf)
THESIS PROPOSAL: THE EXPECTED-OUTCOME MODEL OF TWO-PLAYER GAMES	Bruce Abramson	1986-01-01		(pdf)
DIRECTOR -- AN INTERPRETER FOR RULE-BASED PROGRAMS	Galina Datskovsky Moerdler, J. Robert Ensor	1986-01-01		(pdf)
THE OPS FAMILY OF PRODUCTION SYSTEM LANGUAGES	Alexander Pasik	1986-01-01		(pdf)
AVOIDING LATCH FORMATION IN REGULAR EXPRESSION RECOGNIZERS	M. J. Foster	1986-01-01		(pdf)
CONTROL STRATEGIES FOR TWO PLAYER GAMES	Bruce Abramson	1986-01-01		(pdf)
LANGUAGE GENERATION: APPLICATIONS, ISSUES, AND APPROACHES	Kathleen R. McKeown	1986-01-01		(pdf)
REPLICATION AND NESTED TRANSACTIONS IN THE EDEN DISTRIBUTED SYSTEM	Calton Pu	1986-01-01		(pdf)
PROBABILISTIC SETTING OF INFORMATION-BASED COMPLEXITY	H. Wozniakowski	1986-01-01		(pdf)
COMPLEXITY OF INTEGRATION IN DIFFERENT SETTINGS	H. Wozniakowski	1986-01-01		(pdf)
SENSING AND DESCRIBING 3-D STRUCTURE	Peter K. Allen	1986-01-01		(pdf)
INTEGRATING VISION AND TOUCH FOR OBJECT RECOGNITION TASKS	Peter K. Allen	1986-01-01		(pdf)
APPROACHES TO DISTRIBUTED UNIX SYSTEMS	Jonathan Smith	1986-01-01		(pdf)
SUPERDATABASES FOR COMPOSITION OF HETEROGENEOUS DATABASES	Calton Pu	1986-01-01		(pdf)
SMOOTHNESS ASSUMPTIONS IN HUMAN AND MACHINE VISION, AND THEIR IMPLICATIONS FOR OPTIMAL SURFACE INTERPOLATION	Terrance Boult	1986-01-01		(pdf)
METHODS FOR PERFORMANCE EVALUATION OF PARALLEL COMPUTER SYSTEMS	Yoram Eisenstadter	1986-01-01		(pdf)
TRANSLATING BETWEEN PROGRAMMING LANGUAGES USING A CANONICAL REPRESENTATION AND ATTRIBUTE GRAMMAR INVERSION- EXTENDED ABSTRACT	Rodney Farrow, Daniel Yellin	1986-01-01		(pdf)
A METHODOLOGY FOR SPECIFICATION-BASED PERFORMANCE ANALYSIS OF PROTOCOLS - Ph.D. Dissertation	Nihal Nounou	1986-01-01		(pdf)
CANONICAL APPROXIMATION IN THE PERFORMANCE ANALYSIS OF DISTRIBUTED SYSTEMS -- PhD Dissertation	Eugene Pinsky	1986-01-01		(pdf)
AN INFORMATION-BASED APPROACH TO ILL-POSED PROBLEMS	Arthur G. Werschulz	1986-01-01		(pdf)
COMPLEITY OF DIFFERENTIAL AND INTEGRAL EQUATIONS	Arthur G. Werschulz	1985-01-01		(pdf)
ANALYZING USER PLANS TO PRODUCE INFORMATIVE RESPONSES BY A PROGRAMMER'S CONSULTANT	Ursula Wolz	1985-01-01		(pdf)
GENERATING ADMISSIBLE HEURISTICS BY CRITICIZING SOLUTIONS TO RELAXED MODELS	Othar Hansson, Andrew E. Mayer, Mordechai M. Yung	1985-01-01		(pdf)
OBJECT RECOGNITION USING VISION AND TOUCH	Peter Kirby Allen	1985-01-01		(pdf)
ON THE APPLICATION OF MASSIVELY PARALLEL SIMD TREE MACHINES TO CERTAIN INTERMEDIATE-LEVEL VISION TASKS	Hussein A. H. Ibrahim, John R. Kender, David Elliot Shaw	1985-01-01		(pdf)
A LISP COMPILER FOR THE DADO PARALLEL COMPUTER	Mark D. Lerner, Michael van Biema, Gerald Q. Maguire Jr.	1985-01-01		(pdf)
TAKING THE INITIATIVE FOR SYSTEM GOALS IN COOPERATIVE DIALOGUE	Kevin Matthews	1985-01-01		(pdf)
INITIATORY AND REACTIVE SYSTEM ROLES IN HUMAN COMPUTER DISCOURSE	Kevin Matthews	1985-01-01		(pdf)
A CURE FOR PATHOLOGICAL BEHAVIOR IN GAMES THAT USE MINIMAX	Bruce Abramson	1985-01-01		(pdf)
TOWARDS THE PARALLEL EXECUTION OF RULES IN PRODUCTION SYSTEM PROGRAMS	Toru Ishida, Salvatore J. Stolfo	1985-01-01		(pdf)
MACRO-OPERATORS: A WEAK METHOD FOR LEARNING	Richard E. Korf	1985-01-01		(pdf)
SURFACE ORIENTATION AND SEGMENTATION FROM PERSPECTIVE VIEWS OF PARALLEL-LINE TEXTURES	Mark L. Moerdler, John R. Kender	1985-01-01		(pdf)
DEVELOPMENT TOOLS FOR COMMUNICATION PROTOCOLS	Nihal Nounou, Yechiam Yemini	1985-01-01		(pdf)
COMPLEXITY OF APPROXIMATELY SOLVED PROBLEMS	J. F. Traub	1985-01-01		(pdf)
ON THE COMPLEXITY OF COMPOSITION AND GENERALIZED COMPOSITION OF POWER SERIES	R. P. Brent, J. F. Traub	1985-01-01		(pdf)
A GENERIC FRAMEWORK FOR EXPERT DATA ANALYSIS SYSTEMS	Luanne Burns, Alexander Pasik	1985-01-01		(pdf)
EXPLANATION AND ACQUISITION IN EXPERT SYSTEMS USING SUPPORT KNOWLEDGE	Alexander Pasik, Jens Christensen, Douglas Gordin, Agata Stancato-Pasik, Salvatore Stolfo	1985-01-01		(pdf)
A COMPARISON OF STORAGE OPTIMIZATIONS IN AUTOMATICALLY-GENERATED ATTRIBUTE EVALUATORS	Rodney Farrow, Daniel Yellin	1985-01-01		(pdf)
DESCRIPTION STRATEGIES FOR NAIVE AND EXPERT USERS	Cecile L. Paris	1985-01-01		(pdf)
NATURAL LANGUAGE INTERFACES TO EXPERT SYSTEMS	Galina Datskovky	1985-01-01		(pdf)
THE INFORMATION-CENTERED APPROACH TO OPTIMAL ALGORITHMS APPLIED TO THE 2-1/2 D SKETCH	John R. Kender, David Lee, Terrance Boult	1985-01-01		(pdf)
RESEARCHER: AN EXPERIMENTAL INTELLIGENT INFORMATION SYSTEM	Michael Lebowitz	1985-01-01		(pdf)
TAILORING EXPLANATIONS FOR THE USER	Kathleen R. McKeown, Myron Wish, Kevin Matthews	1985-01-01		(pdf)
THE NEED FOR TEXT GENERATION	Kathleen R. McKeown	1985-01-01		(pdf)
A SIMPLE PREPROCESSING SCHEME TO EXTRACT AND BALANCE IMPLICIT PARALLELISM IN THE CONCURRENT MATCH OF PRODUCTION RULES	Salvatore J. Stolfo, Daniel Miranker, Russell C. Mills	1985-01-01		(pdf)
MORE RULES MAY MEAN FASTER PARALLEL EXECUTION	Salvatore J. Stolfo, Daniel Miranker, Russell C. Mills	1985-01-01		(pdf)
OPTIMAL ALGORITHM FOR LINEAR PROBLEMS WITH GAUSSIAN MEASURES	G. W. Wasilkowski	1985-01-01		(pdf)
UNIFYING REPRESENTATION AND GENERALIZATION: UNDERSTANDING HIERARCHICALLY STRUCTURED OBJECTS -- Ph.D. Dissertation	Kenneth Hal Wasserman	1985-01-01		(pdf)
A PRIVATE INTERACTIVE TEST OF A BOOLEAN PREDICATE AND MINIMUM-KN OWLEDGE PUBLIC-KEY CRYPTOSYSTEMS-EXTENDED ABSTRACT	Zvi Galil, Stuart Haber, Moti Yung	1985-01-01		(pdf)
DISTRIBUTED ALGORITHMS IN SYNCHRONOUS BROADCASTING NETWORKS	Zvi Galil, Gad M. Landau, Moti Yung	1985-01-01		(pdf)
A METHODOLOGY FOR SPECIFICATION-BASED PERFORMANCE ANALYSIS OF COMMUNICATION PROTOCOLS	Nihal Nounou, Yechiam Yemini	1985-01-01		(pdf)
COMPLEXITY OF COMPUTING TOPOLOGICAL DEGREE OF LIPSCHITZ FUNCTIONS IN N DIMENSIONS	T. Boult, K. Sikorski	1985-01-01		(pdf)
VISUAL SURFACE INTERPOLATION: A COMPARISON OF TWO METHODS	Terrance E. Boult	1985-01-01		(pdf)
GENERALIZING LOGIC CIRCUIT DESIGNS BY ANALYZING PROOFS OF CORRECTNESS	Thomas Ellman	1985-01-01		(pdf)
SYMMETRIC PUBLIC-KEY ENCRYPTION	Zvi Galil, Stuart Haber, Moti Yung	1985-01-01		(pdf)
FINDING A MAXIMUM-GENUS GRAPH IMBEDDING	Merrick L. Furst, Jonathan L. Gross, Lyle A. McGeoch	1985-01-01		(pdf)
GENUS DISTRIBUTION FOR TWO CLASSES OF GRAPHS	Merrick L. Furst, Jonathan L. Gross, Richard Statman	1985-01-01		(pdf)
HIERARCHY FOR IMBEDDING-DISTRIBUTION INVARIANTS OF A GRAPH	Jonathan L. Gross, Merrick Furst	1985-01-01		(pdf)
A KNOWLEDGE-BASED EXPERT SYSTEMS PRIMER AND CATALOG	Bruce K. Hillyer	1985-01-01		(pdf)
SPECIFICATION IF INTERPRETERS AND DEBUGGERS USING AN EXTENSION OF ATTRIBUTE GRAMMARS	Gail E. Kaiser	1985-01-01		(pdf)
DEPTH-FIRST ITERATIVE-DEEPENING: AN OPTIMAL ADMISSIBLE TREE SEARCH	Richard E. Korf	1985-01-01		(pdf)
IMPLEMENTATION OF THE GMR ALGORITHM FOR LARGE SYMMETRIC EIGENPROBLEMS	Jacek Kuczynski	1985-01-01		(pdf)
ON THE OPTIMAL SOLUTION OF LARGE EIGENPAIR PROBLEMS	Jacek Kuczynski	1985-01-01		(pdf)
THE USE OF MEMORY IN TEXT PROCESSING	Michael Lebowitz	1985-01-01		(pdf)
INTEGRATED LEARNING: CONTROLLING EXPLANATION	Michael Lebowitz	1985-01-01		(pdf)
STORY TELLING AS PLANNING AND LEARNING	Michael Lebowitz	1985-01-01		(pdf)
DISCOURSE STRATEGIES FOR GENERATING NATURAL-LANGUAGE TEXT	Kathleen R. McKeown	1985-01-01		(pdf)
AN AUTOMATED PERFORMANCE ANALYSIS OF A TWO PHASE LOCKING PROTOCOL	Nihal Nounou, Yechiam Yemini	1985-01-01		(pdf)
A METHODOLOGY FOR SPECIFICATION-BASED PERFORMANCE ANALYSIS OF COMMUNICATION PROTOCOLS	Nihal Nounou, Yechiam Yemini	1985-01-01		(pdf)
GUIDE TO THE UNIFICATION PROCESS AND ITS IMPLEMENTATION. PROGRESS REPORT ON EXTENDING THE GRAMMAR.	Cecile L. Paris, TjoeLiong Kwee	1985-01-01		(pdf)
TOWARDS MORE GRACEFUL INTERACTION: A SURVEY OF QUESTION-ANSWERING PROGRAMS	Cecile L. Paris	1985-01-01		(pdf)
EQUIVALENT DESCRIPTIONS OF GENERALIZED CYLINDERS	K. S. Roberts	1985-01-01		(pdf)
DADO: A PARALLEL COMPUTER FOR ARTIFICIAL INTELLIGENCE	Salvatore J. Stolfo	1985-01-01		(pdf)
A SIMPLE SCHEME FOR A FAULT TOLERANT DADO MACHINE	Salvatore J. Stolfo	1985-01-01		(pdf)
CLOCKING SCHEMES FOR HIGH SPEED DIGITAL SYSTEMS	Stephen H. Unger	1985-01-01		(pdf)
NATURAL LANGUAGE FOR EXPERT SYSTEMS: COMPARISONS WITH DATABASE SYSTEMS	Kathleen R. McKeown	1984-01-01		(pdf)
ASYMPTOTIC OPTIMALITY OF THE BISECTION METHOD	K. Sikorski, G. M. Trojan	1984-01-01		(pdf)
MINIMAL NUMBER OF FUNCTION EVALUATIONS FOR COMPUTING TOPOLOGICAL DEGREE IN TWO DIMENSIONS	K. Sikorski	1984-01-01		(pdf)
LOGIC PROGRAMMING USING PARALLEL ASSOCIATIVE OPERATIONS	S. Taylor, A. Lowry, G. Q. Maguire Jr., S. J. Stolfo	1984-01-01		(pdf)
UNIFICATION IN A PARALLEL ENVIRONMENT	Stephen Taylor, Daphne Tzoar, Salvatore J. Stolfo	1984-01-01		(pdf)
PUTTING PIECES TOGETHER: UNDERSTANDING PATENT ABSTRACTS	Michael Lebowitz	1984-01-01		(pdf)
CREATING CHARACTERS IN A STORY-TELLING UNIVERSE	Michael Lebowitz	1984-01-01		(pdf)
AN O(EV LOG V) ALGORITHM FOR FINDING A MAXIMAL WEIGHTED MATCHING IN GENERAL GRAPHS	Zvi Galil, Silvio Micali, Harold Gabow	1984-01-01		(pdf)
ILL-FORMED TEXT AND CONCEPTUAL PROCESSING	Michael Lebowitz	1984-01-01		(pdf)
EXPERIENCE WITH A PRODUCTION COMPILER AUTOMATICALLY GENERATED FROM AN ATTRIBUTE GRAMMAR	Rodney Farrow	1984-01-01		(pdf)
SIMD AND MSIMD VARIANTS OF THE NON-VON SUPERCOMPUTER	David Elliot Shaw	1984-01-01		(pdf)
PPL/M: THE SYSTEM LEVEL LANGUAGE FOR PROGRAMMING THE @I(DADO) MACHINE	Salvatore J. Stolfo, Gerald Q. Maguire Jr., Mark D. Lerner	1984-01-01		(pdf)
@G(b)-TREES, @G(g)-SYSTEMS, AND A THEOREM ON F-HEAPS	Zvi Galil, Thomas Spencer	1984-01-01		(pdf)
LINEAR PROBLEMS (WITH EXTENDED RANGE) HAVE LINEAR OPTIMAL ALGORITHMS	Edward W. Packel	1984-01-01		(pdf)
\|\|PSL: A PARALLEL LISP FOR THE @I(DADO) MACHINE	Michael K. Van Biema, Mark D. Lerner, Gerald Maguire, Salvatore J. Stolfo	1984-01-01		(pdf)
AN ANALYSIS OF ABSTRACTION IN PROBLEM SOLVING	Richard E. Korf	1984-01-01		(pdf)
SIMULTANEOUS FIRING OF PRODUCTION RULES ON TREE STRUCTURED MACHINES	Toru Ishida, Salvatore J. Stolfo	1984-01-01		(pdf)
NTEREST AND PREDICTABILITY: DECIDING WHAT TO LEARN, WHEN TO LEARN	Michael Lebowitz	1984-01-01		(pdf)
CONCEPT LEARNING IN A RICH INPUT DOMAIN: GENERALIZATION-BASED MEMORY	Michael Lebowitz	1984-01-01		(pdf)
LPS ALGORITHMS: A DETAILED EXAMINATION	Andy Lowry, Stephen Taylor, Salvatore J. Stolfo	1984-01-01		(pdf)
LPS ALGORITHMS: A CRITICAL ANALYSIS	Andy Lowry, Stephen Taylor, Salvatore J. Stolfo	1984-01-01		(pdf)
TAKING THE INITIATIVE IN PROBLEM-SOLVING DISCOURSE	Kevin Matthews, Kathleen McKeown	1984-01-01		(pdf)
DETERMINING THE LEVEL OF EXPERTISE OF A USER OF A QUESTION ANSWERING SYSTEM	Cecile Paris	1984-01-01		(pdf)
FIVE PARALLEL ALGORITHMS FOR PRODUCTION SYSTEM EXECUTION ON THE DADO MACHINE	Salvatore J. Stolfo	1984-01-01		(pdf)
PERFORMANCE ESTIMATES FOR THE DADO MACHINE: A COMPARISON OF @I(TREAT) AND @I(RETE)	Daniel P. Miranker	1984-01-01		(pdf)
TREE MACHINES: ARCHITECTURE AND ALGORITHMS -- A SURVEY PAPER	Hussein A. H. Ibrahim	1984-01-01		(pdf)
TIME-CONSTRAINED COMMUNICATION IN MULTIPLE ACCESS NETWORKS	James Francis Kurose	1984-01-01		(pdf)
USING MEMORY IN TEXT UNDERSTANDING	Michael Lebowitz	1984-01-01		(pdf)
DADO: A PARALLEL PROCESSOR FOR EXPERT SYSTEMS	Salvatore J. Stolfo, Daniel P. Miranker	1984-01-01		(pdf)
UNDERSTANDING HIERARCHICALLY STRUCTURED OBJECTS	Kenneth Wasserman	1984-01-01		(pdf)
THE CONNECTED COMPONENT ALGORITHM ON THE NON-VON SUPERCOMPUTER	Hussein A.H. Ibrahim	1984-01-01		(pdf)
IS CAD/CAM READY FOR AI?	Salvatore J. Stolfo	1984-01-01		(pdf)
A NOTE ON IMPLEMENTING OPS5 PRODUCTION SYSTEMS ON DADO	Salvatore J. Stolfo	1984-01-01		(pdf)
WHAT IS THE COMPLEXITY OF THE FREDHOLM PROBLEM OF THE SECOND KIND?	Arthur G. Werschulz	1984-01-01		(pdf)
WHAT IS THE COMPLEXITY OF ELLIPTIC SYSTEMS?	Arthur G. Werschulz	1984-01-01		(pdf)
AN EIGHT-PROCESSOR CHIP FOR A MASSIVELY PARALLEL MACHINE	David Elliot Shaw, Theodore M. Sabety	1984-01-01		(pdf)
A SURVEY OF TREE-WALK EVALUATION STRATEGIES FOR ATTRIBUTE GRAMMARS	Daniel Yellin	1984-01-01		(pdf)
THE AUTOMATIC INVERSION OF ATTRIBUTE GRAMMARS	Daniel Yellin, Eva-Maria M. Mueckstein	1984-01-01		(pdf)
A SECURE AND USEFUL 'KEYLESS CRYPTOSYSTEM	Mordechai M. Yung	1984-01-01		(pdf)
CRYPTOPROTOCOLS: SUBSCRIPTION TO A PUBLIC KEY, THE SECRET BLOCKING AND THE MULTI-PLAYER MENTAL POKER GAME (EXTENDED ABSTRACT)	Mordechai Yung	1984-01-01		(pdf)
DEVELOPMENT TOOLS FOR COMMUNICATION PROTOCOLS: AN OVERVIEW	Nihal Nounou, Yechiam Yemini	1984-01-01		(pdf)
ALGEBRAIC SPECIFICATION-BASED PERFORMANCE ANALYSIS OF COMMUNICATION PROTOCOLS	Nihal Nounou, Yechiam Yemini	1984-01-01		(pdf)
CAN WE APPROXIMATE ZEROS OF FUNCTIONS WITH NON-ZERO TOPOLOGICAL DEGREE?	T. Boult, K. Sikorski	1984-01-01		(pdf)
EXECUTION OF OPS5 PRODUCTION SYSTEMS ON A MASSIVELY PARALLEL MACHINE	Bruce K. Hillyer, David Elliot Shaw	1984-01-01		(pdf)
SOME NONLINEAR PROBLEMS ARE AS EASY AS THE APPROXIMATION PROBLEM	G. W. Wasilkowski	1984-01-01		(pdf)
WHEN IS NONADAPTIVE INFORMATION AS POWERFUL AS ADAPTIVE INFORMATION?	J. F. Traub, G. W. Wasilkowski, H. Wozniakowski	1984-01-01		(pdf)
APPROXIMATION OF LINEAR OPERATORS ON A WIENER SPACE	D. Lee	1984-01-01		(pdf)
A NOTE ON BIVARIATE BOX SPLINES ON A K-DIRECTION MESH	D. Lee	1984-01-01		(pdf)
AN OVERVIEW OF THE DADO PARALLEL COMPUTER	Mark D. Lerner, Gerald Q. Maguire Jr., etc.	1984-01-01		(pdf)
MENU INTERFACES TO EXPERT SYSTEMS: OVERVIEW AND EVALUATION	Galina Datskovky	1984-01-01		(pdf)
FINITE ELEMENT METHODS ARE NOT ALWAYS OPTIMAL	Arthur G. Wershulz	1984-01-01		(pdf)
LPS ALGORITHMS	Andy Lowry, Stephen Taylor, Salvatore J. Stolfo	1984-01-01		(pdf)
THE DADO PRODUCTION SYSTEM MACHINE	Salvatore J. Stolfo, Daniel P. Miranker	1984-01-01		(pdf)
RECURSION IN TEXT AND ITS USE IN LANGUAGE GENERATION	Kathleen R. McKeown	1983-01-01		(pdf)
FOCUS CONSTRAINTS ON LANGUAGE GENERATION	Kathleen R. McKeown	1983-01-01		(pdf)
FOR WHICH ERROR CRITERIA CAN WE SOLVE NONLINEAR EQUATIONS?	K. Sikorski, H. Wozniakowski	1983-01-01		(pdf)
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS SATISFYING A LIPSCHTIZ CONDITION	K. Sikorski	1983-01-01		(pdf)
ARCHITECTURE AND APPLICATIONS OF DADO: A LARGE-SCALE PARALLEL COMPUTER FOR ARTIFICIAL INTELLIGENCE	Salvatore J. Stolfo, Daniel P. Miranker, David Elliot Shaw	1983-01-01		(pdf)
PERFORMANCE EVALUATION OF A PACKETIZED VOICE SYSTEM -SIMULATION STUDY	Tatsuya Suda, Hideo Miyahara, Toshiharu Hasegawa	1983-01-01		(pdf)
PERFORMANCE EVALUATION OF AN INTEGRATED ACCESS SCHEME IN A SATELLITE COMMUNICATION CHANNEL	Tatsuya Suda, Hideo Miyahara, Toshiharu Hasegawa	1983-01-01		(pdf)
PROLOG ON THE DADO MACHINE: A PARALLEL SYSTEM FOR HIGH-SPEED LOGIC PROGRAMMING	Stephen Taylor, Christopher Maio, Salvatore J. Stolfo, David E. Shaw	1983-01-01		(pdf)
ENVIRONMENTAL RELATIONS IN IMAGE UNDERSTANDING: THE FORCE OF GRAVITY	John R. Kender	1983-01-01		(pdf)
EFFICIENT ALGORITHMS FOR FINDING MAXIMAL MATCHING IN GRAPHS	Zvi Galil	1983-01-01		(pdf)
EXERCISING THE NON-VON PRIMARY PROCESSING SUBSYSTEM	Bruce K. Hillyer, David Elliot Shaw	1983-01-01		(pdf)
COPING WITH COMPLEXITY	J. F. Traub	1983-01-01		(pdf)
IMPLEMENTING DESCRIPTIONS USING NON-VON NEUMANN PARALLELISM	Michael Lebowitz	1983-01-01		(pdf)
CLASSIFYING NUMERIC INFORMATION FOR GENERALIZATION	Michael Lebowitz	1983-01-01		(pdf)
RESEARCHER: AN OVERVIEW	Michael Lebowitz	1983-01-01		(pdf)
CREATING A STORY-TELLING UNIVERSE	Michael Lebowitz	1983-01-01		(pdf)
OPTIMAL INTEGRATION FOR FUNCTIONS OF BOUNDED VARIATION	J. F. Traub, D. Lee	1983-01-01		(pdf)
SOME NONLINEAR OPERATORS ARE AS EASY TO APPROXIMATE AS THE IDENTITY OPERATOR	G. W. Wasilkowski	1983-01-01		(pdf)
CUPID: A PROTOCOL DEVELOPMENT ENVIRONMENT	Yechiam Yemini, Nihal Nounou	1983-01-01		(pdf)
TWO NONLINEAR LOWER BOUNDS FOR ON-LINE COMPUTATIONS	Pavol Duris, Zvi Galil, Wolfgang Paul, Ruediger Reischuk	1983-01-01		(pdf)
THE NON-VON SUPERCOMPUTER PROJECT: CURRENT IDEOLOGY AND THREE-YEAR PLAN	David Elliot Shaw	1983-01-01		(pdf)
PHYSICAL OBJECT REPRESENTATION AND GENERALIZATION: A SURVEY OF NATURAL LANGUAGE PROCESSING PROGRAMS	Kenneth Wasserman	1983-01-01		(pdf)
THE DADO PARALLEL COMPUTER	Salvatore J. Stolfo	1983-01-01		(pdf)
INFORMATION AND COMPUTATION	J. F. Traub, H. Wozniakowski	1983-01-01		(pdf)
AVERAGE CASE @G(E)-COMPLEXITY IN COMPUTER SCIENCE: A BAYESIAN VIEW	J. B. Kadane, G. W. Wasilkowski	1983-01-01		(pdf)
WHAT IS THE COMPLEXITY OF RELATED ELLIPTIC, PARABOLIC AND HYPERBOLIC PROBLEMS	Arthur G. Werschulz	1983-01-01		(pdf)
COUNTEREXAMPLES IN OPTIMAL QUADRATURE	Arthur G. Werschulz	1983-01-01		(pdf)
COMPUTING D-OPTIMUM WEIGHING DESIGNS: WHERE STATISTICS, COMBINATORICS, AND COMPUTATION MEET	Zvi Galil	1983-01-01		(pdf)
TABLE-DRIVEN RULES IN EXPERT SYSTEMS	Alexander Pasik, Marshall Schor	1983-01-01		(pdf)
LOCAL AVERAGE ERROR	G. W. Wasilkowski	1983-01-01		(pdf)
COVERS OF ATTRIBUTE GRAMMARS AND SUB-PROTOCOL ATTRIBUTE EVALUATORS	Rodney Farrow	1983-01-01		(pdf)
PROTOCOL ARCHITECTURE OF A TREE NETWORK WITH COLLISION AVOIDANCE SWITCHES	Tatsuya Suda	1983-01-01		(pdf)
LOWER BOUNDS ON COMMUNICATION COMPLEXITY	Pavol Duris, Zvi Galil, Georg Schnitger	1983-01-01		(pdf)
OPTIMAL PARALLEL ALGORITHMS FOR STRING MATCHING	Zvi Galil	1983-01-01		(pdf)
CONTROLLING WINDOW PROTOCOLS FOR TIME-CONSTRAINED COMMUNICATION IN A MULTIPLE ACCESS ENVIRONMENT	James F. Kurose, Mischa Schwartz, Yechiam Yemini	1983-01-01		(pdf)
NATURAL LANGUAGE SYSTEMS: HOW ARE THEY MEETING HUMAN NEEDS?	Kathleen R. McKeown	1983-01-01		(pdf)
ON THE DESIGN OF PARALLEL PRODUCTION SYSTEM MACHINES: WHAT'S IN A LIP?	Salvatore J. Stolfo	1983-01-01		(pdf)
NON-VON'S PERFORMANCE ON CERTAIN DATABASE BENCHMARKS	Bruce K. Hillyer, David Elliot Shaw	1983-01-01		(pdf)
EVOLUTION OF THE NON-VON SUPERCOMPUTER	David Elliot Shaw	1983-01-01		(pdf)
CAN ADAPTION HELP ON THE AVERAGE?	G. W. Wasilkowski, H. Wozniakowski	1983-01-01		(pdf)
IS GAUSS QUADRATURE OPTIMAL FOR ANALYTIC FUNCTIONS?	M. A. Kowalski, A. G. Werschulz, H. Wozniakowski	1983-01-01		(pdf)
THE ACCURATE SOLUTION OF CERTAIN CONTINUOUS PROBLEMS USING ONLY SINGLE PRECISION	M. Jankowski, H. Wozniakowski	1983-01-01		(pdf)
PERFORMANCE ANALYSIS OF TWO COMPETING DADO PE DESIGNS	Daniel P. Miranker	1983-01-01		(pdf)
COMPLEXITY OF INDEFINITE ELLIPTIC PROBLEMS	Arthur G. Werschulz	1983-01-01		(pdf)
USER-ORIENTED EXPLANATION FOR EXPERT SYSTEMS	Kathleen McKeown	1983-01-01		(pdf)
USING FOCUS TO GENERATE COMPLEX AND SIMPLE SENTENCES	Marcia A. Derr, Kathleen R. McKeown	1983-01-01		(pdf)
RESEARCH DIRECTIONS IN DISTRIBUTED COMPUTING AND COMMUNICATIONS	Yechiam Yemini	1983-01-01		(pdf)
A REAL-TIME TRANSPORT PROTOCOL	Yechiam Yemini	1983-01-01		(pdf)
DISTRIBUTED MULTIPLE ACCESS PROTOCOLS AND REAL-TIME COMMUNICATION	James F. Kurose, Yechiam Yemini, Mischa Schwartz	1983-01-01		(pdf)
A FAMILY OF WINDOW PROTOCOLS FOR TIME-CONSTRAINED APPLICATIONS IN CSMA NETWORKS	James F. Kurose, Mischa Schwartz	1983-01-01		(pdf)
ALLOCATION AND MANIPULATION OF RECORDS IN THE NON-VON SUPERCOMPUTER	David Elliot Shaw, Bruce K. Hillyer	1983-01-01		(pdf)
THE TEXT SYSTEM FOR NATURAL LANGUAGE GENERATION: AN OVERVIEW	Kathleen R. McKeown	1982-01-01		(pdf)
DADO: A TREE-STRUCTURED MACHINE ARCHITECTURE FOR PRODUCTION SYSTEMS	Salvatore J. Stolfo, David Elliot Shaw	1982-01-01		(pdf)
TOWARDS DISTRIBUTED SENSORS NETWORKS: AN EXTENDED ABSTRACT	Y. Yemini, A. Lazar	1982-01-01		(pdf)
TOWARDS THE UNIFICATION OF THE FUNCTIONAL AND PERFORMANCE ANALYSIS OF PROTOCOLS OR, IS THE ALTERNATING-BIT PROTOCOL REALLY CORRECT?	Y. Yemini, J. F. Kurose	1982-01-01		(pdf)
COMPLEXITY OF LINEAR PROGRAMMING	J. F. Traub, H. Wozniakowski	1982-01-01		(pdf)
A STATISTICAL MECHANICS OF DISTRIBUTED RESOURCE SHARING MECHANISMS	Yechiam Yemini	1982-01-01		(pdf)
THE NON-VON SUPER COMPUTER	David Elliot Shaw	1982-01-01		(pdf)
ON THE OPTIMAL SOLUTION OF LARGE LINEAR SYSTEMS	J. F. Traub, H. Wozniakowski	1982-01-01		(pdf)
AVERAGE CASE OPTIMALITY FOR LINEAR PROBLEMS	J. F. Traub, G. Wasilkowski, H. Wozniakowski	1982-01-01		(pdf)
AVERAGE CASE OPTIMAL ALGORITHMS IN HILBERT SPACES	G. W. Wasilkowski, H. Wozniakowski	1982-01-01		(pdf)
AN EXPERT SYSTEM SUPPORTING ANALYSIS AND MANAGEMENT DECISION MAKING	Salvatore J. Stolfo, Gregg T. Vesonder	1982-01-01		(pdf)
PROGRAMMING IN THE DADO MACHINE: AN INTRODUCTION TO PPL/M	Salvatore J. Stolfo, Daniel Miranker, David Elliot Shaw	1982-01-01		(pdf)
MEASURING UNCERTAINTY WITHOUT A NORM	Arthur G. Werschulz	1982-01-01		(pdf)
STATISTICAL SECURITY OF A STATISTICAL DATA BASE	J. F. Traub, H. Wozniakowski, Y. Yemini	1982-01-01		(pdf)
REPRESENTING COMPLEX PHYSICAL OBJECTS IN MEMORY	Kenneth Wasserman, Michael Lebowitz	1982-01-01		(pdf)
DOES INCREASED REGULARITY LOWER COMPLEXITY?	Arthur G. Werschulz	1982-01-01		(pdf)
AN INFORMATION-THEORETIC SCALE FOR CULTURAL RULE SYSTEMS	Jonathan L. Gross	1981-01-01		(pdf)
UNDERSTANDING: THE RELATION BETWEEN LANGUAGE AND MEMORY	Michael Lebowitz	1981-01-01		(pdf)
MEMORY-BASED PARSING	Michael Lebowitz	1981-01-01		(pdf)
REPRESENTING COMPLEX EVENTS SIMPLY	Michael Lebowitz	1981-01-01		(pdf)
DOUBLE-EDGE-TRIGGERED FLIP-FLOPS	Stephen H. Unger	1981-01-01		(pdf)
SPECIALIZED HARDWARE FOR PRODUCTION SYSTEMS	Salvatore J. Stolfo, David Elliot Shaw	1981-01-01		(pdf)
A HIGHLY PARALLEL VLSI-BASED SUBSYSTEM OF THE NON-VONDTABASE MACHINE"	David Elliot Shaw, Hussein Ibrahim, Gio Widerhold, Jim Andrews	1981-01-01		(pdf)
NON-VON: A PARALLEL MACHINE ARCHITECTURE FOR KNOWLEDGE-BASED INFORMATION PROCESSING	David Elliot Shaw	1981-01-01		(pdf)
ON THE LONG-TERM IMPLICATIONS OF DATABASE MACHINE RESEARCH	David Elliot Shaw	1981-01-01		(pdf)
SELFISH OPTIMIZATION IN COMPUTER NETWORKS	Yechiam Yemini	1981-01-01		(pdf)
PARALLEL KNOWLEDGE-BASED INFORMATION RETRIEVAL ON THE NON-VON MACHINE	David Elliot Shaw	1981-01-01		(pdf)
THE NON-VON DATABASE MACHINE: AN OVERVIEW	David Elliot Shaw, Salvatore J. Stolfo, Hussein	1981-01-01		(pdf)
SOME PROBLEMS IN TOPOLOGICAL GRAPH THEORY	Jonathan L. Gross	1980-01-01		(pdf)
COMPUTATIONAL COMPLEXITY	J. F. Traub	1980-01-01		(pdf)
KNOWLEDGE-BASED RETRIEVAL ON A RATIONAL DATABASE MACHINE	David Elliot Shaw	1980-01-01		(pdf)
LEARNING META-RULE CONTROL OF PRODUCTION SYSTEMS FROM EXECUTION TRACES	Malcolm C. Harrison, Salvatore J. Stolfo	1980-01-01		(pdf)
CAN ANY STATIONARY ITERATION USING LINEAR INFORMATION BE GLOBALLY CONVERGENT?	G. W. Wasilkowski	1980-01-01		(pdf)
THE STRENGTH OF NONSTATIONARY ITERATION	G. W. Wasilkowski	1979-01-01		(pdf)
A RELATIONAL DATABASE MACHINE ARCHITECTURE	David Elliot Shaw	1979-01-01		(pdf)
A HIERARCHIAL ASSOCIATIVE ARCHITECTURE FOR THE PARALLEL EVALUATION OF RELATIONAL ALGEBRAIC DATABASE PRIMITIVES	David Elliot Shaw	1979-01-01		(pdf)
LEARNING CONTROL OF PRODUCTION SYSTEMS	Salvatore J. Stolfo	1979-01-01		(pdf)
ANY ITERATION FOR POLYNOMIAL EQUATIONS USING LINEAR INFORMATION HAS INFINITE COMPLEXITY	G. W. Wasilkowski	1979-01-01		(pdf)
STRUCTURE AND ABSTRACTION IN A SYSTEM FOR CONCEPTUAL METHODS	David Elliot Shaw	1977-01-01		(pdf)
INFERRING LISP PROGRAMS FROM EXAMPLES	David Elliot Shaw, William R. Swartout, C. Cordell Green	1975-01-01		(pdf)