Title | Authors | Published | Abstract | Publication Details |
---|---|---|---|---|
From Brain-Computer Interfaces to AI-Enhanced Diagnostics: Developing Cutting-Edge Tools for Medical and Interactive Technologies | Haowen Wei | 2024-06-24 | This thesis presents a series of studies that explore advanced computational techniques and interfaces in the domain of human-computer interaction (HCI), specifically focusing on brain-computer interfaces (BCIs), vision transformers for medical diagnosis, and eye-tracking input systems. The first study introduces PhysioLabXR, a Python platform designed for real-time, multi-modal BCI and extended reality experiments. This platform enhances the interaction in neuroscience and HCI by integrating physiological signals with computational models, supporting sophisticated data analysis and visualization tools that cater to a wide range of experimental needs. The second study delves into the application of vision transformers to the medical field, particularly for glaucoma diagnosis. We developed an expert knowledge-distilled vision transformer that leverages deep learning to analyze ocular images, providing a highly accurate and non-invasive tool for detecting glaucoma, thereby aiding in early diagnosis and treatment strategies. The third study explores SwEYEpe, an innovative eye-tracking input system designed for text entry in virtual reality (VR) environments. By leveraging eye movement data to predict text input, SwEYEpe offers a novel method of interaction that enhances user experience by minimizing physical strain and optimizing input efficiency in immersive environments. Together, these studies contribute to the fields of HCI and medical informatics by providing robust tools and methodologies that push the boundaries of how we interact with and through computational systems. This thesis not only demonstrates the feasibility of these advanced technologies but also paves the way for future research that could further integrate such systems into everyday applications for enhanced interaction and diagnostic processes. | (pdf) (ps) |
Computer Vision-Powered Applications for Interpreting and Interacting with Movement | Basel Nitham Hindi | 2023-12-24 | Movement and our ability to perceive it are core elements of the human experience. To bridge the gap between artificial intelligence research and the daily lives of people, this thesis explores leveraging advancements in the field of computer vision to enhance human experiences related to movement. Through two projects, I leverage computer vision to aid Blind and Low Vision (BLV) people in perceiving sports gameplay, and provide navigation assistance for pedestrians in outdoor urban environments. I present Front Row, a system that enables BLV viewers to interpret tennis matches through immersive audio cues, along with StreetNav, a system that repurposes street cameras for real-time, precise outdoor navigation assistance and environmental awareness. User studies and technical evaluations demonstrate the potential of these systems in augmenting people’s experiences perceiving and interacting with movement. This exploration also uncovers challenges in deploying such solutions along with opportunities in the design of future technologies. | (pdf) (ps) |
Advancing Few-Shot Multi-Label Medication Prediction in Intensive Care Units: The FIDDLE-Rx Approach | Xinghe Chen | 2023-12-04 | Contemporary intensive care units (ICUs) are navigating the challenge of enhancing medical service quality amidst financial and resource constraints. Machine learning models have surfaced as valuable tools in this context, showcasing notable effectiveness in supporting healthcare delivery. Despite advancements, a gap remains in real-time medical interventions. To bridge this gap, we introduce FIDDLE-Rx, a novel, data-driven machine learning approach designed specifically for real-time medication recommendations in ICUs. This method leverages the eICU Collaborative Research Database (eICU-CRD) for its analysis, which encompasses diverse electronic health records from ICUs (ICU-EHRs) sourced from multiple critical care centers across the US. FIDDLE-Rx employs the Flexible Data-Driven Pipeline (FIDDLE) for transforming tabular data into binary matrix representations and standardizes medication labels using the RxNorm (Rx) API. With the processed dataset, FIDDLE-Rx applies various machine learning models to forecast the requirements for 238 medications. Compared with previous studies, FIDDLE-Rx stands out by extending the scope of the research of ICU-EHRs beyond mortality prediction, offering a more comprehensive approach to enhancing critical care. The experimental results of our models demonstrate high efficacy, evidenced by their impressive performance across two key metrics: the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Remarkably, these results were achieved even when the model was trained with just 20% of the database, underlining its strong generalizability. By broadening the scope of ICU-EHRs research to encompass real-time medication recommendations, FIDDLE-Rx presents a scalable and effective solution for improving patient care in intensive care environments. | (pdf) (ps) |
Koopman Constrained Policy Optimization: A Koopman operator theoretic method for differentiable optimal control in robotics | Matthew Retchin | 2023-05-10 | Deep reinforcement learning has recently achieved state-of-the-art results for robotic control. Robots are now beginning to operate in unknown and highly nonlinear environments, expanding their usefulness for everyday tasks. In contrast, classical control theory is not suitable for these unknown, nonlinear environments. However, it retains an immense advantage over traditional deep reinforcement learning: guaranteed satisfaction of hard constraints, which is critically important for the performance and safety of robots. This thesis introduces Koopman Constrained Policy Optimization (KCPO), combining implicitly differentiable model predictive control with a deep Koopman autoencoder. KCPO brings new optimality guarantees to robot learning in unknown and nonlinear dynamical systems. The use of KCPO is demonstrated in Simple Pendulum and Cartpole with continuous state and action spaces and unknown environments. KCPO is shown to be able to train policies end-to-end with hard box constraints on controls. Compared to several baseline methods, KCPO exhibits superior generalization to constraints that were not part of its training. | (pdf) (ps) |
Formal Verification of a Multiprocessor Hypervisor on Arm Relaxed Memory Hardware | Runzhou Tao, Jianan Yao, Xupeng Li, Shih-Wei Li, Jason Nieh, Ronghui Gu | 2021-06-01 | As Arm servers are increasingly used by cloud providers, the complexity of its system software, such as operating systems and hypervisors, poses a growing security risk, as large codebases contain many vulnerabilities. While formal verification offers a potential solution for secure concurrent systems software, existing approaches have not been able to prove the correctness of systems software on Arm relaxed memory hardware. We introduce VRM, a new framework that can be used to verify kernel-level system software which satisfies a set of synchronization and memory access properties such that these programs can be mostly verified on a sequentially consistent hardware model and the proofs will automatically hold on Arm relaxed memory hardware. VRM can be used to verify concurrent kernel code that is not data race free, which is typical for kernel code responsible for managing shared page tables. Using VRM, we prove for the first time the security guarantees of a retrofitted implementation of the Linux KVM multiprocessor hypervisor on Arm. For multiple versions of KVM, we prove KVM’s security properties on a sequentially consistent model, then prove that KVM satisfies VRM’s required program properties such that its security proofs hold for Arm relaxed memory hardware. Our experimental results across multiple verified KVM versions show that the retrofit does not adversely affect the scalability of verified KVM, as it performs comparably to unmodified KVM when running many virtual machines concurrently with real application workloads on Arm server hardware. | (pdf) (ps) |
Topics in Landmarking and Elementwise Mapping | Mehmet Kerem Turkcan | 2021-04-12 | In this thesis, we consider a number of different landmarking and elementwise mapping problems and propose solutions that are thematically interconnected with each other. We consider diverse problems ranging from landmarking to deep dictionary learning, pan-sharpening, compressive sensing magnetic resonance imaging and microgrid control, introducing novelties that go beyond the state of the art for the problems we discuss. We start by introducing a manifold landmarking approach trainable via stochastic gradient descent that allows for the consideration of structural regularization terms in the objective. We extend the approach for semi-supervised learning problems, showing that it is able to achieve comparable or better results than equivalent $k$-means based approaches. Inspired by these results, we consider an extension of this approach for general supervised and semi-supervised classification for structurally similar deep neural networks with self-modulating radial basis kernels. Secondly, we consider convolutional networks that perform image-to-image mappings for the problems of pan-sharpening and compressive sensing magnetic resonance imaging. Using extensions of deep state of the art image-to-image mapping architectures specifically tailored for these problems, we show that they could be addressed naturally and effectively. After this, we move on to describe a method for multilayer dictionary learning and feedforward sparse coding by formulating the dictionary learning problem using a general deep learning layer architecture inspired by analysis dictionary learning. We find this method to be significantly faster to train than classical online dictionary learning approaches and capable of addressing supervised and semi-supervised classification problems more naturally. Lastly, we look at the problem of per-user power supply delivery on a microgrid powered by solar energy. Using real-world data obtained via The Earth Institute, we consider the problem of deciding the amount of power to supply to all each user for a certain period of time given their current power demand as well as past demand/supply data. We approach the problem as one of demand-to-supply mapping, providing results for a policy network trained via regular propagation for worst-case control and classical deep reinforcement learning. | (pdf) (ps) |
SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation | Yiqing Liang, Boyuan Chen, Shuran Song | 2021-04-06 | This thesis focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment. To complete this task, the algorithm should simultaneously locate and navigate to an instance of the category. In comparison to the traditional point goal navigation, this task requires the agent to have a stronger contextual prior to indoor environments. This thesis introduces SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent’s navigation planning. Given a partial observation of the environment, SSCNav first infers a complete scene representation with semantic labels for the unobserved scene together with a confidence map associated with its own prediction. Then, a policy network infers the action from the scene completion result and confidence map. The experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies. Code and data: https://sscnav.cs.columbia.edu/ | (pdf) (ps) |
Semantic Controllable Image Generation in Few-shot Settings | Jianjin Xu | 2021-04-06 | Generative Adversarial Networks (GANs) are able to generate high-quality images, but it remains difficult to explicitly specify the semantics of synthesized images. In this work, we aim to better understand the semantic representation of GANs, and thereby enable semantic control in GAN’s generation process. Interestingly, we find that a well-trained GAN encodes image semantics in its internal feature maps in a surprisingly simple way: a linear transformation of feature maps suffices to extract the generated image semantics. To verify this simplicity, we conduct extensive experiments on various GANs and datasets; and thanks to this simplicity, we are able to learn a semantic segmentation model for a trained GAN from a small number (e.g., 8) of labeled images. Last but not least, leveraging our findings, we propose two few-shot image editing approaches, namely Semantic-Conditional Sampling and Semantic Image Editing. Given an unsupervised GAN and as few as eight semantic annotations, the user is able to generate diverse images subject to a user-provided semantic layout, and control the synthesized image semantics. | (pdf) (ps) |
SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation | Yiqing Liang, Boyuan Chen, Shuran Song | 2021-03-30 | This paper focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment. To complete this task, the algorithm should simultaneously locate and navigate to an instance of the category. In comparison to the traditional point goal navigation, this task requires the agent to have a stronger contextual prior to indoor environments. We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent's navigation planning. Given a partial observation of the environment, SSCNav first infers a complete scene representation with semantic labels for the unobserved scene together with a confidence map associated with its own prediction. Then, a policy network infers the action from the scene completion result and confidence map. Our experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies. Code and data: https://sscnav.cs.columbia.edu/ | (pdf) (ps) |
Using Continous Logic Networks for Hardware Allocation | Anthony Saieva, Dennis Roellke, Suman Jana, Gail Kaiser | 2020-08-04 | Increased production efficiency combined with a slowdown in Moore's law and the end of Dennard scaling have made hardware accelerators increasingly important. Accelerators have become available on many different systems from the cloud to embedded systems. This modern computing paradigm makes specialized hardware available at scale in a way it never has before. While accelerators have shown great efficiency in terms of power consumption and performance, matching software functions with the best available hardware remains problematic without manual selection. Since there is some software representation of each accelerator's function, selection can be automated via code analysis. Static similarity analysis has traditionally been based on solving satisfiable modulo theorems (SMT), but continuous logic networks (CLNs) have provided a faster and more efficient alternative to traditional SMT-solving by replacing boolean functions with smooth estimations. These smooth estimates create the opportunity to leverage gradient descent to learn the solution. We present AccFinder, the first CLN-based code similarity solution and evaluate its effectiveness on a realistically complex accelerator benchmark. | (pdf) (ps) |
SABER: Identifying SimilAr BEhavioR for Program Comprehension | Aditya Sridhar, Guanming Qiao, Gail Kaiser | 2020-07-06 | Modern software engineering practices rely on program comprehension as the most basic underlying component for improving developer productivity and software reliability. Software developers are often tasked to work with unfamiliar code in order to remove security vulnerabilities, port and refactor legacy code, and enhance software with new features desired by users. Automatic identification of behavioral clones, or behaviorally-similar code, is one program comprehension technique that can provide developers with assistance. The idea is to identify other code that "does the same thing" and that may be more intuitive; better documented; or familiar to the developer, to help them understand the code at hand. Unlike the detection of syntactic or structural code clones, behavioral clone detection requires executing workloads or test cases to find code that executes similarly on the same inputs. However, a key problem in behavioral clone detection that has not received adequate attention is the "preponderance of the evidence" problem, which advocates for more convincing evidence from nontrivial test case executions to gain confidence in the behavioral similarities. In other words, similar outputs for some inputs matter more than for others. We present a novel system, SABER, to address the "preponderance of the evidence" problem, for which we adapt the legal metaphor of "more likely to be true than not true" burden of proof. We develop a novel test case generation methodology with three primary dynamic analysis techniques for identifying important behavioral clones. Further, we investigate filtering and weighting schemes to guide developers toward the most convincing behavioral similarities germane to specific software engineering tasks, such as code review, debugging, and introducing new features. | (pdf) (ps) |
A Secure and Formally Verified Linux KVM Hypervisor | Shih-Wei Li, Xupeng Li, Ronghui Gu, Jason Nieh, John Zhuang Hui | 2020-06-27 | Commodity hypervisors are widely deployed to support virtual machines (VMs) on multiprocessor hardware. Their growing complexity poses a security risk. To enable formal verification over such a large code base, we present MicroV, a microverification approach for verifying commodity multiprocessor hypervisors. MicroV retrofits an existing, full-featured hypervisor, into a large collection of untrusted hypervisor services, and a small, verifiable hypervisor core. MicroV introduces security-preserving layers to gradually prove that the implementation of the core refines its high-level layered specification, and ensure that security guarantees proven at the top layer are propagated down through all the layers, such that they hold for the entire implementation. MicroV supports proving noninterference in the sanctioned presence of encrypted data sharing, using data oracles to distinguish between intentional and unintentional information flow. Using MicroV, we retrofitted the Linux KVM hypervisor with only modest modifications to its code base and verify in Coq that the retrofitted KVM protects the confidentiality and integrity of VM data. Our work is the first machine-checked security proof for a commodity multiprocessor hypervisor. | (pdf) (ps) |
Ad hoc Test Generation Through Binary Rewriting | Anthony Saieva, Gail Kaiser | 2020-04-09 | When a security vulnerability or other critical bug is not detected by the developers’ test suite, and is discovered post-deployment, developers must quickly devise a new test that reproduces the buggy behavior. Then the developers need to test whether their candidate patch indeed fixes the bug, without breaking other functionality, while racing to deploy before cyberattackers pounce on exposed user installations. This can be challenging when the bug discovery was due to factors that arose, perhaps transiently, in a specific user environment. If recording execution traces when the bad behavior occurred, record-replay technology faithfully replays the execution, in the developer environment, as if the program were executing in that user environment under the same conditions as the bug manifested. This includes intermediate program states dependent on system calls, memory layout, etc. as well as any externally-visible behavior. So the bug is reproduced, and many modern record-replay tools also integrate bug reproduction with interactive debuggers to help locate the root cause, but how do developers check whether their patch indeed eliminates the bug under those same conditions? State-of-the-art record-replay does not support replaying candidate patches that modify the program in ways that diverge program state from the original recording, but successful repairs necessarily diverge so the bug no longer manifests. This work builds on recordreplay, and binary rewriting, to automatically generate and run tests for candidate patches. These tests reflect the arbitrary (ad hoc) user and system circumstances that uncovered the vulnerability, to check whether a patch indeed closes the vulnerability but does not modify the corresponding segment of the program’s core semantics. Unlike conventional ad hoc testing, each test is reproducible and can be applied to as many prospective patches as needed until developers are satisfied. The proposed approach also enables users to make new recordings of her own workloads with the original version of the program, and automatically generate and run the corresponding ad hoc tests on the patched version, to validate that the patch does not introduce new problems before adopting. | (pdf) (ps) |
Privacy Threats from Seemingly Innocuous Sensors | Shirish Singh, Anthony Saieva, Gail Kaiser | 2020-04-09 | Smartphones incorporate a plethora of diverse and powerful sensors that enhance user experience. Two such sensors are the accelerometer and gyroscope, which measure acceleration in all three spatial dimensions and rotation along the three axes of the smartphone, respectively. These sensors are often used by gaming and fitness apps. Unlike other sensors deemed to carry sensitive user data, such as GPS, camera, and microphone, the accelerometer and gyroscope do not require user permission on Android to transmit data to apps. This paper presents our IRB-approved studies showing that the accelerometer and gyroscope gather sufficient data to quickly infer the user's gender. We started with 33 in-person participants, with 88% accuracy, and followed up with 259 on-line participants to show the effectiveness of our technique. Our unobtrusive ShakyHands technique draws on these sensors to deduce additional demographic attributes that might be considered sensitive information, notably pregnancy. We have implemented ShakyHands for Android as an app, available from Google Play store, and as a Javascript browser web-app for Android and iOS smartphones. We show that even a low-skilled attacker, without expertise in signal processing or deep learning, can succeed at inferring demographic information such as gender and pregnancy. Our approach does not require tampering with the victim's device or specialized hardware; all our study participants used their own phones. | (pdf) (ps) |
The FHW Project: High-Level Hardware Synthesis from Haskell Programs | Stephen A. Edwards | 2019-08-04 | The goal of the FHW project was to produce a compiler able to translate programs written in a functional language (we chose Haskell) into synthesizable RTL (we chose SystemVerilog) suitable for execution on an FPGA or ASIC that was highly parallel. We ultimately produced such a compiler, relying on the Glasgow Haskell Compiler (GHC) as a front-end and writing our own back-end that performed a series of lowering transformations to restructure such constructs as recursion, polymorphism, and frst-order functions, into a form suitable for hardware, then transform the now-restricted functional IR into a datafow representation that is then finally transformed into synthesizable SystemVerilog. | (pdf) (ps) |
Compiling Irregular Software to Specialized Hardware | Richard Townsend | 2019-06-05 | High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an accelerator’s behavior in a “high-level” language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications. In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism. This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before. | (pdf) (ps) |
Extractive Text Summarization Methods Inspired By Reinforcement Learning for Better Generalization | Yan Virin | 2019-05-23 | This master thesis opens with a description of several text summarization methods based on machine learning approaches inspired by reinforcement learning. While in many cases Maximum Likelihood Estimation (MLE) approaches work well for text summarization, they tend to suffer from poor generalization. We show that techniques which expose the model to more opportunities to learn from data tend to generalize better and generate summaries with less lead bias. In our experiments we show that out of the box these new models do not perform significantly better than MLE when evaluated using Rouge, however do possess interesting properties which may be used to assemble more sophisticated and better performing summarization systems. The main theme of the thesis is getting machine learning models to generalize better using ideas from reinforcement learning. We develop a new labeling scheme inspired by Reward Augmented Maximum Likelihood (RAML) methods developed originally for the machine translation task, and discuss how difficult it is to develop models which sample from their own distribution while estimating the gradient e.g. in Minimum Risk Training (MRT) and Reinforcement Learning Policy Gradient methods. We show that RAML can be seen as a compromise between direct optimization of the model towards optimal expected reward using Monte Carlo methods which may fail to converge, and standard MLE methods which fail to explore the entire space of summaries, overfit during training by capturing prominent position features and thus perform poorly on unseen data. To that end we describe and show results of domain transfer experiments, where we train the model on one dataset and evaluate on another, and position distribution experiments, in which we show how the distribution of positions of our models differ from the distribution in MLE. We also show that our models work better on documents which are less lead biased, while standard MLE models get significantly worse performance on those documents in particular. Another topic covered in the thesis is Query Focused text summarization, where a search query is used to produce a summary with the query in mind. The summary needs to be relevant to the query, rather than solely contain important information from the document. We use ii the recently published Squad dataset and adapt it for the Query Focused summarization task. We also train deep learning Query Focused models for summarization and discuss problems associated with that approach. Finally we describe a method to reuse an already trained QA model for the Query Focused text summarization by introducing a reduction of the QA task into the Query Focused text summarization. The source code in python for all the techniques and approaches described in this thesis are available at https://github.com/yanvirin/material. | (pdf) (ps) |
Easy Email Encryption with Easy Key Management | John S. Koh, Steven M. Bellovin, Jason Nieh | 2018-10-02 | Email privacy is of crucial importance. Existing email encryption approaches are comprehensive but seldom used due to their complexity and inconvenience. We take a new approach to simplify email encryption and improve its usability by implementing receiver-controlled encryption: newly received messages are transparently downloaded and encrypted to a locally-generated key; the original message is then replaced. To avoid the problem of users having to move a single private key between devices, we implement per-device key pairs: only public keys need be synchronized to a single device. Compromising an email account or email server only provides access to encrypted emails. We have implemented this scheme for both Android and as a standalone daemon; we show that it works with both PGP and S/MIME, is compatible with widely used email services including Gmail and Yahoo! Mail, has acceptable overhead, and that users consider it intuitive and easy to use. | (pdf) (ps) |
Analysis of the CLEAR Protocol per the National Academies' Framework | Steven M. Bellovin, Matt Blaze, Dan Boneh, Susan Landau, Ronald L. Rivest | 2018-05-10 | The debate over "exceptional access"--the government’s ability to read encrypted data--has been going on for many years and shows no signs of resolution any time soon. On the one hand, some people claim it can be accomplished safely; others dispute that. In an attempt to make progress, a National Academies study committee propounded a framework to use when analyzing proposed solutions. We apply that framework to the CLEAR protocol and show the limitations of the design. | (pdf) (ps) |
Robot Learning in Simulation for Grasping and Manipulation | Beatrice Liang | 2018-05-10 | Teaching a robot to acquire complex motor skills in complicated environments is one of the most ambitious problems facing roboticists today. Grasp planning is a subset of this problem which can be solved through complex geometric and physical analysis or computationally expensive data driven analysis. As grasping problems become more difficult, building analytical models becomes challenging. Consequently, we aim to learn a grasping policy through a simulation-based data driven approach. In this paper, we create and execute tests to evaluate simulator’s suitability for manipulating objects in highly constrained settings. We investigate methods for creating forward models of a robot’s dynamics, and apply a Model Free Reinforcement Learning approach with the goal of developing a grasping policy based solely on proprioception. | (pdf) (ps) |
Partial Order Aware Concurrency Sampling | Xinhao Yuan, Junfeng Yang, Ronghui Gu | 2018-04-15 | We present POS, a concurrency testing approach that directly samples the partial orders of a concurrent program. POS uses a novel priority-based scheduling algorithm that naturally considers partial order information dynamically, and guarantees that each partial order will be explored with significant probability. This probabilistic guarantee of error detection is exponentially better than state-of-the-art sampling approaches. Besides theoretical guarantees, POS is extremely simple and lightweight to implement. Evaluations show that POS is effective in covering the partial-order space of micro-benchmarks and finding concurrency bugs in real-world programs such as Firefox’s JavaScript engine SpiderMonkey. | (pdf) (ps) |
Stretchcam: zooming using thin, elastic optics | Daniel Sims, Oliver Cossairt, Yonghao Yue, Shree Nayar | 2017-12-31 | Stretchcam is a thin camera with a lens capable of zooming with small actuations. In our design, an elastic lens array is placed on top of a sparse, rigid array of pixels. This lens array is then stretched using a small mechanical motion in order to change the field of view of the system. We present in this paper the characterization of such a system and simulations which demonstrate the capabilities of stretchcam. We follow this with the presentation of images captured from a prototype device of the proposed design. Our prototype system is able to achieve 1.5 times zoom when the scene is only 300 mm away with only a 3% change of the lens array’s original length. | (pdf) (ps) |
Design and Implementation of IoT Android Commissioner | Andy Lianghua Xu, Jan Janak, Henning Schulzrinne | 2017-09-21 | As Internet of Things (IoT) devices gain more popularity, device management gradually becomes a major issue to IoT device users. To manage an IoT device, the user first needs to join it to an existing network. Then, the IoT device has to be authenticated by the user. The authentication process often requires a two-way communication between the new device and a trusted entity, which is typically a hand- held device owned by the user. To ease and standardize this process, we present the Device Enrollment Protocol (DEP) as a solution to the enrollment problem described above. Starting from DEP, we then showcase the design of an IoT device commissioner and its prototype implementation on Android, named Android Commissioner. The application allows the user to authenticate IoT devices and join them to an existing protected network. | (pdf) (ps) |
Searching for Meaning in RNNs using Deep Neural Inspection | Kevin Lin, Eugene Wu | 2017-06-01 | Recent variants of Recurrent Neural Networks (RNNs)---in particular, Long Short-Term Memory (LSTM) networks---have established RNNs as a deep learning staple in modeling sequential data in a variety of machine learning tasks. However, RNNs are still often used as a black box with limited understanding of the hidden representation that they learn. Existing approaches such as visualization are limited by the manual effort to examine the visualizations and require considerable expertise, while neural attention models change, rather than interpret, the model. We propose a technique to search for neurons based on existing interpretable models, features, or programs. | (pdf) |
Reliable Synchronization in Multithreaded Servers | Rui Gu | 2017-05-15 | State machine replication (SMR) leverages distributed consensus protocols such as PAXOS to keep multiple replicas of a program consistent in face of replica failures or network partitions. This fault tolerance is enticing on implementing a principled SMR system that replicates general programs, especially server programs that demand high availability. Unfortunately, SMR assumes deterministic execution, but most server programs are multithreaded and thus non-deterministic. Moreover, existing SMR systems provide narrow state machine interfaces to suit specific programs, and it can be quite strenuous and error-prone to orchestrate a general program into these interfaces This paper presents CRANE, an SMR system that trans- parently replicates general server programs. CRANE achieves distributed consensus on the socket API, a common interface to almost all server programs. It leverages deterministic multithreading (specifically, our prior system PARROT) to make multithreaded replicas deterministic. It uses a new technique we call time bubbling to efficiently tackle a difficult challenge of non-deterministic network input timing. Evaluation on five widely used server programs (e.g., Apache, ClamAV, and MySQL) shows that CRANE is easy to use, has moderate overhead, and is robust. | (pdf) |
Deobfuscating Android Applications through Deep Learning | Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Baishakhi Ray | 2017-05-12 | Android applications are nearly always obfuscated before release, making it difficult to analyze them for malware presence or intellectual property violations. Obfuscators might hide the true intent of code by renaming variables, modifying the control flow of methods, or inserting additional code. Prior approaches toward automated deobfuscation of Android applications have relied on certain structural parts of apps remaining as landmarks, un-touched by obfuscation. For instance, some prior approaches have assumed that the structural relation- ships between identifiers (e.g. that A represents a class, and B represents a field declared directly in A) are not broken by obfuscators; others have assumed that control flow graphs maintain their structure (e.g. that no new basic blocks are added). Both approaches can be easily defeated by a motivated obfuscator. We present a new approach to deobfuscating Android apps that leverages deep learning and topic modeling on machine code, MACNETO. MACNETO makes few assumptions about the kinds of modifications that an obfuscator might perform, and we show that it has high precision when applied to two different state-of-the-art obfuscators: ProGuard and Allatori. | (pdf) (ps) |
Analysis of Super Fine-Grained Program Phases | Van Bui, Martha A. Kim | 2017-04-18 | Dynamic reconfiguration systems guided by coarse-grained program phases has found success in improving overall program performance and energy efficiency. These performance/energy savings are limited by the granularity that program phases are detected since phases that occur at a finer granularity goes undetected and reconfiguration opportunities are missed. In this study, we detect program phases using interval sizes on the order of tens, hundreds, and thousands of program cycles. This is in stark contrast with prior phase detection studies where the interval size is on the order of several thousands to millions of cycles. The primary goal of this study is to begin to fill a gap in the literature on phase detection by characterizing super fine-grained program phases and demonstrating an application where detection of these relatively short-lived phases can be instrumental. Traditional models for phase detection including basic block vectors and working set signatures are used to detect super fine-grained phases as well as a less traditional model based on microprocessor activity. Finally, we show an analytical case study where super fine-grained phases are applied to voltage and frequency scaling optimizations. | (pdf) |
Understanding and Detecting Concurrency Attacks | Rui Gu, Bo Gan, Jason Zhao, Yi Ning, Heming Cui, Junfeng Yang | 2016-12-30 | Just like bugs in single-threaded programs can lead to vulnerabilities, bugs in multithreaded programs can also lead to concurrency attacks. Unfortunately, there is little quantitative data on how well existing tools can detect these attacks. This paper presents the first quantitative study on concurrency attacks and their implications on tools. Our study on 10 widely used programs reveals 26 concurrency attacks with broad threats (e.g., OS privilege escalation), and we built scripts to successfully exploit 10 attacks. Our study further reveals that, only extremely small portions of inputs and thread interleavings (or schedules) can trigger these attacks, and existing concurrency bug detectors work poorly because they lack help to identify the vulnerable inputs and schedules. Our key insight is that the reports in existing detectors have implied moderate hints on what inputs and schedules will likely lead to attacks and what will not (e.g., benign bug reports). With this insight, this paper presents a new directed concurrency attack detection approach and its implementation, OWL. It extracts hints from the reports with static analysis, augments existing detectors by pruning out the benign inputs and schedules, and then directs detectors and its own runtime vulnerability verifiers to work on the remaining, likely vulnerable inputs and schedules. Evaluation shows that OWL reduced 94.3% reports caused by benign inputs or schedules and detected 7 known concurrency attacks. OWL also detected 3 previously unknown concurrency attacks, including a use-after-free attack in SSDB confirmed as CVE-2016-1000324, an integer overflow, HTML integrity violation in Apache and three new MySQL data races confirmed with bug ID 84064, 84122, 84241. All OWL source code, exploit scripts, and results are available at https://github.com/ruigulala/ConAnalysis. | (pdf) (ps) |
Mysterious Checks from Mauborgne to Fabyan | Steven M. Bellovin | 2016-11-28 | It has long been known that George Fabyan's Riverbank Laboratories provided the U.S. military with cryptanalytic and training services during World War I. The relationship has always be seen as voluntary. Newly discovered evidence raises the question of whether Fabyan was in fact paid, at least in part, for his services, but available records do not provide a definitive answer. | (pdf) |
Further Information on Miller's 1882 One-Time Pad | Steven M. Bellovin | 2016-11-25 | New information has been discovered about Frank Miller's 1882 one-time pad. These documents explain Miller's threat model and show that he had a reasonably deep understanding of the problem; they also suggest that his scheme was used more than had been supposed. | (pdf) |
Kensa: Sandboxed, Online Debugging of Production Bugs with No Overhead | Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser | 2016-10-27 | Short time-to-bug localization and resolution is extremely important for any 24x7 service-oriented application. In this work, we present a novel-mechanism which allows debugging of production systems on-the-fly. We leverage user-space virtualization technology (OpenVZ/LXC), to launch replicas from running instances of a production application, thereby having two containers: prod uction (which provides the real output), and debug (for debugging). The debug container provides a sandbox environment for debugging without any perturbation to the production environment. Customized network-proxy agents asynchronously replicate and replay network inputs from clients to both the production and debug-container, as well as safely discard all network output from the debug-container. We evaluated this low-overhead record and replay technique on five real-world applications, finding that it was effective at reproducing real bugs. In comparison to existing monitoring solutions which can slow-down production applications, Kensa allows application monitoring at “zero-overheadâ€. | (pdf) (ps) |
Discovering Functionally Similar Code with Taint Analysis | Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Simha Sethumadhavan | 2016-09-30 | Identifying similar code in software systems can assist many software engineering tasks such as program understanding and software refactoring. While most approaches focus on identifying code that looks alike, some techniques aim at detecting code that functions alike. Detecting these functional clones — code that functions alike — in object oriented languages remains an open question because of the difficulty in exposing and comparing programs’ functionality effectively, in general cases undecidable. We propose a novel technique, In-Vivo Clone Detection, which detects functional clones in arbitrary programs by identifying and mining their inputs and outputs. The key insight is to use existing workloads to execute programs and then measure functional similarities between programs based on their inputs and outputs. Further, to identify inputs and outputs of programs appropriately, we use the techniques of static and dynamic data flow analysis. These enhancements mitigate the problems in object oriented languages with respect to identifying program I/Os as reported by prior work. We implement such techniques in our system, HitoshiIO, which is open source and freely available. Our experimental results show that HitoshiIO detects ∼ 900 and ∼ 2, 000 functional clones by static and dynamic data flow analysis, respectively, across a corpus of 118 projects. In a random sample of the detected clones by the static data flow analysis, HitoshiIO achieves 68+% true positive rate with only 15% false positive rate. | (pdf) (ps) |
Heterogeneous Multi-Mobile Computing | Naser AlDuaij, Alexander Van't Hof, Jason Nieh | 2016-08-02 | As smartphones and tablets proliferate, there is a growing need to provide ways to combine multiple mobile systems into more capable ones, including using multiple hardware devices such as cameras, displays, speakers, microphones, sensors, and input. We call this multi-mobile computing. However, the tremendous device, hardware, and software heterogeneity of mobile systems makes this difficult in practice. We present M2, a system for multi-mobile computing across heterogeneous mobile systems that enable new ways of sharing and combining multiple devices. M2 leverages higher-level device abstractions and encoding and decoding hardware in mobile systems to define a client-server device stack that shares devices seamlessly across heterogeneous systems. M2 introduces device transformation, a new technique to mix and match heterogeneous input and output device data including rich media content. Example device transformations for transparently making existing unmodified apps multi-mobile include fused devices, which combine multiple devices into a more capable one, and translated devices, which can substitute use of one type of device for another. We have implemented an M2 prototype on Android that operates across heterogeneous hardware and software, including multiple versions of Android and iOS devices, the latter allowing iOS users to also run Android apps. Our results using unmodified apps from Google Play show that M2 can enable apps to be combined in new ways, and can run device-intensive apps across multiple mobile systems with modest overhead and qualitative performance indistinguishable from using local device hardware. | (pdf) (ps) |
Why Are We Permanently Stuck in an Elevator? A Software Engineering Perspective on Game Bugs | Iris Zhang | 2016-06-01 | In the past decade, the complexity of video games have increased dramatically and so have the complexity of software systems behind them. The difficulty in designing and testing games invariably leads to bugs that manifest themselves across funny video reels on graphical glitches and millions of submitted support tickets. This paper presents an analysis of game developers and their teams who have knowingly released bugs to see what factors may motivate them in doing so. It examines different development environments as well as inquiring into varied types of game platforms and play-style. Above all, it seeks out how established research on software development best practices and challenges should inform understanding of these bugs. These findings may lead to targeted efforts to mitigate some of the factors leading to glitches, tailored to the specific needs of the game development team. | (pdf) |
Software Engineering Methodologies and Life | Scott Lennon | 2016-06-01 | The paradigms of design patterns and software engineering methodologies are methods that apply to areas outside the software space. As a business owner and student, I implement many software principles daily in both my work and personal life. After experiencing the power of Agile methodologies outside the scope of software engineering, I always think about how I can integrate the computer science skills that I am learning at Columbia in my life. For my study, I seek to learn about other software engineering development processes that can be useful in life. I theorize that if a model such as Agile can provide me with useful tools, then a model that the government and most of the world trusts should have paradigms I can learn with as well. The software model I will study is open source software (OSS). My research examines the lateral software standards of (OSS) and closed source software (CSS). For the scope of this paper, I will focus on research primarily on Linux as the OSS model and Agile as the CSS model. OSS has had an extraordinary impact on the software revolution [1], and CSS models have gained such popularity that it’s paradigms extend far beyond the software engineering space. Before delving into research, I thought the methodologies of OSS and CSS would be radically different. My study shall describe the similarities that exist between these two methodologies. In the process of my research, I was able to implement the values and paradigms that define the OSS development model to work more productively in my business. Software engineering core values and models can be used as a tool to improve our lives. | (pdf) |
User Study: Programming Understanding from Similar Code | Anush Ramsurat | 2016-06-01 | The aim of the user study conducted is primarily threefold: • To accurately judge, based on a number of parameters, whether showing similar code helps in code comprehension. • To investigate separately, a number of cases involving dynamic code, static code, the effect of options on accuracy of responses, and so on. • To distribute the user survey, collect data, responses and feedback from the user study and draw conclusions. | (pdf) |
YOLO: A New Security Architecture for Cyber-Physical Systems | Miguel Arroyo, Jonathan Weisz, Simha Sethumadhavan, Hidenori Kobayashi, Junfeng Yang | 2016-05-24 | Cyber-physical systems (CPS) are defined by their unique characteristics involving both the cyber and physical domains. Their hybrid nature introduces new attack vectors but also provides an opportunity to design of new security architectures. In this work, we present YOLO,--- You Only Live Once --- a security architecture that leverages two unique physical properties of a CPS, inertia: the tendency of objects to stay at rest or in motion, and its built-in reliability to intermittent faults to survive CPS attacks. At a high level, YOLO aims to use a new diversified variant for every new sensor input to the CPS'. The delays involved in YOLO, viz., the delays for rebooting and diversification, are easily absorbed by the CPS because of the inherent inertia and their ability to withstand minor perturbations. We implement YOLO on an open source Car Engine Control Unit, and with measurements from a real race car engine show that YOLO is imminently practical. | (pdf) (ps) |
Identifying Functionally Similar Code in Complex Codebases | Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Simha Sethumadhavan | 2016-02-18 | Identifying similar code in software systems can assist many software engineering tasks, including program understand- ing. While most approaches focus on identifying code that looks alike, some researchers propose to detect instead code that functions alike, which are known as functional clones. However, previous work has raised the technical challenges to detect these functional clones in object oriented languages such as Java. We propose a novel technique, In-Vivo Clone Detection, a language-agnostic technique that detects functional clones in arbitrary programs by observing and mining inputs and outputs. We implemented this technique targeting programs that run on the JVM, creating HitoshiIO (available freely on GitHub), a tool to detect functional code clones. Our experimental results show that it is powerful in detecting these functional clones, finding 185 methods that are functionally similar across a corpus of 118 projects, even when there are only very few inputs available. In a random sample of the detected clones, HitoshiIO achieves 68+% true positive rate, while the false positive rate is only 15%. | (pdf) |
Cambits: A Reconfigurable Camera System | Makoto Odamaki, Shree K. Nayar | 2016-02-11 | Cambits is a set of physical blocks that can be used to build a wide variety of cam-eras with different functionalities. A unique feature of Cambits is that it is easy and quick to reconfigure. The blocks are assembled using magnets, without any screws or cables. When two blocks are attached, they are electrically connected by spring-loaded pins that carry power, data and control signals. With this novel architecture we can use Cambits to configure various imaging systems. The host computer al-ways knows the current configuration and presents the user with a menu of functionalities that the configuration can perform. We demonstrate a wide range of computational photography methods including HDR, wide angle, panoramic, collage, kaliedoscopic, post-focus, light field and stereo imaging. Cambits can even be used to configure a microscope. Cambits is a scalable system, allowing new blocks and accompanying software to be added to the existing set. | (pdf) (ps) |
Grandet: A Unified, Economical Object Store for Web Applications | Yang Tang, Gang Hu, Xinhao Yuan, Lingmei Weng, Junfeng Yang | 2016-02-02 | Web applications are getting ubiquitous every day because they offer many useful services to consumers and businesses. Many of these web applications are quite storage-intensive. Cloud computing offers attractive and economical choices for meeting their storage needs. Unfortunately, it remains challenging for developers to best leverage them to minimize cost. This paper presents Grandet, a storage system that greatly reduces storage cost for web applications deployed in the cloud. Grandet provides both a key-value interface and a file system interface, supporting a broad spectrum of web applications. Under the hood, it supports multiple heterogeneous stores, and unifies them by placing each data object at the store deemed most economical. We implemented Grandet on Amazon Web Services and evaluated Grandet on a diverse set of four popular open-source web applications. Our results show that Grandet reduces their cost by an average of 42.4%, and it is fast, scalable, and easy to use. The source code of Grandet is at http://columbia.github.io/grandet. | (pdf) |
A Measurement Study of ARM Virtualization Performance | Christoffer Dall, Shih-Wei Li, Jintack Lim, Jason Nieh | 2015-11-30 | ARM servers are becoming increasingly common, making server technologies such as virtualization for ARM of grow- ing importance. We present the first in-depth study of ARM virtualization performance on ARM server hardware, including measurements of two popular ARM and x86 hypervisors, KVM and Xen. We show how the ARM hardware support for virtualization can support much faster transitions between the VM and the hypervisor, a key hypervisor operation. However, current hypervisor designs, including both KVM (Type 1) and Xen (Type 2), are not able to lever- age this performance benefit in practice for real application workloads. We discuss the reasons why and show that other factors related to hypervisor software design and implementation have a larger role in overall performance than the speed of micro architectural operations. Based on our measurements, we discuss changes to ARM’s hardware virtualization support that can potentially bridge the gap to bring its faster virtual machine exit mechanism to modern Type 2 hypervisors running real applications. These changes have been incorporated into the latest ARM architecture. | (pdf) |
Use of Fast Multipole to Accelerate Discrete Circulation-Preserving Vortex Sheets for Soap Films and Foams | Fang Da, Christopher Batty, Chris Wojtan, Eitan Grinspun | 2015-11-07 | We report the integration of a FMM (Fast Multipole Method) template library “FMMTL†into the discrete circulation-preserving vortex sheets method to accelerate the Biot-Savart integral. We measure the speed-up on a bubble oscillation test with varying mesh resolution. We also report a few examples with higher complexity than previously achieved. | (pdf) |
Hardware in Haskell: Implementing Memories in a Stream-Based World | Richard Townsend, Martha Kim, Stephen Edwards | 2015-09-21 | Recursive functions and data types pose significant challenges for a Haskell-to-hardware compiler. Directly translating these structures yields infinitely large circuits; a subtler approach is required. We propose a sequence of abstraction-lowering transformations that exposes time and memory in a Haskell program. producing a simpler form for hardware translation. This paper outlines these transformations on a specific example; future research will focus on generalizing and automating them in our group's compiler. | (pdf) |
Improving System Reliability for Cyber-Physical Systems | Leon Wu | 2015-09-14 | Cyber-physical systems (CPS) are systems featuring a tight combination of, and coordination between, the system’s computational and physical elements. Cyber-physical systems include systems ranging from critical infrastructure such as a power grid and transportation system to health and biomedical devices. System reliability, i.e., the ability of a system to perform its intended function under a given set of environmental and operational conditions for a given period of time, is a fundamental requirement of cyber-physical systems. An unreliable system often leads to disruption of service, financial cost and even loss of human life. An important and prevalent type of cyber-physical system meets the following criteria: processing large amounts of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and an accountability requirement for safety critical systems. This thesis aims to improve system reliability for this type of cyber-physical system. To improve system reliability for this type of cyber-physical system, I present a system evaluation approach entitled automated online evaluation (AOE), which is a data-centric runtime monitoring and reliability evaluation approach that works in parallel with the cyber-physical system to conduct automated evaluation along the workflow of the system continuously using computational intelligence and self-tuning techniques and provide operator-in-the-loop feedback on reliability improvement. For example, abnormal input and output data at or between the multiple stages of the system can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and increased system reliability. One technique used by the approach is data quality analysis using computational intelligence, which applies computational intelligence in evaluating data quality in an automated and efficient way in order to make sure the running system perform reliably as expected. Another technique used by the approach is self-tuning which automatically self-manages and self-configures the evaluation system to ensure that it adapts itself based on the changes in the system and feedback from the operator. To implement the proposed approach, I further present a system architecture called autonomic reliability improvement system (ARIS). This thesis investigates three hypotheses. First, I claim that the automated online evaluation empowered by data quality analysis using computational intelligence can effectively improve system reliability for cyber-physical systems in the domain of interest as indicated above. In order to prove this hypothesis, a prototype system needs to be developed and deployed in various cyber-physical systems while certain reliability metrics are required to measure the system reliability improvement quantitatively. Second, I claim that the self-tuning can effectively self-manage and self-configure the evaluation system based on the changes in the system and feedback from the operator-in-the-loop to improve system reliability. Third, I claim that the approach is effcient. It should not have a large impact on the overall system performance and introduce only minimal extra overhead to the cyberphysical system. Some performance metrics should be used to measure the effciency and added overhead quantitatively. Additionally, in order to conduct efficient and cost-effective automated online evaluation for data-intensive CPS, which requires large volumes of data and devotes much of its processing time to I/O and data manipulation, this thesis presents COBRA, a cloud-based reliability assurance framework. COBRA provides automated multi-stage runtime reliability evaluation along the CPS workflow using data relocation services, a cloud data store, data quality analysis and process scheduling with self-tuning to achieve scalability, elasticity and efficiency. Finally, in order to provide a generic way to compare and benchmark system reliability for CPS and to extend the approach described above, this thesis presents FARE, a reliability benchmark framework that employs a CPS reliability model, a set of methods and metrics on evaluation environment selection, failure analysis, and reliability estimation. The main contributions of this thesis include validation of the above hypotheses and empirical studies of ARIS automated online evaluation system, COBRA cloud-based reliability assurance framework for data-intensive CPS, and FARE framework for benchmarking reliability of cyber-physical systems. This work has advanced the state of the art in the CPS reliability research, expanded the body of knowledge in this field, and provided some useful studies for further research. | (pdf) |
Exploiting Visual Perception for Sampling-Based Approximation on Aggregate Queries | Daniel Alabi | 2015-09-07 | Efficient sampling algorithms have been developed for approximating answers to aggregate queries on large data sets. In some formulations of the problem, concentration inequalities (such as Hoeffding’s inequality) are used to estimate the confidence interval for an approximated aggregated value. Samples are usually chosen until the confidence interval is arbitrarily small enough regardless of how the approximated query answers will be used (for example, in interactive visualizations). In this report, we show how to exploit visualization-specific properties to reduce the sampling complexity of a sampling-based approximate query processing algorithm while preserving certain visualization guarantees (the visual property of relative ordering) with a very high probability. | (pdf) |
Code Relatives: Detecting Similar Software Behavior | Fang-Hsiang Su, Kenneth Harvey, Simha Sethumadhavan, Gail Kaiser, Tony Jebara | 2015-08-28 | Detecting “similar code†is fundamental to many software engineering tasks. Current tools can help detect code with statically similar syntactic features (code clones). Unfortunately, some code fragments that behave alike without similar syntax may be missed. In this paper, we propose the term “code relatives†to refer to code with dynamically similar execution features. Code relatives can be used for such tasks as implementation-agnostic code search and classification of code with similar behavior for human understanding, which code clone detection cannot achieve. To detect code relatives, we present DyCLINK, which constructs an approximate runtime representation of code using a dynamic instruction graph. With our link analysis based subgraph matching algorithm, DyCLINK detects fine-grained code relatives efficiently. In our experiments, DyCLINK analyzed 290+ million prospective subgraph matches. The results show that DyCLINK detects not only code relatives, but also code clones that the state-of-the-art system is unable to identify. In a code classification problem, DyCLINK achieved 96% precision on average compared with the competitor’s 61%. | (pdf) |
Dynamic Taint Tracking for Java with Phosphor (Formal Tool Demonstration) | Jonathan Bell, Gail Kaiser | 2015-04-07 | Dynamic taint tracking is an information flow analysis that can be applied to many areas of testing. Phosphor is the first portable, accurate and performant dynamic taint tracking system for Java. While previous systems for performing general-purpose taint tracking in the JVM required specialized research JVMs, Phosphor works with standard off-the-shelf JVMs (such as Oracle's HotSpot and OpenJDK's IcedTea). Phosphor also differs from previous portable JVM taint tracking systems that were not general purpose (e.g. tracked only tags on Strings and no other type), in that it tracks tags on all variables. We have also made several enhancements to Phosphor, allowing it to track taint tags through control flow (in addition to data flow), as well as allowing it to track an arbitrary number of relationships between taint tags (rather than be limited to only 32 tags). In this demonstration, we show how developers writing testing tools can benefit from Phosphor, and explain briefly how to interact with it. | (pdf) |
Hardware Synthesis from a Recursive Functional Language | Kuangya Zhai, Richard Townsend, Lianne Lairmore, Martha A. Kim, Stephen A. Edwards | 2015-04-01 | Abstraction in hardware description languages stalled at the register-transfer level decades ago, yet few alternatives have had much success, in part because they provide only modest gains in expressivity. We propose to make a much larger jump: a compiler that synthesizes hardware from behavioral functional specifications. Our compiler translates general Haskell programs into a restricted intermediate representation before applying a series of semantics-preserving transformations, concluding with a simple syntax-directed translation to SystemVerilog. Here, we present the overall framework for this compiler, focusing on the IRs involved and our method for translating general recursive functions into equivalent hardware. We conclude with experimental results that depict the performance and resource usage of the circuitry generated with our compiler. | (pdf) |
M2: Multi-Mobile Computing | Naser AlDuaij, Alexander Van't Hof, Jason Nieh | 2015-03-31 | With the widespread use of mobile systems, there is a growing demand for apps that can enable users to collaboratively use multiple mobile systems, including hardware device features such as cameras, displays, speakers, microphones, sensors, and input. We present M2, a system for multi-mobile computing by enabling remote sharing and combining of devices across multiple mobile systems. M2 leverages higher-level device abstractions and encoding and decoding hardware in mobile systems to define a cross-platform interface for remote device sharing to operate seamlessly across heterogeneous mobile hardware and software. M2 can be used to build new multi-mobile apps as well as make existing unmodified apps multi-mobile aware through the use of fused devices, which transparently combine multiple devices into a more capable one. We have implemented an M2 prototype on Android that operates across heterogeneous hardware and software, including using Android and iOS remote devices, the latter allowing iOS users to also run Android apps. Our results using unmodified apps from Google Play show that M2 can enable even display-intensive 2D and 3D games to use remote devices across multiple mobile systems with modest overhead and qualitative performance indistinguishable from using local device hardware. | (pdf) |
Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing | Fang-Hsiang Su, Jonathan Bell, Christian Murphy, Gail Kaiser | 2015-02-27 | Metamorphic testing is an advanced technique to test programs and applications without a test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, KABU, which can dynamically infer properties that describe the characteristics of a program before and after transforming its input at the method level. Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort do identify them. This paper also proposes a new testing concept, Metamorphic Differential Testing (MDT). By comparing the MPs between different versions of the same application, KABU can detect potential bugs in the program. We have performed a preliminary evaluation of KABU by comparing the MPs detected by humans with the MPs detected by KABU. Our preliminary results are very promising: KABU can find more MPs as human developers, and its differential testing mechanism is effective at detecting function changes in programs. | (pdf) |
DisCo: Displays that Communicate | Kensei Jo, Mohit Gupta, Shree Nayar | 2014-12-16 | We present DisCo, a novel display-camera communication system. DisCo enables displays and cameras to communicate with each other, while also displaying and capturing images for human consumption. Messages are transmitted by temporally modulating the display brightness at high frequencies so that they are imperceptible to humans. Messages are received by a rolling shutter camera which converts the temporally modulated incident light into a spatial flicker pattern. In the captured image, the flicker pattern is superimposed on the pattern shown on the display. The flicker and the display pattern are separated by capturing two images with different exposures. The proposed system performs robustly in challenging real-world situations such as occlusion, variable display size, defocus blur, perspective distortion and camera rotation. Unlike several existing visible light communication methods, DisCo works with off-the-shelf image sensors. It is compatible with a variety of sources (including displays, single LEDs), as well as reflective surfaces illuminated with light sources. We have built hardware prototypes that demonstrate DisCo’s performance in several scenarios. Because of its robustness, speed, ease of use and generality, DisCo can be widely deployed in several CHI applications, such as advertising, pairing of displays with cell-phones, tagging objects in stores and museums, and indoor navigation. | (pdf) |
The Internet is a Series of Tubes | Henning Schulzrinne | 2014-11-28 | This is a contribution for the November 2014 Dagstuhl workshop on affordable Internet access. The contribution describes the issues of availability, affordability and relevance, with a particular focus on the experience with providing universal broadband Internet access in the United States. | (pdf) |
Making Lock-free Data Structures Verifiable with Artificial Transactions | Xinhao Yuan, David Williams-King, Junfeng Yang, Simha Sethumadhavan | 2014-11-11 | Among all classes of parallel programming abstractions, lock-free data structures are considered one of the most scalable and efficient because of their fine-grained style of synchronization. However, they are also challenging for developers and tools to verify because of the huge number of possible interleavings that result from fine-grained synchronizations. This paper address this fundamental problem between performance and verifiability of lock-free data structures. We present TXIT, a system that greatly reduces the set of possible interleavings by inserting transactions into the implementation of a lock-free data structure. We leverage hardware transactional memory support from Intel Haswell processors to enforce these artificial transactions. Evaluation on six popular lock-free data structures shows that TXIT makes it easy to verify lock-free data structures while incurring acceptable runtime overhead. Further analysis shows that two inefficiencies in Haswell are the largest contributors to this overhead. | (pdf) |
Metamorphic Runtime Checking of Applications Without Test Oracles | Jonathan Bell, Chris Murphy, Gail Kaiser | 2014-10-20 | For some applications, it is impossible or impractical to know what the correct output should be for an arbitrary input, making testing difficult. Many machine-learning applications for “big dataâ€, bioinformatics and cyberphysical systems fall in this scope: they do not have a test oracle. Metamorphic Testing, a simple testing technique that does not require a test oracle, has been shown to be effective for testing such applications. We present Metamorphic Runtime Checking, a novel approach that conducts metamorphic testing of both the entire application and individual functions during a program’s execution. We have applied Metamorphic Runtime Checking to 9 machine-learning applications, finding it to be on average 170% more effective than traditional metamorphic testing at only the full application level. | (pdf) |
Repeatable Reverse Engineering for the Greater Good with PANDA | Brendan Dolan-Gavitt, Josh Hodosh, Patrick Hulin, Tim Leek, Ryan Whelan | 2014-10-01 | We present PANDA, an open-source tool that has been purpose-built to support whole system reverse engineering. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., is represented by only a few hundred MB. Further, PANDA leverages QEMU's support of thirteen different CPU architectures to make analyses of those diverse instruction sets possible within the LLVM IR. In this way, PANDA can have a single dynamic taint analysis, for example, that precisely supports many CPUs. PANDA analyses are written in a simple plugin architecture which includes a mechanism to share functionality between plugins, increasing analysis code re-use and simplifying complex analysis development. We demonstrate PANDA's effectiveness via a number of use cases, including enabling an old but legitimate version of Starcraft to run despite a lost CD key, in-depth diagnosis of an Internet Explorer crash, and uncovering the censorship activities and mechanisms of a Chinese IM client. | (pdf) |
Detecting, Isolating and Enforcing Dependencies Between and Within Test Cases | Jonathan Bell | 2014-07-06 | Testing stateful applications is challenging, as it can be difficult to identify hidden dependencies on program state. These dependencies may manifest between several test cases, or simply within a single test case. When it's left to developers to document, understand, and respond to these dependencies, a mistake can result in unexpected and invalid test results. Although current testing infrastructure does not currently leverage state dependency information, we argue that it could, and that by doing so testing can be improved. Our results thus far show that by recovering dependencies between test cases and modifying the popular testing framework, JUnit, to utilize this information, we can optimize the testing process, reducing time needed to run tests by 62% on average. Our ongoing work is to apply similar analyses to improve existing state of the art test suite prioritization techniques and state of the art test case generation techniques. This work is advised by Professor Gail Kaiser. | (pdf) |
Phasor Imaging: A Generalization of Correlation-Based Time-of-Flight Imaging | Mohit Gupta, Shree Nayar, Matthias Hullin, Jaime Martin | 2014-06-26 | In correlation-based time-of-flight (C-ToF) imaging systems, light sources with temporally varying intensities illuminate the scene. Due to global illumination, the temporally varying radiance received at the sensor is a combination of light received along multiple paths. Recovering scene properties (e.g., scene depths) from the received radiance requires separating these contributions, which is challenging due to the complexity of global illumination and the additional temporal dimension of the radiance. We propose phasor imaging, a framework for performing fast inverse light transport analysis using C-ToF sensors. Phasor imaging is based on the idea that by representing light transport quantities as phasors and light transport events as phasor transformations, light transport analysis can be simplified in the temporal frequency domain. We study the effect of temporal illumination frequencies on light transport, and show that for a broad range of scenes, global radiance (multi-path interference) vanishes for frequencies higher than a scene-dependent threshold. We use this observation for developing two novel scene recovery techniques. First, we present Micro ToF imaging, a ToF based shape recovery technique that is robust to errors due to global illumination. Second, we present a technique for separating the direct and global components of radiance. Both techniques require capturing as few as 3-4 images and minimal computations. We demonstrate the validity of the presented techniques via simulations and experiments performed with our hardware prototype. | (pdf) |
Schur complement trick for positive semi-definite energies | Alec Jacobson | 2014-06-12 | The “Schur complement trick†appears sporadically in numerical optimization methods [Schur 1917; Cottle 1974]. The trick is especially useful for solving Lagrangian saddle point problems when minimizing quadratic energies subject to linear equality constraints [Gill et al. 1987]. Typically, to apply the trick, the energy’s Hessian is assumed positive definite. I generalize this technique for positive semi-definite Hessians. | (pdf) |
Model Aggregation for Distributed Content Anomaly Detection | Sean Whalen, Nathaniel Boggs, Salvatore J. Stolfo | 2014-06-02 | Cloud computing offers a scalable, low-cost, and resilient platform for critical applications. Securing these applications against attacks targeting unknown vulnerabilities is an unsolved challenge. Network anomaly detection addresses such zero-day attacks by modeling attributes of attack-free application traffic and raising alerts when new traffic deviates from this model. Content anomaly detection (CAD) is a variant of this approach that models the payloads of such traffic instead of higher level attributes. Zero-day attacks then appear as outliers to properly trained CAD sensors. In the past, CAD was unsuited to cloud environments due to the relative overhead of content inspection and the dynamic routing of content paths to geographically diverse sites. We challenge this notion and introduce new methods for efficiently aggregating content models to enable scalable CAD in dynamically-pathed environments such as the cloud. These methods eliminate the need to exchange raw content, drastically reduce network and CPU overhead, and offer varying levels of content privacy. We perform a comparative analysis of our methods using Random Forest, Logistic Regression, and Bloom Filter-based classifiers for operation in the cloud or other distributed settings such as wireless sensor networks. We find that content model aggregation offers statistically significant improvements over non-aggregate models with minimal overhead, and that distributed and non-distributed CAD have statistically indistinguishable performance. Thus, these methods enable the practical deployment of accurate CAD sensors in a distributed attack detection infrastructure. | (pdf) |
Vernam, Mauborgne, and Friedman: The One-Time Pad and the Index of Coincidence | Steven M. Bellovin | 2014-05-13 | The conventional narrative for the invention of the AT&T one-time pad was related by David Kahn. Based on the evidence available in the AT&T patent files and from interviews and correspondence, he concluded that Gilbert Vernam came up with the need for randomness, while Joseph Mauborgne realized the need for a non-repeating key. Examination of other documents suggests a different narrative. It is most likely that Vernam came up with the need for non-repetition; Mauborgne, though, apparently contributed materially to the invention of the two-tape variant. Furthermore, there is reason to suspect that he suggested the need for randomness to Vernam. However, neither Mauborgne, Herbert Yardley, nor anyone at AT&T really understood the security advantages of the true one-time tape. Col. Parker Hitt may have; William Friedman definitely did. Finally, we show that Friedman's attacks on the two-tape variant likely led to his invention of the index of coincidence, arguably the single most important publication in the history of cryptanalysis. | (pdf) |
Exploring Societal Computing based on the Example of Privacy | Swapneel Sheth | 2014-04-25 | Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. This thesis will consist of the following four projects that aim to address the issues of privacy and software engineering. First, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives. Second, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. As social network platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 59 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 70% of respondents found visualizations using crowd sourced data useful for understanding privacy settings, and 80% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls. Third, as software evolves over time, this might introduce bugs that breach users’ privacy. Further, there might be system-wide policy changes that could change users’ settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective. Fourth, approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable — systems where privacy could be achieved “for free,†i.e., without having to spend extra computational effort. We describe how privacy can indeed be achieved for free — an accidental and beneficial side effect of doing some existing computation — in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate. Finally, we generalize the problem of privacy and its tradeoffs. As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, Societal Computing, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole. | (pdf) |
A Convergence Study of Multimaterial Mesh-based Surface Tracking | Fang Da, Christopher Batty, Eitan Grinspun | 2014-04-14 | We report the results from experiments on the convergence of the multimaterial mesh-based surface tracking method introduced by the same authors. Under mesh refinement, approximately first order convergence or higher in L1 and L2 is shown for vertex positions, face normals and non-manifold junction curves in a number of scenarios involving the new operations proposed in the method. | (pdf) |
The Economics of Cyberwar | Steven M. Bellovin | 2014-04-11 | Cyberwar is very much in the news these days. It is tempting to try to understand the economics of such an activity, if only qualitatively. What effort is required? What can such attacks accomplish? What does this say, if anything, about the likelihood of cyberwar? | (pdf) |
Energy Exchanges: Internal Power Oversight for Applications | Melanie Kambadur, Martha A. Kim | 2014-04-08 | This paper introduces energy exchanges, a set of abstractions that allow applications to help hardware and operating systems manage power and energy consumption. Using annotations, energy exchanges dictate when, where, and how to trade performance or accuracy for power in ways that only an application’s developer can decide. In particular, the abstractions offer audits and budgets which watch and cap the power or energy of some piece of the application. The interface also exposes energy and power usage reports which an application may use to change its behavior. Such information complements existing system-wide energy management by operating systems or hardware, which provide global fairness and protections, but are unaware of the internal dynamics of an application. Energy exchanges are implemented as a user-level C++ library. The library employs an accounting technique to attribute shares of system-wide energy consumption (provided by system-wide hardware energy meters available on newer hardware platforms) to individual application threads. With these per-thread meters and careful tracking of an application’s activity, the library exposes energy and power usage for program regions of interest via the energy exchange abstractions with negligible runtime or power overhead. We use the library to demonstrate three applications of energy exchanges: (1) the prioritization of a mobile game’s energy use over third-party advertisements, (2) dynamic adaptations of the framerate of a video tracking benchmark that maximize performance and accuracy within the confines of a given energy allotment, and (3) the triggering of computational sprints and corresponding cooldowns, based on time, system TDP, and power consumption. | (pdf) |
Phosphor: Illuminating Dynamic Data Flow in the JVM | Jonathan Bell, Gail Kaiser | 2014-03-25 | Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to inputs, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, accuracy, and portability. Performance can be critical, as these systems are typically intended to be deployed with software, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be accurate and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code. We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, accuracy, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two versions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average), using the DaCapo macro benchmark suite. This paper describes the approach that Phosphor uses to achieve portable taint tracking in the JVM. | (pdf) |
Enhancing Security by Diversifying Instruction Sets | Kanad Sinha, Vasileios Kemerlis, Vasilis Pappas, Simha Sethumadhavan, Angelos Keromytis | 2014-03-20 | Despite the variety of choices regarding hardware and software, to date a large number of computer systems remain identical. Characteristic examples of this trend are Windows on x86 and Android on ARM. This homogeneity, sometimes referred to as “computing oligoculture", provides a fertile ground for malware in the highly networked world of today. One way to counter this problem is to diversify systems so that attackers cannot quickly and easily compromise a large number of machines. For instance, if each system has a different ISA, the attacker has to invest more time in developing exploits that run on every system manifestation. It is not that each individual attack gets harder, but the spread of malware slows down. Further, if the diversified ISA is kept secret from the attacker, the bar for exploitation is raised even higher. In this paper, we show that system diversification can be realized by enabling diversity at the lowest hardware/software interface, the ISA, with almost zero performance overhead. We also describe how prac- tical development and deployment problems of diversified systems can be handled easily in the context of popular software distrbution models, such as the mobile app store model. We demonstrate our proposal with an OpenSPARC FPGA prototype | (pdf) |
Teaching Microarchitecture through Metaphors | Julianna Eum, Simha Sethumadhavan | 2014-03-19 | Students traditionally learn microarchitecture by studying textual descriptions with diagrams but few analogies. Several popular textbooks on this topic introduce concepts such as pipelining and caching in the context of simple paper-only architectures. While this instructional style allows important concepts to be covered within a given class period, students have difficulty bridging the gap between what is covered in classes and real-world implementations. Discussing concrete implementations and complications would, however, take too much time. In this paper, we propose a technique of representing microarchitecture building blocks with animated metaphors to accelerate the process of learning about complex microarchitectures. We represent hardware implementations as road networks that include specific patterns of traffic flow found in microarchitectural behavior. Our experiences indicate an 83% improvement to understanding memory system microarchitecture. We believe the mental models developed by these students will serve them in remembering microarchitectural behavior and extend to learning new microarchitectures more easily. | (pdf) |
A Red Team/Blue Team Assessment of Functional Analysis Methods for Malicious Circuit Identification | Adam Waksman, Jeyavijayan Rajendran, Matthew Suozzo, Simha Sethumadhavan | 2014-03-05 | Recent advances in hardware security have led to the development of FANCI (Functional Analysis for Nearly-Unused Circuit Identification), an analysis tool that identifies stealthy, malicious circuits within hardware designs that can perform malicious backdoor behavior. Evaluations of such tools against benchmarks and academic attacks are not always equivalent to the dynamic attack scenarios that can arise in the real world. For this reason, we apply a red team/blue team approach to stress-test FANCI's abilities to efficiently detect malicious backdoor circuits within hardware designs. In the Embedded Systems Challenge (ESC) 2013, teams from research groups from multiple continents created designs with malicious backdoors hidden in them as part of a red team effort to circumvent FANCI. Notably, these backdoors were not placed into a priori known designs. The red team was allowed to create arbitrary, unspecified designs. Two interesting results came out of this effort. The first was that FANCI was surprisingly resilient to this wide variety of attacks and was not circumvented by any of the stealthy backdoors created by the red teams. The second result is that frequent-action backdoors, which are backdoors that are not made stealthy, were often successful. These results emphasize the importance of combining FANCI with a reasonable degree of validation testing. The blue team efforts also exposed some aspects of the FANCI prototype that make analysis time-consuming in some cases, which motivates further development of the prototype in the future. | (pdf) (ps) |
Unsupervised Anomaly-based Malware Detection using Hardware Features | Adrian Tang, Simha Sethumadhavan, Salvatore Stolfo | 2014-03-01 | Recent works have shown promise in using microarchitectural execution patterns to detect malware programs. These detectors belong to a class of detectors known as signature-based detectors as they catch malware by comparing a program's execution pattern (signature) to execution patterns of known malware programs. In this work, we propose a new class of detectors - anomaly-based hardware malware detectors - that do not require signatures for malware detection, and thus can catch a wider range of malware including potentially novel ones. We use unsupervised machine learning to build profiles of normal program execution based on data from performance counters, and use these profiles to detect significant deviations in program behavior that occur as a result of malware exploitation. We show that real-world exploitation of popular programs such as IE and Adobe PDF Reader on a Windows/x86 platform can be detected with nearly perfect certainty. We also examine the limits and challenges in implementing this approach in face of a sophisticated adversary attempting to evade anomaly-based detection. The proposed detector is complementary to previously proposed signature-based detectors and can be used together to improve security. | (pdf) |
Enabling the Virtual Phones to remotely sense the Real Phones in real-time ~ A Sensor Emulation initiative for virtualized Android-x86 ~ | Raghavan Santhanam | 2014-02-13 | Smartphones nowadays have the ground-breaking features that were only a figment of one’s imagination. For the ever-demanding cellphone users, the exhaustive list of features that a smartphone supports just keeps getting more exhaustive with time. These features aid one’s personal and professional uses as well. Extrapolating into the future the features of a present-day smartphone, the lives of us humans using smartphones are going to be unimaginably agile. With the above said emphasis on the current and future potential of a smartphone, the ability to virtualize smartphones with all their real-world features into a virtual platform, is a boon for those who want to rigorously experiment and customize the virtualized smartphone hardware without spending an extra penny. Once virtualizable independently on a larger scale, the idea of virtualized smartphones with all the virtualized pieces of hardware takes an interesting turn with the sensors being virtualized in a way that’s closer to the real-world behavior. When accessible remotely with the real-time responsiveness, the above mentioned real-world behavior will be a real dealmaker in many real-world systems, namely, the life-saving systems like the ones that instantaneously get alerts about harmful magnetic radiations in the deep mining areas, etc. And these life-saving systems would be installed on a large scale on the desktops or large servers as virtualized smartphones having the added support of virtualized sensors which remotely fetch the real hardware sensor readings from a real smartphone in real-time. Based on these readings the lives working in the affected areas can be alerted and thus saved by the people who are operating the at the desktops or large servers hosting the virtualized smartphones. In addition, the direct and one of the biggest advantages of such a real hardware sensor driven Sensor Emulation in an emulated Android(-x86) environment is that the Android applications that use sensors can now run on the emulator and act under the influence of real hardware sensors’ due to the emulated sensors. The current work of Sensor Emulation is quite unique when compared to the existing and past sensor-related works. The uniqueness comes from the full-fledged sensoremulation in a virtualized smartphone environment as opposed to building some sophisticated physical systems that usually aggregate the sensor readings from the real hardware sensors, might be in a remote manner and in real-time. For example, wireless sensor networks based remote-sensing systems that install real hardware sensors in remote places and have the sensor readings from all those sensors at a centralized server or something similar, for the necessary real-time or offline analysis. In these systems, apart from collecting mere real hardware sensor readings into a centralized entity, nothing more is being achieved unlike in the current work of Sensor Emulation wherein the emulated sensors behave exactly like the remote real hardware sensors. The emulated sensors can be calibrated, speeded up or slowed down(in terms of their sampling frequency), influence the sensor-based application running inside the virtualized smartphone environment exactly as the real hardware sensors of a real phone would do to the sensor-based application running in that real phone. In essence, the current work is more about generalizing the sensors with all its real-world characteristics as far as possible in a virtualized platform than just a framework to send and receive sensor readings over the network between the real and virtual phones. Realizing the useful advantages of Sensor Emulation which is about adding virtualized sensors support to emulated environments, the current work emulates a total of ten sensors present in the real smartphone, Samsung Nexus S, an Android device. Virtual phones run Android-x86 while real phones run Android. The real reason behind choosing Android-x86 for virtual phone is that x86-based Android devices are feature-rich over ARM based ones, for example a full-fledged x86 desktop or a tablet has more features than a relatively small smartphone. Out of the ten, five are real sensors and the rest are virtual or synthetic ones. The real ones are Accelerometer, Magnetometer, Light, Proximity, and Gyroscope whereas the virtual ones are Corrected Gyroscope, Linear Acceleration, Gravity, Orientation, and Rotation Vector. The emulated Android-x86 is of Android release version Jelly Bean 4.3.1 which differs only very slightly in terms of bug fixes from Android Jelly Bean 4.3 running on the real smartphone. One of the noteworthy aspects of the Sensor Emulation accomplished is being demand-less - exactly the same sensor-based Android applications will be able to use the sensors on the real and virtual phones, with absolutely no difference in terms of their sensor-based behavior. The emulation’s core idea is the socket-communication between two modules of Hardware Abstraction Layer(HAL) which is driver-agnostic, remotely over a wireless network in real-time. Apart from a Paired real-device scenario from which the real hardware sensor readings are fetched, the Sensor Emulation also is compatible with a Remote Server Scenario wherein the artificially generated sensor readings are fetched from a remote server. Due to the Sensor Emulation having been built on mereend-to-end socket-communication, it’s logical and obvious to see that the real and virtual phones can run different Android(-x86) releases with no real impact on the Sensor Emulation being accomplished. Sensor Emulation once completed was evaluated for each of the emulated sensors using applications from Android Market as well as Amazon Appstore. The applications category include both the basic sensor-test applications that show raw sensor readings, as well as the advanced 3D sensor-driven games which are emulator compatible, especially in terms of the graphics. The evaluations proved the current work of Sensor Emulation to be generic, efficient, robust, fast, accurate, and real. As of this writing i.e., January 2014, the current work of Sensor Emulation is the sole system-level sensor virtualization work that embraces remoteness in real-time for the emulated Android-x86 systems. It is important to note that though the current work is targeted for Android-x86, the code written for the current work makes no assumptions about underlying platform to be an x86 one. Hence, the work is also logically seen as compatible with ARM based emulated Android environment though not actually tested. | (pdf) |
Towards A Dynamic QoS-aware Over-The-Top Video Streaming in LTE | Hyunwoo Nam, Kyung Hwa Kim, Bong Ho Kim, Doru Calin, Henning Schulzrinne | 2014-01-16 | We present a study of traffic behavior of two popular over-the-top (OTT) video streaming services (YouTube and Netflix). Our analysis is conducted on different mobile devices (iOS and Android) over various wireless networks (Wi-Fi, 3G and LTE) under dynamic network conditions. Our measurements show that the video players frequently discard a large amount of video content although it is successfully delivered to a client. We first investigate the root cause of this unwanted behavior. Then, we propose a Quality-of-Service (QoS)-aware video streaming architecture in Long Term Evolution (LTE) networks to reduce the waste of network resource and improve user experience. The architecture includes a selective packet discarding mechanism, which can be placed in packet data network gateways (P-GW). In addition, our QoS-aware rules assist video players in selecting an appropriate resolution under a fluctuating channel condition. We monitor network condition and configure QoS parameters to control availability of the maximum bandwidth in real time. In our experimental setup, the proposed platform shows up to 20.58% improvement in saving downlink bandwidth and improves user experience by reducing buffer underflow period to an average of 32 seconds. | (pdf) |
Towards Dynamic Network Condition-Aware Video Server Selection Algorithms over Wireless Networks | Hyunwoo Nam, Kyung-Hwa Kim, Doru Calin, Henning Schulzrinne | 2014-01-16 | We investigate video server selection algorithms in a distributed video-on-demand system. We conduct a detailed study of the YouTube Content Delivery Network (CDN) on PCs and mobile devices over Wi-Fi and 3G networks under varying network conditions. We proved that a location-aware video server selection algorithm assigns a video content server based on the network attachment point of a client. We found out that such distance-based algorithms carry the risk of directing a client to a less optimal content server, although there may exist other better performing video delivery servers. In order to solve this problem, we propose to use dynamic network information such as packet loss rates and Round Trip Time (RTT)between an edge node of an wireless network (e.g., an Internet Service Provider (ISP) router in a Wi-Fi network and a Radio Network Controller (RNC) node in a 3G network) and video content servers, to find the optimal video content server when a video is requested. Our empirical study shows that the proposed architecture can provide higher TCP performance, leading to better viewing quality compared to location-based video server selection algorithms. | (pdf) |
Approximating the Bethe partition function | Adrian Weller, Tony Jebara | 2013-12-30 | When belief propagation (BP) converges, it does so to a stationary point of the Bethe free energy $\F$, and is often strikingly accurate. However, it may converge only to a local optimum or may not converge at all. An algorithm was recently introduced for attractive binary pairwise MRFs which is guaranteed to return an $\eps$-approximation to the global minimum of $\F$ in polynomial time provided the maximum degree $\Delta=O(\log n)$, where $n$ is the number of variables. Here we significantly improve this algorithm and derive several results including a new approach based on analyzing first derivatives of $\F$, which leads to performance that is typically far superior and yields a fully polynomial-time approximation scheme (FPTAS) for attractive models without any degree restriction. Further, the method applies to general (non-attractive) models, though with no polynomial time guarantee in this case, leading to the important result that approximating $\log$ of the Bethe partition function, $\log Z_B=-\min \F$, for a general model to additive $\epsilon$-accuracy may be reduced to a discrete MAP inference problem. We explore an application to predicting equipment failure on an urban power network and demonstrate that the Bethe approximation can perform well even when BP fails to converge. | (pdf) |
A Gameful Approach to Teaching Software Design and Software Testing | Swapneel Sheth, Jonathan Bell, Gail Kaiser | 2013-12-13 | Introductory computer science courses traditionally focus on exposing students to basic programming and computer science theory, leaving little or no time to teach students about software testing. A lot of students’ mental model when they start learning programming is that “if it compiles and runs without crashing, it must work fine.†Thus exposure to testing, even at a very basic level, can be very beneficial to the students. In the short term, they will do better on their assignments as testing before submission might help them discover bugs in their implementation that they hadn’t realized. In the long term, they will appreciate the importance of testing as part of the software development life cycle. | (pdf) |
A Gameful Approach to Teaching Software Design and Software Testing --- Assignments and Quests | Swapneel Sheth, Jonathan Bell, Gail Kaiser | 2013-12-11 | We describe how we used HALO in a CS2 classroom and include the assignments and quests created. | (pdf) |
Heterogeneous Access: Survey and Design Considerations | Amandeep Singh, Gaston Ormazabal, Sateesh Addepalli, Henning Schulzrinne | 2013-10-25 | As voice, multimedia, and data services are converging to IP, there is a need for a new networking architecture to support future innovations and applications. Users are consuming Internet services from multiple devices that have multiple network interfaces such as Wi-Fi, LTE, Bluetooth, and possibly wired LAN. Such diverse network connectivity can be used to increase both reliability and performance by running applications over multiple links, sequentially for seamless user experience, or in parallel for bandwidth and performance enhancements. The existing networking stack, however, offers almost no support for intelligently exploiting such network, device, and location diversity. In this work, we survey recently proposed protocols and architectures that enable heterogeneous networking support. Upon evaluation, we abstract common design patterns and propose a unified networking architecture that makes better use of a heterogeneous dynamic environment, both in terms of networks and devices. The architecture enables mobile nodes to make intelligent decisions about how and when to use each or a combination of networks, based on access policies. With this new architecture, we envision a shift from current applications, which support a single network, location, and device at a time to applications that can support multiple networks, multiple locations, and multiple devices. | (pdf) |
Functioning Hardware from Functional Programs | Stephen A. Edwards | 2013-10-08 | To provide high performance at practical power levels, tomorrow's chips will have to consist primarily of application-specific logic that is only powered on when needed. This paper discusses synthesizing such logic from the functional language Haskell. The proposed approach, which consists of rewriting steps that ultimately dismantle the source program into a simple dialect that enables a syntax-directed translation to hardware, enables aggressive parallelization and the synthesis of application-specific distributed memory systems. Transformations include scheduling arithmetic operations onto specific data paths, replacing recursion with iteration, and improving data locality by inlining recursive types. A compiler based on these principles is under development. | (pdf) |
N Heads are Better than One | Morris Hopkins, Mauricio Castaneda, Swapneel Sheth, Gail Kaiser | 2013-10-04 | Social network platforms have transformed how people communicate and share information. However, as these platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 50 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 80% of users found visualizations using crowd sourced data useful for understanding privacy settings, and 70% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls. | (pdf) |
Us and Them --- A Study of Privacy Requirements Across North America, Asia, and Europe | Swapneel Sheth, Gail Kaiser, Walid Maalej | 2013-09-15 | Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. However, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and personal data such as location than their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives. | (pdf) |
Unit Test Virtualization with VMVM | Jonathan Bell, Gail Kaiser | 2013-09-13 | Testing large software packages can become very time intensive. To address this problem, researchers have investigated techniques such as Test Suite Minimization. Test Suite Minimization reduces the number of tests in a suite by removing tests that appear redundant, at the risk of a reduction in fault-finding ability since it can be difficult to identify which tests are truly redundant. We take a completely different approach to solving the same problem of long running test suites by instead reducing the time needed to execute each test, an approach that we call Unit Test Virtualization. With Unit Test Virtualization, we reduce the overhead of isolating each unit test with a lightweight virtualization container. We describe the empirical analysis that grounds our approach and provide an implementation of Unit Test Virtualization targeting Java applications. We evaluated our implementation, VMVM, using 20 real-world Java applications and found that it reduces test suite execution time by up to 97% (on average, 62%) when compared to traditional unit test execution. We also compared VMVM to a well known Test Suite Minimization technique, finding the reduction provided by VMVM to be four times greater, while still executing every test with no loss of fault-finding ability. | (pdf) |
Metamorphic Runtime Checking of Applications without Test Oracles | Christian Murphy, Gail Kaiser, Jonathan Bell, Fang-Hsiang Su | 2013-09-13 | Challenges arise in testing applications that do not have test oracles, i.e., for which it is impossible or impractical to know what the correct output should be for general input. Metamorphic testing, introduced by Chen et al., has been shown to be a simple yet effective technique in testing these types of applications: test inputs are transformed in such a way that it is possible to predict the expected change to the output, and if the output resulting from this transformation is not as expected, then a fault must exist. Here, we improve upon previous work by presenting a new technique called Metamorphic Runtime Checking, which automatically conducts metamorphic testing of both the entire application and individual functions during a program's execution. This new approach improves the scope, scale, and sensitivity of metamorphic testing by allowing for the identification of more properties and execution of more tests, and increasing the likelihood of detecting faults not found by application-level properties. We also present the results of new mutation analysis studies that demonstrate that Metamorphic Runtime Checking can kill an average of 170% more mutants than traditional, application-level metamorphic testing alone, and advances the state of the art in testing applications without oracles. | (pdf) |
Effectiveness of Teaching Metamorphic Testing, Part II | Kunal Swaroop Mishra, Gail E. Kaiser, Swapneel K. Sheth | 2013-07-31 | We study the ability of students in a senior/graduate software engineering course to understand and apply metamorphic testing, a relatively recently invented advance in software testing research that complements conventional approaches such as equivalence partitioning and boundary analysis. We previously reported our investigation of the fall 2011 offering of the Columbia University course COMS W4156 Advanced Software Engineering, and here report on the fall 2012 offering and contrast it to the previous year. Our main findings are: 1) Although the students in the second offering did not do very well on the newly added individual assignment specifically focused on metamorphic testing, thereafter they were better able to find metamorphic properties for their team projects than the students from the previous year who did not have that preliminary homework and, perhaps most significantly, did not have the solution set for that homework. 2) Students in the second offering did reasonably well using the relatively novel metamorphic testing technique vs. traditional black box testing techniques in their projects (such comparison data is not available for the first offering). 3) Finally, in both semesters, the majority of the student teams were able to apply metamorphic testing to their team projects after only minimal instruction, which would imply that metamorphic testing is a viable strategy for student testers. | (pdf) |
On the Effectiveness of Traffic Analysis Against Anonymity Networks Using Flow Records | Sambuddho Chakravarty, Marco V. Barbera, Georgios Portokalidis, Michalis Polychronakis, Angelos D. Keromytis | 2013-07-18 | Low-latency anonymous communication networks, such as Tor, are geared towards web browsing, instant messaging, and other semi-interactive applications. To achieve acceptable quality of service, these systems attempt to preserve packet inter-arrival characteristics, such as inter-packet delay. Consequently, a powerful adversary can mount traffic analysis attacks by observing similar traffic patterns at various points of the network, linking together otherwise unrelated network connections. Previous research has shown that having access to a few Internet exchange points is enough for monitoring a significant percentage of the network paths from Tor nodes to destination servers. Although the capacity of current networks makes packet-level monitoring at such a scale quite challenging, adversaries could potentially use less accurate but readily available traffic monitoring functionality, such as Cisco's NetFlow, to mount large-scale traffic analysis attacks. In this paper, we assess the feasibility and effectiveness of practical traffic analysis attacks against the Tor network using NetFlow data. We present an active traffic analysis method based on deliberately perturbing the characteristics of user traffic at the server side, and observing a similar perturbation at the client side through statistical correlation. We evaluate the accuracy of our method using both in-lab testing, as well as data gathered from a public Tor relay serving hundreds of users. Our method revealed the actual sources of anonymous traffic with 100% accuracy for the in-lab tests, and achieved an overall accuracy of about 81.4% for the real-world experiments, with an average false positive rate of 6.4%. | (pdf) |
A Mobile Video Traffic Analysis: Badly Designed Video Clients Can Waste Network Bandwidth | Hyunwoo Nam, Bong Ho Kim, Doru Calin, Henning Schulzrinne | 2013-07-08 | Video streaming on mobile devices is on the rise. According to recent reports, mobile video streaming traffic accounted for 52.8% of total mobile data traffic in 2011, and it is forecast to reach 66.4% in 2015. We analyzed the network traffic behaviors of the two most popular HTTP-based video streaming services: YouTube and Netflix. Our research indicates that the network traffic behavior depends on factors such as the type of device, multimedia applications in use and network conditions. Furthermore, we found that a large part of the downloaded video content can be unaccepted by a video player even though it is successfully delivered to a client. This unwanted behavior often occurs when the video player changes the resolution in a fluctuating network condition and the playout buffer is full while downloading a video. Some of the measurements show that the discarded data may exceed 35% of the total video content. | (pdf) |
Energy Secure Architecture: A wish list | Simha Sethumadhavan | 2013-06-23 | Energy optimizations are being aggressively pursued today. Can these optimizations open up security vulnerabilities? In this invited talk at the Energy Secure System Architectures Workshop (run by Pradip Bose from IBM Watson research center) I discussed security implications of energy optimizations, capabilities of attackers, ease of exploitation, and potential payoff to the attacker. I presented a mini tutorial on security for computer architects, and a personal research wish list for this emerging topic. | (pdf) |
Principles and Techniques of Schlieren Imaging Systems | Amrita Mazumdar | 2013-06-19 | This paper presents a review of modern-day schlieren optics system and its application. Schlieren imaging systems provide a powerful technique to visualize changes or nonuniformities in refractive index of air or other transparent media. With the popularization of computational imaging techniques and widespread availability of digital imaging systems, schlieren systems provide novel methods of viewing transparent fluid dynamics. This paper presents a historical background of the technique, describes the methodology behind the system, presents a mathematical proof of schlieren fundamentals, and lists various recent applications and advancements in schlieren studies. | (pdf) |
WiSlow: A WiFi Network Performance Troubleshooting Tool for End Users | Kyung Hwa Kim, Hyunwoo Nam, Henning Schulzrinne | 2013-05-29 | The increasing number of 802.11 APs and wireless devices results in more contention, which causes unsatisfactory WiFi network performance. In addition, non-WiFi devices sharing the same spectrum with 802.11 networks such as microwave ovens, cordless phones, and baby monitors severely interfere with WiFi networks. Although the problem sources can be easily removed in many cases, it is difficult for end users to identify the root cause. We introduce WiSlow, a software tool that diagnoses the root causes of poor WiFi performance with user-level network probes and leverages peer collaboration to identify the location of the causes. We elaborate on two main methods: packet loss analysis and 802.11 ACK pattern analysis. | (pdf) |
Connecting the Physical World with Arduino in SECE | Hyunwoo Nam, Jan Janak, Henning Schulzrinne | 2013-05-23 | The Internet of Things (IoT) enables the physical world to be connected and controlled over the Internet. This paper presents a smart gateway platform that connects everyday objects such as lights, thermometers, and TVs over the Internet. The proposed hardware architecture is implemented on an Arduino platform with a variety of off the shelf home automation technologies such as Zigbee and X10. Using the microcontroller-based platform, the SECE (Sense Everything, Control Everything) system allows users to create various IoT services such as monitoring sensors, controlling actuators, triggering action events, and periodic sensor reporting. We give an overview of the Arduino-based smart gateway architecture and its integration into SECE. | (pdf) |
Chameleon: Multi-Persona Binary Compatibility for Mobile Devices | Jeremy Andrus, Alexander Van't Hof, Naser AlDuaij, Christoffer Dall, Nicolas Viennot, Jason Nieh | 2013-04-08 | Mobile devices are vertically integrated systems that are powerful, useful platforms, but unfortunately limit user choice and lock users and developers into a particular mobile ecosystem, such as iOS or Android. We present Chameleon, a multi-persona binary compatibility architecture that allows mobile device users to run applications built for different mobile ecosystems together on the same smartphone or tablet. Chameleon enhances the domestic operating system of a device with personas to mimic the application binary interface of a foreign operating system to run unmodified foreign binary applications. To accomplish this without reimplementing the entire foreign operating system from scratch, Chameleon provides four key mechanisms. First, a multi-persona binary interface is used that can load and execute both domestic and foreign applications that use different sets of system calls. Second, compile-time code adaptation makes it simple to reuse existing unmodified foreign kernel code in the domestic kernel. Third, API interposition and passport system calls make it possible to reuse foreign user code together with domestic kernel facilities to support foreign kernel functionality in user space. Fourth, schizophrenic processes allow foreign applications to use domestic libraries to access proprietary software and hardware interfaces on the device. We have built a Chameleon prototype and demonstrate that it imposes only modest performance overhead and can run iOS applications from the Apple App Store together with Android applications from Google Play on a Nexus 7 tablet running the latest version of Android. | (pdf) |
KVM/ARM: Experiences Building the Linux ARM Hypervisor | Christoffer Dall, Jason Nieh | 2013-04-05 | As ARM CPUs become increasingly common in mobile devices and servers, there is a growing demand for providing the benefits of virtualization for ARMbased devices. We present our experiences building the Linux ARM hypervisor, KVM/ARM, the first full system ARM virtualization solution that can run unmodified guest operating systems on ARM multicore hardware. KVM/ARM introduces split-mode virtualization, allowing a hypervisor to split its execution across CPU modes to take advantage of CPU mode-specific features. This allows KVM/ARM to leverage Linux kernel services and functionality to simplify hypervisor development and maintainability while utilizing recent ARM hardware virtualization extensions to run application workloads in guest operating systems with comparable performance to native execution. KVM/ARM has been successfully merged into the mainline Linux 3.9 kernel, ensuring that it will gain wide adoption as the virtualization platform of choice for ARM. We provide the first measurements on real hardware of a complete hypervisor using ARM hardware virtualization support. Our results demonstrate that KVM/ARM has modest virtualization performance and power costs, and can achieve lower performance and power costs compared to x86-based Linux virtualization on multicore hardware. | (pdf) |
FARE: A Framework for Benchmarking Reliability of Cyber-Physical Systems | Leon Wu, Gail Kaiser | 2013-04-01 | A cyber-physical system (CPS) is a system featuring a tight combination of, and coordination between, the system’s computational and physical elements. System reliability is a critical requirement of cyber-physical systems. An unreliable CPS often leads to system malfunctions, service disruptions, financial losses and even human life. Improving CPS reliability requires an objective measurement, estimation and comparison of the CPS system reliability. This paper describes FARE (Failure Analysis and Reliability Estimation), a framework for benchmarking reliability of cyber-physical systems. Some prior researches have proposed reliability benchmark for some specific CPS such as wind power plant and wireless sensor networks. There were also some prior researches on the components of CPS such as software and some specific hardware. But according to the best of our knowledge, there isn’t any reliability benchmark framework for CPS in general. FARE framework provides a CPS reliability model, a set of methods and metrics on the evaluation environment selection, failure analysis and reliability estimation for benchmarking CPS reliability. It not only provides a retrospect evaluation and estimation of the CPS system reliability using the past data, but also provides a mechanism for continuous monitoring and evaluation of CPS reliability for runtime enhancement. The framework is extensible for accommodating new reliability measurement techniques and metrics. It is also generic and applicable to a wide range of CPS applications. For empirical study, we applied the FARE framework on a smart building management system for a large commercial building in New York City. Our experiments showed that FARE is easy to implement, accurate for comparison and can be used for building useful industry benchmarks and standards after accumulating enough data. | (pdf) |
Additional remarks on designing category-level attributes for discriminative visual recognition | Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang | 2013-03-10 | This is the supplementary material for the paper Designing Category-Level Attributes for Discriminative Visual Recognition. | (pdf) |
Make Parallel Programs Reliable with Stable Multithreading | Junfeng Yang, Heming Cui, Jingyue Wu, John Gallagher, Chia-Che Tsai, Huayang Guo | 2013-02-20 | Our accelerating computational demand and the rise of multicore hardware have made parallel programs increasingly pervasive and critical. Yet, these programs remain extremely difficult to write, test, analyze, debug, and verify. In this article, we provide our view on why parallel programs, specifically multithreaded programs, are difficult to get right.We present a promising approach we call stable multithreading to dramatically improve reliability, and summarize our last four years’ research on building and applying stable multithreading systems. | (pdf) |
A Finer Functional Fibonacci on a Fast FPGA | Stephen A. Edwards | 2013-02-13 | Through a series of mechanical, semantics-preserving transformations, I show how a three-line recursive Haskell program (Fibonacci) can be transformed to a hardware description language -- Verilog -- that can be synthesized on an FPGA. This report lays groundwork for a compiler that will perform this transformation automatically. | (pdf) |
Cost and Scalability of Hardware Encryption Techniques | Adam Waksman, Simha Sethumadhavan | 2013-02-06 | We discuss practical details and basic scalability for two recent ideas for hardware encryption for trojan prevention. The broad idea is to encrypt the data used as inputs to hardware circuits to make it more difficult for malicious attackers to exploit hardware trojans. The two methods we discuss are data obfuscation and fully homomorphic encryption (FHE). Data obfuscation is a technique wherein specific data inputs are encrypted so that they can be operated on within a hardware module without exposing the data itself to the hardware. FHE is a technique recently discovered to be theoretically possible. With FHE, not only the data but also the operations and the entire circuit are encrypted. FHE primarily exists as a theoretical construct currently. It has been shown that it can theoretically be applied to any program or circuit. It has also been applied in a limited respect to some software. Some initial algorithms for hardware applications have been proposed. We find that data obfuscation is efficient enough to be immediately practical, while FHE is not yet in the practical realm. There are also scalability concerns regarding current algorithms for FHE. | (pdf) |
Societal Computing - Thesis Proposal | Swapneel Sheth | 2013-01-30 | As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, \emph{Societal Computing}, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole. This thesis will consist of the following four projects that aim to address the issues of Societal Computing. First, privacy in the context of ubiquitous social computing systems has become a major concern for society at large. As the number of online social computing systems that collect user data grows, concerns with privacy are further exacerbated. Examples of such online systems include social networks, recommender systems, and so on. Approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable --- systems where privacy could be achieved ``for free,'' \ie without having to spend extra computational effort. We describe how privacy can indeed be achieved for free --- an accidental and beneficial side effect of doing some existing computation --- in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate. Second, we aim to understand what the expectations and needs to end-users and software developers are, with respect to privacy in social systems. Some questions that we want to answer are: Do end-users care about privacy? What aspects of privacy are the most important to end-users? Do we need different privacy mechanisms for technical vs. non-technical users? Should we customize privacy settings and systems based on the geographic location of the users? We have created a large scale user study using an online questionnaire to gather privacy requirements from a variety of stakeholders. We also plan to conduct follow-up semi-structured interviews. This user study will help us answer these questions. Third, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. Our approach is to use crowdsourcing to find out how other users deal with privacy and what settings are commonly used to give users feedback on aspects like how public/private their settings are, what common settings are typically used by others, where do a certain users' settings differ from a trusted group of friends, etc. We have a large dataset of privacy settings for over 500 users on Facebook and we plan to create a user study that will use the data to make privacy settings more understandable. Finally, end-users of such systems find it increasingly hard to understand complex privacy settings. As software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by \emph{end-users} for detecting changes in privacy, \ie regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective. | (pdf) |
Finding 9-1-1 Callers in Tall Buildings | Wonsang Song, Jae Woo Lee, Byung Suk Lee, Henning Schulzrinne | 2013-01-23 | Accurately determining a user's floor location is essential for minimizing delays in emergency response. This paper presents a floor localization system intended for emergency calls. We aim to provide floor-level accuracy with minimum infrastructure support. Our approach is to use multiple sensors, all available in today's smartphones, to trace a user's vertical movements inside buildings. We make three contributions. First, we present a hybrid architecture for floor localization with emergency calls in mind. The architecture combines beacon-based infrastructure and sensor-based dead reckoning, striking the right balance between accurately determining a user's location and minimizing the required infrastructure. Second, we present the elevator module for tracking a user's movement in an elevator. The elevator module addresses three core challenges that make it difficult to accurately derive displacement from acceleration. Third, we present the stairway module which determines the number of floors a user has traveled on foot. Unlike previous systems that track users' foot steps, our stairway module uses a novel landing counting technique. | (pdf) |
Effective Dynamic Detection of Alias Analysis Errors | Jingyue Wu, Gang Hu, Yang Tang, Junfeng Yang | 2013-01-23 | Alias analysis is perhaps one of the most crucial and widely used analyses, and has attracted tremendous research efforts over the years. Yet, advanced alias analyses are extremely difficult to get right, and the bugs in these analyses are most likely the reason that they have not been adopted to production compilers. This paper presents NEONGOBY, a system for effectively detecting errors in alias analysis implementations, improving their correctness and hopefully widening their adoption. NEONGOBY works by dynamically observing pointer addresses during the execution of a test program and then checking these addresses against an alias analysis for errors. It is explicitly designed to (1) be agnostic to the alias analysis it checks for maximum applicability and ease of use and (2) detect alias analysis errors that manifest on real-world programs and workloads. It reduces false positives and performance overhead using a practical selection of techniques. Evaluation on three popular alias analyses and real-world programs Apache and MySQL shows that NEONGOBY effectively finds 29 alias analysis bugs with only 2 false positives and reasonable overhead. To enable alias analysis builders to start using NEONGOBY today, we have released it open-source at https://github.com/wujingyue/neongoby, along with our error detection results and proposed patches. | (pdf) |
Bethe Bounds and Approximating the Global Optimum | Adrian Weller, Tony Jebara | 2012-12-31 | Inference in general Markov random fields (MRFs) is NP-hard, though identifying the maximum a posteriori (MAP) configuration of pairwise MRFs with submodular cost functions is efficiently solvable using graph cuts. Marginal inference, however, even for this restricted class, is in \#P. We prove new formulations of derivatives of the Bethe free energy, provide bounds on the derivatives and bracket the locations of stationary points, introducing a new technique called Bethe bound propagation. Several results apply to pairwise models whether associative or not. Applying these to discretized pseudo-marginals in the associative case we present a polynomial time approximation scheme for global optimization provided the maximum degree is $O(\log n)$, and discuss several extensions. | (pdf) |
Reconstructing Pong on an FPGA | Stephen A. Edwards | 2012-12-27 | I describe in detail the circuitry of the original 1972 Pong video arcade game and how I reconstructed it on an FPGA -- a modern-day programmable logic device. In the original circuit, I discover some sloppy timing and a previously unidentified bug that subtly affected gameplay. I emulate the quasi-synchronous behavior of the original circuit by running a synchronous ``simulation'' circuit with a 2X clock and replacing each flip-flop with a circuit that effectively simulates one. The result is an accurate reproduction that exhibits many idiosyncracies of the original. | (pdf) |
Focal Sweep Camera for Space-Time Refocusing | Changyin Zhou, Daniel Miau, Shree K. Nayar | 2012-11-29 | A conventional camera has a limited depth of field (DOF), which often results in defocus blur and loss of image detail. The technique of image refocusing allows a user to interactively change the plane of focus and DOF of an image after it is captured. One way to achieve refocusing is to capture the entire light field. But this requires a significant compromise of spatial resolution. This is because of the dimensionality gap - the captured information (a light field) is 4-D, while the information required for refocusing (a focal stack) is only 3-D. In this paper, we present an imaging system that directly captures a focal stack by physically sweeping the focal plane. We first describe how to sweep the focal plane so that the aggregate DOF of the focal stack covers the entire desired depth range without gaps or overlaps. Since the focal stack is captured in a duration of time when scene objects can move, we refer to the captured focal stack as a duration focal stack. We then propose an algorithm for computing a space-time in-focus index map from the focal stack, which represents the time at which each pixel is best focused. The algorithm is designed to enable a seamless refocusing experience, even for textureless regions and at depth discontinuities. We have implemented two prototype focal-sweep cameras and captured several duration focal stacks. Results obtained using our method can be viewed at www.focalsweep.com. | (pdf) |
Effectiveness of Teaching Metamorphic Testing | Kunal Swaroop Mishra, Gail Kaiser | 2012-11-15 | This paper is an attempt to understand the effectiveness of teaching metamorphic properties in a senior/graduate software engineering course classroom environment through gauging the success achieved by students in identifying these properties on the basis of the lectures and materials provided in class. The main findings were: (1) most of the students either misunderstood what metamorphic properties are or fell short of identifying all the metamorphic properties in their respective projects, (2) most of the students that were successful in finding all the metamorphic properties in their respective projects had incorporated certain arithmetic rules into their project logic, and (3) most of the properties identified were numerical metamorphic properties. A possible reason for this could be that the two relevant lectures given in class cited examples of metamorphic properties that were based on numerical properties. Based on the findings of the case study, pertinent suggestions were made in order to improve the impact of lectures provided for Metamorphic Testing. | (pdf) |
A Competitive-Collaborative Approach for Introducing Software Engineering in a CS2 Class | Swapneel Sheth, Jonathan Bell, Gail Kaiser | 2012-11-05 | Introductory Computer Science (CS) classes are typically competitive in nature. The cutthroat nature of these classes comes from students attempting to get as high a grade as possible, which may or may not correlate with actual learning. Further, there is very little collaboration allowed in most introductory CS classes. Most assignments are completed individually since many educators feel that students learn the most, especially in introductory classes, by working alone. In addition to completing ``normal'' individual assignments, which have many benefits, we wanted to expose students to collaboration early (via, for example, team projects). In this paper, we describe how we leveraged competition and collaboration in a CS2 to help students learn aspects of computer science better --- in this case, good software design and software testing --- and summarize student feedback. | (pdf) |
Increasing Student Engagement in Software Engineering with Gamification | Swapneel Sheth, Jonathan Bell, Gail Kaiser | 2012-11-05 | Gamification, or the use of game elements in non-game contexts, has become an increasingly popular approach to increasing end-user engagement in many contexts, including employee productivity, sales, recycling, and education. Our preliminary work has shown that gamification can be used to boost student engagement and learning in basic software testing. We seek to expand our gamified software engineering approach to motivate other software engineering best practices. We propose to build a game layer on top of traditional continuous integration technologies to increase student engagement in development, documentation, bug reporting, and test coverage. This poster describes to our approach and presents some early results showing feasibility. | (pdf) |
Improving Vertical Accuracy of Indoor Positioning for Emergency Communication | Wonsang Song, Jae Woo Lee, Byung Suk Lee, Henning Schulzrinne | 2012-10-30 | The emergency communication systems are undergoing a transition from the PSTN-based legacy system to an IP-based next generation system. In the next generation system, GPS accurately provides a user's location when the user makes an emergency call outdoors using a mobile phone. Indoor positioning, however, presents a challenge because GPS does not generally work indoors. Moreover, unlike outdoors, vertical accuracy is critical indoors because an error of few meters will send emergency responders to a different floor in a building. This paper presents an indoor positioning system which focuses on improving the accuracy of vertical location. We aim to provide floor-level accuracy with minimal infrastructure support. Our approach is to use multiple sensors available in today's smartphones to trace users' vertical movements inside buildings. We make three contributions. First, we present the elevator module for tracking a user's movement in elevators. The elevator module addresses three core challenges that make it difficult to accurately derive displacement from acceleration. Second, we present the stairway module which determines the number of floors a user has traveled on foot. Unlike previous systems that track users' foot steps, our stairway module uses a novel landing counting technique. Third, we present a hybrid architecture that combines the sensor-based components with minimal and practical infrastructure. The infrastructure provides initial anchor and periodic corrections of a user's vertical location indoors. The architecture strikes the right balance between the accuracy of location and the feasibility of deployment for the purpose of emergency communication. | (pdf) |
An Autonomic Reliability Improvement System for Cyber-Physical Systems | Leon Wu, Gail Kaiser | 2012-09-17 | System reliability is a fundamental requirement of cyber-physical systems. Unreliable systems can lead to disruption of service, financial cost and even loss of human life. Typical cyber-physical systems are designed to process large amounts of data, employ software as a system component, run online continuously and retain an operator-in-the-loop because of human judgment and accountability requirements for safety-critical systems. This paper describes a data-centric runtime monitoring system named ARIS (Autonomic Reliability Improvement System) for improving the reliability of these types of cyber-physical systems. ARIS employs automated online evaluation, working in parallel with the cyber-physical system to continuously conduct automated evaluation at multiple stages in the system workflow and provide real-time feedback for reliability improvement. This approach enables effective evaluation of data from cyber-physical systems. For example, abnormal input and output data can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop, who can then take actions and make changes to the system based on these alerts in order to achieve minimal system downtime and higher system reliability. We have implemented ARIS in a large commercial building cyber-physical system in New York City, and our experiment has shown that it is effective and efficient in improving building system reliability. | (pdf) |
Hardware-Accelerated Range Partitioning | Lisa Wu, Raymond J. Barker, Martha A. Kim, Kenneth A. Ross | 2012-09-05 | With global pool of data growing at over 2.5 quinitillion bytes per day and over 90% of all data in existence created in the last two years alone [23], there can be little doubt that we have entered the big data era. This trend has brought database performance to the forefront of high throughput, low energy system design. This paper explores targeted deploy- ment of hardware accelerators to improve the throughput and efficiency of database processing. Partitioning, a critical operation when manipulating large data sets, is often the limiting factor in database performance, and represents a significant amount of the overall runtime of database processing workloads. This paper describes a hardware-software streaming framework and a hardware accelerator for range partitioning, or HARP. The streaming framework offers seamless execution environment for database processing elements such as HARP. HARP offers performance, as well as orders of magnitude gains in power and area efficiency. A detailed analysis of a 32nm physical design shows 9.3 times the throughput of a highly optimized and optimistic software implementation, while consuming just 3.6% of the area and 2.6% of the power of a single Xeon core in the same technology generation. | (pdf) |
End-User Regression Testing for Privacy | Swapneel Sheth, Gail Kaiser | 2012-08-25 | Privacy in social computing systems has become a major concern. End-users of such systems find it increasingly hard to understand complex privacy settings. As software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective. | (pdf) |
Chronicler: Lightweight Recording to Reproduce Field Failures | Jonathan Bell, Nikhil Sarda, Gail Kaiser | 2012-08-23 | When programs fail in the field, developers are often left with limited information to diagnose the failure. Automated error reporting tools can assist in bug report generation but without precise steps from the end user it is often difficult for developers to recreate the failure. Advanced remote debugging tools aim to capture sufficient information from field executions to recreate failures in the lab but often have too much overhead to practically deploy. We present CHRONICLER, an approach to remote debugging that captures non-deterministic inputs to applications in a lightweight manner, assuring faithful reproduction of client executions. We evaluated CHRONICLER by creating a Java implementation, CHRONICLERJ, and then by using a set of benchmarks mimicking real world applications and workloads, showing its runtime overhead to be under 10% in most cases (worst case 86%), while an existing tool showed overhead over 100% in the same cases (worst case 2,322%). | (pdf) |
Statically Unrolling Recursion to Improve Opportunities for Parallelism | Neil Deshpande, Stephen A. Edwards | 2012-07-13 | We present an algorithm for unrolling recursion in the Haskell functional language. Adapted from a similar algorithm proposed by Rugina and Rinard for imperative languages, it essentially inlines a function in itself as many times as requested. This algorithm aims to increase the available parallelism in recursive functions, with an eye toward its eventual application in a Haskell-to-hardware compiler. We first illustrate the technique on a series of examples, then describe the algorithm, and finally show its Haskell source, which operates as a plug-in for the Glasgow Haskell Compiler. | (pdf) |
Functional Fibonacci to a Fast FPGA | Stephen A. Edwards | 2012-06-16 | Through a series of mechanical transformation, I show how a three-line recursive Haskell function (Fibonacci) can be translated into a hardware description language -- VHDL -- for efficient execution on an FPGA. The goal of this report is to lay the groundwork for a compiler that will perform these transformations automatically, hence the development is deliberately pedantic. | (pdf) |
High Throughput Heavy Hitter Aggregation | Orestis Polychroniou, Kenneth A. Ross | 2012-05-15 | Heavy hitters are data items that occur at high frequency in a data set. Heavy hitters are among the most important items for an organization to summarize and understand dur- ing analytical processing. In data sets with sufficient skew, the number of heavy hitters can be relatively small. We take advantage of this small footprint to compute aggregate functions for the heavy hitters in fast cache memory. We design cache-resident, shared-nothing structures that hold only the most frequent elements from the table. Our approach works in three phases. It first samples and picks heavy hitter candidates. It then builds a hash table and computes the exact aggregates of these candidates. Finally, if necessary, a validation step identifies the true heavy hitters from among the candidates based on the query specification. We identify trade-offs between the hash table capacity and performance. Capacity determines how many candidates can be aggregated. We optimize performance by the use of perfect hashing and SIMD instructions. SIMD instructions are utilized in novel ways to minimize cache accesses, be- yond simple vectorized operations. We use bucketized and cuckoo hash tables to increase capacity, to adapt to different datasets and query constraints. The performance of our method is an order of magnitude faster than in-memory aggregation over a complete set of items if those items cannot be cache resident. Even for item sets that are cache resident, our SIMD techniques enable significant performance improvements over previous work. | (pdf) |
Improving Efficiency and Reliability of Building Systems Using Machine Learning and Automated Online Evaluation | Leon Wu, Gail Kaiser, David Solomon, Rebecca Winter, Albert Boulanger, Roger Anderson | 2012-05-04 | A high percentage of newly-constructed commercial office buildings experience energy consumption that exceeds specifications and system failures after being put into use. This problem is even worse for older buildings. We present a new approach, ‘predictive building energy optimization’, which uses machine learning (ML) and automated online evaluation of historical and real-time building data to improve efficiency and reliability of building operations without requiring large amounts of additional capital investment. Our ML approach uses a predictive model to generate accurate energy demand forecasts and automated analyses that can guide optimization of building operations. In parallel, an automated online evaluation system monitors efficiency at multiple stages in the system workflow and provides building operators with continuous feedback. We implemented a prototype of this application in a large commercial building in Manhattan. Our predictive machine learning model applies Support Vector Regression (SVR) to the building’s historical energy use and temperature and wet-bulb humidity data from the building’s interior and exterior in order to model performance for each day. This predictive model closely approximates actual energy usage values, with some seasonal and occupant-specific variability, and the dependence of the data on day-of-the-week makes the model easily applicable to different types of buildings with minimal adjustment. In parallel, an automated online evaluator monitors the building’s internal and external conditions, control actions and the results of those actions. Intelligent real-time data quality analysis components quickly detect anomalies and automatically transmit feedback to building management, who can then take necessary preventive or corrective actions. Our experiments show that this evaluator is responsive and effective in further ensuring reliable and energy-efficient operation of building systems. | (pdf) |
Aperture Evaluation for Defocus Deblurring and Extended Depth of Field | Changyin Zhou, Shree Nayar | 2012-04-17 | For a given camera setting, scene points that lie outside of depth of field (DOF) will appear defocused (or blurred). Defocus causes the loss of image details. To recover scene details from a defocused region, deblurring techniques must be employed. It is well known that the deblurring quality is closely related to the defocus kernel or point-spread-function (PSF), whose shape is largely determined by the aperture pattern of the camera. In this paper, we propose a comprehensive framework of aperture evaluation for the purpose of defocus deblurring, which takes the effects of image noise, deblurring algorithm, and the structure of natural images into account. By using the derived evaluation criterion, we are able to solve for the optimal coded aperture patterns. Extensive simulations and experiments are then conducted to compare the optimized coded apertures with previously proposed ones. The proposed framework of aperture evaluation is further extended to evaluate and optimize extended depth of field (EDOF) cameras. EDOF cameras (e.g., focal sweep and wavefront coding camera) are designed to produce PSFs which are less sensitive to depth variation, so that people can deconvolve the whole image using a single PSF without knowing scene depth. Different choices of camera parameters or the PSF to deconvolve with lead to different deblurring qualities. With the derived evaluation criterion, we are able to derive the optimal PSF to deconvolve with in a closed-form and optimize camera parameters for the best deblurring results. | (pdf) |
Partitioned Blockmap Indexes for Multidimensional Data Access | Kenneth Ross, Evangelia Sitaridi | 2012-04-16 | Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an in-memory column store tuned for on-line analytical processing or scientific data analysis. We propose a multidimensional data structure that contains a novel combination of a grid array and several bitmaps. The base data is clustered in an order matching that of the index structure. The bitmaps contain one bit per block of data, motivating the term ``blockmap.'' The proposed data structures are compact, typically taking less than one bit of space per row of data. Partition boundaries can be chosen in a way that reflects both the query workload and the data distribution, and boundaries are not required to evenly divide the data if there is a bias in the query distribution. We examine the theoretical performance of the data structure and experimentally measure its performance on three modern CPUs and one GPU processor. We demonstrate that efficient multidimensional access can be achieved with minimal space overhead. | (pdf) |
Experiments of Image Retrieval Using Weak Attributes | Felix X. Yu, Rongrong Ji, Ming-Hen Tsai, Guangnan Ye, Shih-Fu Chang | 2012-04-06 | Searching images based on descriptions of image attributes is an intuitive process that can be easily understood by humans and recently made feasible by a few promising works in both the computer vision and multimedia communities. In this report, we describe some experiments of image retrieval methods that utilize weak attributes. | (pdf) |
When Does Computational Imaging Improve Performance? | Oliver Cossairt, Mohit Gupta, Shree Nayar | 2012-03-24 | A number of computational imaging techniques have been introduced to improve image quality by increasing light throughput. These techniques use optical coding to measure a stronger signal level. However, the performance of these techniques is limited by the decoding step, which amplifies noise. While it is well understood that optical coding can increase performance at low light levels, little is known about the quantitative performance advantage of computational imaging in general settings. In this paper, we derive the performance bounds for various computational imaging techniques. We then discuss the implications of these bounds for several real-world scenarios (illumination conditions, scene properties and sensor noise characteristics). Our results show that computational imaging techniques provide a significant performance advantage in a surprisingly small set of real world settings. These results can be readily used by practitioners to design the most suitable imaging systems given the application at hand. | (pdf) |
High Availability for Carrier-Grade SIP Infrastructure on Cloud Platforms | Jong Yul Kim, Henning Schulzrinne | 2012-03-19 | SIP infrastructure on cloud platforms has the potential to be both scalable and highly available. In our previous project, we focused on the scalability aspect of SIP services on cloud platforms; the focus of this project is on the high availability aspect. We investigated the effects of component fault on service availability with the goal of understanding how high availability can be guaranteed even in the face of component faults. The experiments were conducted empirically on a real system that runs on Amazon EC2. Our analysis shows that most component faults are masked with a simple automatic failover technique. However, we have also identified fundamental problems that cannot be addressed by simple failover techniques; a problem involving DNS cache in resolvers and a problem involving static failover configurations. Recommendations on how to solve these problems are included in the report. | (pdf) |
Automatic Detection of Metamorphic Properties of Software | Sahar Hasan | 2012-03-14 | The goal of this project is to demonstrate the feasibility of automatic detection of metamorphic properties of individual functions. Properties of interest here, as described in Murphy et al.’s SEKE 2008 paper “Properties of Machine Learning Applications for Use in Metamorphic Testing”, include: 1. Permutation of the order of the input data 2. Addition of numerical values by a constant 3. Multiplication of numerical values by a constant 4. Reversal of the order of the input data 5. Removal of part of the data 6. Addition of data to the dataset While focusing on permutative, additive, and multiplicative properties in functions and applications, I have sought to identify common programming constructs and code fragments that strongly indicate that these properties will hold, or fail to hold, along an execution path in which the code is evaluated. I have constructed a syntax for expressions representing these common constructs and have also mapped a collection of these expressions to the metamorphic properties they uphold or invalidate. I have then developed a general framework to evaluate these properties for programs as a whole. | (pdf) |
CloudFence: Enabling Users to Audit the Use of their Cloud-Resident Data | Vasilis Pappas, Vasileios P. Kemerlis, Angeliki Zavou, Michalis Polychronakis, Angelos D. Keromytis | 2012-01-24 | One of the primary concerns of users of cloud-based services and applications is the risk of unauthorized access to their private information. For the common setting in which the infrastructure provider and the online service provider are different, end users have to trust their data to both parties, although they interact solely with the service provider. This paper presents CloudFence, a framework that allows users to independently audit the treatment of their private data by third-party online services, through the intervention of the cloud provider that hosts these services. CloudFence is based on a fine-grained data flow tracking platform exposed by the cloud provider to both developers of cloud-based applications, as well as their users. Besides data auditing for end users, CloudFence allows service providers to confine the use of sensitive data in well-defined domains using data tracking at arbitrary granularity, offering additional protection against inadvertent leaks and unauthorized access. The results of our experimental evaluation with real-world applications, including an e-store platform and a cloud-based backup service, demonstrate that CloudFence requires just a few changes to existing application code, while it can detect and prevent a wide range of security breaches, ranging from data leakage attacks using SQL injection, to personal data disclosure due to missing or erroneously implemented access control checks. | (pdf) |
Failure Analysis of the New York City Power Grid | Leon Wu, Roger Anderson, Albert Boulanger, Cynthia Rudin, Gail Kaiser | 2012-01-12 | As U.S. power grid transforms itself into Smart Grid, it has become less reliable in the past years. Power grid failures lead to huge financial cost and affect people’s life. Using a statistical analysis and holistic approach, this paper analyzes the New York City power grid failures: failure patterns and climatic effects. Our findings include: higher peak electrical load increases likelihood of power grid failure; increased subsequent failures among electrical feeders sharing the same substation; underground feeders fail less than overhead feeders; cables and joints installed during certain years are more likely to fail; higher weather temperature leads to more power grid failures. We further suggest preventive maintenance, intertemporal consumption, and electrical load optimization for failure prevention. We also estimated that the predictability of the power grid component failures correlates with the cycles of the North Atlantic Oscillation (NAO) Index. | (pdf) |
NetServ: Reviving Active Networks | Jae Woo Lee, Roberto Francescangeli, Wonsang Song, Emanuele Maccherani, Jan Janak, Suman Srinivasan | 2012-01-05 | In 1996, Tennenhouse and Wetherall proposed active net- works, where users can inject code modules into network nodes. The proposal sparked intense debate and follow- on research, but ultimately failed to win over the net- working community. Fifteen years later, the problems that motivated the active networks proposal persist. We call for a revival of active networks. We present NetServ, a fully integrated active network system that provides all the necessary functionality to be deployable, addressing the core problems that prevented the practical success of earlier approaches. We make the following contributions. We present a hybrid approach to active networking, which combines the best qualities from the two extreme approaches� integrated and discrete. We built a working system that strikes the right balance between security and perfor- mance by leveraging current technologies. We suggest an economic model based on NetServ between content providers and ISPs. We built four applications to illus- trate the model. | (pdf) |
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft | Jonathan Bell, Swapneel Sheth, Gail Kaiser | 2011-12-29 | We present a survey of usage of the popular Massively Multiplayer Online Role Playing Game, World of Warcraft. Players within this game often self-organize into communities with similar interests and/or styles of play. By mining publicly available data, we collected a dataset consisting of the complete player history for approximately six million characters, with partial data for another six million characters. The paper provides a thorough description of the distributed approach used to collect this massive community data set, and then focuses on an analysis of player achievement data in particular, exposing trends in play from this highly successful game. From this data, we present several findings regarding player profiles. We correlate achievements with motivations based upon a previously-defined motivation model, and then classify players based on the categories of achievements that they pursued. Experiments show players who fall within each of these buckets can play differently, and that as players progress through game content, their play style evolves as well. | (pdf) |
GRAND: Git Revisions As Named Data | Jan Janak, Jae Woo Lee, Henning Schulzrinne | 2011-12-12 | GRAND is an experimental extension of Git, a distributed revision control system, which enables the synchronization of Git repositories over Content-Centric Networks (CCN). GRAND brings some of the benefits of CCN to Git, such as transparent caching, load balancing, and the ability to fetch objects by name rather than location. Our implementation is based on CCNx, a reference implementation of content router. The current prototype consists of two components: git-daemon-ccnx allows a node to publish its local Git repositories to CCNx Content Store; git-remote-ccnx implements CCNx transport on the client side. This adds CCN to the set of transport protocols supported by Git, alongside HTTP and SSH. | (pdf) |
ActiveCDN: Cloud Computing Meets Content Delivery Networks | Suman Srinivasan, Jae Woo Lee, Dhruva Batni, Henning Schulzrinne | 2011-11-15 | Content delivery networks play a crucial role in today’s Internet. They serve a large portion of the multimedia on the Internet and solve problems of scalability and indirectly network congestion (at a price). However, most content delivery networks rely on a statically deployed configuration of nodes and network topology that makes it hard to grow and scale dynamically. We present ActiveCDN, a novel CDN architecture that allows a content publisher to dynamically scale their content delivery services using network virtualization and cloud computing techniques. | (pdf) |
libdft: Practical Dynamic Data Flow Tracking for Commodity Systems | Vasileios P. Kemerlis, Georgios Portokalidis, Kangkook Jee, Angelos D. Keromytis | 2011-10-27 | Dynamic data flow tracking (DFT) deals with the tagging and tracking of "interesting" data as they propagate during program execution. DFT has been repeatedly implemented by a variety of tools for numerous purposes, including protection from zero-day and cross-site scripting attacks, detection and prevention of information leaks, as well as for the analysis of legitimate and malicious software. We present libdft, a dynamic DFT framework that unlike previous work is at once fast, reusable, and works with commodity software and hardware. libdft provides an API, which can be used to painlessly deliver DFT-enabled tools that can be applied on unmodified binaries, running on common operating systems and hardware, thus facilitating research and rapid prototyping. We explore different approaches for implementing the low-level aspects of instruction-level data tracking, introduce a more efficient and 64-bit capable shadow memory, and identify (and avoid) the common pitfalls responsible for the excessive performance overhead of previous studies. We evaluate libdft using real applications with large codebases like the Apache and MySQL servers, and the Firefox web browser. We also use a series of benchmarks and utilities to compare libdft with similar systems. Our results indicate that it performs at least as fast, if not faster, than previous solutions, and to the best of our knowledge, we are the first to evaluate the performance overhead of a fast dynamic DFT implementation in such depth. Finally, our implementation is freely available as open source software. | (pdf) |
Money for Nothing and Privacy for Free? | Swapneel Sheth, Tal Malkin, Gail Kaiser | 2011-10-10 | Privacy in the context of ubiquitous social computing systems has become a major concern for the society at large. As the number of online social computing systems that collect user data grows, this privacy threat is further exacerbated. There has been some work (both, recent and older) on addressing these privacy concerns. These approaches typically require extra computational resources, which might be beneficial where privacy is concerned, but when dealing with Green Computing and sustainability, this is not a great option. Spending more computation time results in spending more energy and more resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable - systems where privacy could be achieved ``for free,'' i.e., without having to spend extra computational effort. In this paper, we describe how privacy can be achieved for free - an accidental and beneficial side effect of doing some existing computation - and what types of privacy threats it can mitigate. More precisely, we describe a ``Privacy for Free'' design pattern and show its feasibility, sustainability, and utility in building complex social computing systems. | (pdf) |
Forecasting Energy Demand in Large Commercial Buildings Using Support Vector Machine Regression | David Solomon, Rebecca Winter, Albert Boulanger, Roger Anderson, Leon Wu | 2011-09-24 | As our society gains a better understanding of how humans have negatively impacted the environment, research related to reducing carbon emissions and overall energy consumption has become increasingly important. One of the simplest ways to reduce energy usage is by making current buildings less wasteful. By improving energy efficiency, this method of lowering our carbon footprint is particularly worthwhile because it reduces energy costs of operating the building, unlike many environmental initiatives that require large monetary investments. In order to improve the efficiency of the heating, ventilation, and air conditioning (HVAC) system of a Manhattan skyscraper, 345 Park Avenue, a predictive computer model was designed to forecast the amount of energy the building will consume. This model uses Support Vector Machine Regression (SVMR), a method that builds a regression based purely on historical data of the building, requiring no knowledge of its size, heating and cooling methods, or any other physical properties. SVMR employs time-delay coordinates as a representation of the past to create the feature vectors for SVM training. This pure dependence on historical data makes the model very easily applicable to different types of buildings with few model adjustments. The SVM regression model was built to predict a week of future energy usage based on past energy, temperature, and dew point temperature data. | (pdf) |
Privacy Enhanced Access Control for Outsourced Data Sharing | Mariana Raykova, Hang Zhao, Steven Bellovin | 2011-09-20 | Traditional access control models often assume that the entity enforcing access control policies is also the owner of data and resources. This assumption no longer holds when data is outsourced to a third-party storage provider, such as the \emph{cloud}. Existing access control solutions mainly focus on preserving confidentiality of stored data from unauthorized access and the storage provider. However, in this setting, access control policies as well as users' access patterns also become privacy sensitive information that should be protected from the cloud. We propose a two-level access control scheme that combines coarse-grained access control enforced at the cloud, which allows to get acceptable communication overhead and at the same time limits the information that the cloud learns from his partial view of the access rules and the access patterns, and fine-grained cryptographic access control enforced at the user's side, which provides the desired expressiveness of the access control policies. Our solution handles both \emph{read} and \emph{write} access control. | (pdf) |
Stable Flight and Object Tracking with a Quadricopter using an Android Device | Benjamin Bardin, William Brown, Paul S. Blaer | 2011-09-09 | We discuss a novel system architecture for quadricopter control, the Robocopter platform, in which the quadricopter can behave near-autonomously and processing is handled by an Android device on the quadricopter. The Android device communicates with a laptop, receiving commands from the host and sending imagery and sensor data back. We also discuss the results of a series of tests of our platform on our first hardware iteration, named Jabberwock. | (pdf) |
NetServ on OpenFlow 1.0 | Emanuele Maccherani, Jae Woo Lee, Mauro Femminella, Gianluca Reali, Henning Schulzrinne | 2011-09-03 | We describe the initial prototype implementation of OpenFlow-based NetServ. | (pdf) |
Improving System Reliability for Cyber-Physical Systems | Leon Wu | 2011-08-31 | System reliability is a fundamental requirement of Cyber-Physical System, i.e., a system featuring a tight combination of, and coordination between, the systems computational and physical elements. Cyber-physical system includes systems ranging from the critical infrastructure such as power grid and transportation system to the health and biomedical devices. An unreliable system often leads to disruption of service, financial cost and even loss of human life. This thesis aims to improve system reliability for cyber-physical systems that meet following criteria: processing large amount of data; employing software as a system component; running online continuously; having operator-in-the-loop because of human judgment and accountability requirement for safety critical systems. The reason that I limit the system scope to this type of cyber-physical system is that this type of cyber-physical systems are important and becoming more prevalent. To improve system reliability for this type of cyber-physical systems, I propose a system evaluation approach named automated online evaluation. It works in parallel with the cyber-physical system to conduct automated evaluation at the multiple stages along the workflow of the system continuously and provide operator-in-the-loop feedback on reliability improvement. It is an approach whereby data from cyber-physical system is evaluated. For example, abnormal input and output data can be detected and flagged through data quality analysis. As a result, alerts can be sent to the operator-in-the-loop. The operator can then take actions and make changes to the system based on the alerts in order to achieve minimal system downtime and higher system reliability. To implement the proposed approach, I further propose a system architecture named ARIS (Autonomic Reliability Improvement System). One technique used by the approach is data quality analysis using computational intelligence that applies computational intelligence in evaluating data quality in some automated and efficient way to ensure data quality and make sure the running system to perform as expected reliably. The computational intelligence is enabled by machine learning, data mining, statistical and probabilistic analysis, and other intelligent techniques. In a cyber-physical system, the data collected from the system, e.g., software bug reports, system status logs and error reports, are stored in some databases. In my approach, these data are analyzed via data mining and other intelligent techniques so that useful information on system reliability including erroneous data and abnormal system state can be concluded. These reliability related information are directed to operators so that proper actions can be taken, sometimes proactively based on the predictive results, to ensure the proper and reliable execution of the system. Another technique used by the approach is self-tuning that automatically self-manages and self-configures the evaluation system to ensure it adapts itself based on the changes in the system and feedback from the operator. The self-tuning adapts the evaluation system to ensure its proper functioning, which leads to a more robust evaluation system and improved system reliability. For feasibility study of the proposed approach, I first present NOVA (Neutral Online Visualization-aided Autonomic) system, a data quality analysis system for improving system reliability for power grid cyber-physical system. I then present a feasibility study on effectiveness of some self-tuning techniques, including data classification, redundancy checking and trend detection. The self-tuning leads to an adaptive evaluation system that works better under system changes and operator feedback, which will lead to improved system reliability. The contribution of the work is an automated online evaluation approach that is able to improve system reliability for cyber-physical systems in the domain of interest as indicated above. It enables online reliability assurance of the deployed systems that are not possible to perform robust tests prior to actual deployment. | (pdf) |
Describable Visual Attributes for Face Images | Neeraj Kumar | 2011-08-01 | We introduce the use of describable visual attributes for face images. Describable visual attributes are labels that can be given to an image to describe its appearance. This thesis focuses mostly on images of faces and the attributes used to describe them, although the concepts also apply to other domains. Examples of face attributes include gender, age, jaw shape, nose size, etc. The advantages of an attribute-based representation for vision tasks are manifold: they can be composed to create descriptions at various levels of specificity; they are generalizable, as they can be learned once and then applied to recognize new objects or categories without any further training; and they are efficient, possibly requiring exponentially fewer attributes (and training data) than explicitly naming each category. We show how one can create and label large datasets of real-world images to train classifiers which measure the presence, absence, or degree to which an attribute is expressed in images. These classifiers can then automatically label new images. We demonstrate the current effectiveness and explore the future potential of using attributes for image search, automatic face replacement in images, and face verification, via both human and computational experiments. To aid other researchers in studying these problems, we introduce two new large face datasets, named FaceTracer and PubFig, with labeled attributes and identities, respectively. Finally, we also show the effectiveness of visual attributes in a completely different domain: plant species identification. To this end, we have developed and publicly released the Leafsnap system, which has been downloaded by over half a million users. The mobile phone application is a flexible electronic field guide with high-quality images of the tree species in the Northeast US. It also gives users instant access to our automatic recognition system, greatly simplifying the identification process. | (pdf) |
ICOW: Internet Access in Public Transit Systems | Se Gi Hong, SungHoon Seo, Henning Schulzrinne, Prabhakar Chitrapu | 2011-07-27 | When public transportation stations have access points to provide Internet access to passengers, public transportation becomes a more attractive travel and commute option. However, the Internet connectivity is intermittent because passengers can access the Internet only when a bus or a train is within the networking coverage of an access point at a stop. To efficiently handle this intermittent network for the public transit system, we propose Internet Cache on Wheels (ICOW), a system that provides a low-cost way for bus and train operators to offer access to Internet content. Each bus or train car is equipped with a smart cache that serves popular content to passengers. The cache updates its content based on passenger requests when it is within range of Internet access points placed at bus stops, train stations or depots. We have developed a system architecture and built a prototype of the ICOW system. Our evaluation and analysis show that ICOW is significantly more efficient than having passengers contact Internet access points individually and ensures continuous availability of content throughout the journey. | (pdf) |
Data Quality Assurance and Performance Measurement of Data Mining for Preventive Maintenance of Power Grid | Leon Wu, Gail Kaiser, Cynthia Rudin, Roger Anderson | 2011-07-01 | Ensuring reliability as the electrical grid morphs into the "smart grid" will require innovations in how we assess the state of the grid, for the purpose of proactive maintenance, rather than reactive maintenance; in the future, we will not only react to failures, but also try to anticipate and avoid them using predictive modeling (machine learning and data mining) techniques. To help in meeting this challenge, we present the Neutral Online Visualization-aided Autonomic evaluation framework (NOVA) for evaluating machine learning and data mining algorithms for preventive maintenance on the electrical grid. NOVA has three stages provided through a unified user interface: evaluation of input data quality, evaluation of machine learning and data mining results, and evaluation of the reliability improvement of the power grid. A prototype version of NOVA has been deployed for the power grid in New York City, and it is able to evaluate machine learning and data mining systems effectively and efficiently. | (pdf) |
Columbia University WiMAX Campus Deployment and Installation | SungHoon Seo, Jan Janak, Henning Schulzrinne | 2011-06-27 | This report describes WiMAX campus deployment and installation at Columbia University. | (pdf) |
The Benefits of Using Clock Gating in the Design of Networks-on-Chip | Michele Petracca, Luca P. Carloni | 2011-06-21 | Networks-on-chip (NoC) are critical to the design of complex multi-core system-on-chip (SoC) architectures. Since SoCs are characterized by a combination of high performance requirements and stringent energy constraints, NoCs must be realized with low-power design techniques. Since the use of semicustom design flow based on standard-cell technology libraries is essential to cope with the SoC design complexity challenges under tight time-to-market constraints, NoC must be implemented using logic synthesis. In this paper we analyze the major power reduction that clock gating can deliver when applied to the synthesis of a NoC in the context of a semi-custom automated design flow. | (pdf) |
Secret Ninja Testing with HALO Software Engineering | Jonathan Bell, Swapneel Sheth, Gail Kaiser | 2011-06-21 | Software testing traditionally receives little attention in early computer science courses. However, we believe that if exposed to testing early, students will develop positive habits for future work. As we have found that students typically are not keen on testing, we propose an engaging and socially-oriented approach to teaching software testing in introductory and intermediate computer science courses. Our proposal leverages the power of gaming utilizing our previously described system HALO. Unlike many previous approaches, we aim to present software testing in disguise - so that students do not recognize (at first) that they are being exposed to software testing. We describe how HALO could be integrated into course assignments as well as the benefits that HALO creates. | (pdf) |
Markov Models for Network-Behavior Modeling and Anonymization | Yingbo Song, Salvatore J. Stolfo, Tony Jebara | 2011-06-15 | Modern network security research has demonstrated a clear need for open sharing of traffic datasets between organizations, a need that has so far been superseded by the challenge of removing sensitive content beforehand. Network Data Anonymization (NDA) is emerging as a field dedicated to this problem, with its main direction focusing on removal of identifiable artifacts that might pierce privacy, such as usernames and IP addresses. However, recent research has demonstrated that more subtle statistical artifacts, also present, may yield fingerprints that are just as differentiable as the former. This result highlights certain shortcomings in current anonymization frameworks -- particularly, ignoring the behavioral idiosyncrasies of network protocols, applications, and users. Recent anonymization results have shown that the extent to which utility and privacy can be obtained is mainly a function of the information in the data that one is aware and not aware of. This paper leverages the predictability of network behavior in our favor to augment existing frameworks through a new machine-learning-driven anonymization technique. Our approach uses the substitution of individual identities with group identities where members are divided based on behavioral similarities, essentially providing anonymity-by-crowds in a statistical mix-net. We derive time-series models for network traffic behavior which quantifiably models the discriminative features of network "behavior" and introduce a kernel-based framework for anonymity which fits together naturally with network-data modeling. | (pdf) |
Concurrency Attacks | Junfeng Yang, Ang Cui, John Gallagher, Sal Stolfo, Simha Sethumadhavan | 2011-06-02 | Just as errors in sequential programs can lead to security exploits, errors in concurrent programs can lead to concurrency attacks. In this paper, we present an in-depth study of concurrency attacks and how they may affect existing defenses. Our study yields several interesting findings. For instance, we find that concurrency attacks can corrupt non-pointer data, such as user identifiers, which existing memory-safety defenses cannot handle. Inspired by our findings, we propose new defense directions and fixes to existing defenses. | (pdf) |
Constructing Subtle Concurrency Bugs Using Synchronization-Centric Second-Order Mutation Operators | Leon Wu, Gail Kaiser | 2011-06-01 | Mutation testing applies mutation operators to modify program source code or byte code in small ways, and then runs these modified programs (i.e., mutants) against a test suite in order to evaluate the quality of the test suite. In this paper, we first describe a general fault model for con- current programs and some limitations of previously developed sets of first-order concurrency mutation operators. We then present our new mutation testing approach, which em- ploys synchronization-centric second-order mutation operators that are able to generate subtle concurrency bugs not represented by the first-order mutation. These operators are used in addition to the synchronization-centric first-order mutation operators to form a small set of effective concurrency mutation operators for mutant generation. Our empirical study shows that our set of operators is effective in mutant generation with limited cost and demonstrates that this new approach is easy to implement. | (pdf) |
BUGMINER: Software Reliability Analysis Via Data Mining of Bug Reports | Leon Wu, Boyi Xie, Gail Kaiser, Rebecca Passonneau | 2011-06-01 | Software bugs reported by human users and automatic error reporting software are often stored in some bug track- ing tools (e.g., Bugzilla and Debbugs). These accumulated bug reports may contain valuable information that could be used to improve the quality of the bug reporting, reduce the quality assurance effort and cost, analyze software re- liability, and predict future bug report trend. In this paper, we present BUGMINER, a tool that is able to derive useful information from historic bug report database using data mining, use these information to do completion check and redundancy check on a new or given bug report, and to estimate the bug report trend using statistical analysis. Our empirical studies of the tool using several real-world bug report repositories show that it is effective, easy to implement, and has relatively high accuracy despite low quality data. | (pdf) |
Evaluating Machine Learning for Improving Power Grid Reliability | Leon Wu, Gail Kaiser, Cynthia Rudin, David Waltz, Roger Anderson, Albert Boulanger | 2011-06-01 | Ensuring reliability as the electrical grid morphs into the “smart grid” will require innovations in how we assess the state of the grid, for the purpose of proactive maintenance, rather than reactive maintenance – in the future, we will not only react to failures, but also try to anticipate and avoid them using predictive modeling (ma- chine learning) techniques. To help in meeting this challenge, we present the Neutral Online Visualization-aided Autonomic evaluation framework (NOVA) for evaluating machine learning algorithms for preventive maintenance on the electrical grid. NOVA has three stages provided through a unified user interface: evaluation of input data quality, evaluation of machine learning results, and evaluation of the reliability improvement of the power grid. A prototype version of NOVA has been deployed for the power grid in New York City, and it is able to evaluate machine learning systems effectively and efficiently. | (pdf) |
Entropy, Randomization, Derandomization, and Discrepancy | Michael Gnewuch | 2011-05-24 | Abstract The star discrepancy is a measure of how uniformly distributed a finite point set is in the d-dimensional unit cube. It is related to high-dimensional numerical integration of certain function classes as expressed by the Koksma-Hlawka inequality. A sharp version of this inequality states that the worst-case error of approximating the integral of functions from the unit ball of some Sobolev space by an equal-weight cubature is exactly the star discrepancy of the set of sample points. In many applications, as, e.g., in physics, quantum chemistry or finance, it is essential to approximate high-dimensional integrals. Thus with regard to the Koksma- Hlawka inequality the following three questions are very important: (i) What are good bounds with explicitly given dependence on the dimension d for the smallest possible discrepancy of any n-point set for moderate n? (ii) How can we construct point sets efficiently that satisfy such bounds? (iii) How can we calculate the discrepancy of given point sets efficiently? We want to discuss these questions and survey and explain some approaches to tackle them relying on metric entropy, randomization, and derandomization. | (pdf) (ps) |
A NEW RANDOMIZED ALGORITHM TO APPROXIMATE THE STAR DISCREPANCY BASED ON THRESHOLD ACCEPTING | MICHAEL GNEWUCH, MAGNUS WAHLSTROM, CAROLA WINZEN | 2011-05-24 | Abstract. We present a new algorithm for estimating the star discrepancy of arbitrary point sets. Similar to the algorithm for discrepancy approximation of Winker and Fang [SIAM J. Numer. Anal. 34 (1997), 2028{2042] it is based on the optimization algorithm threshold accepting. Our improvements include, amongst others, a non-uniform sampling strategy which is more suited for higher-dimensional inputs and additionally takes into account the topological characteristics of given point sets, and rounding steps which transform axis-parallel boxes, on which the discrepancy is to be tested, into critical test boxes. These critical test boxes provably yield higher discrepancy values, and contain the box that exhibits the maximum value of the local discrepancy. We provide comprehensive experiments to test the new algorithm. Our randomized algorithm computes the exact discrepancy frequently in all cases where this can be checked (i.e., where the exact discrepancy of the point set can be computed in feasible time). Most importantly, in higher dimension the new method behaves clearly better than all previously known methods. | (pdf) (ps) |
Cells: A Virtual Mobile Smartphone Architecture | Jeremy Andrus, Christoffer Dall, Alexander Van't Hoff, Oren Laadan, Jason Nieh | 2011-05-24 | Cellphones are increasingly ubiquitous, so much so that many users are inconveniently forced to carry multiple cellphones to accommodate work, personal, and geographic mobility needs. We present Cells, a virtualization architecture for enabling multiple virtual smartphones to run simultaneously on the same physical cellphone device in a securely isolated manner. Cells introduces a usage model of having one foreground virtual phone and multiple background virtual phones. This model enables a new device namespace mechanism and novel device proxies that integrate with lightweight operating system virtualization to efficiently and securely multiplex phone hardware devices across multiple virtual phones while providing native hardware device performance to all applications. Virtual phone features include fully-accelerated graphics for gaming, complete power management features, and full telephony functionality with separately assignable telephone numbers and caller ID support. We have implemented a Cells prototype that supports multiple Android virtual phones on the same phone hardware. Our performance results demonstrate that Cells imposes only modest runtime and memory overhead, works seamlessly across multiple hardware devices including Google Nexus 1 and Nexus S phones and an NVIDIA tablet, and transparently runs all existing Android applications without any modifications. | (pdf) |
Towards Diversity in Recommendations using Social Networks | Swapneel Sheth, Jonathan Bell, Nipun Arora, Gail Kaiser | 2011-05-17 | While there has been a lot of research towards improving the accuracy of recommender systems, the resulting systems have tended to become increasingly narrow in suggestion variety. An emerging trend in recommendation systems is to actively seek out diversity in recommendations, where the aim is to provide unexpected, varied, and serendipitous recommendations to the user. Our main contribution in this paper is a new approach to diversity in recommendations called ``Social Diversity,'' a technique that uses social network information to diversify recommendation results. Social Diversity utilizes social networks in recommender systems to leverage the diverse underlying preferences of different user communities to introduce diversity into recommendations. This form of diversification ensures that users in different social networks (who may not collaborate in real life, since they are in a different network) share information, helping to prevent siloization of knowledge and recommendations. We describe our approach and show its feasibility in providing diverse recommendations for the MovieLens dataset. | (pdf) |
Combining a Baiting and a User Search Profiling Techniques for Masquerade Detection | Malek Ben Salem, Salvatore J. Stolfo | 2011-05-06 | Masquerade attacks are characterized by an adversary stealing a legitimate user's credentials and using them to impersonate the victim and perform malicious activities, such as stealing information. Prior work on masquerade attack detection has focused on proling legitimate user behavior and detecting abnormal behavior indicative of a masquerade attack. Like any anomaly-detection based techniques, detecting masquerade attacks by proling user behavior suers from a signicant number of false positives. We extend prior work and provide a novel integrated detection approach in this paper. We combine a user behavior proling technique with a baiting technique in order to more accurately detect masquerade activity. We show that using this integrated approach reduces the false positives by 36% when compared to user behavior proling alone, while achieving almost perfect detection results.We also show how this combined detection approach serves as a mechanism for hardening the masquerade attack detector against mimicry attacks. | (pdf) |
DYSWIS: Collaborative Network Fault Diagnosis - Of End-users, By End-users, For End-users | Kyung Hwa Kim, Vishal Singh, Henning Schulzrinne | 2011-05-05 | With increase in application complexity, the need for network faults diagnosis for end-users has increased. However, existing failure diagnosis techniques fail to assist the endusers in accessing the applications and services. We present DYSWIS, an automatic network fault detection and diagnosis system for end-users. The key idea is collaboration of end-users; a node requests multiple nodes to diagnose a network fault in real time to collect diverse information from different parts of the networks and infer the cause of failure. DYSWIS leverages DHT network to search the collaborating nodes with appropriate network properties required to diagnose a failure. The framework allows dynamic updating of rules and probes into a running system. Another key aspect is contribution of expert knowledge (rules and probes) by application developers, vendors and network administrators; thereby enabling crowdsourcing of diagnosis strategy for growing set of applications. We have implemented the framework and the software and tested them using our test bed and PlanetLab to show that several complex commonly occurring failures can be detected and diagnosed successfully using DYSWIS, while single-user probe with traditional tools fails to pinpoint the cause of such failures. We validate that our base modules and rules are sufficient to detect infrastructural failures causing majority of application failures. | (pdf) |
NetServ Framework Design and Implementation 1.0 | Jae Woo Lee, Roberto Francescangeli, Wonsang Song, Jan Janak, Suman Srinivasan, Michael S. Kester | 2011-05-04 | Eyeball ISPs today are under-utilizing an important asset: edge routers. We present NetServ, a programmable node architecture aimed at turning edge routers into distributed service hosting platforms. This allows ISPs to allocate router resources to content publishers and application service pro\-vi\-ders motivated to deploy content and services at the network edge. This model provides important benefits over currently available solutions like CDN. Content and services can be brought closer to end users by dynamically installing and removing custom modules as needed throughout the network. Unlike previous programmable router proposals which focused on customizing features of a router, NetServ focuses on deploying content and services. All our design decisions reflect this change in focus. We set three main design goals: a wide-area deployment, a multi-user execution environment, and a clear economic benefit. We built a prototype using Linux, NSIS signaling, and the Java OSGi framework. We also implemented four prototype applications: ActiveCDN provides publisher-specific content distribution and processing; KeepAlive Responder and Media Relay reduce the infrastructure needs of telephony providers; and Overload Control makes it possible to deploy more flexible algorithms to handle excessive traffic. | (pdf) |
Estimation of System Reliability Using a Semiparametric Model | Leon Wu, Timothy Teravainen, Gail Kaiser, Roger Anderson, Albert Boulanger, Cynthia Rudin | 2011-04-20 | An important problem in reliability engineering is to predict the failure rate, that is, the frequency with which an engineered system or component fails. This paper presents a new method of estimating failure rate using a semiparametric model with Gaussian process smoothing. The method is able to provide accurate estimation based on historical data and it does not make strong a priori assumptions of failure rate pattern (e.g., constant or monotonic). Our experiments of applying this method in power system failure data compared with other models show its efficacy and accuracy. This method can be used in estimating reliability for many other systems, such as software systems or components. | (pdf) |
Beyond Trending Topics: Real-World Event Identification on Twitter | Hila Becker, Mor Naaman, Luis Gravano | 2011-03-25 | User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, earlier than other social media sites such as Flickr or YouTube, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages. Our approach relies on a rich family of aggregate statistics of topically similar message clusters, including temporal, social, topical, and Twitter-centric features. Our large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter. | (pdf) |
Efficient, Deterministic and Deadlock-free Concurrency | Nalini Vasudevan | 2011-03-25 | Concurrent programming languages are growing in importance with the advent of multicore systems. Two major concerns in any concurrent program are data races and deadlocks. Each are potentially subtle bugs that can be caused by nondeterministic scheduling choices in most concurrent formalisms. Unfortunately, traditional race and deadlock detection techniques fail on both large programs, and small programs with complex behaviors. We believe the solution is model-based design, where the programmer is presented with a constrained higher-level language that prevents certain unwanted behavior. We present the SHIM model that guarantees the absence of data races by eschewing shared memory. This dissertation provides SHIM based techniques that aid determinism - models that guarantee determinism, compilers that generate deterministic code and libraries that provide deterministic constructs. Additionally, we avoid deadlocks, a consequence of improper synchronization. A SHIM program may deadlock if it violates a communication protocol. We provide efficient techniques for detecting and deterministically breaking deadlocks in programs that use the SHIM model. We evaluate the efficiency of our techniques with a set of benchmarks. We have also extended our ideas to other languages. The ultimate goal is to provide deterministic deadlock-free concurrency along with efficiency. Our hope is that these ideas will be used in the future while designing complex concurrent systems. | (pdf) |
Implementing Zeroconf in Linphone | Abhishek Srivastava, Jae Woo Lee, Henning Schulzrinne | 2011-03-05 | This report describes the motivation behind implementing Zeroconf in a open source SIP phone(Linphone) and the architecture of the solution implemented. It also describes the roadblocks encountered and how they were tackled in the implementation. It concludes with a few mentions about future enhancements that may be implemented on a later date. | (pdf) |
Frank Miller: Inventor of the One-Time Pad | Steven M. Bellovin | 2011-03-01 | The invention of the one-time pad is generally credited to Gilbert S. Vernam and Joseph O. Mauborgne. We show that it was invented about 35 years earlier by a Sacramento banker named Frank Miller. We provide a tentative identification of which Frank Miller it was, and speculate on whether or not Mauborgne might have known of Miller's work, especially via his colleague Parker Hitt. | (pdf) |
The Failure of Online Social Network Privacy Settings | Michelle Madejski, Maritza Johnson, Steven M. Bellovin | 2011-02-23 | Increasingly, people are sharing sensitive personal information via online social networks (OSN). While such networks do permit users to control what they share with whom, access control policies are notoriously difficult to configure correctly; this raises the question of whether OSN users' privacy settings match their sharing intentions. We present the results of an empirical evaluation that measures privacy attitudes and intentions and compares these against the privacy settings on Facebook. Our results indicate a serious mismatch: every one of the 65 participants in our study confirmed that at least one of the identified violations was in fact a sharing violation. In other words, OSN users' privacy settings are incorrect. Furthermore, a majority of users cannot or will not fix such errors. We conclude that the current approach to privacy settings is fundamentally flawed and cannot be fixed; a fundamentally different approach is needed. We present recommendations to ameliorate the current problems, as well as provide suggestions for future research. | (pdf) |
HALO (Highly Addictive, sociaLly Optimized) Software Engineering | Swapneel Sheth, Jonathan Bell, Gail Kaiser | 2011-02-08 | In recent years, computer games have become increasingly social and collaborative in nature. Massively multiplayer online games, in which a large number of players collaborate with each other to achieve common goals in the game, have become extremely pervasive. By working together towards a common goal, players become more engrossed in the game. In everyday work environments, this sort of engagement would be beneficial, and is often sought out. We propose an approach to software engineering called HALO that builds upon the properties found in popular games, by turning work into a game environment. Our proposed approach can be viewed as a model for a family of prospective games that would support the software development process. Utilizing operant conditioning and flow theory, we create an immersive software development environment conducive to increased productivity. We describe the mechanics of HALO and how it could fit into typical software engineering processes. | (pdf) |
On Effective Testing of Health Care Simulation Software | Christian Murphy, M.S. Raunak, Andrew King, Sanjien Chen, Christopher Imbriano, Gail Kaiser | 2011-02-04 | Health care professionals rely on software to simulate anatomical and physiological elements of the human body for purposes of training, prototyping, and decision making. Software can also be used to simulate medical processes and protocols to measure cost effectiveness and resource utilization. Whereas much of the software engineering research into simulation software focuses on validation (determining that the simulation accurately models real-world activity), to date there has been little investigation into the testing of simulation software itself, that is, the ability to effectively search for errors in the implementation. This is particularly challenging because often there is no test oracle to indicate whether the results of the simulation are correct. In this paper, we present an approach to systematically testing simulation software in the absence of test oracles, and evaluate the effectiveness of the technique. | (pdf) |
Protocols and System Design, Reliability, and Energy Efficiency in Peer-to-Peer Communication Systems | Salman Abdul Baset | 2011-02-04 | Modern Voice-over-IP (VoIP) communication systems provide a bundle of services to their users. These services range from the most basic voice-based services such as voice calls and voicemail to more advanced ones such as conferencing, voicemail-to-text, and online address books. Besides voice, modern VoIP systems provide video calls and video conferencing, presence, instant messaging (IM), and even desktop sharing services. These systems also let their users establish a voice, video, or a text session with devices in cellular, public switched telephone network (PSTN), or other VoIP networks. The peer-to-peer (p2p) paradigm for building VoIP systems involves minimal or no use of managed servers and is therefore attractive from an administrative and economic perspective. However, the benefits of using p2p paradigm in VoIP systems are not without their challenges. First, p2p communication (VoIP) systems can be deployed in environ- ments with varying requirements of scalability, connectivity, security, interoperability, and performance. These requirements bring forth the question of designing open and standardized protocols for diverse deployments. Second, the presence of restrictive network address translators (NATs) and firewalls prevents machines from directly exchanging packets and is problematic from the perspective of establishing direct media sessions. The p2p communication systems address this problem by using an intermediate peer with unrestricted connectivity to relay the session or by preferring the use of TCP. This technique for addressing connectivity problems raises questions about the reliability and session quality of p2p communication systems compared with the traditional client-server VoIP systems. Third, while administrative overheads are likely to be lower in running p2p communication systems as compared to client-server, can the same be said about the energy efficiency? Fourth, what type of techniques can be used to gain insights into the performance of a deployed p2p VoIP system like Skype? The thesis addresses the challenges in designing, building, and analyzing peer-to-peer communication systems. The thesis presents Peer-to-Peer Protocol (P2PP), an open protocol for building p2p communication systems with varying operational requirements. P2PP is now part of the IETF's P2PSIP protocol and is on track to become an RFC. The thesis describes the design and implementation of OpenVoIP, a proof-of-concept p2p communication system to demonstrate the feasibility of P2PP and to explore issues in building p2p communication systems. The thesis introduces a simple and novel analytical model for analyzing the reliability of peer-to-peer communication systems and analyzes the feasibility of TCP for sending real-time traffic. The thesis then analyzes the energy efficiency of peer-to-peer and client-server VoIP systems and shows that p2p VoIP systems are less energy efficient than client-server even if the peers consume a small amount of energy for running the p2p network. Finally, the thesis presents an analysis of the Skype protocol which indicates that Skype is free-riding on the network bandwidth of universities. | (pdf) |
Detecting Traffic Snooping in Anonymity Networks Using Decoys | Sambuddho Chakravarty, Georgios Portokalidis, Michalis Polychronakis, Angelos D. Keroymtis | 2011-02-03 | Anonymous communication networks like Tor partially protect the confidentiality of their users' traffic by encrypting all intra-overlay communication. However, when the relayed traffic reaches the boundaries of the overlay network towards its actual destination, the original user traffic is inevitably exposed. At this point, unless end-to-end encryption is used, sensitive user data can be snooped by a malicious or compromised exit node, or by any other rogue network entity on the path towards the actual destination. We explore the use of decoy traffic for the detection of traffic interception on anonymous proxying systems. Our approach is based on the injection of traffic that exposes bait credentials for decoy services that require user authentication. Our aim is to entice prospective eavesdroppers to access decoy accounts on servers under our control using the intercepted credentials. We have deployed our prototype implementation in the Tor network using decoy IMAP and SMTP servers. During the course of six months, our system detected eight cases of traffic interception that involved eight different Tor exit nodes. We provide a detailed analysis of the detected incidents, discuss potential improvements to our system, and outline how our approach can be extended for the detection of HTTP session hijacking attacks. | (pdf) |
POWER: Parallel Optimizations With Executable Rewriting | Nipun Arora, Jonathan Bell, Martha Kim, Vishal Singh, Gail Kaiser | 2011-02-01 | The hardware industry�s rapid development of multicore and many core hardware has outpaced the software industry�s transition from sequential to parallel programs. Most applications are still sequential, and many cores on parallel machines remain unused. We propose a tool that uses data-dependence profiling and binary rewriting to parallelize executables without access to source code. Our technique uses Bernstein�s conditions to identify independent sets of basic blocks that can be executed in parallel, introducing a level of granularity between fine-grained instruction level and coarse grained task level parallelism. We analyze dynamically generated control and data dependence graphs to find independent sets of basic blocks which can be parallelized. We then propose to parallelize these candidates using binary rewriting techniques. Our technique aims to demonstrate the parallelism that remains in serial application by exposing concrete opportunities for parallelism. | (pdf) |
Decoy Document Deployment for Effective Masquerade Attack Detection | Malek Ben Salem, Salvatore J. Stolfo | 2011-01-30 | Masquerade attacks pose a grave security problem that is a consequence of identity theft. Detecting masqueraders is very hard. Prior work has focused on proling legitimate user behavior and detecting deviations from that normal behavior that could potentially signal an ongoing masquerade attack. Such approaches suffer from high false positive rates. Other work investigated the use of trap-based mechanisms as a means for detecting insider attacks in general. In this paper, we investigate the use of such trap-based mechanisms for the detection of masquerade at tacks. We evaluate the desirable properties of decoys deployed within a user's file space for detection.We investigate the trade-os between these properties through two user studies, and propose recommendations for effective masquerade detection using decoy documents based on findings from our user studies. | (pdf) |
Data Collection and Analysis for Masquerade Attack Detection: Challenges and Lessons Learned | Malek Ben Salem, Salvatore J. Stolfo | 2011-01-30 | Real-world large-scale data collection poses an important challenge in the security field. Insider and masquerader attack data collection poses even a greater challenge. Very few organizations acknowledge such breaches because of liability concerns and potential implications on their market value. This caused the scarcity of real-world data sets that could be used to study insider and masquerader attacks. In this paper, we present the design, technical, and procedural challenges encountered during our own masquerade data gathering project. We also share some lessons learned from this several-year project related to the Institutional Review Board process and to user study design. | (pdf) |
On Accelerators: Motivations, Costs, and Outlook | Simha Sethumadhavan | 2011-01-30 | Some notes on accelerators. | (pdf) |
Computational Cameras: Appraoches, Benefits and Limits | Shree K. Nayar | 2011-01-15 | A computational camera uses a combination of optics and software to produce images that cannot be taken with traditional cameras. In the last decade, computational imaging has emerged as a vibrant field of research. A wide variety of computational cameras have been demonstrated - some designed to achieve new imaging functionalities and others to reduce the complexity of traditional imaging. In this article, we describe how computational cameras have evolved and present a taxonomy for the technical approaches they use. We explore the benefits and limits of computational imaging, and describe how it is related to the adjacent and overlapping fields of digital imaging, computational photography and computational image sensors. | (pdf) |
Weighted Geometric Discrepancies and Numerical Integration on Reproducing Kernel Hilbert Spaces | Michael Gnewuch | 2010-12-22 | We extend the notion of L2-B-discrepancy introduced in [E. Novak, H. Wo´zniakowski, L2 discrepancy and multivariate integration, in: Analytic number theory. Essays in honour of Klaus Roth. W. W. L. Chen, W. T. Gowers, H. Halberstam, W. M. Schmidt, and R. C. Vaughan (Eds.), Cambridge University Press, Cambridge, 2009, 359 – 388] to what we want to call weighted geometric L2-discrepancy. This extended notion allows us to consider weights to moderate the importance of different groups of variables, and additionally volume measures different from the Lebesgue measure as well as classes of test sets different from measurable subsets of Euclidean spaces. We relate the weighted geometric L2-discrepancy to numerical integration defined over weighted reproducing kernel Hilbert spaces and settle in this way an open problem posed by Novak and Wo´zniakowski. Furthermore, we prove an upper bound for the numerical integration error for cubature formulas that use admissible sample points. The set of admissible sample points may actually be a subset of the integration domain of measure zero. We illustrate that particularly in infinite dimensional numerical integration it is crucial to distinguish between the whole integration domain and the set of those sample points that actually can be used by algorithms. | (pdf) (ps) |
A Comprehensive Survey of Voice over IP Security Research | Angelos D. Keromytis | 2010-12-22 | We present a comprehensive survey of Voice over IP security academic research, using a set of 245 publications forming a closed cross-citation set. We classify these papers according to an extended version of the VoIP Security Alliance (VoIPSA) Threat Taxonomy. Our goal is to provide a roadmap for researchers seeking to understand existing capabilities and to identify gaps in addressing the numerous threats and vulner- abilities present in VoIP systems. We discuss the implications of our findings with respect to vulnerabilities reported in a variety of VoIP products. We identify two specific problem areas (denial of service, and service abuse) as requiring significant more attention from the research community. We also find that the overwhelming majority of the surveyed work takes a black box view of VoIP systems that avoids examining their internal structure and implementation. Such an approach may miss the mark in terms of addressing the main sources of vulnerabilities, i.e., implementation bugs and misconfigurations. Finally, we argue for further work on understanding cross-protocol and cross-mechanism vulnerabilities (emergent properties), which are the byproduct of a highly complex system-of-systems and an indication of the issues in future large-scale systems. | (pdf) |
Modeling User Search Behavior for Masquerade Detection | Malek Ben Salem, Salvatore J. Stolfo | 2010-12-13 | Masquerade attacks are a common security problem that is a consequence of identity theft. Masquerade detection may serve as a means of building more secure and dependable systems that authenticate legitimate users by their behavior. Prior work has focused on user command modeling to identify abnormal behavior indicative of impersonation. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own file system well enough to search in a limited, targeted and unique fashion in order to find information germane to their current task. Masqueraders, on the other hand, will likely not know the file system and layout of another user�s desktop, and would likely search more extensively and broadly in a manner that is different than the victim user being impersonated. We devise a taxonomy of Windows applications and user commands that are used to abstract sequences of user actions and identify actions linked to search activities. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 1.1%, far better than prior published results. The limited set of features used for search behavior modeling also results in large performance gains over the same modeling techniques that use larger sets of features. | (pdf) |
The Tradeoffs of Societal Computing | Swapneel Sheth, Gail Kaiser | 2010-12-10 | As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, \emph{Societal Computing}, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole. | (pdf) |
NetServ: Early Prototype Experiences | Michael S. Kester, Eric Liu, Jae Woo Lee, Henning Schulzrinne | 2010-12-03 | This paper describes a work-in-progress to demonstrate the feasibility of integrating services in the Internet core. The project aims to reduce or eliminate so called ossification of the Internet. Here we discuss the recent contributions of two of the team members at Columbia University. We will describe experiences setting up a Juniper router, running packet forwarding tests, preparing for the GENI demo, and starting prototype 2 of NetServ. | (pdf) |
Towards using Cached Data Mining for Large Scale Recommender Systems | Swapneel Sheth, Gail Kaiser | 2010-11-01 | Recommender systems are becoming increasingly popular. As these systems become commonplace and the number of users increases, it will become important for these systems to be able to cope with a large and diverse set of users whose recommendation needs may be very different from each other. In particular, large scale recommender systems will need to ensure that users' requests for recommendations can be answered with low response times and high throughput. In this paper, we explore how to use caches and cached data mining to improve the performance of recommender systems by improving throughput and reducing response time for providing recommendations. We describe the structure of our cache, which can be viewed as a prefetch cache that prefetches all types of supported recommendations, and how it is used in our recommender system. We also describe the results of our simulation experiments to measure the efficacy of our cache. | (pdf) |
Automatic Detection of Defects in Applications without Test Oracles | Christian Murphy, Gail Kaiser | 2010-10-29 | In application domains that do not have a test oracle, such as machine learning and scientific computing, quality assurance is a challenge because it is difficult or impossible to know in advance what the correct output should be for general input. Previously, metamorphic testing has been shown to be a simple yet effective technique in detecting defects, even without an oracle. In metamorphic testing, the application's ``metamorphic properties'' are used to modify existing test case input to produce new test cases in such a manner that, when given the new input, the new output can easily be computed based on the original output. If the new output is not as expected, then a defect must exist. In practice, however, metamorphic testing can be a manually intensive technique for all but the simplest cases. The transformation of input data can be laborious for large data sets, and errors can occur in comparing the outputs when they are very complex. In this paper, we present a tool called Amsterdam that automates metamorphic testing by allowing the tester to easily set up and conduct metamorphic tests with little manual intervention, merely by specifying the properties to check, configuring the framework, and running the software. Additionally, we describe an approach called Heuristic Metamorphic Testing, which addresses issues related to false positives and non-determinism, and we present the results of new empirical studies that demonstrate the effectiveness of metamorphic testing techniques at detecting defects in real-world programs without test oracles. | (pdf) |
Bypassing Races in Live Applications with Execution Filters | Jingyue Wu, Heming Cui, Junfeng Yang | 2010-09-30 | Deployed multithreaded applications contain many races because these applications are difficult to write, test, and debug. Worse, the number of races in deployed applications may drastically increase due to the rise of multicore hardware and the immaturity of current race detectors. LOOM is a “live-workaround” system designed to quickly and safely bypass application races at runtime. LOOM provides a flexible and safe language for developers to write execution filters that explicitly synchronize code. It then uses an evacuation algorithm to safely install the filters to live applications to avoid races. It reduces its performance overhead using hybrid instrumen- tation that combines static and dynamic instrumentation. We evaluated LOOM on nine real races from a diverse set of six applications, including MySQL and Apache. Our results show that (1) LOOM can safely fix all evaluated races in a timely manner, thereby increasing application availability; (2) LOOM incurs little performance overhead; (3) LOOM scales well with the number of application threads; and (4) LOOM is easy to use. | (pdf) |
Baseline: Metrics for setting a baseline for web vulnerability scanners | Huning Dai, Michael Glass, Gail Kaiser | 2010-09-22 | As web scanners are becoming more popular because they are faster and cheaper than security consultants, the trend of relying on these scanners also brings a great hazard: users can choose a weak or outdated scanner and trust incomplete results. Therefore, benchmarks are created to both evaluate and compare the scanners. Unfortunately, most existing benchmarks suffer from various drawbacks, often by testing against inappropriate criteria that does not reflect the user's needs. To deal with this problem, we present an approach called Baseline that coaches the user in picking the minimal set of weaknesses (i.e., a baseline) that a qualified scanner should be able to detect and also helps the user evaluate the effectiveness and efficiency of the scanner in detecting those chosen weaknesses. Baseline's goal is not to serve as a generic ranking system for web vulnerability scanners, but instead to help users choose the most appropriate scanner for their specific needs. | (pdf) |
Tractability of the Fredholm problem of the second kind | Arthur G. Werschulz, Henryk Wozniakowski | 2010-09-21 | We study the tractability of computing $\varepsilon$-approximations of the Fredholm problem of the second kind: given $f\in F_d$ and $q\in Q_{2d}$, find $u\in L_2(I^d)$ satisfying \[ u(x) - \int_{I^d} q(x,y)u(y)\,dy = f(x) \qquad\forall\,x\in I^d=[0,1]^d. \] Here, $F_d$ and $Q_{2d}$ are spaces of $d$-variate right hand functions and $2d$-variate kernels that are continuously embedded in~$L_2(I^d)$ and~$L_2(I^{2d})$, respectively. We consider the worst case setting, measuring the approximation error for the solution $u$ in the $L_2(I^d)$-sense. We say that a problem is tractable if the minimal number of information operations of $f$ and $q$ needed to obtain an $\varepsilon$-approximation is sub-exponential in $\varepsilon^{-1}$ and~$d$. One information operation corresponds to the evaluation of one linear functional or one function value. The lack of sub-exponential behavior may be defined in various ways, and so we have various kinds of tractability. In particular, the problem is strongly polynomially tractable if the minimal number of information operations is bounded by a polynomial in $\varepsilon^{-1}$ for all~$d$. We show that tractability (of any kind whatsoever) for the Fredholm problem is equivalent to tractability of the $L_2$-approximation problems over the spaces of right-hand sides and kernel functions. So (for example) if both these approximation problems are strongly polynomially tractable, so is the Fredholm problem. In general, the upper bound provided by this proof is essentially non-constructive, since it involves an interpolatory algorithm that exactly solves the Fredholm problem (albeit for finite-rank approximations of~$f$ and~$q$). However, if linear functionals are permissible and that $F_d$ and~$Q_{2d}$ are tensor product spaces, we are able to surmount this obstacle; that is, we provide a fully-constructive algorithm that provides an approximation with nearly-optimal cost, i.e., one whose cost is within a factor $\ln\,\varepsilon^{-1}$ of being optimal. | (pdf) |
Trade-offs in Private Search | Vasilis Pappas, Mariana Raykova, Binh Vo, Steven M. Bellovin, Tal Malkin | 2010-09-17 | Encrypted search --- performing queries on protected data --- is a well researched problem. However, existing solutions have inherent inefficiency that raises questions of practicality. Here, we step back from the goal of achieving maximal privacy guarantees in an encrypted search scenario to consider efficiency as a priority. We propose a privacy framework for search that allows tuning and optimization of the trade-offs between privacy and efficiency. As an instantiation of the privacy framework we introduce a tunable search system based on the SADS scheme and provide detailed measurements demonstrating the trade-offs of the constructed system. We also analyze other existing encrypted search schemes with respect to this framework. We further propose a protocol that addresses the challenge of document content retrieval in a search setting with relaxed privacy requirements. | (pdf) |
Simple-VPN: Simple IPsec Configuration | Shreyas Srivatsan, Maritza Johnson, Steven M. Bellovin | 2010-07-12 | The IPsec protocol promised easy, ubiquitous encryption. That has never happened. For the most part, IPsec usage is confined to VPNs for road warriors, largely due to needless configuration complexity and incompatible implementations. We have designed a simple VPN configuration language that hides the unwanted complexities. Virtually no options are necessary or possible. The administrator specifies the absolute minimum of information: the authorized hosts, their operating systems, and a little about the network topology; everything else, including certificate generation, is automatic. Our implementation includes a multitarget compiler, which generates implementation-specific configuration files for three different platforms; others are easy to add. | (pdf) |
Infinite-Dimensional Integration on Weighted Hilbert Spaces | Michael Gnewuch | 2010-05-21 | We study the numerical integration problem for functions with infinitely many variables. The functions we want to integrate are from a reproducing kernel Hilbert space which is endowed with a weighted norm. We study the worst case $\epsilon$-complexity which is defined as the minimal cost among all algorithms whose worst case error over the Hilbert space unit ball is at most $\epsilon$. Here we assume that the cost of evaluating a function depends polynomially on the number of active variables. The infinite-dimensional integration problem is (polynomially) tractable if the $\epsilon$-complexity is bounded by a constant times a power of $1/\epsilon$. The smallest such power is called the exponent of tractability. First we study finite-order weights. We provide improved lower bounds for the exponent of tractability for general finite-order weights and improved upper bounds for three newly defined classes of finite-order weights. The constructive upper bounds are obtained by multilevel algorithms that use for each level quasi-Monte Carlo integration points whose projections onto specific sets of coordinates exhibit a small discrepancy. The newly defined finite-intersection weights model the situation where each group of variables interacts with at most $\rho$ other groups of variables, where $\rho$ is some fixed number. For these weights we obtain a sharp upper bound. This is the first class of weights for which the exact exponent of tractability is known for any possible decay of the weights and for any polynomial degree of the cost function. For the other two classes of finite-order weights our upper bounds are sharp if, e.g., the decay of the weights is fast or slow enough. We extend our analysis to the case of arbitrary weights. In particular, from our results for finite-order weights, we conclude a lower bound on the exponent of tractability for arbitrary weights and a constructive upper bound for product weights. Although we confine ourselves for simplicity to explicit upper bounds for four classes of weights, we stress that our multilevel algorithm together with our default choice of quasi-Monte Carlo points is applicable to any class of weights. | (pdf) (ps) |
Huning Dai's Master's Thesis | Huning Dai | 2010-05-13 | Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations and inputs together with a certain runtime environment. One approach to detecting these vulnerabilities is fuzz testing that feeds randomly generated inputs to the software and witnesses its failures. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants'' that, if violated, indicate a vulnerability. We discuss the approach and introduce a prototype framework called ConFu (CONfiguration FUzzing testing framework) implementation. We also present the results of case studies that demonstrate the approach's feasibility and evaluate its performance. | (pdf) |
Modeling User Search-Behavior for Masquerade Detection | Malek Ben Salem, Shlomo Hershkop, Salvatore J Stolfo | 2010-05-12 | Masquerade attacks are a common security problem that is a consequence of identity theft. Prior work has focused on user command modeling to identify abnormal behavior indicative of impersonation. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own le system well enough to search in a limited, targeted and unique fashion in order to nd information germane to their current task. Masqueraders, on the other hand, will likely not know the le system and layout of another user's desktop, and would likely search more extensively and broadly in a manner that is dierent than the victim user being impersonated. We extend prior research by devising taxonomies of UNIX commands and Windows applications that are used to abstract sequences of user commands and actions. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 0.13%, far better than prior published results. The limited set of features used for search behavior modeling also results in large performance gains over the same modeling techniques that use larger sets of features. | (pdf) |
The weHelp Reference Architecture for Community-Driven Recommender Systems | Swapneel Sheth, Nipun Arora, Christian Murphy, Gail Kaiser | 2010-05-11 | Recommender systems have become increasingly popular. Most research on recommender systems has focused on recommendation algorithms. There has been relatively little research, however, in the area of generalized system architectures for recommendation systems. In this paper, we introduce weHelp - a reference architecture for social recommender systems. Our architecture is designed to be application and domain agnostic, but we briefly discuss here how it applies to recommender systems for software engineering. | (pdf) |
Comparing Speed of Provider Data Entry: Electronic Versus Paper Methods | Kevin M. Jackson, Gail Kaiser, Lyndon Wong, Daniel Rabinowitz, Michael F. Chiang | 2010-05-11 | Electronic health record (EHR) systems have significant potential advantages over traditional paper-based systems, but they require that providers assume responsibility for data entry. One significant barrier to adoption of EHRs is the perception of slowed data-entry by providers. This study compares the speed of data-entry using computer-based templates vs. paper for a large eye clinic, using 10 subjects and 10 simulated clinical scenarios. Dataentry into the EHR was significantly slower (p<0.01) than traditional paper forms. | (pdf) |
Empirical Study of Concurrency Mutation Operators for Java | Leon Wu, Gail Kaiser | 2010-04-29 | Mutation testing is a white-box fault-based software testing technique that applies mutation operators to modify program source code or byte code in small ways and then runs these modified programs (i.e., mutants) against a test suite in order to measure its effectiveness and locate the weaknesses either in the test data or in the program that are seldom or never exposed during normal execution. In this paper, we describe our implementation of a generic mutation testing framework and the results of applying three sets of concurrency mutation operators on four example Java programs through empirical study and analysis. | (pdf) |
Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles | Christian Murphy | 2010-04-27 | Applications in the fields of scientific computing, simulation, optimization, machine learning, etc. are sometimes said to be "non-testable programs" because there is no reliable test oracle to indicate what the correct output should be for arbitrary input. In some cases, it may be impossible to know the program's correct output a priori; in other cases, the creation of an oracle may simply be too hard. These applications typically fall into a category of software that Weyuker describes as "Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known." The absence of a test oracle clearly presents a challenge when it comes to detecting subtle errors, faults, defects or anomalies in software in these domains. Without a test oracle, it is impossible to know in general what the expected output should be for a given input, but it may be possible to predict how changes to the input should effect changes in the output, and thus identify expected relations among a set of inputs and among the set of their respective outputs. This approach, introduced by Chen et al., is known as "metamorphic testing". In metamorphic testing, if test case input x produces an output f(x), the function's so-called "metamorphic properties" can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the expected output f(t(x)), based on the (already known) value of f(x). If the new output is as expected, it is not necessarily right, but any violation of the property indicates a defect. That is, though it may not be possible to know whether an output is correct, we can at least tell whether an output is incorrect. This thesis investigates three hypotheses. First, I claim that an automated approach to metamorphic testing will advance the state of the art in detecting defects in programs without test oracles, particularly in the domains of machine learning, simulation, and optimization. To demonstrate this, I describe a tool for test automation, and present the results of new empirical studies comparing the effectiveness of metamorphic testing to that of other techniques for testing applications that do not have an oracle. Second, I suggest that conducting function-level metamorphic testing in the context of a running application will reveal defects not found by metamorphic testing using system-level properties alone, and introduce and evaluate a new testing technique called Metamorphic Runtime Checking. Third, I hypothesize that it is feasible to continue this type of testing in the deployment environment (i.e., after the software is released), with minimal impact on the user, and describe a generalized approach called In Vivo Testing. Additionally, this thesis presents guidelines for identifying metamorphic properties, explains how metamorphic testing fits into the software development process, and discusses suggestions for both practitioners and researchers who need to test software without the help of a test oracle. | (pdf) |
Robust, Efficient, and Accurate Contact Algorithms | David Harmon | 2010-04-26 | Robust, efficient, and accurate contact response remains a challenging problem in the simulation of deformable materials. Contact models should robustly handle contact between geometry by preventing interpenetrations. This should be accomplished while respecting natural laws in order to maintain physical correctness. We simultaneously desire to achieve these criteria as efficiently as possible to minimize simulation runtimes. Many methods exist that partially achieve these properties, but none yet fully attain all three. This thesis investigates existing methodologies with respect to these attributes, and proposes a novel algorithm for the simulation of deformable materials that demonstrate them all. This new method is analyzed and optimized, paving the way for future work in this simplified but powerful manner of simulation. | (pdf) |
A Real-World Identity Management System with Master Secret Revocation | Elli Androulaki, Binh Vo, Steven Bellovin | 2010-04-21 | Cybersecurity mechanisms have become increasingly important as online and offline worlds converge. Strong authentication and accountability are key tools for dealing with online attacks, and we would like to realize them through a token-based, centralized identity management system. In this report, we present aprivacy-preserving group of protocols comprising a unique per user digital identity card, with which its owner is able to authenticate himself, prove possession of attributes, register himself to multiple online organizations (anonymously or not) and provide proof of membership. Unlike existing credential-based identity management systems, this card is revocable, i.e., its legal owner may invalidate it if physically lost, and still recover its content and registrations into a new credential. This card will protect an honest individual's anonymity when applicable as well as ensure his activity is known only to appropriate users. | (pdf) |
Quasi-Polynomial Tractability | Michael Gnewuch, Henryk Wozniakowski | 2010-04-09 | Tractability of multivariate problems has become nowadays a popular research subject. Polynomial tractability means that the solution of a d-variate problem can be solved to within $\varepsilon$ with polynomial cost in $\varepsilon^{-1}$ and d. Unfortunately, many multivariate problems are not polynomially tractable. This holds for all non-trivial unweighted linear tensor product problems. By an unweighted problem we mean the case when all variables and groups of variables play the same role. It seems natural to ask what is the ``smallest'' non-exponential function $T:[1,\infty)\times [1,\infty)\to[1,\infty)$ for which we have T-tractability of unweighted linear tensor product problems. That is, when the cost of a multivariate problem can be bounded by a multiple of a power of $T(\varepsilon^{-1},d)$. Under natural assumptions, it turns out that this function is $T^{qpol}(x,y):=\exp((1+\ln\,x)(1+\ln y))$ for all $x,y\in[1,\infty)$. The function $T^{qpol}$ goes to infinity faster than any polynomial although not ``much'' faster, and that is why we refer to $T^{qpol}$-tractability as quasi-polynomial tractability. The main purpose of this paper is to promote quasi-polynomial tractability especially for the study of unweighted multivariate problems. We do this for the worst case and randomized settings and for algorithms using arbitrary linear functionals or only function values. We prove relations between quasi-polynomial tractability in these two settings and for the two classes of algorithms. | (pdf) (ps) |
BotSwindler: Tamper Resistant Injection of Believable Decoys in VM-Based Hosts for Crimeware Detection | Brian M. Bowen, Pratap Prabhu, Vasileios P. Kemerlis, Stelios Sidiroglou-Douskos, Angelos D. Keromytis, Salvatore J. Stolfo | 2010-04-09 | We introduce BotSwindler, a bait injection system designed to delude and detect crimeware by forcing it to reveal itself during the exploitation of monitored information. Our implementation of BotSwindler relies upon an out-of-host software agent to drive user-like interactions in a virtual machine, seeking to convince malware residing within the guest OS that it has captured legitimate credentials. To aid in the accuracy and realism of the simulations, we introduce a low overhead approach, called virtual machine verification, for verifying whether the guest OS is in one of a predefined set of states. We provide empirical evidence to show that BotSwindler can be used to induce malware into performing observable actions and demonstrate how this approach is superior to that used in other tools. We present results from a user study to illustrate the believability of the simulations and show that financial bait information can be used to effectively detect compromises through experimentation with real credential-collecting malware. | (pdf) |
Privacy-Preserving, Taxable Bank Accounts | Elli Androulaki, Binh Vo, Steven Bellovin | 2010-04-07 | Current banking systems do not aim to protect user privacy. Purchases made from a single bank account can be linked to each other by many parties. This could be addressed in a straight-forward way by generating unlinkable credentials from a single master credential using Camenisch and Lysyanskaya's algorithm; however, if bank accounts are taxable, some report must be made to the tax authority about each account. Using unlinkable credentials, digital cash, and zero knowledge proofs of knowledge, we present a solution that prevents anyone, even the tax authority, from knowing which accounts belong to which users, or from being able to link any account to another or to purchases or deposits. | (pdf) |
CONFU: Configuration Fuzzing Testing Framework for Software Vulnerability Detection | Huning Dai, Christian Murphy, Gail Kaiser | 2010-02-19 | Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations and inputs together with a certain runtime environment. One approach to detecting these vulnerabilities is fuzz testing. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants'' that, if violated, indicate a vulnerability. We discuss the approach and introduce a prototype framework called ConFu (CONfiguration FUzzing testing framework) for implementation. We also present the results of case studies that demonstrate the approach's feasibility and evaluate its performance. | (pdf) |
Empirical Evaluation of Approaches to Testing Applications without Test Oracles | Christian Murphy, Gail Kaiser | 2010-02-05 | Software testing of applications in fields like scientific omputation, simulation, machine learning, etc. is particularly challenging because many applications in these domains have no reliable "test oracle" to indicate whether the program's output is correct when given arbitrary input. A common approach to testing such applications has been to use a "pseudo-oracle", in which multiple independently-developed implementations of an algorithm process an input and the results are compared. Other approaches include the use of program invariants, formal specification languages, trace and log file analysis, and metamorphic testing. In this paper, we present the results of two empirical studies in which we compare the effectiveness of some of these approaches, including metamorphic testing, pseudo-oracles, and runtime assertion checking. We also analyze the results in terms of the software development process, and discuss suggestions for practitioners and researchers who need to test software without a test oracle. | (pdf) |
Automatic Detection of Previously-Unseen Application States for Deployment Environment Testing and Analysis | Christian Murphy, Moses Vaughan, Waseem Ilahi, Gail Kaiser | 2010-01-19 | For large, complex software systems, it is typically impossible in terms of time and cost to reliably test the application in all possible execution states and configurations before releasing it into production. One proposed way of addressing this problem has been to continue testing and analysis of the application in the field, after it has been deployed. The theory behind this "perpetual testing" approach is that over time, defects will reveal themselves given that multiple instances of the same application may be run globally with different configurations, in different environments, under different patterns of usage, and in different system states. A practical limitation of many automated approaches to deployment environment testing and analysis is the potentially high performance overhead incurred by the necessary instrumentation. However, it may be possible to reduce this overhead by selecting test cases and performing analysis only in previously-unseen application states, thus reducing the number of redundant tests and analyses that are run. Solutions for fault detection, model checking, security testing, and fault localization in deployed software may all benefit from a technique that ignores application states that have already been tested or explored. In this paper, we apply such a technique to a testing methodology called "In Vivo Testing", which conducts tests in deployed applications, and present a solution that ensures that tests are only executed in states that the application has not previously encountered. In addition to discussing our implementation, we present the results of an empirical study that demonstrates its effectiveness, and explain how the new approach can be generalized to assist other automated testing and analysis techniques. | (pdf) |
Testing and Validating Machine Learning Classifiers by Metamorphic Testing | Xiaoyuan Xie, Joshua W. K. Ho, Christian Murphy, Gail Kaiser, Baowen Xu, Tsong Yueh Chen | 2010-01-11 | Machine Learning algorithms have provided important core functionality to support solutions in many scientific computing applications - such as computational biology, computational linguistics, and others. However, it is difficult to test such applications because often there is no "test oracle" to indicate what the correct output should be for arbitrary input. To help address the quality of scientific computing software, in this paper we present a technique for testing the implementations of machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called "metamorphic testing", which has been shown to be effective in such cases. Also presented is a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has very high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficient to test for the correctness of a supervised classification program. Metamorphic testing is strongly recommended as a complementary approach. Finally we discuss how our findings can be used in other areas of computational science and engineering. | (pdf) |
ONEChat: Enabling Group Chat and Messaging in Opportunistic Networks | Heming Cui, Suman Srinivasan, Henning Schulzrinne | 2010-01-04 | Opportunistic networks, which are wireless network "islands" formed when transient and highly mobile nodes meet for a short period of time, are becoming commonplace as wireless devices become more and more popular. It is thus imperative to develop communication tools and applications that work well in opportunistic networks. In particular, group chat and instant messaging applications are particularly lacking for such opportunistic networks today. In this paper, we present ONEChat, a group chat and instant messaging program that works in such opportunistic networks. ONEChat uses message multicasting on top of service discovery protocols in order to support group chat and reduce bandwidth consumption in opportunistic networks. ONEChat does not require any pre-configuration, a fixed network infrastructure or a client-server architecture in order to operate. In addition, it supports features such as group chat, private rooms, line-by-line or character-by-character messaging, file transfer, etc. We also present our quantitative analysis of ONEChat, which we believe indicates that the ONEChat architecture is an efficient group collaboration platform for opportunistic networks. | (pdf) |
Exploiting Local Logic Structures to Optimize Multi-Core SoC Floorplanning | Cheng-Hong Li, Sampada Sonalkar, Luca P. Carloni | 2009-12-10 | We present a throughput-driven partitioning and a throughput-preserving merging algorithm for the high-level physical synthesis of latency-insensitive (LI) systems. These two algorithms are integrated along with a published floorplanner in a new iterative physical synthesis flow to optimize system throughput and reduce area occupation. The synthesis flow iterates a floorplanning-partitioning-floorplanning-merging sequence of operations to improve the system topology and the physical locations of cores. The partitioning algorithm performs bottom-up clustering of the internal logic of a given IP core to divide it into smaller ones, each of which has no combinational path from input to output and thus is legal for LI-interface encapsulation. Applying this algorithm to cores on critical feedback loops optimizes their length and in turn enables throughput optimization via the subsequent floorplanning. The merging algorithm reduces the number of cores on non-critical loops, lowering the overall area taken by LI interfaces without hurting the system throughput. Experimental results on a large system-on-chip design show a 16.7% speedup in system throughput and a 2.1% reduction in area occupation. | (pdf) |
ConFu: Configuration Fuzzing Framework for Software Vulnerability Detection | Huning Dai, Gail E. Kaiser | 2009-12-08 | Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations of the software and certain inputs together with its particular runtime environment. One approach to detecting these vulnerabilities is fuzz testing, which feeds a range of randomly modified inputs to a software application while monitoring it for failures. However, typical fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, in this proposal we present a new testing methodology called Configuration Fuzzing. Configuration Fuzzing is a technique whereby the configuration of the running application is mutated at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants" that, if violated, indicate a vulnerability; however, the fuzzing is performed in a duplicated copy of the original process, so that it does not affect the state of the running application. Configuration Fuzzing uses a covering array algorithm when fuzzing the configuration which guarantees a certain degree of coverage of the configuration space in the lifetime of the program-under-test. In addition, Configuration Fuzzing tests that are run after the software is released ensure representative real-world user inputs to test with. In addition to discussing the approach and describing a prototype framework for implementation, we also present the results of case studies to prove the approach's feasibility and evaluate its performance. In this thesis, we will continue developing the framework called ConFu (CONfiguration FUzzi ng framework) that supports the generation of test functions, parallel sandboxed execution and vulnerability detection. Given the initial ConFu, we will optimize the way that configurations get mutated, define more security invariants and conduct additional empirical studies of ConFu's effectiveness in detecting vulnerabilities. At the conclusion of this work, we want to prove that ConFu is efficient and effective in detecting common vulnerabilities and tests executed by ConFu can ensure reasonable degree of coverage of both the configuration and user input space in the lifetime of the software. | (pdf) |
Record and Transplay: Partial Checkpointing for Replay Debugging | Dinesh Subhraveti, Jason Nieh | 2009-11-21 | Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. Toward addressing this problem, we present Transplay, a system that captures application software bugs as they occur in production and deterministically reproduces them in a completely different environment, potentially running a different operating system, where the application, its binaries and other support data do not exist. Transplay introduces partial checkpointing, a new mechanism that provides two key properties. It efficiently captures the minimal state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries or the original execution environment. Transplay integrates with existing debuggers to provide facilities such as breakpoints and single-stepping to allow the user to examine the contents of variables and other program state at each source line of the application’s replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with server applications such as the Apache web server show that Transplay can be used in production with modest recording overhead. | (pdf) |
On TCP-based SIP Server Overload Control | Charles Shen, Henning Schulzrinne | 2009-11-10 | SIP server overload management has attracted interest recently as SIP becomes the core signaling protocol for Next Generation Networks. Yet virtually all existing SIP overload control work is focused on SIP-over-UDP, despite the fact that TCP is increasingly seen as the more viable choice of SIP transport. This paper answers the following questions: is the existing TCP flow control capable of handling the SIP overload problem? If not, why and how can we make it work? We provide a comprehensive explanation of the default SIP-over-TCP overload behavior through server instrumentation. We also propose and implement novel but simple overload control algorithms without any kernel or protocol level modification. Experimental evaluation shows that with our mechanism the overload performance improves from its original zero throughput to nearly full capacity. Our work also leads to the important high level insight that the traditional notion of TCP flow control alone is incapable of managing overload for time-critical session based applications, which would be applicable not only to SIP, but also to a wide range of other common applications such as database servers. | (pdf) |
PBS: Signaling architecture for network traffic authorization | Se Gi Hong, Henning Schulzrinne, Swen Weiland | 2009-10-27 | We present a signaling architecture for network traffic authorization, Permission-Based Sending (PBS). This architecture aims to prevent Denial-of-Service (DoS) attacks and other forms of unauthorized traffic. Towards this goal, PBS takes a hybrid approach: a proactive approach of explicit permissions and a reactive approach of monitoring and countering attacks. On-path signaling is used to configure the permission state stored in routers for a data flow. The signaling approach enables easy installation and management of the permission state, and its use of soft-state improves robustness of the system. For secure permission state setup, PBS provides security for signaling in two ways: signaling messages are encrypted end-to-end using public key encryption and TLS provides hop-by-hop encryption of signaling paths. In addition, PBS uses IPsec for data packet authentication. Our analysis and performance evaluation show that PBS is an effective and scalable solution for preventing various kinds of attack scenarios, including Byzantine attacks. | (pdf) |
A Secure and Privacy-Preserving Targeted Ad System | Elli Androulaki, Steven Bellovin | 2009-10-22 | Thanks to its low product-promotion cost and its efficiency, targeted online advertising has become very popular. Unfortunately, being profile-based, online advertising methods violate consumers' privacy, which has engendered resistance to the ads. However, protecting privacy through anonymity seems to encourage click-fraud. In this paper, we define consumer's privacy and present a privacy-preserving, targeted ad system (PPOAd) which is resistant towards click fraud. Our scheme is structured to provide financial incentives to to all entities involved. | (pdf) |
Rank-Aware Subspace Clutering for Structured Datasets | Julia Stoyanovich, Sihem Amer-Yahia | 2009-10-21 | In online applications such as Yahoo! Personals and Trulia.com users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in the form of a ranked list. Top results in ranked lists are typically homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with different characteristics. An alternative to ranking is to group matches on common attribute values (e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths). However, not all groups will be of interest to the user given the ranking criteria. We argue here that neither single-list ranking nor attribute-based grouping is adequate for effective exploration of ranked datasets. We formalize rank-aware clustering and develop a novel rank-aware bottom-up subspace clustering algorithm. We evaluate the performance of our algorithm over large datasets from a leading online dating site, and present an experimental evaluation of its effectiveness. | (pdf) |
Metamorphic Runtime Checking of Non-Testable Programs | Christian Murphy, Gail Kaiser | 2009-10-20 | Challenges arise in assuring the quality of applications that do not have test oracles, i.e., for which it is impossible to know what the correct output should be for arbitrary input. Metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of these "non-testable programs". In metamorphic testing, if test input x produces output f(x), specified "metamorphic properties" are used to create a transformation function t, which can be applied to the input to produce t(x); this transformation then allows the output f(t(x)) to be predicted based on the already-known value of f(x). If the output is not as expected, then a defect must exist. Previously we investigated the effectiveness of testing based on metamorphic properties of the entire application. Here, we improve upon that work by presenting a new technique called Metamorphic Runtime Checking, a testing approach that automatically conducts metamorphic testing of individual functions during the program's execution. We also describe an implementation framework called Columbus, and discuss the results of empirical studies that demonstrate that checking the metamorphic properties of individual functions increases the effectiveness of the approach in detecting defects, with minimal performance impact. | (pdf) |
Configuration Fuzzing for Software Vulnerability Detection | Huning Dai, Christian Murphy, Gail Kaiser | 2009-10-07 | Many software security vulnerabilities only reveal themselves under certain conditions, i.e., particular configurations of the software together with its particular runtime environment. One approach to detecting these vulnerabilities is fuzz testing, which feeds a range of randomly modified inputs to a software application while monitoring it for failures. However, fuzz testing makes no guarantees regarding the syntactic and semantic validity of the input, or of how much of the input space will be explored. To address these problems, in this paper we present a new testing methodology called configuration fuzzing. Configuration fuzzing is a technique whereby the configuration of the running application is randomly modified at certain execution points, in order to check for vulnerabilities that only arise in certain conditions. As the application runs in the deployment environment, this testing technique continuously fuzzes the configuration and checks "security invariants" that, if violated, indicate a vulnerability; however, the fuzzing is performed in a duplicated copy of the original process, so that it does not affect the state of the running application. In addition to discussing the approach and describing a prototype framework for implementation, we also present the results of a case study to demonstrate the approach’s efficiency. | (pdf) |
Curtailed Online Boosting | Raphael Pelossof, Michael Jones | 2009-09-28 | The purpose of this work is to lower the average number of features that are evaluated by an online algorithm. This is achieved by merging Sequential Analysis and Online Learning. Many online algorithms use the example's margin to decide whether the model should be updated. Usually, the algorithm's model is updated when the margin is smaller than a certain threshold. The evaluation of the margin for each example requires the algorithm to evaluate all the model's features. Sequential Analysis allows us to early stop the computation of the margin when uninformative examples are encountered. It is desirable to save computation on uninformative examples since they will have very little impact on the final model. We show the successful speedup of Online Boosting while maintaining accuracy on a synthetic and the MNIST data sets. | (pdf) |
Using a Model Checker to Determine Worst-case Execution Time | Sungjun Kim, Hiren D. Patel, Stephen A. Edwards | 2009-09-03 | Hard real-time systems use worst-case execution time (WCET) estimates to ensure that timing requirements are met. The typical approach for obtaining WCET estimates is to employ static program analysis methods. While these approaches provide WCET bounds, they struggle to analyze programs with loops whose iteration counts depend on input data. Such programs mandate user-guided annotations. We propose a hybrid approach by augmenting static program analysis with model-checking to analyze such programs and derive the loop bounds automatically. In addition, we use model-checking to guarantee repeatable timing behaviors from segments of program code. Our target platform is a precision timed architecture: a SPARC-based architecture promising predictable and repeatable timing behaviors. We use CBMC and illustrate our approach on Euclidean greatest common divisor algorithm (for WCET analysis) and a VGA controller (for repeatable timing validation). | (pdf) |
Smashing the Stack with Hydra: The Many Heads of Advanced Polymorphic Shellcode | Pratap V. Prabhu, Yingbo Song, Salvatore J. Stolfo | 2009-08-31 | Recent work on the analysis of polymorphic shellcode engines suggests that modern obfuscation methods would soon eliminate the usefulness of signature-based network intrusion detection methods and supports growing views that the new generation of shellcode cannot be accurately and efficiently represented by the string signatures which current IDS and AV scanners rely upon. In this paper, we expand on this area of study by demonstrating never before seen concepts in advanced shellcode polymorphism with a proof-of-concept engine which we call Hydra. Hydra distinguishes itself by integrating an array of obfuscation techniques, such as recursive NOP sleds and multi-layer ciphering into one system while offering multiple improvements upon existing strategies. We also introduce never before seen attack methods such as byte-splicing statistical mimicry, safe-returns with forking shellcode and syscall-time-locking. In total, Hydra simultaneously attacks signature, statistical, disassembly, behavioral and emulation-based sensors, as well as frustrates offline forensics. This engine was developed to present an updated view of the frontier of modern polymorphic shellcode and provide an effective tool for evaluation of IDS systems, Cyber test ranges and other related security technologies. | (pdf) |
On the Learnability of Monotone Functions | Homin K. Lee | 2009-08-13 | A longstanding lacuna in the field of computational learning theory is the learnability of succinctly representable monotone Boolean functions, i.e., functions that preserve the given order of the input. This thesis makes significant progress towards understanding both the possibilities and the limitations of learning various classes of monotone functions by carefully considering the complexity measures used to evaluate them. We show that Boolean functions computed by polynomial-size monotone circuits are hard to learn assuming the existence of one-way functions. Having shown the hardness of learning general polynomial-size monotone circuits, we show that the class of Boolean functions computed by polynomial-size depth-3 monotone circuits are hard to learn using statistical queries. As a counterpoint, we give a statistical query learning algorithm that can learn random polynomial-size depth-2 monotone circuits (i.e., monotone DNF formulas). As a preliminary step towards a fully polynomial-time, proper learning algorithm for learning polynomial-size monotone decision trees, we also show the relationship between the average depth of a monotone decision tree, its average sensitivity, and its variance. Finally, we return to monotone DNF formulas, and we show that they are teachable (a different model of learning) in the average case. We also show that non-monotone DNF formulas, juntas, and sparse GF2 formulas are teachable in the average case. | (pdf) |
Mouth-To-Ear Latency in Popular VoIP Clients | Chitra Agastya, Dan Mechanic, Neha Kothari | 2009-08-06 | Most popular instant messaging clients are now offering Voiceover- IP (VoIP) technology. The many options running on similar platforms, implementing common audio codecs and encryption algorithms offers the opportunity to identify what factors affect call quality. We measure call quality objectively based on mouthto- ear latency. Based on our analysis we determine that the mouth-to-ear latency can be influenced by operating system (process priority and interrupt handling), the VoIP client implementation and network quality. | (pdf) |
Apiary: Easy-to-use Desktop Application Fault Containment on Commodity Operating Systems | Shaya Potter, Jason Nieh | 2009-08-05 | Desktop computers are often compromised by the interaction of untrusted data and buggy software. To address this problem, we present Apiary, a system that provides transparent application fault containment while retaining the ease of use of a traditional integrated desktop environment. Apiary accomplishes this with three key mechanisms. It isolates applications in containers that integrate in a controlled manner at the display and file system. It introduces ephemeral containers that are quickly instantiated for single application execution and then removed, to prevent any exploit that occurs from persisting and to protect user privacy. It introduces the virtual layered file system to make instantiating containers fast and space efficient, and to make managing many containers no more complex than having a single traditional desktop. We have implemented Apiary on Linux without any application or operating system kernel changes. Our results from running real applications, known exploits, and a 24-person user study show that Apiary has modest performance overhead, is effective in limiting the damage from real vulnerabilities to enable quick recovery, and is as easy to use as a traditional desktop while improving desktop computer security and privacy. | (pdf) |
Source Prefix Filtering in ROFL | Hang Zhao, Maritza Johnson, Chi-Kin Chau, Steven M. Bellovin | 2009-07-26 | Traditional firewalls have the ability to allow or block traffic based on source address as well as destination address and port number. Our original ROFL scheme implements firewalling by layering it on top of routing; however, the original proposal focused just on destination address and port number. Doing route selection based in part on source addresses is a form of policy routing, which has started to receive increased amounts of attention. In this paper, we extend the original ROFL (ROuting as the Firewall Layer) scheme by including source prefix constraints in route announcement. We present algorithms for route propagation and packet forwarding, and demonstrate the correctness of these algorithms using rigorous proofs. The new scheme not only accomplishes the complete set of filtering functionality provided by traditional firewalls, but also introduces a new direction for policy routing. | (pdf) |
Semantic Ranking and Result Visualization for Life Sciences Publications | Julia Stoyanovich, William Mee, Kenneth A. Ross | 2009-06-23 | An ever-increasing amount of data and semantic knowledge in the domain of life sciences is bringing about new data management challenges. In this paper we focus on adding the semantic dimension to literature search, a central task in scientific research. We focus our attention on PubMed, the most significant bibliographic source in life sciences, and explore ways to use high-quality semantic annotations from the MeSH vocabulary to rank search results. We start by developing several families of ranking functions that relate a search query to a document's annotations. We then propose an efficient adaptive ranking mechanism for each of the families. We also describe a two-dimensional Skyline-based visualization that can be used in conjunction with the ranking to further improve the user's interaction with the system, and demonstrate how such Skylines can be computed adaptively and efficiently. Finally, we present a user study that demonstrates the effectiveness of our ranking. We use the full PubMed dataset and the complete MeSH ontology in our experimental evaluation. | (pdf) |
A Software Checking Framework Using Distributed Model Checking and Checkpoint/Resume of Virtualized PrOcess Domains | Nageswar Keetha, Leon Wu, Gail Kaiser, Junfeng Yang | 2009-06-18 | Complexity and heterogeneity of the deployed software applications often result in a wide range of dynamic states at runtime. The corner cases of software failure during execution often slip through the traditional software checking. If the software checking infrastructure supports the transparent checkpoint and resume of the live application states, the checking system can preserve and replay the live states in which the software failures occur. We introduce a novel software checking framework that enables application states including program behaviors and execution contexts to be cloned and resumed on a computing cloud. It employs (1) EXPLODE’s model checking engine for a lightweight and general purpose software checking (2) ZAP system for faster, low overhead and transparent checkpoint and resume mechanism through virtualized PODs (PrOcess Domains), which is a collection of host-independent processes, and (3) scalable and distributed checking infrastructure based on Distributed EXPLODE. Efficient and portable checkpoint/resume and replay mechanism employed in this framework enables scalable software checking in order to improve the reliability of software products. The evaluation we conducted showed its feasibility, efficiency and applicability. | (pdf) |
Serving Niche Video-on-Demand Content in a Managed P2P Environment | Eli Brosh, Chitra Agastya, John Morales, Vishal Misra, Dan Rubenstein | 2009-06-17 | A limitation of existing P2P VoD services is their inability to support efficient streamed access to niche content that has relatively small demand. This limitation stems from the poor performance of P2P when the number of peers sharing the content is small. In this paper, we propose a new provider-managed P2P VoD framework for efficient delivery of niche content based on two principles: reserving small portions of peers' storage and upload resources, as well as using novel, weighed caching techniques. We demonstrate through analytical analysis, simulations, and experiments on planetlab that our architecture can provide high streaming quality for niche content. In particular, we show that our architecture increases the catalog size by up to $40\%$ compared to standard P2P VoD systems, and that a weighted cache policy can reduce the startup delay for niche content by a factor of more than three. | (pdf) |
Flexible Filters: Load Balancing through Backpressure for Stream Programs | Rebecca Collins, Luca Carloni | 2009-06-16 | Stream processing is a promising paradigm for programming multi-core systems for high-performance embedded applications. We propose flexible filters as a technique that combines static mapping of the stream program tasks with dynamic load balancing of their execution. The goal is to improve the system-level processing throughput of the program when it is executed on a distributed-memory multi-core system as well as the local (core-level) memory utilization. Our technique is distributed and scalable because it is based on point-to-point handshake signals exchanged between neighboring cores. Load balancing with flexible filters can be applied to stream applications that present large dynamic variations in the computational load of their tasks and the dimension of the stream data tokens. In order to demonstrate the practicality of our technique, we present the performance improvements for the case study of a JPEG encoder running on the IBM Cell multi-core processor. | (pdf) |
Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating | Gabriela Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo | 2009-06-11 | The deployment and use of Anomaly Detection (AD) sensors often requires the intervention of a human expert to manually calibrate and optimize their performance. Depending on the site and the type of traffic it receives, the operators might have to provide recent and sanitized training data sets, the characteristics of expected traffic (i.e. outlier ratio), and exceptions or even expected future modifications of system’s behavior. In this paper, we study the potential performance issues that stem from fully automating the AD sensors’ day-to-day maintenance and calibration. Our goal is to remove the dependence on human operator using an unlabeled, and thus potentially dirty, sample of incoming traffic. To that end, we propose to enhance the training phase of AD sensors with a self-calibration phase, leading to the automatic determination of the optimal AD parameters. We show how this novel calibration phase can be employed in conjunction with previously proposed methods for training data sanitization resulting in a fully automated AD maintenance cycle. Our approach is completely agnostic to the underlying AD sensor algorithm. Furthermore, the self-calibration can be applied in an online fashion to ensure that the resulting AD models reflect changes in the system’s behavior which would otherwise render the sensor’s internal state inconsistent. We verify the validity of our approach through a series of experiments where we compare the manually obtained optimal parameters with the ones computed from the self-calibration phase. Modeling traffic from two different sources, the fully automated calibration shows a 7.08% reduction in detection rate and a 0.06% increase in false positives, in the worst case, when compared to the optimal selection of parameters. Finally, our adaptive models outperform the statically generated ones retaining the gains in performance from the sanitization process over time. | (pdf) |
Masquerade Attack Detection Using a Search-Behavior Modeling Approach | Malek Ben Salem, Salvatore J. Stolfo | 2009-06-10 | Masquerade attacks are unfortunately a familiar security problem that is a consequence of identity theft. Detecting masqueraders is very hard. Prior work has focused on user command modeling to identify abnormal behavior indicative of impersonation. This paper extends prior work by presenting one-class Hellinger distance-based and one-class SVM modeling techniques that use a set of novel features to reveal user intent. The specic objective is to model user search proles and detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own le system well enough to search in a limited, targeted and unique fashion in order to nd information germane to their current task. Masqueraders, on the other hand, will likely not know the le system and layout of another user's desktop, and would likely search more extensively and broadly in a manner that is dierent than the victim user being impersonated. We extend prior research that uses UNIX command sequences issued by users as the audit source by relying upon an abstraction of commands. We devise taxonomies of UNIX commands and Windows applications that are used to abstract sequences of user commands and actions. We also gathered our own normal and masquerader data sets captured in a Windows environment for evaluation. The datasets are publicly available for other researchers who wish to study masquerade attack rather than author identication as in much of the prior reported work. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 0.1%, far better than prior published results. The limited set of features used for search behavior modeling also results in huge performance gains over the same modeling techniques that use larger sets of features. | (pdf) |
Self-monitoring Monitors | Salvatore Stolfo, Isaac Greenbaum, Simha Sethumadhavan | 2009-06-03 | Many different monitoring systems have been created to identify system state conditions to detect or prevent a myriad of deliberate attacks, or arbitrary faults inherent in any complex system. Monitoring systems are also vulnerable to attack. A stealthy attacker can simply turn off or disable these monitoring systems without being detected; he would thus be able to perpetrate the very attacks that these systems were designed to stop. For example, many examples of virus attacks against antivirus scanners have appeared in the wild. In this paper, we present a novel technique to “monitor the monitors” in such a way that (a) unauthorized shutdowns of critical monitors are detected with high probability, (b) authorized shutdowns raise no alarm, and (c) the proper shutdown sequence for authorized shutdowns cannot be inferred from reading memory. The techniques proposed to prevent unauthorized shut down (turning off) of monitoring systems was inspired by the duality of safety technology devised to prevent unauthorized discharge (turning on) of nuclear weapons. | (pdf) |
Thwarting Attacks in Malcode-Bearing Documents by Altering Data Sector Values | Wei-Jen Li, Salvatore J. Stolfo | 2009-06-01 | Embedding malcode within documents provides a convenient means of attacking systems. Such attacks can be very targeted and difficult to detect to stop due to the multitude of document-exchange vectors and the vulnerabilities in modern document processing applications. Detecting malcode embedded in a document is difficult owing to the complexity of modern document formats that provide ample opportunity to embed code in a myriad of ways. We focus on Microsoft Word documents as malcode carriers as a case study in this paper. To detect stealthy embedded malcode in documents, we develop an arbitrary data transformation technique that changes the value of data segments in documents in such a way as to purposely damage any hidden malcode that may be embedded in those sections. Consequently, the embedded malcode will not only fail but also introduce a system exception that would be easily detected. The method is intended to be applied in a safe sandbox, the transformation is reversible after testing a document, and does not require any learning phase. The method depends upon knowledge of the structure of the document binary format to parse a document and identify the specific sectors to which the method can be safely applied for malcode detection. The method can be implemented in MS Word as a security feature to enhance the safety of Word documents. | (pdf) |
weHelp: A Reference Architecture for Social Recommender Systems | Swapneel Sheth, Nipun Arora, Christian Murphy, Gail Kaiser | 2009-05-15 | Recommender systems have become increasingly popular. Most of the research on recommender systems has focused on recommendation algorithms. There has been relatively little research, however, in the area of generalized system architectures for recommendation systems. In this paper, we introduce \textit{weHelp}: a reference architecture for social recommender systems - systems where recommendations are derived automatically from the aggregate of logged activities conducted by the system's users. Our architecture is designed to be application and domain agnostic. We feel that a good reference architecture will make designing a recommendation system easier; in particular, weHelp aims to provide a practical design template to help developers design their own well-modularized systems. | (pdf) |
The Zodiac Policy Subsystem: a Policy-Based Management System for a High-Security MANET | Yuu-Heng Cheng, Scott Alexander, Alex Poylisher, Mariana Raykova, Steven M. Bellovin | 2009-05-07 | Zodiac (Zero Outage Dynamic Intrinsically Assurable Communities) is an implementation of a high-security MANET, resistant to multiple types of attacks, including Byzantine faults. The Zodiac architecture poses a set of unique system security, performance, and usability requirements to its policy-based management system (PBMS). In this paper, we identify theses requirements, and present the design and implementation of the Zodiac Policy Subsystem (ZPS), which allows administrators to securely specify, distribute and evaluate network control and system security policies to customize Zodiac behaviors. ZPS uses Keynote language for specifying all authorization policies with simple extension to support obligation policies. | (pdf) |
The Impact of TLS on SIP Server Performance | Charles Shen, Erich Nahum, Henning Schulzrinne, Charles Wright | 2009-05-05 | This report studies the performance impact of using TLS as a transport protocol for SIP servers. We evaluate the cost of TLS experimentally using a testbed with OpenSIPS, OpenSSL, and Linux running on an Intel-based server. We analyze TLS costs using application, library, and kernel profiling, and use the profiles to illustrate when and how different costs are incurred, such as bulk data encryption, public key encryption, private key decryption, and MAC-based verification. We show that using TLS can reduce performance by up to a factor of 20 compared to the typical case of SIP over UDP. The primary factor in determining performance is whether and how TLS connection establishment is performed, due to the heavy costs of RSA operations used for session negotiation. This depends both on how the SIP proxy is deployed (e.g., as an inbound or outbound proxy) and what TLS options are used (e.g., mutual authentication, session reuse). The cost of symmetric key operations such as AES or 3DES, in contrast, tends to be small. Network operators deploying SIP over TLS should attempt to maximize the persistence of secure connections, and will need to assess the server resources required. To aid them, we provide a measurement-driven cost model for use in provisioning SIP servers using TLS. Our cost model predicts performance within 15 percent on average. | (pdf) |
COMPASS: A Community-driven Parallelization Advisor for Sequential Software | Simha Sethumadhavan, Gail E. Kaiser | 2009-04-22 | The widespread adoption of multicores has renewed the emphasis on the use of parallelism to improve performance. The present and growing diversity in hardware architectures and software environments, however, continues to pose difficulties in the effective use of parallelism thus delaying a quick and smooth transition to the concurrency era. In this paper, we describe the research being conducted at Columbia University on a system called COMPASS that aims to simplify this transition by providing advice to programmers while they reengineer their code for parallelism. The advice proffered to the programmer is based on the wisdom collected from programmers who have already parallelized some similar code. The utility of COMPASS rests, not only on its ability to collect the wisdom unintrusively but also on its ability to automatically seek, find and synthesize this wisdom into advice that is tailored to the task at hand, i.e., the code the user is considering parallelizing and the environment in which the optimized program is planned to execute. COMPASS provides a platform and an extensible framework for sharing human expertise about code parallelization – widely, and on diverse hardware and software. By leveraging the “wisdom of crowds” model [26], which has been conjectured to scale exponentially and which has successfully worked for wikis, COMPASS aims to enable rapid propagation of knowledge about code parallelization in the context of the actual parallelization reengineering, and thus continue to extend the benefits of Moore’s law scaling to science and society. | (pdf) |
Have I Met You Before? Using Cross-Media Relations to Reduce SPIT | Kumiko Ono, Henning Schulzrinne | 2009-04-14 | Most legitimate calls are from persons or organizations with strong social ties such as friends. Some legitimate calls, however, are from those with weak social ties such as a restaurant the callee booked a table on-line. Since a callee's contact list usually contains only the addresses of persons or organizations with strong social ties, filtering out unsolicited calls using the contact list is prone to false positives. To reduce these false positives, we first analyzed call logs and identified that legitimate calls are initiated from persons or organizations with weak social ties through transactions over the web or email exchanges. This paper proposes two approaches to label incoming calls by using cross-media relations to previous contact mechanisms which initiate the calls. One approach is that potential callers offer the callee their contact addresses which might be used in future correspondence. Another is that a callee provides potential callers with weakly-secret information that the callers should use in future correspondence in order to identify them as someone the callee has contacted before through other means. Depending on previous contact mechanisms, the callers use either customized contact addresses or message identifiers. The latter approach enables a callee to label incoming calls even without caller identifiers. Reducing false positives during filtering using our proposed approaches will contribute to the reduction in SPIT (SPam over Internet Telephony). | (pdf) |
F3ildCrypt: End-to-End Protection of Sensitive Information in Web Services | Matthew Burnside, Angelos D. Keromytis | 2009-03-30 | The frequency and severity of recent intrusions involving data theft and leakages has shown that online users' trust, voluntary or not, in the ability of third parties to protect their sensitive data is often unfounded. Data may be exposed anywhere along a corporation's web pipeline, from the outward-facing web servers to the back-end databases. Additionally, in service-oriented architectures (SOAs), data may also be exposed as they transit between SOAs. For example, credit card numbers may be leaked during transmission to or handling by transaction-clearing intermediaries. We present F3ildCrypt, a system that provides end-to-end protection of data across a web pipeline and between SOAs. Sensitive data are protected from their origin (the user's browser) to their legitimate final destination. To that end, F3ildCrypt exploits browser scripting to enable application- and merchant-aware handling of sensitive data. Such techniques have traditionally been considered a security risk; to our knowledge, this is one of the first uses of web scripting that enhances overall security. F3ildCrypt uses proxy re-encryption to re-target messages as they enter and cross SOA boundaries, and uses XACML, the XML-based access control language, to define protection policies. Our approach scales well in the number of public key operations required for web clients and does not reveal proprietary details of the logical enterprise network (because of the application of proxy re-encryption). We evaluate F3ildCrypt and show an additional cost of 40 to 150 ms when making sensitive transactions from the web browser, and a processing rate of 100 to 140 XML fields/second on the server. We believe such costs to be a reasonable tradeoff for increased sensitive-data confidentiality. | (pdf) |
Baiting Inside Attackers using Decoy Documents | Brian M. Bowen, Shlomo Hershkop, Angelos D. Keromytis, Salvatore J. Stolfo | 2009-03-30 | The insider threat remains one of the most vexing problems in computer security. A number of approaches have been proposed to detect nefarious insider actions including user modeling and profiling techniques, policy and access enforcement techniques, and misuse detection. In this work we propose trap-based defense mechanisms for the case where insiders attempt to exfiltrate and use sensitive information. Our goal is to confuse and confound the attacker requiring far more effort to identify real information from bogus information and to provide a means of detecting when an inside attacker attempts to exploit sensitive information. ``Decoy Documents" are automatically generated and stored on a file system with the aim of enticing a malicious insider to open and review the contents of the documents. The decoy documents contain several different types of bogus credentials that when used, trigger an alert. We also embed ``stealthy beacons" inside the documents that cause a signal to be emitted to a server indicating when and where the particular decoy was opened. We evaluate decoy documents on honeypots penetrated by attackers demonstrating the feasibility of the method. | (pdf) |
Metamorphic Runtime Checking of Non-Testable Programs | Christian Murphy, Gail Kaiser | 2009-03-16 | Challenges arise in assuring the quality of applications that do not have test oracles, i.e., for which it is difficult or impossible to know that the correct output should be for arbitrary input. Recently, metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of these so-called "non-testable programs". In metamorphic testing, existing test case input is modified to produce new test cases in such a manner that, when given the new input, the function should produce an output that can easily be computed based on the original output. That is, if input x produces output f(x), then we create input x' such that we can predict f(x') based on f(x); if the application does not produce the expected output, then a defect must exist, and either f(x) or f(x') (or both) is wrong. Previously we have presented an approach called "Automated Metamorphic System Testing", in which metamorphic testing is conducted automatically as the program executes. In the approach, metamorphic properties of the entire application are specified, and then checked after execution is complete. Here, we improve upon that work by presenting a technique in which the metamorphic properties of individual functions are used, allowing for the specification of more complex properties and enabling finer-grained runtime checking. Our goal is to demonstrate that such an approach will be more effective than one based on specifying metamorphic properties at the system level, and is also feasible for use in the deployment environment. This technique, called Metamorphic Runtime Checking, is a system testing approach in which the metamorphic properties of individual functions are automatically checked during the program's execution. The tester is able to easily specify the functions' properties so that metamorphic testing can be conducted in a running application, allowing the tests to execute using real input data and in the context of real system states, without affecting those states. We also describe an implementation framework called Columbus, and present the results of empirical studies that demonstrate that checking the metamorphic properties of individual functions increases the effectiveness of the approach in detecting defects, with minimal performance impact. | (pdf) |
An Anonymous Credit Card System | Elli Androulaki, Steven Bellovin | 2009-02-27 | Credit cards have many important benets; however, these same benefits often carry with them many privacy concerns. In particular, the need for users to be able to monitor their own transactions, as well as bank's need to justify its payment requests from cardholders, entitle the latter to maintain a detailed log of all transactions its credit card customers were involved in. A bank can thus build a profile of each cardholder even without the latter's consent. In this technical report, we present a practical and accountable anonymous credit system based on ecash , with a privacy preserving mechanism for error correction and expense-reporting. | (pdf) |
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue | Agustin Gravano | 2009-02-23 | As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as 'okay' or 'alright' that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination. | (pdf) |
Automatic System Testing of Programs without Test Oracles | Christian Murphy, Kuang Shen, Gail Kaiser | 2009-01-30 | Metamorphic testing has been shown to be a simple yet effective technique in addressing the quality assurance of applications that do not have test oracles, i.e., for which it is difficult or impossible to know what the correct output should be for arbitrary input. In metamorphic testing, existing test case input is modified to produce new test cases in such a manner that, when given the new input, the application should produce an output that can be easily be computed based on the original output. That is, if input x produces output f (x), then we create input x' such that we can predict f (x') based on f(x); if the application does not produce the expected output, then a defect must exist, and either f (x) or f (x') (or both) is wrong. In practice, however, metamorphic testing can be a manually intensive technique for all but the simplest cases. The transformation of input data can be laborious for large data sets, or practically impossible for input that is not in human-readable format. Similarly, comparing the outputs can be error-prone for large result sets, especially when slight variations in the results are not actually indicative of errors (i.e., are false positives), for instance when there is non-determinism in the application and multiple outputs can be considered correct. In this paper, we present an approach called Automated Metamorphic System Testing. This involves the automation of metamorphic testing at the system level by checking that the metamorphic properties of the entire application hold after its execution. The tester is able to easily set up and conduct metamorphic tests with little manual intervention, and testing can continue in the field with minimal impact on the user. Additionally, we present an approach called Heuristic Metamorphic Testing which seeks to reduce false positives and address some cases of non-determinism. We also describe an implementation framework called Amsterdam, and present the results of empirical studies in which we demonstrate the effectiveness of the technique on real-world programs without test oracles. | (pdf) |
Example application under PRET environment -- Programming a MultiMediaCard | Devesh Dedhia | 2009-01-22 | PRET philosophy proposes the temporal characteristics to be made predictable. However for various applications the PRET processor will have to interact with a non predictable environment. In this paper an example of one such environment, an MultiMediaCard (MMC) is considered. This paper illustrates a method to make the response of the MMC predictable. | (pdf) |
Improving the Quality of Computational Science Software by Using Metamorphic Relations to Test Machine Learning Applications | Xiaoyuan Xie, Joshua Ho, Christian Murphy, Gail Kaiser, Baowen Xu, T.Y. Chen | 2009-01-19 | Many applications in the field of scientific computing - such as computational biology, computational linguistics, and others - depend on Machine Learning algorithms to provide important core functionality to support solutions in the particular problem domains. However, it is difficult to test such applications because often there is no "test oracle" to indicate what the correct output should be for arbitrary input. To help address the quality of scientific computing software, in this paper we present a technique for testing the implementations of machine learning classification algorithms on which such scientific computing software depends. Our technique is based on an approach called "metamorphic testing", which has been shown to be effective in such cases. In addition to presenting our technique, we describe a case study we performed on a real-world machine learning application framework, and discuss how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also discuss how our findings can be of use to other areas of computational science and engineering. | (pdf) |
Multi-perspective Evaluation of Self-Healing Systems Using Simple Probabilistic Models | Rean Griffith, Gail Kaiser, Javier Alonso Lopez | 2009-01-19 | Quantifying the efficacy of self-healing systems is a challenging but important task, which has implications for increasing designer, operator and end-user confidence in these systems. During design system architects benefit from tools and techniques that enhance their understanding of the system, allowing them to reason about the tradeoffs of proposed or existing self-healing mechanisms and the overall effectiveness of the system as a result of different mechanism-compositions. At deployment time, system integrators and operators need to understand how the selfhealing mechanisms work and how their operation impacts the system’s reliability, availability and serviceability (RAS) in order to cope with any limitations of these mechanisms when the system is placed into production. In this paper we construct an evaluation framework for selfhealing systems around simple, yet powerful, probabilistic models that capture the behavior of the system’s selfhealing mechanisms from multiple perspectives (designer, operator, and end-user). We combine these analytical models with runtime fault-injection to study the operation of VM-Rejuv – a virtual machine based rejuvenation scheme for web-application servers. We use the results from the fault-injection experiments and model-analysis to reason about the efficacy of VM-Rejuv, its limitations and strategies for managing/mitigating these limitations in systemdeployments. Whereas we use VM-Rejuv as the subject of our evaluation in this paper, our main contribution is a practical evaluation approach that can be generalized to other self-healing systems. | (pdf) |
A Case Study in Distributed Deployment of Embedded Software for Camera NetworksA Case Study in Distributed Deployment of Embedded Software for Camera Networks | Francesco Leonardi, Alessandro Pinto, Luca P. Carloni | 2009-01-15 | We present an embedded software application for the real-time estimation of building occupancy using a network of video cameras. We analyze a series of alternative decompositions of the main application tasks and profile each of them by running the corresponding embedded software on three different processors. Based on the profiling measures, we build various alternative embedded platforms by combining different embedded processors, memory modules and network interfaces. In particular, we consider the choice of two possible network technologies: ARCnet and Ethernet. After deriving an analytical model of the network costs, we use it to complete an exploration of the design space as we scale the number of video cameras in an hypothetical building. We compare our results with those obtained for two real buildings of different characteristics. We conclude discussing the results of our case study in the broader context of other camera-network applications. | (pdf) |
Improving Virtual Appliance Management through Virtual Layered File Systems | Shaya Potter, Jason Nieh | 2009-01-15 | Managing many computers is difficult. Recent virtualization trends exacerbate this problem by making it easy to create and deploy multiple virtual appliances per physical machine, each of which can be configured with different applications and utilities. This results in a huge scaling problem for large organizations as management overhead grows linearly with the number of appliances. To address this problem, we present Strata, a system that introduces the Virtual Layered File System (VLFS) and integrates it with virtual appliances to simplify system management. Unlike a traditional file system, which is a monolithic entity, a VLFS is a collection of individual software layers composed together to provide the traditional file system view. Individual layers are maintained in a central repository and shared across all VLFSs that use them. Layer changes and upgrades only need to be done once in the repository and are then automatically propagated to all VLFSs, resulting in management overhead independent of the number of virtual appliances. We have implemented a Strata Linux prototype without any application or operating system kernel changes. Using this prototype, we demonstrate how Strata enables fast system provisioning, simplifies system maintenanc and upgrades, speeds system recovery from security exploits, and incurs only modest performance overhead. | (pdf) |
Retrocomputing on an FPGA: Reconstructing an 80's-Era Home Computer with Programmable Logic | Stephen A. Edwards | 2009-01-12 | The author reconstructs a computer of his childhood, an Apple II+. | (pdf) (ps) |
A MPEG Decoder in SHIM | Keerti Joshi, Delvin Kellebrew | 2008-12-23 | The emergence of world-wide standards for video compression has created a demand for design tools and simulation resources to support algorithm research and new product development. Because of the need for subjective study in the design of video compression algorithms it is essential that flexible yet computationally efficient tools be developed. For this project, we plan to implement a MPEG standard using the SHIM programming language. The SHIM is a software/hardware integration language whose aim is to provide communication between hardware and software while providing deterministic concurrency. The focus of this project will be to emphasize the efficiency of the SHIM language in embedded applications as compared to other existing implementations. | (pdf) |
Using Metamorphic Testing at Runtime to Detect Defects in Applications without Test Oracles | Christian Murphy | 2008-12-22 | First, we will present an approach called Automated Metamorphic System Testing. This will involve automating system-level metamorphic testing by treating the application as a black box and checking that the metamorphic properties of the entire application hold after execution. This will allow for metamorphic testing to be conducted in the production environment without affecting the user, and will not require the tester to have access to the source code. The tests do not require an oracle upon their creation; rather, the metamorphic properties act as built-in test oracles. We will also introduce an implementation framework called Amsterdam. Second, we will present a new type of testing called Metamorphic Runtime Checking. This involves the execution of metamorphic tests from within the application, i.e., the application launches its own tests, within its current context. The tests execute within the application’s current state, and in particular check a function’s metamorphic properties. We will also present a system called Columbus that supports the execution of the Metamorphic Runtime Checking from within the context of the running application. Like Amsterdam, it will conduct the tests with acceptable performance overhead, and will ensure that the execution of the tests does not affect the state of the original application process from the users’ perspective; however, the implementation of Columbus will be more challenging in that it will require more sophisticated mechanisms for conducting the tests without pre-empting the rest of the application, and for comparing the results which may conceivably be in different processes or environments. Third, we will describe a set of metamorphic testing guidelines that can be followed to assist in the formulation and specification of metamorphic properties that can be used with the above approaches. These will categorize the different types of properties exhibited by many applications in the domain of machine learning and data mining in particular (as a result of the types of applications we will investigate), but we will demonstrate that they are also generalizable to other domains as well. This set of guidelines will also correlate to the different types of defects that we expect the approaches will be able to find. | (pdf) |
Static Deadlock Detection in SHIM with an Automata Type Checking System | Dave Aaron Smith, Nalini Vasudevan, Stephen Edwards | 2008-12-21 | With the advent of multicores, concurrent programming languages are become more prevelant. Data Races and Deadlocks are two major problems with concurrent programs. SHIM is a concurrent programming language that guarantees absence of data races through its semantics. However, a program written in SHIM can deadlock if not carefully written. In this paper, we present a divide-and-merge technique to statically detect deadlocks in SHIM. SHIM is asynchronous, but we can greatly reduce its state space without loosing precision because of its semantics. | (pdf) (ps) |
SHIM Optimization: Elimination Of Unstructured Loops | Ravindra Babu Ganapathi, Stephen A. Edwards | 2008-12-21 | The SHIM compiler for the IBM CELL processor generates distinct code for the two processing units, PPE (Power Processor Element) and SPE (Synergistic Processor Elements). The SPE is specialized to give high throughput with computation intensive application operating on dense data. We propose mechanism to tune the code generated by the SHIM compiler to enable optimizing compilers to generate structured code. Although, the discussion here is related to optimizing SHIM IR (Intermediate Representation) code, the techniques discussed here can be incorporated into compilers to convert unstructured loops consisting of goto statements to structured loops such as while and do-while statements to ease back end compiler optimizations. Our research based SHIM compiler takes the code written in SHIM language and performs various static analysis and finally transforms it into C code. This generated code is compiled to binary using standard compilers available for IBM cell processor such as GCC and IBM XL compiler. | (pdf) (ps) |
uClinux on the Altera DE2 | David Lariviere, Stephen A. Edwards | 2008-12-21 | This technical report provides an introduction on how to compile and run uClinux and third-party programs to be run on a Nios II CPU core instantiated within the FPGA on the Altera DE2. It is based on experiences working with the OS and development board while teaching the Embedded Systems course during the springs of 2007 and 2008. | (pdf) |
Memory Issues in PRET Machines | Nishant R. Shah | 2008-12-21 | In a processor design the premier issues with memory are (1) main memory allocation and (2) interprocess communication. These two mainly affect the performance of the memory system. The goal of this paper is to formulate a deterministic model for memory systems of PRET, taking into account all the intertwined parallelism of modern memory chips. Studying existing memory models is necessary to understand the implications of these factors to realize a perfectly time predictable memory system. | (pdf) |
Analysis of Clocks in X10 Programs (Extended) | Nalini Vasudevan, Olivier Tardieu, Julian Dolby, Stephen A. Edwards | 2008-12-19 | Clocks are a mechanism for providing synchronization barriers in concurrent programming languages. They are usually implemented using primitive communication mechanisms and thus spare the programmer from reasoning about low-level implementation details such as remote procedure calls and error conditions. Clocks provide flexibility, but programs often use them in specific ways that do not require their full implementation. In this paper, we describe a tool that mitigates the overhead of general-purpose clocks by statically analyzing how programs use them and choosing optimized implementations when available. We tackle the clock implementation in the standard library of the X10 programming language---a parallel, distributed object-oriented language. We report our findings for a small set of analyses and benchmarks. Our tool only adds a few seconds to analysis time, making it practical to use as part of a compilation chain. | (pdf) |
Classifying High-Dimensional Text and Web Data using Very Short Patterns | Hassan Malik, John Kender | 2008-12-17 | In this paper, we propose the "Democratic Classifier", a simple, democracy-inspired pattern-based classification algorithm that uses very short patterns for classification, and does not rely on the minimum support threshold. Borrowing ideas from democracy, our training phase allows each training instance to vote for an equal number of candidate size-2 patterns. Similar to the usual democratic election process, where voters select candidates by considering their qualifications, prior contributions at the constituency and territory levels, as well as their own perception about candidates, the training instances select patterns by effectively balancing between local, class, and global significance of patterns. In addition, we respect "each voter's opinion" by simultaneously adding shared patterns to all applicable classes, and then apply a novel power law based weighing scheme, instead of making binary decisions on these patterns. Results of experiments performed on 121 common text and web datasets show that our algorithm almost always outperforms state of the art classification algorithms, without requiring any dataset-specific parameter tuning. On 100 real-life, noisy, web datasets, the average absolute classification accuracy improvement was as great as 10% over SVM, Harmony, C4.5 and KNN. Also, our algorithm ran about 3.5 times faster than the fastest existing pattern-based classification algorithm. | (pdf) |
Distributed eXplode: A High-Performance Model Checking Engine to Scale Up State-Space Coverage | Nageswar Keetha, Leon Wu, Gail Kaiser, Junfeng Yang | 2008-12-10 | Model checking the state space (all possible behaviors) of software systems is a promising technique for verification and validation. Bugs such as security vulnerabilities, file storage issues, deadlocks and data races can occur anywhere in the state space and are often triggered by corner cases; therefore, it becomes important to explore and model check all runtime choices. However, large and complex software systems generate huge numbers of behaviors leading to ‘state explosion’. eXplode is a lightweight, deterministic and depth-bound model checker that explores all dynamic choices at runtime. Given an application-specific test-harness, eXplode performs state search in a serialized fashion - which limits its scalability and performance. This paper proposes a distributed eXplode engine that uses multiple host machines concurrently in order to achieve more state space coverage in less time, and is very helpful to scale up the software verification and validation effort. Test results show that Distributed eXplode runs several times faster and covers more state space than the standalone eXplode. | (pdf) |
Generalized Assorted Pixel Camera: Post-Capture Control of Resolution, Dynamic Range and Spectrum | Fumihito Yasuma, Tomoo Mitsunaga, Daisuke Iso, Shree K. Nayar | 2008-11-24 | We propose the concept of a generalized assorted pixel (GAP) camera, which enables the user to capture a single image of a scene and, after the fact, control the trade-off between spatial resolution, dynamic range and spectral detail. The GAP camera uses a complex array (or mosaic) of color filters. A major problem with using such an array is that the captured image is severely under-sampled for at least some of the filter types. This leads to reconstructed images with strong aliasing. We make three contributions in this paper: (a) We present a comprehensive optimization method to arrive at the spatial and spectral layout of the color filter array of a GAP camera. (b) We develop a novel anti-aliasing algorithm for reconstructing the under-sampled channels of the image with minimal aliasing. (c) We demonstrate how the user can capture a single image and then control the trade-off of spatial resolution to generate a variety of images, including monochrome, high dynamic range (HDR) monochrome, RGB, HDR RGB, and multispectral images. Finally, the performance of our GAP camera has been verified using extensive simulations that use multispectral images of real world scenes. A large database of these multispectral images is being made publicly available for use by the research community. | (pdf) |
Measurements of Multicast Service Discovery in a Campus Wireless Network | Se Gi Hong, Suman Srinivasan, Henning Schulzrinne | 2008-11-14 | Applications utilizing multicast service discovery protocols, such as iTunes, have become increasingly popular. However, multicast service discovery protocols are considered to generate network traffic overhead, especially in a wireless network. Therefore, it becomes important to evaluate the traffic and overhead caused by multicast service discovery packets in real-world networks. We measure and analyze the traffic of one of the mostly deployed multicast service discovery protocols, multicast DNS (mDNS) service discovery, in a campus wireless network that forms a single multicast domain of large users. We also analyze different service discovery models in terms of packet overhead and service discovery delay under different network sizes and churn rates. Our measurement shows that mDNS traffic consumes about 13 percent of the total bandwidth. | (pdf) |
Improving the Dependability of Machine Learning Applications | Christian Murphy, Gail Kaiser | 2008-10-10 | As machine learning (ML) applications become prevalent in various aspects of everyday life, their dependability takes on increasing importance. It is challenging to test such applications, however, because they are intended to learn properties of data sets where the correct answers are not already known. Our work is not concerned with testing how well an ML algorithm learns, but rather seeks to ensure that an application using the algorithm implements the specification correctly and fulfills the users' expectations. These are critical to ensuring the application's dependability. This paper presents three approaches to testing these types of applications. In the first, we create a set of limited test cases for which it is, in fact, possible to predict what the correct output should be. In the second approach, we use random testing to generate large data sets according to parameterization based on the application’s equivalence classes. Our third approach is based on metamorphic testing, in which properties of the application are exploited to define transformation functions on the input, such that the new output can easily be predicted based on the original output. Here we discuss these approaches, and our findings from testing the dependability of three real-world ML applications. | (pdf) |
Opportunistic Use of Client Repeaters to Improve Performance of WLANs | Victor Bahl, Ranveer Chandra, Patrick Pak-Ching Lee, Vishal Misra, Jitendra Padhye, Dan Rubenstein | 2008-10-09 | Currently deployed IEEE 802.11WLANs (Wi-Fi networks) share access point (AP) bandwidth on a per-packet basis. However, the various stations communicating with the AP often have different signal qualities, resulting in different transmission rates. This induces a phenomenon known as the rate anomaly problem, in which stations with lower signal quality transmit at lower rates and consume a significant majority of airtime, thereby dramatically reducing the throughput of stations transmitting at high rates. We propose a practical, deployable system, called SoftRepeater, in which stations cooperatively address the rate anomaly problem. Specifically, higher-rate Wi-Fi stations opportunistically transformthemselves into repeaters for stations with low data-rates when transmitting to/from the AP. The key challenge is to determine when it is beneficial to enable the repeater functionality. In this paper, we propose an initiation protocol that ensures that repeater functionality is enabled only when appropriate. Also, our system can run directly on top of today’s 802.11 infrastructure networks. We also describe a novel, zero-overhead network coding scheme that further alleviates undesirable symptoms of the rate anomaly problem. We evaluate our system using simulation and testbed implementation, and find that SoftRepeater can improve cumulative throughput by up to 200%. | (pdf) |
The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models | Rean Griffith | 2008-10-06 | Renewed interest in developing computing systems that meet additional non-functional requirements such as reliability, high availability and ease-of-management/self-management (serviceability) has fueled research into developing systems that exhibit enhanced reliability, availability and serviceability (RAS) capabilities. This research focus on enhancing the RAS capabilities of computing systems impacts not only the legacy/existing systems we have today, but also has implications for the design and development of next generation (self- managing/self-*) systems, which are expected to meet these non-functional requirements with minimal human intervention. To reason about the RAS capabilities of the systems of today or the self-* systems of tomorrow, there are three evaluation-related challenges to address. First, developing (or identifying) practical fault-injection tools that can be used to study the failure behavior of computing systems and exercise any (remediation) mechanisms the system has available for mitigating or resolving problems. Second, identifying techniques that can be used to quantify RAS deficiencies in computing systems and reason about the efficacy of individual or combined RAS-enhancing mechanisms (at design-time or after system deployment). Third, developing an evaluation methodology that can be used to objectively compare systems based on the (expected or actual) benefits of RAS-enhancing mechanisms. This thesis addresses these three challenges by introducing the 7U Evaluation Methodology, a complementary approach to traditional performance-centric evaluations that identifies criteria for comparing and analyzing existing (or yet-to-be-added) RAS-enhancing mechanisms, is able to evaluate and reason about combinations of mechanisms, exposes under-performing mechanisms and highlights the lack of mechanisms in a rigorous, objective and quantitative manner. The development of the 7U Evaluation Methodology is based on the following three hypotheses. First, that runtime adaptation provides a platform for implementing efficient and flexible fault-injection tools capable of in-situ and in-vivo interactions with computing systems. Second, that mathematical models such as Markov chains, Markov reward networks and Control theory models can successfully be used to create simple, reusable templates for describing specific failure scenarios and scoring the system’s responses, i.e., studying the failure-behavior of systems, and the various facets of its remediation mechanisms and their impact on system operation. Third, that combining practical fault-injection tools with mathematical modeling techniques based on Markov Chains, Markov Reward Networks and Control Theory can be used to develop a benchmarking methodology for evaluating and comparing the reliability, availability and serviceability (RAS) characteristics of computing systems. This thesis demonstrates how the 7U Evaluation Method can be used to evaluate the RAS capabilities of real-world computing systems and in so doing makes three contributions. First, a suite of runtime fault-injection tools (Kheiron tools) able to work in a variety of execution environments is developed. Second, analytical tools that can be used to construct mathematical models (RAS models) to evaluate and quantify RAS capabilities using appropriate metrics are discussed. Finally, the results and insights gained from conducting fault-injection experiments on real-world systems and modeling the system responses (or lack thereof) using RAS models are presented. In conducting 7U Evaluations of real-world systems, this thesis highlights the similarities and differences between traditional performance-oriented evaluations and RAS-oriented evaluations and outlines a general framework for conducting RAS evaluations. | (pdf) |
Quality Assurance of Software Applications using the In Vivo Testing Approach | Christian Murphy, Gail Kaiser, Ian Vo, Matt Chu | 2008-10-02 | Software products released into the field typically have some number of residual defects that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, incorrect assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system; these defects may also be due to application states that were not considered during lab testing, or corrupted states that could arise due to a security violation. One approach to this problem is to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present a testing methodology we call in vivo testing, in which tests are continuously executed in the deployment environment. We also describe a type of test we call in vivo tests that are specifically designed for use with such an approach: these tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state from the perspective of the end-user. We discuss the approach and the prototype testing framework for Java applications called Invite. We also provide the results of case studies that demonstrate Invite’s effectiveness and efficiency. | (pdf) |
Using JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracles | Christian Murphy, Kuang Shen, Gail Kaiser | 2008-10-02 | It is challenging to test applications and functions for which the correct output for arbitrary input cannot be known in advance, e.g. some computational science or machine learning applications. In the absence of a test oracle, one approach to testing these applications is to use metamorphic testing: existing test case input is modified to produce new test cases in such a manner that, when given the new input, the application should produce an output that can be easily be computed based on the original output. That is, if input x produces output f(x), then we create input x' such that we can predict f(x') based on f(x); if the application or function does not produce the expected output, then a defect must exist, and either f(x) or f(x') (or both) is wrong. By using metamorphic testing, we are able to provide built-in "pseudo-oracles" for these so-called "nontestable programs" that have no test oracles. In this paper, we describe an approach in which a function's metamorphic properties are specified using an extension to the Java Modeling Language (JML), a behavioral interface specification language that is used to support the "design by contract" paradigm in Java applications. Our implementation, called Corduroy, pre-processes these specifications and generates test code that can be executed using JML runtime assertion checking, for ensuring that the specifications hold during program execution. In addition to presenting our approach and implementation, we also describe our findings from case studies in which we apply our technique to applications without test oracles. | (pdf) |
VoIP-based Air Traffic Controller Training | Supreeth Subramanya, Xiaotao Wu, Henning Schulzrinne | 2008-09-26 | Extending VoIP beyond the Internet telephony, we propose a case study of applying the technology outside of its intended domain, to solve a real-world problem. This work is an attempt to understand an analog hardwired communication system of the U.S. Federal Aviation Administration (FAA), and effectively translate it into a generic, standards-based VoIP system that runs on their existing data network. We develop insights into the air traffic training and weigh on the design choices for building a soft real-time data communication system. We also share our real-world deployment and maintenance experiences, as the FAA Academy has been successfully using this VoIP system in five training rooms since 2006 to train the future air traffic controllers of the U.S. and the world. | (pdf) |
A Better Approach than Carrier-Grade-NAT | Olaf Maennel, Randy Bush, Luca Cittadini, Steven M. Bellovin | 2008-09-24 | We are facing the exhaustion of newly assignable IPv4 addresses. Unfortunately, IPv6 is not yet deployed widely enough to fully replace IPv4, and it is unrealistic to expect that this is going to change before we run out of IPv4 addresses. Letting hosts seamlessly communicate in an IPv4-world without assigning a unique globally routable IPv4 address to each of them is a challenging problem, for which many solutions have been proposed. Some prominent ones target towards carrier-grade-NATs (CGN), which we feel is a bad idea. Instead, we propose using specialized NATs at the edge that treat some of the port number bits as part of the address. | (pdf) |
Spectrogram: A Mixture-of-Markov-Chains Model for Anomaly Detection in Web Traffic | Yingbo Song, Angelos D. Keromytis, Salvatore J. Stolfo | 2008-09-15 | We present Spectrogram, a mixture of Markov-chains sensor for anomaly detection (AD) against web-layer (port 80) code-injection attacks such as PHP file inclusion, SQL-injection, cross-site-scripting, as well as memory layer buffer overflows. Port 80 is the gateway to many application level services and a large array of attacks are channeled through this vector, servers cannot easily firewall this port. Signature-based sensors are effective in filtering known exploits but can’t detect 0-day vulnerabilities or deal with polymorphism and statistical AD approaches have mostly been limited to network layer, protocol-agnostic modeling, weakening their effectiveness. N -gram based modeling approaches have recently demonstrated success but the ill-posed nature of modeling large grams have thus far prevented exploration of higher order statistical models. In this paper, we provide a solution to this problem based on a factorization into Markov-chains and aim to model higher order structure as well as content for web requests. Spectrogram is implemented in a protocol-aware, passive, network-situated, but CGI-layered, AD architecture and we show in our evaluation that this model demonstrates significant detection results on an array of real world web-layer attacks, achieving at least 97% detection rates on all but one dataset and comparing favorably against other AD sensors. | (pdf) |
Retina: Helping Students and Instructors Based on Observed Programming Activities | Christian Murphy, Gail Kaiser, Kristin Loveland, Sahar Hasan | 2008-08-28 | It is difficult for instructors of CS1 and CS2 courses to get accurate answers to such critical questions as "how long are students spending on programming assignments?", or "what sorts of errors are they making?" At the same time, students often have no idea of where they stand with respect to the rest of the class in terms of time spent on an assignment or the number or types of errors that they encounter. In this paper, we present a tool called Retina, which collects information about students' programming activities, and then provides useful and informative reports to both students and instructors based on the aggregation of that data. Retina can also make real-time recommendations to students, in order to help them quickly address some of the errors they make. In addition to describing Retina and its features, we also present some of our initial ndings during two trials of the tool in a real classroom setting. | (pdf) |
Approximating a Global Passive Adversary Against Tor | Sambuddho Chakravarty, Angelos Stavrou, Angelos D. Keromytis | 2008-08-18 | We present a novel, practical, and effective mechanism for identifying the IP address of Tor clients. We approximate an almost-global passive adversary (GPA) capable of eavesdropping anywhere in the network by using LinkWidth, a novel bandwidth-estimation technique. LinkWidth allows network edge-attached entities to estimate the available bandwidth in an arbitrary Internet link without a cooperating peer host, router, or ISP. By modulating the bandwidth of an anonymous connection (e.g., when the destination server or its router is under our control), we can observe these fluctuations as they propagate through the Tor network and the Internet to the end-user’s IP address. Our technique exploits one of the design criteria for Tor (trading off GPA-resistance for improved latency/bandwidth over MIXes) by allowing well-provisioned (in terms of bandwidth) adversaries to effectively become GPAs. Although timing-based attacks have been demonstrated against non-timing-preserving anonymity networks, they have depended either on a global passive adversary or on the compromise of a substantial number of Tor nodes. Our technique does not require compromise of any Tor nodes or collaboration of the end-server (for some scenarios). We demonstrate the effectiveness of our approach in tracking the IP address of Tor users in a series of experiments. Even for an underprovisioned adversary with only two network vantage points, we can identify the end user (IP address) in many cases. | (pdf) |
Deux: Autonomic Testing System for Operating System Upgrades | Leon Wu, Gail Kaiser, Jason Nieh, Christian Murphy | 2008-08-15 | Operating system upgrades and patches sometimes break applications that worked fine on the older version. We present an autonomic approach to testing of OS updates while minimizing downtime, usable without local regression suites or IT expertise. Deux utilizes a dual-layer virtual machine architecture, with lightweight application process checkpoint and resume across OS versions, enabling simultaneous execution of the same applications on both OS versions in different VMs. Inputs provided by ordinary users to the production old version are also fed to the new version. The old OS acts as a pseudo-oracle for the update, and application state is automatically re-cloned to continue testing after any output discrepancies (intercepted at system call level) - all transparently to users. If all differences are deemed inconsequential, then the VM roles are switched with the application state already in place. Our empirical evaluation with both LAMP and standalone applications demonstrates Deux’s efficiency and effectiveness. | (pdf) |
Predictive Models of Gene Regulation | Anshul Kundaje | 2008-08-15 | The regulation of gene expression plays a central role in the development and function of a living cell. A complex network of interacting regulatory proteins bind specific sequence elements in the genome to control the amount and timing of gene expression. The abundance of genome-scale datasets from different organisms provides an opportunity to accelerate our understanding of the mechanisms of gene regulation. Developing computational tools to infer gene regulation programs from high-throughput genomic data is one of the central problems in computational biology. In this thesis, we present a new predictive modeling framework for studying gene regulation. We formulate the problem of learning regulatory programs as a binary classification task: to accurately predict the the condition-specific activation (up-regulation) and repression (down-regulation) of gene expression. The gene expression response is measured by microarray expression data. Genes are represented by various genomic regulatory sequence features. Experimental conditions are represented by the gene expression levels of various regulatory proteins. We use this combination of features to learn a prediction function for the regulatory response of genes under different experimental conditions. The core computational approach is based on boosting. Boosting algorithms allow us to learn high-accuracy, large-margin classifiers and avoid overfitting. We describe three applications of our framework to study gene regulation: - In the GeneClass algorithm, we use a compendium of known transcription factor binding sites and gene expression data to learn a global context-specific regulation program that accurately predicts differential expression. GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. We introduce a novel robust variant of boosting that improves stability and biological interpretability in the presence of correlated features. We also show how to incorporate genome-wide protein-DNA binding data from ChIP-chip experiments into the framework. - In several organisms, the DNA binding sites of many transcription factors are unknown. Hence, automatic discovery of regulatory sequence motifs is required. In the MEDUSA algorithm, we integrate raw promoter sequence data and gene expression data to simultaneously discover cis regulatory motifs ab initio and learn predictive regulatory programs. MEDUSA automatically learns probabilistic representations of motifs and their corresponding target genes. We show that we are able to accurately learn the binding sites of most known transcription factors in yeast. - We also design new techniques for extracting biologically and statistically significant information from the learned regulatory models. We use a margin-based score to extract global condition-specific regulomes as well as cluster-specific and gene-specific regulation programs. We develop a post-processing framework for interpreting and visualizing biological information encapsulated in our models. We show the utility of our framework in analyzing several interesting biological contexts (environmental stress responses, DNA-damage response and hypoxia-response) in the budding yeast Saccharomyces cerevisiae. We also show that our methods can learn regulatory programs and cis regulatory motifs in higher eukaryotes such as worms and humans. Several hypotheses generated by our methods are validated by our collaborators using biochemical experiments. Experimental results demonstrate that our framework is quantitatively and qualitatively predictive. We are able to achieve high prediction accuracy on test data and also generate specific, testable hypotheses. | (pdf) |
Using Runtime Testing to Detect Defects in Applications without Test Oracles | Christian Murphy, Gail Kaiser | 2008-08-07 | It is typically infeasible to test a large, complex software system in all its possible configurations and system states prior to deployment. Moreover, some such applications have no test oracles to indicate their correctness. In my thesis, we will address these problems in two ways. First, we suggest that executing tests within the context of an application running in the field can reveal defects that would not ordinarily otherwise be found. Second, we believe that this approach can further be extended to applications for which there is no test oracle by using a variant of metamorphic testing at runtime. | (pdf) |
Towards the Quality of Service for VoIP traffic in IEEE 802.11 Wireless Networks | Sangho Shin, Henning Schulzrinne | 2008-07-09 | The usage of voice over IP (VoIP) traffic in IEEE 802.11 wireless networks is expected to increase in the near future due to widely deployed 802.11 wireless networks and VoIP services on fixed lines. However, the quality of service (QoS) of VoIP traffic in wireless networks is still unsatisfactory. In this thesis, I identify several sources for the QoS problems of VoIP traffic in IEEE 802.11 wireless networks and propose solutions for these problems. The QoS problems discussed can be divided into three categories, namely, user mobility, VoIP capacity, and call admission control. User mobility causes network disruptions during handoffs. In order to reduce the handoff time between Access Points (APs), I propose a new handoff algorithm, Selective Scanning and Caching, which finds available APs by scanning a minimum number of channels and furthermore allows clients to perform handoffs without scanning, by caching AP information. I also describe a new architecture for the client and server side for seamless IP layer handoffs, which are caused when mobile clients change the subnet due to layer 2 handoffs. I also present two methods to improve VoIP capacity for 802.11 networks, Adaptive Priority Control (APC) and Dynamic Point Coordination Function (DPCF). APC is a new packet scheduling algorithm at the AP and improves the capacity by balancing the uplink and downlink delay of VoIP traffic, and DPCF uses a polling based protocol and minimizes the bandwidth wasted from unnecessary polling, using a dynamic polling list. Additionally, I estimated the capacity for VoIP traffic in IEEE 802.11 wireless networks via theoretical analysis, simulations, and experiments in a wireless test-bed and show how to avoid mistakes in the measurements and comparisons. Finally, to protect the QoS for existing VoIP calls while maximizing the channel utilization, I propose a novel admission control algorithm called QP-CAT (Queue size Prediction using Computation of Additional Transmission), which accurately predicts the impact of new voice calls by virtually transmitting virtual new VoIP traffic. | (pdf) |
genSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work | Christian Murphy, Swapneel Sheth, Gail Kaiser, Lauren Wilcox | 2008-06-13 | Many collaborative applications, especially in scientific research, focus only on the sharing of tools or the sharing of data. We seek to introduce an approach to scientific collaboration that is based on knowledge sharing. We do this by automatically building organizational memory and enabling knowledge sharing by observing what users do with a particular tool or set of tools in the domain, through the addition of activity and usage monitoring facilities to standalone applications. Once this knowledge has been gathered, we apply social networking models to provide collaborative features to users, such as suggestions on tools to use, and automatically-generated sequences of actions based on past usage amongst the members of a social network or the entire community. In this work, we investigate social networking models as an approach to scientific knowledge sharing, and present an implementation called genSpace, which is built as an extension to the geWorkbench platform for computational biologists. Last, we discuss the approach from the viewpoint of social software engineering. | (pdf) |
Application Layer Feedback-based SIP Server Overload Control | Charles Shen, Henning Schulzrinne, Erich Nahum | 2008-06-06 | A SIP server may be overloaded by emergency-induced call volume, "American Idol" style flash crowd effects or denial of service attacks. The SIP server overload problem is interesting especially because the costs of serving or rejecting a SIP session can be similar. For this reason, the built-in SIP overload control mechanism based on generating rejection messages cannot prevent the server from entering congestion collapse under heavy load. The SIP overload problem calls for a pushback control solution in which the potentially overloaded receiving server may notify its upstream sending servers to have them send only the amount of load within the receiving server's processing capacity. The pushback framework can be achieved by SIP application layer rate-based feedback or window-based feedback. The centerpiece of the feedback mechanism is the algorithm used to generate load regulation information. We propose three new window-based feedback algorithms and evaluate them together with two existing rate-based feedback algorithms. We compare the different algorithms in terms of the number of tuning parameters and performance under both steady and variable load. Furthermore, we identify two categories of fairness requirements for SIP overload control, namely, user-centric and provider-centric fairness. With the introduction of a new double-feed SIP overload control architecture, we show how the algorithms meet those fairness criteria. | (pdf) |
CPU Torrent -- CPU Cycle Offloading to Reduce User Wait Time and Provider Resource Requirements | Swapneel Sheth, Gail Kaiser | 2008-06-04 | Developers of novel scientific computing systems are often eager to make their algorithms and databases available for community use, but their own computational resources may be inadequate to fulfill external user demand -- yet the system's footprint is far too large for prospective user organizations to download and run locally. Some heavyweight systems have become part of designated ``centers'' providing remote access to supercomputers and/or clusters supported by substantial government funding; others use virtual supercomputers dispersed across grids formed by massive numbers of volunteer Internet-connected computers. But public funds are limited and not all systems are amenable to huge-scale divisibility into independent computation units. We have identified a class of scientific computing systems where ``utility'' sub-jobs can be offloaded to any of several alternative providers thereby freeing up local cycles for the main proprietary jobs, implemented a proof-of-concept framework enabling such deployments, and analyzed its expected throughput and response-time impact on a real-world bioinformatics system (Columbia's PredictProtein) whose present users endure long wait queues. | (pdf) |
FairTorrent: Bringing Fairness to Peer-to-Peer Systems | Alex Sherman, Jason Nieh, Clifford Stein | 2008-05-27 | The lack of fair bandwidth allocation in Peer-to-Peer systems causes many performance problems, including users being disincentivized from contributing upload bandwidth, free riders taking as much from the system as possible while contributing as little as possible, and a lack of quality-of-service guarantees to support streaming applications. We present FairTorrent, a simple distributed scheduling algorithm for Peer-to-Peer systems that fosters fair bandwidth allocation among peers. For each peer, FairTorrent maintains a deficit counter which represents the number of bytes uploaded to a peer minus the number of bytes downloaded from it. It then uploads to the peer with the lowest deficit counter. FairTorrent automatically adjusts to variations in bandwidth among peers and is resilient to exploitation by free-riding peers. We have implemented FairTorrent inside a BitTorrent client without modifications to the BitTorrent protocol, and compared its performance on PlanetLab against other widely-used BitTorrent clients. Our results show that FairTorrent can provide up to two orders of magnitude better fairness and up to five times better download performance for high contributing peers. It thereby gives users an incentive to contribute more bandwidth, and improve overall system performance. | (pdf) |
IEEE 802.11 in the Large: Observations at an IETF Meeting | Andrea G. Forte, Sangho Shin, Henning Schulzrinne | 2008-05-05 | We observed wireless network traffic at the 65th IETF Meeting in Dallas, Texas in March of 2006, attended by approximately 1200 engineers. The event was supported by a very large number of 802.11a and 802.11b access points, often seeing hundreds of simultaneous users. We were particularly interested in the stability of wireless connectivity, load balancing and loss behavior, rather than just traffic.We observed distinct differences among client implementations and saw a number of factors that made the overall system less than optimal, pointing to the need for better design tools and automated adaptation mechanisms. | (pdf) |
ROFL: Routing as the Firewall Layer | Hang Zhao, Chi-Kin Chau, Steven M. Bellovin | 2008-05-03 | We propose a firewall architecture that treats port numbers as part of the IP address. Hosts permit connectivity to a service by advertising the IPaddr:port/48 address; they block connectivity by ensuring that there is no route to it. This design, which is especially well-suited to MANETs, provides greater protection against insider attacks than do conventional firewalls, but drops unwanted traffic far earlier than distributed firewalls do. | (pdf) |
Stored Media Streaming in BitTorrent-like P2P Networks | Kyung-Wook Hwang, Vishal Misra, Dan Rubenstein | 2008-05-01 | Peer-to-peer (P2P) networks exist on the Internet today as a popular means of data distribution. However, conventional uses of P2P networking involve distributing stored files for use after the entire file has been downloaded. In this work, we investigate whether P2P networking can be used to provide real-time playback capabilities for stored media. For real-time playback, users should be able to start playback immediately, or almost immediately, after requesting the media and have uninterrupted playback during the download. To achieve this goal, it is critical to efficiently schedule the order in which pieces of the desired media are downloaded. Simply downloading pieces in sequential (earliest-first) order is prone to bottlenecks. Consequently we propose a hybrid of earliest-first and rarest-first scheduling - ensuring high piece diversity while at the same time prioritizing pieces needed to maintain uninterrupted playback. We consider an approach to peer-assisted streaming that is based on BitTorrent. In particular, we show that dynamic adjustment of the probabilities of earliest-first and rarest-first strategies along with utilization of coding techniques promoting higher data diversity, can offer noticeable improvements for real-time playback. | (pdf) |
ReoptSMART: A Learning Query Plan Cache | Julia Stoyanovich, Kenneth A. Ross, Jun Rao, Wei Fan, Volker Markl, Guy Lohman | 2008-04-24 | The task of query optimization in modern relational database systems is important but can be computationally expensive. Parametric query optimization(PQO) has as its goal the prediction of optimal query execution plans based on historical results, without consulting the query optimizer. We develop machine learning techniques that can accurately model the output of a query optimizer. Our algorithms handle non-linear boundaries in plan space and achieve high prediction accuracy even when a limited amount of data is available for training. We use both predicted and actual query execution times for learning, and are the first to demonstrate a total net win of a PQO method over a state-of-the-art query optimizer for some workloads. ReoptSMART realizes savings not only in optimization time, but also in query execution time, for an over-all improvement by more than an order of magnitude in some cases. | (pdf) |
Masquerade Detection Using a Taxonomy-Based Multinomial Modeling Approach in UNIX Systems | Malek Ben Salem, Salvatore J. Stolfo | 2008-04-14 | This paper presents one-class Hellinger distance-based and one-class SVM modeling techniques that use a set of features to reveal user intent. The specific objective is to model user command profiles and detect deviations indicating a masquerade attack. The approach aims to model user intent, rather than only modeling sequences of user issued commands. We hypothesize that each individual user will search in a targeted and limited fashion in order to find information germane to their current task. Masqueraders, on the other hand, will likely not know the file system and layout of another user's desktop, and would likely search more extensively and broadly. Hence, modeling a user search behavior to detect deviations may more accurately detect masqueraders. To that end, we extend prior research that uses UNIX command sequences issued by users as the audit source by relying upon an abstraction of commands. We devised a taxonomy of UNIX commands that is used to abstract command sequences. The experimental results show that the approach does not lose information and performs comparably to or slightly better than the modeling approach based on simple UNIX command frequencies. | (pdf) |
Approximating the Permanent with Belief Propagation | Bert Huang, Tony Jebara | 2008-04-05 | This work describes a method of approximating matrix permanents efficiently using belief propagation. We formulate a probability distribution whose partition function is exactly the permanent, then use Bethe free energy to approximate this partition function. After deriving some speedups to standard belief propagation, the resulting algorithm requires $(n^2)$ time per iteration. Finally, we demonstrate the advantages of using this approximation. | (pdf) |
Behavior-Based Network Access Control: A Proof-of-Concept | Vanessa Frias-Martinez | 2008-03-27 | Current NAC technologies implement a pre-connect phase where the status of a device is checked against a set of policies before being granted access to a network, and a post-connect phase that examines whether the device complies with the policies that correspond to its role in the network. In order to enhance current NAC technologies, we propose a new architecture based on behaviors rather than roles or identity, where the policies are automatically learned and updated over time by the members of the network in order to adapt to behavioral changes of the devices. Behavior profiles may be presented as identity cards that can change over time. By incorporating an Anomaly Detector (AD) to the NAC server or to each of the hosts, their behavior profile is modeled and used to determine the type of behaviors that should be accepted within the network. These models constitute behavior-based policies. In our enhanced NAC architecture, global decisions are made using a group voting process. Each host’s behavior profile is used to compute a partial decision for or against the acceptance of a new profile or traffic. The aggregation of these partial votes amounts to the model-group decision. This voting process makes the architecture more resilient to attacks. Even after accepting a certain percentage of malicious devices, the enhanced NAC is able to compute an adequate decision. We provide proof-of-concept experiments of our architecture using web traffic from our department network. Our results show that the model-group decision approach based on behavior profiles has a 99% detection rate of anomalous traffic with a false positive rate of only 0.005%. Furthermore, the architecture achieves short latencies for both the pre- and post-connect phases. | (pdf) |
Path-based Access Control for Enterprise Networks | Matthew Burnside, Angelos D. Keromytis | 2008-03-27 | Enterprise networks are ubiquitious and increasingly complex. The mechanisms for defining security policies in these networks have not kept up with the advancements in networking technology. In most cases, system administrators must define policies on a per-application basis, and subsequently, these policies do not interact. For example, there is no mechanism that allows a firewall to communicate decisions based on its ruleset to a web server behind it, even though decisions being made at the firewall may be relevant to decisions made at the web server. In this paper, we describe a path-based access control system which allows applications in a network to pass access-control-related information to neighboring applications, as the applications process requests from outsiders and from each other. This system defends networks against a class of attacks wherein individual applications may make correct access control decisions but the resulting network behavior is incorrect. We demonstrate the system on service-oriented architecture (SOA)-style networks, in two forms, using graph-based policies, and leveraging the KeyNote trust management system. | (pdf) |
Tractability of multivariate approximation over a weighted unanchored Sobolev space: Smoothness sometimes hurts | Arthur G. Werschulz, Henryk Wozniakowski | 2008-03-25 | We study $d$-variate approximation for a weighted unanchored Sobolev space having smoothness $m\ge1$. Folk wisdom would lead us to believe that this problem should become easier as its smoothness increases. This is true if we are only concerned with asymptotic analysis: the $n$th minimal error is of order~$n^{-(m-\delta)}$ for any $\delta>0$. However, it is unclear how long we need to wait before this asymptotic behavior kicks in. How does this waiting period depend on $d$ and~$m$? We prove that no matter how the weights are chosen, the waiting period is at least~$m^d$, even if the error demand~$\varepsilon$ is arbitrarily close to~$1$. Hence, for $m\ge2$, this waiting period is exponential in~$d$, so that the problem suffers from the curse of dimensionality and is intractable. In other words, the fact that the asymptotic behavior improves with~$m$ is irrelevant when $d$~is large. So, we will be unable to vanquish the curse of dimensionality unless $m=1$, i.e., unless the smoothness is minimal. We then show that our problem \emph{can} be tractable if $m=1$. That is, we can find an $\varepsilon$-approximation using polynomially-many (in $d$ and~$\varepsilon^{-1}$) information operations, even if only function values are permitted. When $m=1$, it is even possible for the problem to be \emph{strongly} tractable, i.e., we can find an $\varepsilon$-approximation using polynomially-many (in~$\varepsilon^{-1}$) information operations, independent of~$d$. These positive results hold when the weights of the Sobolev space decay sufficiently quickly or are bounded finite-order weights, i.e., the $d$-variate functions we wish to approximate can be decomposed as sums of functions depending on at most~$\omega$ variables, where $\omega$ is independent of~$d$. | (pdf) |
Spreadable Connected Autonomic Networks (SCAN) | Joshua Reich, Vishal Misra, Dan Rubestein, Gil Zussman | 2008-03-24 | A Spreadable Connected Autonomic Network (SCAN) is a mobile network that automatically maintains its own connectivity as nodes move. We envision SCANs to enable a diverse set of applications such as self-spreading mesh networks and robotic search and rescue systems. This paper describes our experiences developing a prototype robotic SCAN built from commercial, off-the-shelf hardware, to support such applications. A major contribution of our work is the development of a protocol, called SCAN1, which maintains network connectivity by enabling individual nodes to determine when they must constrain their mobility in order to avoid disconnecting the network. SCAN1 achieves its goal through an entirely distributed process in which individual nodes utilize only local (2-hop) knowledge of the network's topology to periodically make a simple decision: move, or freeze in place. Along with experimental results from our hardware testbed, we model SCAN1's performance, providing both supporting analysis and simulation for the efficacy of SCAN1 as a solution to enable SCANs. While our evaluation of SCAN1 in this paper is limited to systems whose capabilities match those of our testbed, SCAN1 can be utilized in conjunction with a wide-range of potential applications and environments, as either a primary or backup connectivity maintenance mechanism. | (pdf) |
Leveraging Local Intra-Core Information to Increase Global Performance in Block-Based Design of Systems-on-Chip | Cheng-Hong Li, Luca P. Carloni | 2008-03-18 | Latency-insensitive design is a methodology for system-on-chip (SoC) design that simplifies the reuse of intellectual property cores and the implementation of the communication among them. This simplification is based on a system-level protocol that decouples the intra-core logic design from the design of the inter-core communication channels. Each core is encapsulated within a shell, a synthesized logic block that dynamically controls its operation to interface it with the rest of the SoC and to absorb any latency variations on its I/O signals. In particular, a shell stalls a core whenever new valid data are not available on the input channels or a down-link core has requested a delay in the data production on the output channels. We study how knowledge about the internal logic structure of a core can be applied to the design of its shell to improve the overall system-level performance by avoiding unnecessary local stalling. We introduce the notion of functional independence conditions (FIC) and present a novel circuit design of a generic shell template that can leverage FIC. We propose a procedure for the logic synthesis of a FIC-shell instance that is only based on the analysis of the intra-core logic and does not require any input from the designers. Finally, we present a comprehensive experimental analysis that shows the performance benefits and limited design overhead of the proposed technique. This includes the semi-custom design of an SoC, an ultra-wideband baseband transmitter, using a 90nm industrial standard cell library. | (pdf) |
The Delay-Friendliness of TCP | Eli Brosh, Salman Baset, Vishal Misra, Dan Rubenstein, Henning Schulzrinne | 2008-03-10 | TCP has been traditionally considered unfriendly for real-time applications. Nonetheless, popular applications such as Skype use TCP due to the deployment of NATs and firewalls that prevent UDP traffic. Motivated by this observation we study the delay performance of TCP for real-time media flows. We develop an analytical performance model for the delay of TCP. We use extensive experiments to validate the model and to evaluate the impact of various TCP mechanisms on its delay performance. Based on our results, we derive the working region for VoIP and live video streaming applications and provide guidelines for delay-friendly TCP settings. Our research indicates that simple application-level schemes, such as packet splitting and parallel connections, can reduce the delay of real-time TCP flows by as much as 30\% and 90\%, respectively. | (pdf) (ps) |
Properties of Machine Learning Applications for Use in Metamorphic Testing | Christian Murphy, Gail Kaiser, Lifeng Hu | 2008-02-28 | It is challenging to test machine learning (ML) applications, which are intended to learn properties of data sets where the correct answers are not already known. In the absence of a test oracle, one approach to testing these applications is to use metamorphic testing, in which properties of the application are exploited to define transformation functions on the input, such that the new output will be unchanged or can easily be predicted based on the original output; if the output is not as expected, then a defect must exist in the application. Here, we seek to enumerate and classify the metamorphic properties of some machine learning algorithms, and demonstrate how these can be applied to reveal defects in the applications of interest. In addition to the results of our testing, we present a set of properties that can be used to define these metamorphic relationships so that metamorphic testing can be used as a general approach to testing machine learning applications. | (pdf) |
The Impact of SCTP on Server Scalability and Performance | Kumiko Ono, Henning Schulzrinne | 2008-02-28 | The Stream Control Transmission Protocol (SCTP) is a newer transport protocol, having additional features to TCP. Although SCTP is an alternative transport protocol for the Session Initiation Protocol (SIP), we do not know how SCTP features influence SIP server scalability and performance. To estimate this, we measured the scalability and performance of two servers, an echo server and a simplified SIP server on Linux, comparing to TCP. Our measurements found that using SCTP does not significantly affect on data latency: approximately 0.3 ms longer for the handshake than that for TCP. However, server scalability in terms of the number of sustainable associations drops to 17-21%, or to 43% of TCP if we adjust the acceptable gap size of unordered data delivery. | (pdf) |
Optimal Splitters for Database Partitioning with Size Bounds | Kenneth A. Ross, John Cieslewicz | 2008-02-27 | Partitioning is an important step in several database algorithms, including sorting, aggregation, and joins. Partitioning is also fundamental for dividing work into equal-sized (or balanced) parallel subtasks. In this paper, we aim to find, materialize and maintain a set of partitioning elements (splitters) for a data set. Unlike traditional partitioning elements, our splitters define both inequality and equality partitions, which allows us to bound the size of the inequality partitions. We provide an algorithm for determining an optimal set of splitters from a sorted data set and show that it has time complexity O(k lg_2 N), where k is the number of splitters requested and N is the size of the data set. We show how the algorithm can be extended to pairs of tables, so that joins can be partitioned into work units that have balanced cost. We demonstrate experimentally (a) that finding the optimal set of splitters can be done efficiently, and (b) that using the precomputed splitters can improve the time to sort a data set by up to 76%, with particular benefits in the presence of a few heavy hitters. | (pdf) |
One Server Per City: Using TCP for Very Large SIP Servers | Kumiko Ono, Henning Schulzrinne | 2008-02-26 | The transport protocol for SIP can be chosen based on the requirements of services and network conditions. How does the choice of TCP affect the scalability and performance compared to UDP? We experimentally analyze the impact of using TCP as a transport protocol for a SIP server. We first investigate scalability of a TCP echo server, then compare performance of a SIP server for three TCP connection lifetimes: transaction, dialog, and persistent. Our results show that a Linux machine can establish 450,000+ TCP connections and maintaining connections does not affect the transaction response time. Additionally, the transaction response times using the three TCP connection lifetimes and UDP show no significant difference at 2,500 registration requests/second and at 500 call requests/second. However, sustainable request rate is lower for TCP than for UDP, since using TCP requires more message processing. More message processing causes longer delays at the thread queue for the server implementing a thread-pool model. Finally, we suggest how to reduce the impact of TCP for a scalable SIP server especially under overload control. This is applicable to other servers with very large connection counts. | (pdf) |
Newspeak: A Secure Approach for Designing Web Applications | Kyle Dent, Steven M. Bellovin | 2008-02-16 | Internet applications are being used for more and more important business and personal purposes. Despite efforts to lock down web servers and isolate databases, there is an inherent problem in the web application architecture that leaves databases necessarily exposed to possible attack from the Internet. We propose a new design that removes the web server as a trusted component of the architecture and provides an extra layer of protection against database attacks. We have created a prototype system that demonstrates the feasibility of the new design. | (pdf) |
Summary-Based Pointer Analysis Framework for Modular Bug Finding | Marcio O. Buss | 2008-02-07 | Modern society is irreversibly dependent on computers and, consequently, on software. However, as the complexity of programs increase, so does the number of defects within them. To alleviate the problem, automated techniques are constantly used to improve software quality. Static analysis is one such approach in which violations of correctness properties are searched and reported. Static analysis has many advantages, but it is necessarily conservative because it symbolically executes the program instead of using real inputs, and it considers all possible executions simultaneously. Being conservative often means issuing false alarms, or missing real program errors. Pointer variables are a challenging aspect of many languages that can force static analysis tools to be overly conservative. It is often unclear what variables are affected by pointer-manipulating expressions, and aliasing between variables is one of the banes of program analysis. To alleviate that, a common solution is to allow the programmer to provide annotations such as declaring a variable as unaliased in a given scope, or providing special constructs such as the ``never-null'' pointer of Cyclone. However, programmers rarely keep these annotations up-to-date. The solution is to provide some form of pointer analysis, which derives useful information about pointer variables in the program. An appropriate pointer analysis equips the static tool so that it is capable of reporting more errors without risking too many false alarms. This dissertation proposes a methodology for pointer analysis that is specially tailored for ``modular bug finding.'' It presents a new analysis space for pointer analysis, defined by finer-grain ``dimensions of precision,'' which allows us to explore and evaluate a variety of different algorithms to achieve better trade-offs between analysis precision and efficiency. This framework is developed around a new abstraction for computing points-to sets, the Assign-Fetch Graph, that has many interesting features. Empirical evaluation shows promising results, as some unknown errors in well-known applications were discovered. | (pdf) |
SPARSE: A Hybrid System to Detect Malcode-Bearing Documents | Wei-Jen Li, Salvatore J. Stolfo | 2008-01-31 | Embedding malcode within documents provides a convenient means of penetrating systems which may be unreachable by network-level service attacks. Such attacks can be very targeted and difficult to detect compared to the typical network worm threat due to the multitude of document-exchange vectors. Detecting malcode embedded in a document is difficult owing to the complexity of modern document formats that provide ample opportunity to embed code in a myriad of ways. We focus on Microsoft Word documents as malcode carriers as a case study in this paper. We introduce a hybrid system that integrates static and dynamic techniques to detect the presence and location of malware embedded in documents. The system is designed to automatically update its detection models to improve accuracy over time. The overall hybrid detection system with a learning feedback loop is demonstrated to achieve a 99.27% detection rate and 3.16% false positive rate on a corpus of 6228 Word documents. | (pdf) |
The In Vivo Approach to Testing Software Applications | Christian Murphy, Gail Kaiser, Matt Chu | 2008-01-31 | Software products released into the field typically have some number of residual bugs that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system. Testing approaches such as perpetual testing or continuous testing seek to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present our initial work towards a testing methodology we call in vivo testing, in which unit tests are continuously executed inside a running application in the deployment environment. These tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state. Our approach can reveal defects both in the applications of interest and in the unit tests themselves. It can also be used for detecting concurrency or robustness issues that may not have appeared in a testing lab. Here we describe the approach and the testing framework called Invite that we have developed for Java applications. We also enumerate the classes of bugs our approach can discover, and provide the results of a case study on a publicly-available application, as well as the results of experiments to measure the added overhead. | (pdf) |
Mitigating the Effect of Free-Riders in BitTorrent using Trusted Agents | Alex Sherman, Angelos Stavrou, Jason Nieh, Cliff Stein | 2008-01-25 | Even though Peer-to-Peer (P2P) systems present a cost-effective and scalable solution to content distribution, most entertainment, media and software, content providers continue to rely on expensive, centralized solutions such as Content Delivery Networks. One of the main reasons is that the current P2P systems cannot guarantee reasonable performance as they depend on the willingness of users to contribute bandwidth. Moreover, even systems like BitTorrent, which employ a tit-for-tat protocol to encourage fair bandwidth exchange between users, are prone to free-riding (i.e. peers that do not upload). Our experiments on PlanetLab extend previous research (e.g. LargeViewExploit, BitTyrant) demonstrating that such selfish behavior can seriously degrade the performance of regular users in many more scenarios beyond simple free-riding: we observed an overhead of upto 430\% for 80\% of free-riding identities easily generated by a small set of selfish users. To mitigate the effects of selfish users, we propose a new P2P architecture that classifies peers with the help of a small number of {\em trusted nodes} that we call Trusted Auditors (TAs). TAs participate in P2P download like regular clients and detect free-riding identities by observing their neighbors' behavior. Using TAs, we can separate compliant users into a separate service pool resulting in better performance. Furthermore, we show that TAs are more effective ensuring the performance of the system than a mere increase in bandwidth capacity: for 80\% of free-riding identities a single-TA system has a 6\% download time overhead while without the TA and three times the bandwidth capacity we measure a 100\% overhead. | (pdf) |
A Distance Learning Approach to Teaching eXtreme Programming | Christian Murphy, Dan Phung, Gail Kaiser | 2008-01-23 | As university-level distance learning programs become more and more popular, and software engineering courses incorporate eXtreme Programming (XP) into their curricula, certain challenges arise when teaching XP to students who are not physically co-located. In this paper, we present the results of a three-year study of such an online software engineering course targeted to graduate students, and describe some of the specific challenges faced, such as students’ aversion to aspects of XP and difficulties in scheduling. We discuss our findings in terms of the course’s educational objectives, and present suggestions to other educators who may face similar situations. | (pdf) |
Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems | Rebecca Collins, Luca Carloni | 2008-01-15 | Latency-insensitive protocols allow system-on-chip engineers to decouple the design of the computing cores from the design of the inter-core communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS) each core is encapsulated within a shell, a synthesized interface module that dynamically controls its operation. At each clock period, if new data has not arrived on an input channel or a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing. We also practically characterize several LIS topologies and how the topology of a LIS can impact not only how much throughput degradation will occur, but also the difficulty of finding optimal queue sizing solutions. | (pdf) |
LinkWidth: A Method to Measure Link Capacity and Available Bandwidth using Single-End Probes | Sambuddho Chakravarty, Angelos Stavrou, Angelos D. Keromytis | 2008-01-05 | We introduce LinkWidth, a method for estimating capacity and available bandwidth using single-end controlled TCP packet probes. To estimate capacity, we generate a train of TCP RST packets ``sandwiched'' between trains of TCP SYN packets. Capacity is computed from the end-to-end packet dispersion of the received TCP RST/ACK packets corresponding to the TCP SYN packets going to closed ports. Our technique is significantly different from the rest of the packet-pair based measurement techniques, such as {\em CapProbe,} {\em pathchar} and {\em pathrate,} because the long packet trains minimize errors due to bursty cross-traffic. Additionally, TCP RST packets do not generate additional ICMP replies, thus avoiding cross-traffic due to such packets from interfering with our probes. In addition, we use TCP packets for all our probes to prevent QoS-related traffic shaping (based on packet types) from affecting our measurements (eg. CISCO routers by default are known have to very high latency while generating to ICMP TTL expired replies). We extend the {\it Train of Packet Pairs} technique to approximate the available link capacity. We use a train of TCP packet pairs with variable intra-pair delays and sizes. This is the first attempt to implement this technique using single-end TCP probes, tested on a range of networks with different bottleneck capacities and cross traffic rates. The method we use for measuring from a single point of control uses TCP RST packets between a train of TCP SYN packets. The idea is quite similar to the technique for measuring the bottleneck capacity. We compare our prototype with {\em pathchirp,} {\em pathload,} {\em IPERF,} which require control of both ends as well as another single end controlled technique {\em abget}, and demonstrate that in most cases our method gives approximately the same results if not better. | (pdf) |
Autotagging to Improve Text Search for 3D Models | Corey Goldfeder, Peter Allen | 2008-01-02 | Text search on 3D models has traditionally worked poorly, as text annotations on 3D models are often unreliable or incomplete. In this paper we attempt to improve the recall of text search by automatically assigning appropriate tags to models. Our algorithm finds relevant tags by appealing to a large corpus of partially labeled example models, which does not have to be preclassified or otherwise prepared. For this purpose we use a copy of Google 3DWarehouse, a database of user contributed models which is publicly available on the Internet. Given a model to tag, we find geometrically similar models in the corpus, based on distances in a reduced dimensional space derived from Zernike descriptors. The labels of these neighbors are used as tag candidates for the model with probabilities proportional to the degree of geometric similarity. We show experimentally that text based search for 3D models using our computed tags can work as well as geometry based search. Finally, we demonstrate our 3D model search engine that uses this algorithm and discuss some implementation issues. | (pdf) |
Schema Polynomials and Applications | Kenneth A. Ross, Julia Stoyanovich | 2007-12-17 | Conceptual complexity is emerging as a new bottleneck as database developers, application developers, and database administrators struggle to design and comprehend large, complex schemas. The simplicity and conciseness of a schema depends critically on the idioms available to express the schema. We propose a formal conceptual schema representation language that combines different design formalisms, and allows schema manipulation that exposes the strengths of each of these formalisms. We demonstrate how the schema factorization framework can be used to generate relational, object-oriented, and faceted physical schemas, allowing a wider exploration of physical schema alternatives than traditional methodologies. We illustrate the potential practical benefits of schema factorization by showing that simple heuristics can significantly reduce the size of a real-world schema description. We also propose the use of schema polynomials to model and derive alternative representations for complex relationships with constraints. | (pdf) |
A Recursive Data-Driven Approach to Programming Multicore Systems | Rebecca Collins, Luca Carloni | 2007-12-05 | In this paper, we propose a method to program divide-and-conquer problems on multicore systems that is based on a data-driven recursive programming model. Data intensive programs are difficult to program on multicore architectures because they require efficient utilization of inter-core communication. Models for programming multicore systems available today generally lack the ability to automatically extract concurrency from a sequential style program and map concurrent tasks to efficiently leverage data and temporal locality. For divide-and-conquer algorithms, a recursive programming model can address both of these problems. Furthermore, since a recursive function has the same behavior patterns at all granularities of a problem, the same recursive model can be used to implement a multicore program at all of its levels: 1. the operations of a single core, 2. how to distribute tasks among several cores, and 3. in what order to schedule tasks on a multicore system when it is not possible to schedule all of the tasks at the same time. We present a novel selective execution technique that can enable automatic parallelization and task mapping of a recursive program onto a multicore system. To verify the practicality of this approach, we perform a case-study of bitonic sort on the Cell BE processor. | (pdf) |
Speech Enabled Avatar from a Single Photograph | Dmitri Bitouk, Shree K. Nayar | 2007-11-25 | This paper presents a complete framework for creating speech-enabled 2D and 3D avatars from a single image of a person. Our approach uses a generic facial motion model which represents deformations of the prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model is transformed to a novel face geometry using a set of corresponding points between the generic mesh and the novel face. In the case of a 2D avatar, a single photograph of the person is used as input. We manually select a small number of features on the photograph and these are used to deform the prototype surface. The deformed surface is then used to animate the photograph. In the case of a 3D avatar, we use a single stereo image of the person as input. The sparse geometry of the face is computed from this image and used to warp the prototype surface to obtain the complete 3D surface of the person's face. This surface is etched into a glass cube using sub-surface laser engraving (SSLE) technology. Synthesized facial animation videos are then projected onto the etched glass cube. Even though the etched surface is static, the projection of facial animation onto it results in a compelling experience for the viewer. We show several examples of 2D and 3D avatars that are driven by text and speech inputs. | (pdf) |
Partial Evaluation for Code Generation from Domain-Specific Languages | Jia Zeng | 2007-11-20 | Partial evaluation has been applied to compiler optimization and generation for decades. Most of the successful partial evaluators have been designed for general-purpose languages. Our observation is that domain-specific languages are also suitable targets for partial evaluation. The unusual computational models in many DSLs bring challenges as well as optimization opportunities to the compiler. To enable aggressive optimization, partial evaluation has to be specialized to fit the specific paradigm of a DSL. In this dissertation, we present three such specialized partial evaluation techniques designed for specific languages that address a variety of compilation concerns. The first algorithm provides a low-cost solution for simulating concurrency on a single-threaded processor. The second enables a compiler to compile modest-sized synchronous programs in pieces that involve communication cycles. The third statically elaborates recursive function calls that enable programmers to dynamically create a system's concurrent components in a convenient and algorithmic way. Our goal is to demonstrate the potential of partial evaluation to solve challenging issues in code generation for domain-specific languages. Naturally, we do not cover all DSL compilation issues. We hope our work will enlighten and encourage future research on the application of partial evaluation to this area. | (pdf) |
Distributed In Vivo Testing of Software Applications | Matt Chu, Christian Murphy, Gail Kaiser | 2007-11-16 | The in vivo software testing approach focuses on testing live applications by executing unit tests throughout the lifecycle, including after deployment. The motivation is that the “known state” approach of traditional unit testing is unrealistic; deployed applications rarely operate under such conditions, and it may be more informative to perform the testing in live environments. One of the limitations of this approach is the high performance cost it incurs, as the unit tests are executed in parallel with the application. Here we present distributed in vivo testing, which focuses on easing the burden by sharing the load across multiple instances of the application of interest. That is, we elevate the scope of in vivo testing from a single instance to a community of instances, all participating in the testing process. Our approach is different from prior work in that we are actively testing during execution, as opposed to passively monitoring the application or conducting tests in the user environment prior to execution. We discuss new extensions to the existing in vivo testing framework (called Invite) and present empirical results that show the performance overhead improves linearly with the number of clients. | (pdf) |
Tractability of the Helmholtz equation with non-homogeneous Neumann boundary conditions: Relation to $L_2$-approximation | Arthur G. Werschulz | 2007-11-08 | We want to compute a worst case $\varepsilon$-approximation to the solution of the Helmholtz equation $-\Delta u+qu=f$ over the unit $d$-cube~$I^d$, subject to Neumann boundary conditions $\partial_\nu u=g$ on~$\partial I^d$. Let $\mathop{\rm card}(\varepsilon,d)$ denote the minimal number of evaluations of $f$, $g$, and~$q$ needed to compute an absolute or normalized $\varepsilon$-approximation, assuming that $f$, $g$, and~$q$ vary over balls of weighted reproducing kernel Hilbert spaces. This problem is said to be weakly tractable if $\mathop{\rm card}(\varepsilon,d)$ grows subexponentially in~$\varepsilon^{-1}$ and $d$. It is said to be polynomially tractable if $\mathop{\rm card}(\varepsilon,d)$ is polynomial in~$\varepsilon^{-1}$ and~$d$, and strongly polynomially tractable if this polynomial is independent of~$d$. We have previously studied tractability for the homogeneous version $g=0$ of this problem. In this paper, we investigate the tractability of the non-homogeneous problem, with general~$g$. First, suppose that we use product weights, in which the role of any variable is moderated by its particular weight. We then find that if the weight sum is sublinearly bounded, then the problem is weakly tractable; moreover, this condition is more or less necessary. We then show that the problem is polynomially tractable if the weight sum is logarithmically or uniformly bounded, and we estimate the exponents of tractability for these two cases. Next, we turn to finite-order weights of fixed order~$\omega$, in which a $d$-variate function can be decomposed as sum, each term depending on at most $\omega$~variables. We show that the problem is always polynomially tractable for finite-order weights, and we give estimates for the exponents of tractability. Since our results so far have established nothing stronger than polynomial tractability, we look more closely at whether strong polynomial tractability is possible. We show that our problem is never strongly polynomially tractable for the absolute error criterion. Moreover, we believe that the same is true for the normalized error criterion, but we have been able to prove this lack of strong tractability only when certain conditions hold on the weights. Finally, we use the Korobov- and min-kernels, along with product weights, to illustrate our results. | (pdf) |
High Level Synthesis for Packet Processing Pipelines | Cristian Soviani | 2007-10-30 | Packet processing is an essential function of state-of-the-art network routers and switches. Implementing packet processors in pipelined architectures is a well-known, established technique, albeit different approaches have been proposed. The design of packet processing pipelines is a delicate trade-off between the desire for abstract specifications, short development time, and design maintainability on one hand and very aggressive performance requirements on the other. This thesis proposes a coherent design flow for packet processing pipelines. Like the design process itself, I start by introducing a novel domain-specific language that provides a high-level specification of the pipeline. Next, I address synthesizing this model and calculating its worst-case throughput. Finally, I address some specific circuit optimization issues. I claim, based on experimental results, that my proposed technique can dramatically improve the design process of these pipelines, while the resulting performance matches the expectations of hand-crafted design. The considered pipelines exhibit a pseudo-linear topology, which can be too restrictive in the general case. However, especially due to its high performance, such an architecture may be suitable for applications outside packet processing, in which case some of my proposed techniques could be easily adapted. Since I ran my experiments on FPGAs, this work has an inherent bias towards that technology; however, most results are technology-independent. | (pdf) (ps) |
Generalized Tractability for Multivariate Problems | Michael Gnewuch, Henryk Wozniakowski | 2007-10-30 | \usepackage{amssymb} \begin{document} We continue the study of generalized tractability initiated in our previous paper ``Generalized tractability for multivariate problems, Part I: Linear tensor product problems and linear information'', J. Complexity, 23, 262-295 (2007). We study linear tensor product problems for which we can compute linear information which is given by arbitrary continuous linear functionals. We want to approximate an operator $S_d$ given as the $d$-fold tensor product of a compact linear operator $S_1$ for $d=1,2,\dots\,$, with $\|S_1\|=1$ and $S_1$ has at least two positive singular values. Let $n(\varepsilon,S_d)$ be the minimal number of information evaluations needed to approximate $S_d$ to within $\varepsilon\in[0,1]$. We study \emph{generalized tractability} by verifying when $n(\varepsilon,S_d)$ can be bounded by a multiple of a power of $T(\varepsilon^{-1},d)$ for all $(\varepsilon^{-1},d)\in\Omega \subseteq[1,\infty)\times \mathbb{N}$. Here, $T$ is a \emph{tractability} function which is non-decreasing in both variables and grows slower than exponentially to infinity. We study the \emph{exponent of tractability} which is the smallest power of $T(\varepsilon^{-1},d)$ whose multiple bounds $n(\varepsilon,S_d)$. We also study \emph{weak tractability}, i.e., when $\lim_{\varepsilon^{-1}+d\to\infty,(\varepsilon^{-1},d)\in\Omega} \ln\,n(\varepsilon,S_d)/(\varepsilon^{-1}+d)=0$. In our previous paper, we studied generalized tractability for proper subsets $\Omega$ of $[1,\infty)\times\mathbb{N}$, whereas in this paper we take the unrestricted domain $\Omega^{\rm unr}=[1,\infty)\times\mathbb{N}$. We consider the three cases for which we have only finitely many positive singular values of $S_1$, or they decay exponentially or polynomially fast. Weak tractability holds for these three cases, and for all linear tensor product problems for which the singular values of $S_1$ decay slightly faster that logarithmically. We provide necessary and sufficient conditions on the function~$T$ such that generalized tractability holds. These conditions are obtained in terms of the singular values of $S_1$ and mostly limiting properties of $T$. The tractability conditions tell us how fast $T$ must go to infinity. It is known that $T$ must go to infinity faster than polynomially. We show that generalized tractability is obtained for $T(x,y)=x^{1+\ln\,y}$. We also study tractability functions $T$ of product form, $T(x,y) =f_1(x)f_2(x)$. Assume that $a_i=\liminf_{x\to\infty}(\ln\,\ln f_i(x))/(\ln\,\ln\,x)$ is finite for $i=1,2$. Then generalized tractability takes place iff $$a_i>1 \ \ \mbox{and}\ \ (a_1-1)(a_2-1)\ge1,$$ and if $(a_1-1)(a_2-1)=1$ then we need to assume one more condition given in the paper. If $(a_1-1)(a_2-1)>1$ then the exponent of tractability is zero, and if $(a_1-1)(a_2-1)=1$ then the exponent of tractability is finite. It is interesting to add that for $T$ being of the product form, the tractability conditions as well as the exponent of tractability depend only on the second singular eigenvalue of $S_1$ and they do \emph{not} depend on the rate of their decay. Finally, we compare the results obtained in this paper for the unrestricted domain $\Omega^{\rm unr}$ with the results from our previous paper obtained for the restricted domain $\Omega^{\rm res}= [1,\infty)\times\{1,2,\dots,d^*\}\,\cup\,[1,\varepsilon_0^{-1})\times\mathbb{N}$ with $d^*\ge1$ and $\varepsilon_0\in(0,1)$. In general, the tractability results are quite different. We may have generalized tractability for the restricted domain and no generalized tractability for the unrestricted domain which is the case, for instance, for polynomial tractability $T(x,y)=xy$. We may also have generalized tractability for both domains with different or with the same exponents of tractability. \end{document} | (pdf) |
Optimizing Frequency Queries for Data Mining Applications | Hassan Malik, John Kender | 2007-10-27 | Data mining algorithms use various Trie and bitmap-based representations to optimize the support (i.e., frequency) counting performance. In this paper, we compare the memory requirements and support counting performance of FP Tree, and Compressed Patricia Trie against several novel variants of vertical bit vectors. First, borrowing ideas from the VLDB domain, we compress vertical bit vectors using WAH encoding. Second, we evaluate the Gray code rank-based transaction reordering scheme, and show that in practice, simple lexicographic ordering, obtained by applying LSB Radix sort, outperforms this scheme. Led by these results, we propose HDO, a novel Hamming-distance-based greedy transaction reordering scheme, and aHDO, a linear-time approximation to HDO. We present results of experiments performed on 15 common datasets with varying degrees of sparseness, and show that HDO- reordered, WAH encoded bit vectors can take as little as 5% of the uncompressed space, while aHDO achieves similar compression on sparse datasets. Finally, with results from over a billion database and data mining style frequency query executions, we show that bitmap-based approaches result in up to hundreds of times faster support counting, and HDO-WAH encoded bitmaps offer the best space-time tradeoff. | (pdf) |
Automated Social Hierarchy Detection through Email Network Analysis | Ryan Rowe, German Creamer, Shlomo Heshkop, Sal Stolfo | 2007-10-17 | We present our work on automatically extracting social hierarchies from electronic communication data. Data mining based on user behavior can be leveraged to analyze and catalog patterns of communications between entities to rank relationships. The advantage is that the analysis can be done in an automatic fashion and can adopt itself to organizational changes over time. We illustrate the algorithms over real world data using the Enron corporation's email archive. The results show great promise when compared to the corporations work chart and judicial proceeding analyzing the major players. | (pdf) |
A New Framework for Unsupervised Semantic Discovery | Barry Schiffman | 2007-10-16 | This paper presents a new framework for the unsupervised discovery of semantic information, using a divide-and-conquer approach to take advantage of contextual regularities and to avoid problems of polysemy and sublanguages. Multiple sets of documents are formed and analyzed to create multiple sets of frames. The overall procedure is wholly unsupervised and domain independent. The end result will be a collection of sets of semantic frames that will be useful in a wide range of applications, including question-answering, information extraction, summarization and text generation. | (pdf) |
Towards In Vivo Testing of Software Applications | Christian Murphy, Gail Kaiser, Matt Chu | 2007-10-15 | Software products released into the field typically have some number of residual bugs that either were not detected or could not have been detected during testing. This may be the result of flaws in the test cases themselves, assumptions made during the creation of test cases, or the infeasibility of testing the sheer number of possible configurations for a complex system. Testing approaches such as perpetual testing or continuous testing seek to continue to test these applications even after deployment, in hopes of finding any remaining flaws. In this paper, we present our initial work towards a testing methodology we call “in vivo testing”, in which unit tests are continuously executed inside a running application in the deployment environment. In this novel approach, unit tests execute within the current state of the program (rather than by creating a clean slate) without affecting or altering that state. Our approach has been shown to reveal defects both in the applications of interest and in the unit tests themselves. It can also be used for detecting concurrency or robustness issues that may not have appeared in a testing lab. Here we describe the approach, the testing framework we have developed for Java applications, classes of bugs our approach can discover, and the results of experiments to measure the added overhead. | (pdf) |
Experiences in Teaching eXtreme Programming in a Distance Learning Program | Christian Murphy, Dan Phung, Gail Kaiser | 2007-10-12 | As university-level distance learning programs become more and more popular, and software engineering courses incorporate eXtreme Programming (XP) into their curricula, certain challenges arise when teaching XP to students who are not physically co-located. In this paper, we present our experiences and observations from managing such an online software engineering course, and describe some of the specific challenges we faced, such as students’ aversion to using XP and difficulties in scheduling. We also present some suggestions to other educators who may face similar situations. | (pdf) |
BARTER: Profile Model Exchange for Behavior-Based Access Control and Communication Security in MANETs | Vanessa Frias-Martinez, Salvatore J. Stolfo, Angelos D. Keromytis | 2007-10-10 | There is a considerable body of literature and technology that provides access control and security of communication for Mobile Ad-hoc Networks (MANETs) based on cryptographic authentication technologies and protocols. We introduce a new method of granting access and securing communication in a MANET environment to augment, not replace, existing techniques. Previous approaches grant access to the MANET, or to its services, merely by means of an authenticated identity or a qualified role. We present BARTER, a framework that, in addition, requires nodes to exchange a model of their behavior to grant access to the MANET and to assess the legitimacy of their subsequent communication. This framework forces the nodes not only to say who or what they are, but also how they behave. BARTER will continuously run membership acceptance and update protocols to give access to and accept traffic only from nodes whose behavior model is considered ``normal'' according to the behavior model of the nodes in the MANET. We implement and experimentally evaluate the merger between BARTER and other cryptographic technologies and show that BARTER can implement a fully distributed automatic access control and update with small cryptographic costs. Although the methods proposed involve the use of content-based anomaly detection models, the generic infrastructure implementing the methodology may utilize any behavior model. Even though the experiments are implemented for MANETs, the idea of model exchange for access control can be applied to any type of network. | (pdf) |
Post-Patch Retraining for Host-Based Anomaly Detection | Michael E. Locasto, Gabriela F. Cretu, Shlomo Hershkop, Angelos Stavrou | 2007-10-05 | Applying patches, although a disruptive activity, remains a vital part of software maintenance and defense. When host-based anomaly detection (AD) sensors monitor an application, patching the application requires a corresponding update of the sensor's behavioral model. Otherwise, the sensor may incorrectly classify new behavior as malicious (a false positive) or assert that old, incorrect behavior is normal (a false negative). Although the problem of ``model drift'' is an almost universally acknowledged hazard for AD sensors, relatively little work has been done to understand the process of re-training a ``live'' AD model --- especially in response to legal behavioral updates like vendor patches or repairs produced by a self-healing system. We investigate the feasibility of automatically deriving and applying a ``model patch'' that describes the changes necessary to update a ``reasonable'' host-based AD behavioral model ({\it i.e.,} a model whose structure follows the core design principles of existing host--based anomaly models). We aim to avoid extensive retraining and regeneration of the entire AD model when only parts may have changed --- a task that seems especially undesirable after the exhaustive testing necessary to deploy a patch. | (pdf) (ps) |
Privacy-Enhanced Searches Using Encrypted Bloom Filters | Steven M. Bellovin, William R. Cheswick | 2007-09-25 | It is often necessary for two or more or more parties that do not fully trust each other to share data selectively. For example, one intelligence agency might be willing to turn over certain documents to another such agency, but only if the second agency requests the specific documents. The problem, of course, is finding out that such documents exist when access to the database is restricted. We propose a search scheme based on Bloom filters and group ciphers such as Pohlig-Hellman encryption. A semi-trusted third party can transform one party's search queries to a form suitable for querying the other party's database, in such a way that neither the third party nor the database owner can see the original query. Furthermore, the encryption keys used to construct the Bloom filters are not shared with this third party. Multiple providers and queriers are supported; provision can be made for third-party ``warrant servers'', as well as ``censorship sets'' that limit the data to be shared. | (pdf) |
Service Composition in a Global Service Discovery System | Knarig Arabshian, Christian Dickmann, Henning Schulzrinne | 2007-09-17 | GloServ is a global service discovery system which aggregates information about different types of services in a globally distributed network. GloServ classifies services in an ontology and maps knowledge obtained by the ontology onto a scalable hybrid hierarchical peer-to-peer network. The network mirrors the semantic relationships of service classes and as a result, reduces the number of message hops across the global network due to the domain-specific way services are distributed. Also, since services are described in greater detail, due to the ontology representation, greater reasoning is applied when querying and registering services. In this paper, we describe an enhancement to the GloServ querying mechanism which allows GloServ servers to process and issue subqueries between servers of different classes. Thus, information about different service classes may be queried for in a single query and issued directly from the front end, creating an extensible platform for service composition. The results are then aggregated and presented to the user such that services which share an attribute are categorized together. We have built and evaluated a location-based web service discovery prototype which demonstrates the flexibility of service composition in GloServ and discuss the design and evaluation of this system. Keywords: service discovery, ontologies, OWL, CAN, peer-to-peer, web service composition | (pdf) |
Using boosting for automated planning and trading systems | German Creaer | 2007-09-15 | The problem: Much of finance theory is based on the efficient market hypothesis. According to this hypothesis, the prices of financial assets, such as stocks, incorporate all information that may affect their future performance. However, the translation of publicly available information into predictions of future performance is far from trivial. Making such predictions is the livelihood of stock traders, market analysts, and the like. Clearly, the efficient market hypothesis is only an approximation which ignores the cost of producing accurate predictions. Markets are becoming more efficient and more accessible because of the use of ever faster methods for communicating and analyzing financial data. Algorithms developed in machine learning can be used to automate parts of this translation process. In other words, we can now use machine learning algorithms to analyze vast amounts of information and compile them to predict the performance of companies, stocks, or even market analysts. In financial terms, we would say that such algorithms discover inefficiencies in the current market. These discoveries can be used to make a profit and, in turn, reduce the market inefficiencies or support strategic planning processes. Relevance: Currently, the major stock exchanges such as NYSE and NASDAQ are transforming their markets into electronic financial markets. Players in these markets must process large amounts of information and make instantaneous investment decisions. Machine learning techniques help investors and corporations recognize new business opportunities or potential corporate problems in these markets. With time, these techniques help the financial market become better regulated and more stable. Also, corporations could save significant amount of resources if they can automate certain corporate finance functions such as planning and trading. Results: This dissertation offers a novel approach to using boosting as a predictive and interpretative tool for problems in finance. Even more, we demonstrate how boosting can support the automation of strategic planning and trading functions. Many of the recent bankruptcy scandals in publicly held US companies such as Enron and WorldCom are inextricably linked to the conflict of interest between shareholders (principals) and managers (agents). We evaluate this conflict in the case of Latin American and US companies. In the first part of this dissertation, we use Adaboost to analyze the impact of corporate governance variables on performance. In this respect, we present an algorithm that calculates alternating decision trees (ADTs), ranks variables according to their level of importance, and generates representative ADTs. We develop a board Balanced Scorecard (BSC) based on these representative ADTs which is part of the process to automate the planning functions. In the second part of this dissertation we present three main algorithms to improve forecasting and automated trading. First, we introduce a link mining algorithm using a mixture of economic and social network indicators to forecast earnings surprises, and cumulative abnormal return. Second, we propose a trading algorithm for short-term technical trading. The algorithm was tested in the context of the Penn-Lehman Automated Trading Project (PLAT) competition using the Microsoft stock. The algorithm was profitable during the competition. Third, we present a multi-stock automated trading system that includes a machine learning algorithm that makes the prediction, a weighting algorithm that combines the experts, and a risk management layer that selects only the strongest prediction and avoids trading when there is a history of negative performance. This algorithm was tested with 100 randomly selected S&P 500 stocks. We find that even an efficient learning algorithm, such as boosting, still requires powerful control mechanisms in order to reduce unnecessary and unprofitable trades that increase transaction costs. | (pdf) |
Oblivious Image Matching | Shai Avidan, Ariel Elbaz, Tal Malkin, Ryan Moriarty | 2007-09-13 | We present the problem of Oblivious Image Matching, where two parties want to determine whether they have images of the same object or scene, without revealing any additional information. While image matching has attracted a great deal of attention in the computer vision community, it was never treated in a cryptographic sense. In this paper we study the private version of the problem, oblivious image matching, and provide an efficient protocol for it. In doing so, we design a novel image matching algorithm, and a few private protocols that may be of independent interest. Specifically, we first show how to reduce the image matching problem to a two-level version of the fuzzy set matching problem, and then present a novel protocol to privately compute this (and several other) matching problems. | (pdf) |
OpenTor: Anonymity as a Commodity Service | Elli Androulaki, Mariana Raykova, Angelos Stavrou, Steven Bellovin | 2007-09-13 | Despite the growth of the Internet and the increasing concern for privacy of online communications, current deployments of anonymization networks depends on a very small set of nodes that volunteer their bandwidth. We believe that the main reason is not disbelief in their ability to protect anonymity, but rather the practical limitations in bandwidth and latency that stem from limited participation. This limited participation, in turn, is due to a lack of incentives. We propose providing economic incentives, which historically have worked very well. In this technical report, we demonstrate a payment scheme that can be used to compensate nodes which provide anonymity in Tor, an existing onion routing, anonymizing network. We show that current anonymous payment schemes are not suitable and introduce a hybrid payment system based on a combination of the Peppercoin Micropayment system and a new type of ``one use'' electronic cash. Our system claims to maintain users' anonymity, although payment techniques mentioned previously --- when adopted individually --- provably fail. | (pdf) |
Reputation Systems for Anonymous Networks | Elli Androulaki, Seung Geol Choi, Steven M. Bellovin, Tal G. Malkin | 2007-09-12 | We present a reputation scheme for a pseudonymous peer-to-peer (P2P) system in an anonymous network. Misbehavior is one of the biggest problems in pseudonymous P2P systems, where there is little incentive for proper behavior. In our scheme, using ecash for reputation points, the reputation of each user is closely related to his real identity rather than to his current pseudonym. Thus, our scheme allows an honest user to switch to a new pseudonym keeping his good reputation, while hindering a malicious user from erasing his trail of evil deeds with a new pseudonym. | (pdf) |
A Study of Malcode-Bearing Documents | Wei-Jen Li, Salvatore Stolfo, Angelos Stavrou, Elli Androulaki, Angelos D. Keromytis | 2007-09-07 | By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party applications that may harbor exploitable vulnerabilities otherwise unreachable by network-level service attacks. Such attacks can be very selective and difficult to detect compared to the typical network worm threat, owing to the complexity of these applications and data formats, as well as the multitude of document-exchange vectors. As a case study, this paper focuses on Microsoft Word documents as malcode carriers. We investigate the possibility of detecting embedded malcode in Word documents using two techniques: static content analysis using statistical models of typical document content, and run-time dynamic tests on diverse platforms. The experiments demonstrate these approaches can not only detect known malware, but also most zero-day attacks. We identify several problems with both approaches, representing both challenges in addressing the problem and opportunities for future research. | (pdf) |
Backstop: A Tool for Debugging Runtime Errors | Christian Murphy, Eunhee Kim, Gail Kaiser, Adam Cannon | 2007-09-06 | The errors that Java programmers are likely to encounter can roughly be categorized into three groups: compile-time (semantic and syntactic), logical, and runtime (exceptions). While much work has focused on the first two, there are very few tools that exist for interpreting the sometimes cryptic messages that result from runtime errors. Novice programmers in particular have difficulty dealing with uncaught exceptions in their code and the resulting stack traces, which are by no means easy to understand. We present Backstop, a tool for debugging runtime errors in Java applications. This tool provides more user-friendly error messages when an uncaught exception occurs, but also provides debugging support by allowing users to watch the execution of the program and the changes to the values of variables. We also present the results of two studies conducted on introductory-level programmers using the two different features of the tool. | (pdf) |
RAS-Models: A Building Block for Self-Healing Benchmarks | Rean Griffith, Ritika Virmani, Gail Kaiser | 2007-09-01 | To evaluate the efficacy of self-healing systems a rigorous, objective, quantitative benchmarking methodology is needed. However, developing such a benchmark is a non-trivial task given the many evaluation issues to be resolved, including but not limited to: quantifying the impacts of faults, analyzing various styles of healing (reactive, preventative, proactive), accounting for partially automated healing and accounting for incomplete/imperfect healing. We posit, however,that it is possible to realize a self-healing benchmark using a collection of analytical techniques and practical tools as building blocks. This paper highlights the flexibility of one analytical tool, the Reliability, Availability and Serviceability (RAS) model, and illustrates its power and relevance to the problem of evaluating self-healing mechanisms/systems, when combined with practical tools for fault-injection. | (pdf) |
A Precomputed Polynomial Representation for Interactive BRDF Editing with Global Illumination | Aner Ben-Artzi, Kevin Egan, Fredo Durand, Ravi Ramamoorthi | 2007-07-31 | The ability to interactively edit BRDFs in their final placement within a computer graphics scene is vital to making informed choices for material properties. We significantly extend previous work on BRDF editing for static scenes (with fixed lighting and view), by developing a precomputed polynomial representation that enables interactive BRDF editing with global illumination. Unlike previous recomputation based rendering techniques, the image is not linear in the BRDF when considering interreflections. We introduce a framework for precomputing a multi-bounce tensor of polynomial coefficients, that encapsulates the nonlinear nature of the task. Significant reductions in complexity are achieved by leveraging the low-frequency nature of indirect light. We use a high-quality representation for the BRDFs at the first bounce from the eye, and lower-frequency (often diffuse) versions for further bounces. This approximation correctly captures the general global illumination in a scene, including color-bleeding, near-field object reflections, and even caustics. We adapt Monte Carlo path tracing for precomputing the tensor of coefficients for BRDF basis functions. At runtime, the high-dimensional tensors can be reduced to a simple dot product at each pixel for rendering. We present a number of examples of editing BRDFs in complex scenes, with interactive feedback rendered with global illumination. | (pdf) |
Parameterizing Random Test Data According to Equivalence Classes | Christian Murphy, Gail Kaiser, Marta Arias | 2007-07-12 | We are concerned with the problem of detecting bugs in machine learning applications. In the absence of sufficient real-world data, creating suitably large data sets for testing can be a difficult task. Random testing is one solution, but may have limited effectiveness in cases in which a reliable test oracle does not exist, as is the case of the machine learning applications of interest. To address this problem, we have developed an approach to creating data sets called “parameterized random data generation”. Our data generation framework allows us to isolate or combine different equivalence classes as desired, and then randomly generate large data sets using the properties of those equivalence classes as parameters. This allows us to take advantage of randomness but still have control over test case selection at the system testing level. We present our findings from using the approach to test two different machine learning ranking applications. | (pdf) |
The Delay-Friendliness of TCP | Salman Abdul Baset, Eli Brosh, Vishal Misra, Dan Rubenstein, Henning Schulzrinne | 2007-06-30 | Traditionally, TCP has been considered unfriendly for real-time applications. Nonetheless, popular applications such as Skype use TCP due to the deployment of NATs and firewalls that prevent UDP traffic. This observation motivated us to study the delay performance of TCP for real-time media flows using an analytical model and experiments. The results obtained yield the working region for VoIP and live video streaming applications and guidelines for delay-friendly TCP settings. Further, our research indicates that simple application-level schemes, such as packet splitting and parallel connections, can significantly improve the delay performance of real-time TCP flows. | (pdf) |
STAND: Sanitization Tool for ANomaly Detection | Gabriela F. Cretu, Angelos Stavrou, Slavatore J. Stolfo, Angelos D. Keromytis | 2007-05-30 | The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Arti- ficial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these “micro-models” to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as “attack-free” and “regular” as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site. | (pdf) |
The Role of Reliability, Availability and Serviceability (RAS) Models in the Design and Evaluation of Self-Healing Systems | Rean Griffith, Ritika Virmani, Gail Kaiser | 2007-04-10 | In an idealized scenario, self-healing systems predict, prevent or diagnose problems and take the appropriate actions to mitigate their impact with minimal human intervention. To determine how close we are to reaching this goal we require analytical techniques and practical approaches that allow us to quantify the effectiveness of a system’s remediations mechanisms. In this paper we apply analytical techniques based on Reliability, Availability and Serviceability (RAS) models to evaluate individual remediation mechanisms of select system components and their combined effects on the system. We demonstrate the applicability of RAS-models to the evaluation of self-healing systems by using them to analyze various styles of remediations (reactive, preventative etc.), quantify the impact of imperfect remediations, identify sub-optimal (less effective) remediations and quantify the combined effects of all the activated remediations on the system as a whole. | (pdf) |
Aequitas: A Trusted P2P System for Paid Content Delivery | Alex Sherman, Japinder Chawla, Jason Nieh, Cliff Stein, Justin Sarma | 2007-03-30 | P2P file-sharing has been recognized as a powerful and efficient distribution model due to its ability to leverage users' upload bandwidth. However, companies that sell digital content on-line are hesitant to rely on P2P models for paid content distribution due to the free file-sharing inherent in P2P models. In this paper we present Aequitas, a P2P system in which users share paid content anonymously via a layer of intermediate nodes. We argue that with the extra anonymity in Aequitas, vendors could leverage P2P bandwidth while effectively maintaining the same level of trust towards their customers as in traditional models of paid content distribution. As a result, a content provider could reduce its infrastructure costs and subsequently lower the costs for the end-users. The intermediate nodes are incentivized to contribute their bandwidth via electronic micropayments. We also introduce techniques that prevent the intermediate nodes from learning the content of the files they help transmit. In this paper we present the design of our system, an analysis of its properties and an implementation and experimental evaluation. We quantify the value of the intermediate nodes, both in terms of efficiency and their effect on anonoymity. We argue in support of the economic and technological merits of the system. | (pdf) |
Can P2P Replace Direct Download for Content Distribution | Alex Sherman, Angelos Stavrou, Jason Nieh, Cliff Stein, Angelos Keromytis | 2007-03-30 | While peer-to-peer (P2P) file-sharing is a powerful and cost-effective content distribution model, most paid-for digital-content providers (CPs) rely on direct download to deliver their content. CPs such as Apple iTunes that command a large base of paying users are hesitant to use a P2P model that could easily degrade their user base into yet another free file-sharing community. We present TP2, a system that makes P2P file sharing a viable delivery mechanism for paid digital content by providing the same security properties as the currently used direct-download model.} introduces the novel notion of trusted auditors (TAs) -- P2P peers that are controlled by the system operator. TAs monitor the behavior of other peers and help detect and prevent formation of illegal file-sharing clusters among the CP's user base. TAs both complement and exploit the strong authentication and authorization mechanisms that are used in TP2 to control access to content. It is important to note that TP2 does not attempt to solve the out-of-band file-sharing or DRM problems, which also exist in the direct-download systems currently in use. We analyze TP2 by modeling it as a novel game between misbehaving users who try to form unauthorized file-sharing clusters and TAs who curb the growth of such clusters. Our analysis shows that a small fraction of TAs is sufficient to protect the P2P system against unauthorized file sharing. In a system with as many as 60\% of misbehaving users, even a small fraction of TAs can detect 99\% of unauthorized cluster formation. We developed a simple economic model to show that even with such a large fraction of malicious nodes, TP2 can improve CP's profits (which could translate to user savings) by 62 to 122\%, even while assuming conservative estimates of content and bandwidth costs. We implemented TP2 as a layer on top of BitTorrent and demonstrated experimentally using PlanetLab that our system provides trusted P2P file sharing with negligible performance overhead. | (pdf) |
Policy Algebras for Hybrid Firewalls | Hang Zhao, Steven M. Bellovin | 2007-03-21 | Firewalls are a effective means of protecting a local system or network of systems from network-based security threats. In this paper, we propose a policy algebra framework for security policy enforcement in hybrid firewalls, ones that exist both in the network and on end systems. To preserve the security semantics, the policy algebras provide a formalism to compute addition, conjunction, subtraction, and summation on rule sets; it also defines the cost and risk functions associated with policy enforcement. Policy outsourcing triggers global cost minimization. We show that our framework can easily be extended to support packet filter firewall policies. Finally, we discuss special challenges and requirements for applying the policy algebra framework to MANETs. | (pdf) |
The PBS Policy: Some Properties and Their Proofs | Hanhua Feng, Vishal Misra, Dan Rubenstein | 2007-03-20 | In this report we analyze a configurable blind scheduler containing a continuous, tunable parameter. After the definition of this policy, we prove the property of no surprising interruption, the property of no permanent starvation, and two theorems about monotonicity of this policy. This technical report contains supplemental materials for the following publication: Hanhua Feng, Vishal Misra, and Dan Rubenstein, "PBS: A unified priority-based scheduler", Proceedings of ACM SIGMETRICS '07, 2007. | (pdf) (ps) |
An Approach to Software Testing of Machine Learning Applications | Christian Murphy, Gail Kaiser, Marta Arias | 2007-03-19 | Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test such ML software, because there is no reliable test oracle. We describe a software testing approach aimed at addressing this problem. We present our findings from testing implementations of two different ML ranking algorithms: Support Vector Machines and MartiRank. | (pdf) |
Design, Implementation, and Validation of a New Class of Interface Circuits for Latency-Insensitive Design | Cheng-Hong Li, Rebecca Collins, Sampada Sonalkar, Luca P. Carloni | 2007-03-05 | With the arrival of nanometer technologies wire delays are no longer negligible with respect to gate delays, and timing-closure becomes a major challenge to System-on-Chip designers. Latency-insensitive design (LID) has been proposed as a "correct-by-construction" design methodology to cope with this problem. In this paper we present the design and implementation of a new and more efficient class of interface circuits to support LID. Our design offers substantial improvements in terms of logic delay over the design originally proposed by Carloni et al. [1] as well as in terms of both logic delay and processing throughput over the synchronous elastic architecture (SELF) recently proposed by Cortadella et al. [2]. These claims are supported by the experimental results that we obtained completing semi-custom implementations of the three designs with a 90nm industrial standard-cell library. We also report on the formal verification of our design: using the NuSMV model checker we verified that the RTL synthesizable implementations of our LID interface circuits (relay stations and shells) are correct refinements of the corresponding abstract specifications according to the theory of LID [3]. | (pdf) |
Evaluating Software Systems via Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models | Rean Griffith | 2007-02-28 | The most common and well-understood way to evaluate and compare computing systems is via performance-oriented benchmarks. However, numerous other demands are placed on computing systems besides speed. Current generation and next generation computing systems are expected to be reliable, highly available, easy to manage and able to repair faults and recover from failures with minimal human intervention. The extra-functional requirements concerned with reliability, high availability, and serviceability (manageability, repair and recovery) represent an additional set of high-level goals the system is expected to meet or exceed. These goals govern the system’s operation and are codified using policies and service level agreements (SLAs). To satisfy these extra-functional requirements, system-designers explore or employ a number of mechanisms geared towards improving the system’s reliability, availability and serviceability (RAS) characteristics. However, to evaluate these mechanisms and their impact, we need something more than performance metrics. Performance-measures are suitable for studying the feasibility of the mechanisms i.e. they can be used to conclude that the level of performance delivered by the system with these mechanisms active does not preclude its usage. However, performance numbers convey little about the efficacy of the systems RAS-enhancing mechanisms. Further, they do not allow us to analyze the (expected or actual) impact of individual mechanisms or make comparisons/discuss tradeoffs between mechanisms. What is needed is an evaluation methodology that is able to analyze the details of the RAS-enhancing mechanisms – the micro-view as well as the high-level goals, expressed as policies, SLAs etc., governing the system’s operation – the macro-view. Further, we must establish a link between the details of the mechanisms and their impact on the high-level goals. This thesis is concerned with developing the tools and applying analytical techniques to enable this kind of evaluation. We make three contributions. First, we contribute to a suite of runtime fault-injection tools with Kheiron. Kheiron demonstrates a feasible, low-overhead, transparent approach to performing system-adaptations in a variety of execution environments at runtime. We use Kheiron’s runtime-adaptation capability to inject faults into running programs. We present three implementations of Kheiron, each targeting a different execution environment. Kheiron/C manipulates compiled C-programs running in an unmanaged execution environment – comprised of the operating system and the underlying processor. Kheiron/CLR manipulates programs running in Microsoft’s Common Language Runtime (CLR) and Kheiron/JVM manipulates programs running in Sun Microsystems’ Java Virtual Machine (JVM). Kheiron’s operation is transparent to both the application and the execution environment. Further, the overheads imposed by Kheiron on the application and the execution environment are negligible, <5%, when no faults are being injected. Second, we describe analytical techniques based on RAS-models, represented as Markov chains and Markov reward models, to demonstrate their power in evaluating RAS-mechanisms and their impact on the high-level goals governing system-operation. We demonstrate the flexibility of these models in evaluating reactive, proactive and preventative mechanisms as well as their ability to explore the feasibility of yet-to-be-implemented mechanisms. Our analytical techniques focus on remediations rather than observed mean time to failures (MTTF). Unlike hardware, where the laws of physics govern the failure rates of mechanical and electrical parts, there are no such guarantees for software failure rates. Software failure-rates can however be influenced using fault-injection, which we employ in our experiments. In our analysis we consider a number of facets of remediations, which include, but go beyond mean time to recovery (MTTR). For example we consider remediation success rates, the (expected) impact of preventative-maintenance and the degradation-impact of remediations in our efforts to establish a framework for reasoning about the tradeoffs (the costs versus the benefits) of various remediation mechanisms. Finally, we distill our experiences developing runtime fault-injection tools, performing fault-injection experiments and constructing and analyzing RAS-models into a 7-step process for evaluating computing systems – the 7U-evaluation methodology. Our evaluation method succeeds in establishing the link between the details of the low-level mechanisms and the high-level goals governing the system’s operation. It also highlights the role of environmental constraints and policies in establishing meaningful criteria for scoring and comparing these systems and their RAS-enhancing mechanisms. | (pdf) |
Privacy-Preserving Distributed Event Corroboration | Janak J. Parekh | 2007-02-26 | Event correlation is a widely-used data processing methodology for a broad variety of applications, and is especially useful in the context of distributed monitoring for software faults and vulnerabilities. However, most existing solutions have typically been focused on "intra-organizational" correlation; organizations typically employ privacy policies that prohibit the exchange of information outside of the organization. At the same time, the promise of "inter-organizational" correlation is significant given the broad availability of Internet-scale communications, and its potential role in both software fault maintenance and software vulnerability detection. In this thesis, I present a framework for reconciling these opposing forces via the use of privacy preservation integrated into the event processing framework. I introduce the notion of event corroboration, a reduced yet flexible form of correlation that enables collaborative verification, without revealing sensitive information. By accommodating privacy policies, we enable the corroboration of data across different organizations without actually releasing sensitive information. The framework supports both source anonymity and data privacy, yet allows for temporal corroboration of a broad variety of data. The framework is designed as a lightweight collection of components to enable integration with existing COTS platforms and distributed systems. I also present an implementation of this framework: Worminator, a collaborative Intrusion Detection System, based on an earlier platform, XUES (XML Universal Event Service), an event processor used as part of a software monitoring platform called KX (Kinesthetics eXtreme). KX comprised a series of components, connected together with a publish-subscribe content-based routing event subsystem, for the autonomic software monitoring, reconfiguration, and repair of complex distributed systems. Sensors were installed in legacy systems; XUES' two modules then performed event processing on sensor data: information was collected and processed by the Event Packager, and correlated using the Event Distiller. While XUES itself was not privacy-preserving, it laid the groundwork for this thesis by supporting event typing, the use of publish-subscribe and extensibility support via pluggable event transformation modules. I also describe techniques by which corroboration and privacy preservation could optionally be "retrofitted" onto XUES without breaking the correlation applications and scenarios described. Worminator is a ground-up rewrite of the XUES platform to fully support privacy-preserving event types and algorithms in the context of a Collaborative Intrusion Detection System (CIDS), whereby sensor alerts can be exchanged and corroborated without revealing sensitive information about a contributor's network, services, or even external sources, as required by privacy policies. Worminator also fully anonymizes source information, allowing contributors to decide their preferred level of information disclosure. Worminator is implemented as a monitoring framework on top of a collection of non-collaborative COTS and in-house IDS sensors, and demonstrably enables the detection of not only worms but also "broad and stealthy" scans; traditional single-network sensors either bury such scans in large volumes or miss them entirely. Worminator supports corroboration for packet and flow headers (metadata), packet content, and even aggregate models of network traffic using a variety of techniques. The contributions of this thesis include the development of a cross-application-domain event processing framework with native privacy-preserving types, the use and validation of privacy-preserving corroboration, and the establishment of a practical deployed collaborative security system. The thesis also quantifies Worminator's effectiveness at attack detection, the overhead of privacy preservation and the effectiveness of our approach against adversaries, be they "honest-but-curious" or actively malicious. | (pdf) |
Distributed Algorithms for Secure Multipath Routing in Attack-Resistant Networks | Patrick Pak-Ching Lee, Vishal Misra, Dan Rubenstein | 2007-02-16 | To proactively defend against intruders from readily jeopardizing single-path data sessions, we propose a {\em distributed secure multipath solution} to route data across multiple paths so that intruders require much more resources to mount successful attacks. Our work exhibits several important properties that include: (1) routing decisions are made locally by network nodes without the centralized information of the entire network topology, (2) routing decisions minimize throughput loss under a single-link attack with respect to different session models, and (3) routing decisions address multiple link attacks via lexicographic optimization. We devise two algorithms termed the {\em Bound-Control algorithm} and the {\em Lex-Control algorithm}, both of which provide provably optimal solutions. Experiments show that the Bound-Control algorithm is more effective to prevent the worst-case single-link attack when compared to the single-path approach, and that the Lex-Control algorithm further enhances the Bound-Control algorithm by countering severe single-link attacks and various types of multi-link attacks. Moreover, the Lex-Control algorithm offers prominent protection after only a few execution rounds, implying that we can sacrifice minimal routing protection for significantly improved algorithm performance. Finally, we examine the applicability of our proposed algorithms in a specialized defensive network architecture called the attack-resistant network and analyze how the algorithms address resiliency and security in different network settings. | (pdf) |
MutaGeneSys: Making Diagnostic Predictions Based on Genome-Wide Genotype Data in Association Studies | Julia Stoyanovich, Itsik Pe'er | 2007-02-16 | Summary: We present MutaGeneSys: a system that uses genomewide genotype data for disease prediction. Our system integrates three data sources: the International HapMap project, whole-genome marker correlation data and the Online Mendelian Inheritance in Man (OMIM) database. It accepts SNP data of individuals as query input and delivers disease susceptibility hypotheses even if the original set of typed SNPs is incomplete. Our system is scalable and flexible: it operates in real time and can be configured on the fly to produce population, technology, and confidence-specific predictions. Availability: Efforts are underway to deploy our system as part of the NCBI Reference Assembly. Meanwhile, the system may be obtained from the authors. Contact: jds1@cs.columbia.edu | (pdf) |
Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems | Gabriela F. Cretu, Angelos Stavrou, Salvatore J. Stolfo, Angelos D. Keromytis | 2007-02-15 | Anomaly Detection (AD) sensors have become an invaluable tool for forensic analysis and intrusion detection. Unfortunately, the detection performance of all learning-based ADs depends heavily on the quality of the training data. In this paper, we extend the training phase of an AD to include a sanitization phase. This phase significantly improves the quality of unlabeled training data by making them as ”attack-free” as possible in the absence of absolute ground truth. Our approach is agnostic to the underlying AD, boosting its performance based solely on training-data sanitization. Our approach is to generate multiple AD models for content-based AD sensors trained on small slices of the training data. These AD “micro-models” are used to test the training data, producing alerts for each training input. We employ voting techniques to determine which of these training items are likely attacks. Our preliminary results show that sanitization increases 0-day attack detection while in most cases reducing the false positive rate. We analyze the performance gains when we deploy sanitized versus unsanitized AD systems in combination with expensive hostbased attack-detection systems. Finally, we show that our system incurs only an initial modest cost, which can be amortized over time during online operation. | (pdf) |
Accelerating Service Discovery in Ad-Hoc Zero Configuration Networking | Se Gi Hong, Suman Srinivasan, Henning Schulzrinne | 2007-02-12 | Zero Configuration Networking (Zeroconf) assigns IP addresses and host names, and discovers service without a central server. Zeroconf can be used in wireless mobile ad-hoc networks which are based on IEEE 802.11 and IP. However, Zeroconf has problems in mobile ad-hoc networks as it cannot detect changes in the network topology. In highly mobile networks, Zeroconf causes network overhead while discovering new services. In this paper, we propose an algorithm to accelerate service discovery for mobile ad-hoc networks. Our algorithm involves the monitoring of network interface changes that occur when a device with IEEE 802.11 enabled joins a new network area. This algorithm allows users to discover network topology changes and new services in real-time while minimizing network overhead. | (pdf) |
From STEM to SEAD: Speculative Execution for Automated Defense | Michael Locasto, Angelos Stavrou, Gabriela F. Cretu, Angelos D. Keromytis | 2007-02-10 | Most computer defense systems crash the process that they protect as part of their response to an attack. In contrast, self-healing software recovers from an attack by automatically repairing the underlying vulnerability. Although recent research explores the feasibility of the basic concept, self-healing faces four major obstacles before it can protect legacy applications and COTS software. Besides the practical issues involved in applying the system to such software (<i>e.g.</i>, not modifying source code), self-healing has encountered a number of problems: knowing when to engage, knowing how to repair, and handling communication with external entities. <p> Our previous work on a self-healing system, STEM, left these challenges as future work. STEM provides self-healing by speculatively executing ``slices'' of a process. This paper improves STEM's capabilities along three lines: (1) applicability of the system to COTS software (STEM does not require source code, and it imposes a roughly 73% performance penalty on Apache's normal operation), (2) semantic correctness of the repair (we introduce <i>virtual proxies</i> and <i>repair policy</i> to assist the healing process), and (3) creating a behavior profile based on aspects of data and control flow. | (pdf) |
Topology-Based Optimization of Maximal Sustainable Throughput in a Latency-Insensitive System | Rebecca Collins, Luca Carloni | 2007-02-06 | We consider the problem of optimizing the performance of a latency-insensitive system (LIS) where the addition of backpressure has caused throughput degradation. Previous works have addressed the problem of LIS performance in different ways. In particular, the insertion of relay stations and the sizing of the input queues in the shells are the two main optimization techniques that have been proposed. We provide a unifying framework for this problem by outlining which approaches work for different system topologies, and highlighting counterexamples where some solutions do not work. We also observe that in the most difficult class of topologies, instances with the greatest throughput degradation are typically very amenable to simplifications. The contributions of this paper include a characterization of topologies that maintain optimal throughput with fixed-size queues and a heuristic for sizing queues that produces solutions close to optimal in a fraction of the time. | (pdf) |
On the infeasibility of Modeling Polymorphic Shellcode for Signature Detection | Yingbo Song, Michael E. Locasto, Angelos Stavrou, Angelos D. Keromytis, Salvatore J. Stolfo | 2007-02-04 | POlymorphic malcode remains one of the most troubling threats for information security and intrusion defense systems. The ability for malcode to be automatically transformed into a semantically equivalent variant frustrates attemtps to construct a single, simple, easily verifiable representation. We present a quantitative analysis of the strentghs and limitations of shellcode polymorphism and consider the impact of this analysis on the current practices in intrusion detection. Our examination focuses on the nature of shellcode "decoding routines", and the empirical evidence we gather illustrate our mail result: that the challenge of modeling the class of self-modifying code is likely intractable - even when the size of the instruction sequence (i.e. the decoder) is relatively small. We develop metrics to gauge the power of polymorphic engines and use them to provide insight into the strengths and weaknesses of some popular engines. We believe this analysis supplies a novel and useful way to understand the limitations of the current generation of signature-based techniques. We analyze some contemporary polymorphic techniques, explore ways to improve them in order to forecast the nature of future threats, and present our suggestions for countermeasures. Our resulsts indicate that the class of polymorphic behavior is too greatly spread and varied to model effectively. We conclude that modeling normal content is ulatimately a more promising defense mechanism than modeling malicious or abnormal content. | (pdf) |
Combining Ontology Queries with Text Search in Service Discovery | Knarig Arabshian, Henning Schulzrinne | 2007-01-21 | We present a querying mechanism for service discovery which combines ontology queries with text search. The underlying service discovery architecture used is GloServ. GloServ uses the Web Ontology Language (OWL) to classify services in an ontology and map knowledge obtained by the ontology onto a hierarchical peer-to-peer network. Initially, an ontology-based first order predicate logic query is issued in order to route the query to the appropriate server and to obtain exact and related service data. Text search further enhances querying by allowing services to be described not only with ontology attributes, but with plain text so that users can query for them using key words. Currently, querying is limited to either simple attribute-value pair searches, ontology queries or text search. Combining ontology queries with text search enhances current service discovery mechanisms. | (pdf) |
A Model for Automatically Repairing Execution Integrity | Michael Locasto, Gabriela F. Cretu, Angelos Stavrou, Angelos D. Keromytis | 2007-01-20 | Many users value applications that continue execution in the face of attacks. Current software protection techniques typically abort a process after an intrusion attempt ({\it e.g.}, a code injection attack). We explore ways in which the security property of integrity can support availability. We extend the Clark-Wilson Integrity Model to provide primitives and rules for specifying and enforcing repair mechanisms and validation of those repairs. Users or administrators can use this model to write or automatically synthesize \emph{repair policy}. The policy can help customize an application's response to attack. We describe two prototype implementations for transparently applying these policies without modifying source code. | (pdf) |
Using Functional Independence Conditions to Optimize the Performance of Latency-Insensitive Systems | Cheng-Hong Li, Luca Carloni | 2007-01-11 | In latency-insensitive design shell modules are used to encapsulate system components (pearls) in order to interface them with the given latency-insensitive protocol and dynamically control their operations. In particular, a shell stalls a pearl whenever new valid data are not available on its input channels. We study how functional independence conditions (FIC) can be applied to the performance optimization of a latency-insensitive system by avoiding unnecessary stalling of their pearls. We present a novel circuit design of a generic shell template that can exploit FICs. We also provide an automatic procedure for the logic synthesis of a shell instance that is only based on the particular local characteristics of its corresponding pearl and does not require any input from the designers. We conclude reporting on a set of experimental results that illustrate the beneits and overhead of the proposed technique. | (pdf) |
Whitepaper: The Value of Improving the Separation of Concerns | Marc Eaddy, Alan Cyment, Pierre van de Laar, Fabian Schmied, Wolfgang Schult | 2007-01-09 | Microsoft's enterprise customers are demanding better ways to modularize their software systems. They look to the Java community, where these needs are being met with language enhancements, improved developer tools and middleware, and better runtime support. We present a business case for why Microsoft should give priority to supporting better modularization techniques, also known as advanced separation of concerns (ASOC), for the .NET platform, and we provide a roadmap for how to do so. | (pdf) |
An Implementation of a Renesas H8/300 Microprocessor with a Cycle-Level Timing Extension | Chen-Chun Huang, Javier Coca, Yashket Gupta, Stephen A. Edwards | 2006-12-30 | We describe an implementation of the Renesas H8/300 16-bit processor in VHDL suitable for synthesis on an FPGA. We extended the ISA slightly to accomodate cycle-accurate timers accessible from the instruction set, designed to provide more precise real-time control. We describe the architecture of our implementation in detail, describe our testing strategy, and finally show how to built a cross compilation toolchain under Linux. | (pdf) |
Embedded uClinux, the Altera DE2, and the SHIM Compiler | Wei-Chung Hsu, David Lariviere, Stephen A. Edwards | 2006-12-28 | SHIM is a concurrent deterministic language focused on embedded system. Although SHIM has undergone substantial evolution, it currently does not have a code generator for a true embedded environment. In this project, we built an embedded environment that we intend to use as a target for the SHIM compiler. We add the uClinux operating system between hardware devices and software programs. Our long-term goal is to have the SHIM compiler generate both user-space and kernel/module programs for this environment. This project is a first step: we manually explored what sort of code we ultimately want the SHIM compiler to produce. In this report, we provide instructions on how to build and install uClinux into an Altera DE2 board and example programs, including a user-space program, a kernel module, and a simple device driver for the buttons on the DE2 board that includes an interrupt handler. | (pdf) (ps) |
A JPEG Decoder in SHIM | Nalini Vasudevan, Stephen A. Edwards | 2006-12-25 | Image compression plays an important role in multimedia systems, digital systems, handheld systems and various other devices. Efficient image processing techniques are needed to make images suitable for use in embedded systems. This paper describes an implementation of a JPEG decoder in the SHIM programming language. SHIM is a software/hardware integration language whose aim is to provide communication between hardware and software while providing deterministic concurrency. The paper shows that a JPEG decoder is a good application and reasonable test case for the SHIM language and illustrates the ease with which conventional sequential decoders can be modified to achieve concurrency. | (pdf) (ps) |
Arrays in SHIM: A Proposal | Smridh Thapar, Olivier Tardieu, Stephen A. Edwards | 2006-12-23 | The use of multiprocessor configurations over uniprocessor is rapidly increasing to exploit parallelism instead of frequency scaling for better compute capacity. The multiprocessor architectures being developed will have a major impact on existing software. Current languages provide facilities for concurrent and distributed programming, but are prone to races and non-determinism. SHIM, a deterministic concurrent language, guarantees the behavior of its programs are independent of the scheduling of concurrent operations. The language currently supports atomic arrays only, i.e., parts of arrays cannot be sent to concurrent processes for evaluation (and edition). In this report, we propose a way to add non-atomic arrays to SHIM and describe the semantics that should be considered while allowing concurrent processes to edit parts of the same array. | (pdf) (ps) |
High Quality, Efficient Hierarchical Document Clustering using Closed Interesting Itemsets | Hassan Malik, John Kender | 2006-12-18 | High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to improve the efficiency of hierarchical document clustering. In this paper, we introduce the notion of “closed interesting” itemsets (i.e. closed itemsets with high interestingness). We provide heuristics such as “super item” to efficiently mine these itemsets and show that they provide significant dimensionality reduction over closed frequent itemsets. Using “closed interesting” itemsets, we propose a new hierarchical document clustering method that outperforms state of the art agglomerative, partitioning and frequent-itemset based methods both in terms of FScore and Entropy, without requiring dataset specific parameter tuning. We evaluate twenty interestingness measures on nine standard datasets and show that when used to generate “closed interesting” itemsets, and to select parent nodes, Mutual Information, Added Value, Yule’s Q and Chi-Square offers best clustering performance, regardless of the characteristics of underlying dataset. We also show that our method is more scalable, and results in better run-time performance as compare to leading approaches. On a dual processor machine, our method scaled sub-linearly and was able to cluster 200K documents in about 40 seconds. | (pdf) |
LinkWidth: A Method to measure Link Capacity and Available Bandwidth using Single-End Probes | Sambuddho Chakravarty, Angelos Stavrou, Angelos D. Keromytis | 2006-12-15 | We introduce LinkWidth, a method for estimating capacity and available bandwidth using single-end controlled TCP packet probes. To estimate capacity, we generate a train of TCP RST packets “sandwiched” between two TCP SYN packets. Capacity is obtained by end-to-end packet dispersion of the received TCP RST/ACK packets corresponding to the TCP SYN packets. Our technique is significantly different from the rest of the packet-pair-based measurement techniques, such as CapProbe, pathchar and pathrate, because the long packet trains minimize errors due to bursty cross-traffic. TCP RST packets do not generate additional ICMP replies preventing cross-traffic interference with our probes. In addition, we use TCP packets for all our probes to prevent some types of QoS-related traffic shaping from affecting our measurements. We extend the Train of Packet Pairs technique to approximate the available link capacity. We use pairs of TCP packets with variable intra-pair delays and sizes. This is the first attempt to implement this technique using single-end TCP probes, tested on a wide range of real networks with variable cross-traffic. We compare our prototype with pathchirp and pathload, which require control of both ends, and demonstrate that in most cases our method gives approximately the same results. | (pdf) |
Deriving Utility from a Self-Healing Benchmark Report | Ritika Virmani, Rean Griffith, Gail Kaiser | 2006-12-15 | Autonomic systems, specifically self-healing systems, currently lack an objective and relevant methodology for their evaluation. Due to their focus on problem detection, diagnosis and remediation any evaluation methodology should facilitate an objective evaluation and/or comparison of these activities. Measures of “raw” performance are easily quantified and hence facilitate measurement and comparison on the basis of numbers. However, classifying a system better at problem detection, diagnosis and remediation purely on the basis of performance measures is not useful. The proposed evaluation methodology devised will differ from traditional benchmarks, which are primarily concerned with measures of performance. In order to develop this methodology we rely on a set of experiments which will enable us to compare the self-healing capabilities of one system versus another. As currently we do not have available “real” self-healing systems, we will simulate the behavior of some target self-healing systems, system faults and the operational and repair activities of target systems. Further, we will use the results derived from the simulation experiments to answer questions relevant to the utility of a benchmark report. | (pdf) |
Measurements of DNS Stability | Omer Boyaci, Henning Schulzrinne | 2006-12-14 | In this project, we measured the stability of DNS servers based on the most popular 500 domains. In the first part of the project, DNS server replica counts and maximum DNS server separation are found for each domain. In the second part, these domains are queried for a one-month period in order to find their uptime percentages. | (pdf) |
Cooperation Between Stations in Wireless Networks | Andrea G. Forte, Henning Schulzrinne | 2006-12-07 | In a wireless network, mobile nodes (MNs) repeatedly perform tasks such as layer 2 (L2) handoff, layer 3 (L3) handoff and authentication. These tasks are critical, particularly for real-time applications such as VoIP. We propose a novel approach, namely Cooperative Roaming (CR), in which MNs can collaborate with each other and share useful information about the network in which they move. We show how we can achieve seamless L2 and L3 handoffs regardless of the authentication mechanism used and without any changes to either the infrastructure or the protocol. In particular, we provide a working implementation of CR and show how, with CR, MNs can achieve a total L2+L3 handoff time of less than 16 ms in an open network and of about 21 ms in an IEEE 802.11i network. We consider behaviors typical of IEEE 802.11 networks, although many of the concepts and problems addressed here apply to any kind of mobile network. | (pdf) |
Throughput and Fairness in CSMA/CA Wireless Networks | Hoon Chang, Vishal Misra, Dan Rubenstein | 2006-12-07 | While physical layer capture has been observed in real implementations of wireless devices accessing the channel like 802.11, log-utility fair allocation algorithms based on accurate channel models describing the phenomenon have not been developed. In this paper, using a general physical channel model, we develop an allocation algorithm for log-utility fairness. To maximize the aggregate utility, our algorithm determines channel access attempt probabilities of nodes using partial derivatives of the utility. Our algorithm is verified through extended simulations. The results indicate that our algorithm could quickly achieve allocations close to the optimum with 8.6% accuracy error on average. | (pdf) |
A Case for P2P Delivery of Paid Content | Alex Sherman, Angelos Stavrou, Jason Nieh, Cliff Stein, Angelos D. Keromytis | 2006-11-28 | P2P file sharing provides a powerful content distribution model by leveraging users' computing and bandwidth resources. However, companies have been reluctant to rely on P2P systems for paid content distribution due to their inability to limit the exploitation of these systems for free file sharing. We present \sname, a system that combines the more cost-effective and scalable distribution capabilities of P2P systems with a level of trust and control over content distribution similar to direct download content delivery networks. \sname\ uses two key mechanisms that can be layered on top of existing P2P systems. First, it provides strong authentication to prevent free file sharing in the system. Second, it introduces a new notion of trusted auditors to detect and limit malicious attempts to gain information about participants in the system to facilitate additional out-of-band free file sharing. We analyze \sname\ by modeling it as a novel game between malicious users who try to form free file sharing clusters and trusted auditors who curb the growth of such clusters. Our analysis shows that a small fraction of trusted auditors is sufficient to protect the P2P system against unauthorized file sharing. Using a simple economic model, we further show that \sname\ provides a more cost-effective content distribution solution, resulting in higher profits for a content provider even in the presence of a large percentage of malicious users. Finally, we implemented \sname\ on top of BitTorrent and use PlanetLab to show that our system can provide trusted P2P f | (pdf) |
Presence Traffic Optimization Techniques | Vishal Singh, Henning Schulzrinne, Markus Isomaki, Piotr Boni | 2006-11-02 | With the growth of presence-based services, it is important to provision the network to support high traffic and load generated by presence services. Presence event distribution systems amplify a single incoming PUBLISH message into possibly numerous outgoing NOTIFY messages from the server. This can increase the network load on inter-domain links and can potentially disrupt other QoS-sensitive applications. In this document, we present existing as well as new techniques that can be used to reduce presence traffic both in inter-domain and intra-domain scenarios. Specifically, we propose two new techniques: sending common NOTIFY for multiple watchers and batched notifications. We also propose some generic heuristics that can be used to reduce network traffic due to presence. | (pdf) |
A Common Protocol for Implementing Various DHT Algorithms | Salman Abdul Baset, Henning Schulzrinne, Eunsoo Shim | 2006-10-22 | This document defines DHT-independent and DHT-dependent features of DHT algorithms and presents a comparison of Chord, Pastry and Kademlia. It then describes key DHT operations and their information requirements. | (pdf) |
A survey on service creation by end users | Xiaotao Wu, Henning Schulzrinne | 2006-10-15 | We conducted a survey on end users’ willingness and capability to create their desired communication services. The survey is based on the graphical service creation tool we implemented for the Language for End System Services (LESS). We call the tool CUTE, which stands for Columbia University Telecommunication service Editor. This report demonstrates our survey result and shows that relatively inexperienced users are willing and capable to create their desired communication services, and CUTE fits their needs. | (pdf) |
A VoIP Privacy Mechanism and its Application in VoIP Peering for Voice Service Provider Topology and Identity Hiding | Charles Shen, Henning Schulzrinne | 2006-10-03 | Voice Service Providers (VSPs) participating in VoIP peering frequently want to withhold their identity and related privacy-sensitive information from other parties during the VoIP communication. A number of existing documents on VoIP privacy exist, but most of them focus on end user privacy. By summarizing and extending existing work, we present a unified privacy mechanism for both VoIP users and service providers. We also show a case study on how VSPs can use this mechanism for identity and topology hiding in VoIP peering. | (pdf) |
Evaluation and Comparison of BIND, PDNS and Navitas as ENUM Server | Charles Shen, Henning Schulzrinne | 2006-09-27 | ENUM is a protocol standard developed by the Internet Engineering Task Force (IETF) for translating the E.164 phone numbers into Internet Universal Resource Identifiers (URIs). It plays an increasingly important role as the bridge between Internet and traditional telecommunications services. ENUM is based on the Domain Name System (DNS), but places unique performance requirements on DNS server. In particular, ENUM server needs to host a huge number of records, provide high query throughput for both existing and non-existing records in the server, maintain high query performance under update load, and answer queries within a tight latency budget. In this report, we evaluate and compare performance of serving ENUM queries by three servers, namely BIND, PDNS and Navitas. Our objective is to answer whether and how these servers can meet the unique performance requirements of ENUM. Test results show that the ENUM query response time on our platform has always been on the order of a few milliseconds or less, so this is likely not a concern. Throughput then becomes the key. The throughput of BIND degrades linearly as the record set size grows, so BIND is not suitable for ENUM. PDNS delivers higher performance than BIND in most cases, while the commercial Navitas server presents even better ENUM performance than PDNS. Under our 5M-record set test, Navitas server with its default configuration consumes one tenth to one sixth the memory of PDNS, achieves six times higher throughput for existing records and an order of two magnitudes higher throughput for non-existing records than the bottom line PDNS server without caching. The throughput of Navitas is also the highest among the tested servers when the database is being updated in the background. We investigated ways to improve PDNS performance. For example, doubling CPU processing power by putting PDNS and its backend database in two separate machines can increase PDNS throughput for existing records by 45% and that for nonexisting records by 40%. Since PDNS is open source, we also instrumented the source code to obtain a detailed profile of contributions of various systems components to the overall latency. We found that when the server is within its normal load range, the main component of server processing latency is caused by backend database lookup operations. Excessive number of backend database lookups is the reason that makes PDNS throughput for non-existing records its key weakness. We studied using PDNS caching to reduce the number of database lookups. With a full packet cache and a modified cache maintenance mechanism, the PDNS throughput for existing records can be improved by 100%. This brings the value to one third of its Navitas counterpart. After enabling the PDNS negative query cache, we improved PDNS throughput for non-existing records to the level comparable to its throughput for existing records, but this result is still an order of magnitude lower than the corresponding value in Navitas. Further improvements of PDNS throughput for non-existing records will require optimization of related processing mechanism in its implementation. | (pdf) |
Specifying Confluent Processes | Olivier Tardieu, Stephen A. Edwards | 2006-09-22 | We address the problem of specifying concurrent processes that can make local nondeterministic decisions without affecting global system behavior---the sequence of events communicated along each inter-process communication channel. Such nondeterminism can be used to cope with unpredictable execution rates and communication delays. Our model resembles Kahn's, but does not include unbounded buffered communication, so it is much simpler to reason about and implement. After formally characterizing these so-called confluent processes, we propose a collection of operators, including sequencing, parallel, and our own creation, confluent choice, that guarantee confluence by construction. The result is a set of primitive constructs that form the formal basis of a concurrent programming language for both hardware and software systems that gives deterministic behavior regardless of the relative execution rates of the processes. Such a language greatly simplifies the verification task because any correct implementation of such a system is guaranteed to have the same behavior, a property rarely found in concurrent programming environments. | (pdf) (ps) |
MacShim: Compiling MATLAB to a Scheduling-Independent Concurrent Language | Neesha Subramaniam, Ohan Oda, Stephen A. Edwards | 2006-09-22 | Nondeterminism is a central challenge in most concurrent models of computation. That programmers must worry about races and other timing-dependent behavior is a key reason that parallel programming has not been widely adopted. The SHIM concurrent language, intended for hardware/software codesign applications, avoids this problem by providing deterministic (race-free) concurrency, but does not support automatic parallelization of sequential algorithms. In this paper, we present a compiler able to parallelize a simple MATLAB-like language into concurrent SHIM processes. From a user-provided partitioning of arrays to processes, our compiler divides the program into coarse-grained processes and schedules and synthesizes inter-process communication. We demonstrate the effectiveness of our approach on some image-processing algorithms. | (pdf) (ps) |
SHIM: A Deterministic Approach to Programming with Threads | Olivier Tardieu, Stephen A. Edwards | 2006-09-21 | Concurrent programming languages should be a good fit for embedded systems because they match the intrinsic parallelism of their architectures and environments. Unfortunately, most concurrent programming formalisms are prone to races and nondeterminism, despite the presence of mechanisms such as monitors. In this paper, we propose SHIM, the core of a concurrent language with disciplined shared variables that remains deterministic, meaning the behavior of a program is independent of the scheduling of concurrent operations. SHIM does not sacrifice power or flexibility to achieve this determinism. It supports both synchronous and asynchronous paradigms---loosely and tightly synchronized threads---the dynamic creation of threads and shared variables, recursive procedures, and exceptions. We illustrate our programming model with examples including breadth-first-search algorithms and pipelines. By construction, they are race-free. We provide the formal semantics of SHIM and a preliminary implementation. | (pdf) (ps) |
Debugging Woven Code | Marc Eaddy, Alfred Aho, Weiping Hu, Paddy McDonald, Julian Burger | 2006-09-20 | The ability to debug woven programs is critical to the adoption of Aspect Oriented Programming (AOP). Nevertheless, many AOP systems lack adequate support for debugging, making it difficult to diagnose faults and understand the program's structure and control flow. We discuss why debugging aspect behavior is hard and how harvesting results from related research on debugging optimized code can make the problem more tractable. We also specify general debugging criteria that we feel all AOP systems should support. We present a novel solution to the problem of debugging aspect-enabled programs. Our Wicca system is the first dynamic AOP system to support full source-level debugging of woven code. It introduces a new weaving strategy that combines source weaving with online byte-code patching. Changes to the aspect rules, or base or aspect source code are rewoven and recompiled on-the-fly. We present the results of an experiment that show how these features provide the programmer with a powerful interactive debugging experience with relatively little overhead. | (pdf) |
A Framework for Quality Assurance of Machine Learning Applications | Christian Murphy, Gail Kaiser, Marta Arias | 2006-09-15 | Some machine learning applications are intended to learn properties of data sets where the correct answers are not already known to human users. It is challenging to test and debug such ML software, because there is no reliable test oracle. We describe a framework and collection of tools aimed to assist with this problem. We present our findings from using the testing framework with three implementations of an ML ranking algorithm (all of which had bugs). | (pdf) |
Throughput and Fairness in Random Access Networks | Hoon Chang, Vishal Misra, Dan Rubenstein | 2006-08-24 | This paper present an throughput analysis of log-utility and max-min fairness. Assuming all nodes interfere with each other, completely or partially, log-utility fairness significantly enhances the total throughput compared to max-min fairness since the nodes should have the same throughput in max-min fairness. The improvement is enlarged especially when the effect of cumulated interference from multiple senders cannot be ignored. | (pdf) |
Linear Approximation of Optimal Attempt Rate in Random Access Networks | HOON CHANG, VISHAL MISRA, DAN RUBENSTEIN | 2006-08-01 | While packet capture has been observed in real implementations of wireless devices randomly accessing shared channels, fair rate control algorithms based on accurate channel models that describe the phenomenon have not been developed. In this paper, using a general physical channel model, we develop the equation for the optimal attemp rate to maximize the aggregate log utility. We use the least squares method to approximate the equation to a linear function of the attempt rate. Our analysis on the approximation error shows that the linear function obtained is close enough to the original with the square of the residuals more than 0.9. | (pdf) |
Complexity and tractability of the heat equation | Arthur G. Werschulz | 2006-07-27 | In a previous paper, we developed a general framework for establishing tractability and strong tractability for quasilinear multivariate problems in the worst case setting. One important example of such a problem is the solution of the heat equation $u_t = \Delta u - qu$ in $I^d\times(0,T)$, where $I$ is the unit interval and $T$ is a maximum time value. This problem is to be solved subject to homogeneous Dirichlet boundary conditions, along with the initial conditions $u(\cdot,0)=f$ over~$I^d$. The solution~$u$ depends linearly on~$f$, but nonlinearly on~$q$. Here, both $f$ and~$q$ are $d$-variate functions from a reproducing kernel Hilbert space with finite-order weights of order~$\omega$. This means that, although~$d$ can be arbitrary large, $f$ and~$q$ can be decomposed as sums of functions of at most $\omega$~variables, with $\omega$ independent of~$d$. In this paper, we apply our previous general results to the heat equation. We study both the absolute and normalized error criteria. For either error criterion, we show that the problem is \emph. That is, the number of evaluations of $f$ and~$q$ needed to obtain an $\varepsilon$-approximation is polynomial in~$\varepsilon$ and~$d$, with the degree of the polynomial depending linearly on~$\omega$. In addition, we want to know when the problem is \emph{strongly tractable}, meaning that the dependence is polynomial only in~$\varepsilon$, independently of~$d$. We show that if the sum of the weights defining the weighted reproducing kernel Hilbert space is uniformly bounded in~$d$ and the integral of the univariate kernel is positive, then the heat equation is strongly tractable. | (pdf) |
Projection Volumetric Display using Passive Optical Scatterers | Shree K. Nayar, Vijay N. Anand | 2006-07-25 | In this paper, we present a new class of volumetric displays that can be used to display 3D objects. The basic approach is to trade-off the spatial resolution of a digital projector (or any light engine) to gain resolution in the third dimension. Rather than projecting an image onto a 2D screen, a depth-coded image is projected onto a 3D cloud of passive optical scatterers. The 3D point cloud is realized using a technique called Laser Induced Damage (LID), where each scatterer is a physical crack embedded in a block of glass or plastic. We show that when the point cloud is randomized in a specific manner, a very large fraction of the points are visible to the viewer irrespective of his/her viewing direction. We have developed an orthographic projection system that serves as the light engine for our volumetric displays. We have implemented several types of point clouds, each one designed to display a specific class of objects. These include a cloud with uniquely indexable points for the display of true 3D objects, a cloud with an independently indexable top layer and a dense extrusion volume to display extruded objects with arbitrarily textured top planes and a dense cloud for the display of purely extruded objects. In addition, we show how our approach can be used to extend simple video games to 3D. Finally, we have developed a 3D avatar in which videos of a face with expression changes are projected onto a static surface point cloud of the face. | (pdf) |
Practical Preference Relations for Large Data Sets | Kenneth Ross, Peter Stuckey, Amelie Marian | 2006-06-16 | User-defined preferences allow personalized ranking of query results. A user provides a declarative specification of his/her preferences, and the system is expected to use that specification to give more prominence to preferred answers. We study constraint formalisms for expressing user preferences as base facts in a partial order. We consider a language that allows comparison and a limited form of arithmetic, and show that the transitive closure computation required to complete the partial order terminates. We consider various ways of composing partial orders from smaller pieces, and provide results on the size of the resulting transitive closures. We introduce the notion of ``covering composition,'' which solves some semantic problems apparent in previous notions of composition. Finally, we show how preference queries within our language can be supported by suitable index structures for efficient evaluation over large data sets. Our results provide guidance about when complex preferences can be efficiently evaluated, and when they cannot. | (pdf) |
Feasibility of Voice over IP on the Internet | Alex Sherman, Jason Nieh, Yoav Freund | 2006-06-09 | VoIP (Voice over IP) services are using the Internet infrastructure to enable new forms of communication and collaboration. A growing number of VoIP service providers such as Skype, Vonage, Broadvoice, as well as many cable services are using the Internet to offer telephone services at much lower costs. However, VoIP services rely on the user's Internet connection, and this can often translate into lower quality communication. Overlay networks offer a potential solution to this problem by improving the default Internet routing and overcome failures. To assess the feasibility of using overlays to improve VoIP on the Internet, we have conducted a detailed experimental study to evaluate the benefits of using an overlay on PlanetLab nodes for improving voice communication connectivity and performance around the world. Our measurements demonstrate that an overlay architecture can significantly improve VoIP communication across most regions and provide their greatest benefit for locations with poorer default Internet connectivity. We explore overlay topologies and show that a small number of well-connected intermediate nodes is sufficient to improve VoIP performance. We show that there is significant variation over time in the best overlay routing paths and argue for the need for adaptive routing to account for this variation to deliver the best performance. | (pdf) |
Exploiting Temporal Coherence for Pre-computation Based Rendering | Ryan Overbeck | 2006-05-23 | Precomputed radiance transfer (PRT) generates impressive images with complex illumi- nation, materials and shadows with real-time interactivity. These methods separate the scene’s static and dynamic components allowing the static portion to be computed as a preprocess. In this work, we hold geometry static and allow either the lighting or BRDF to be dynamic. To achieve real-time performance, both static and dynamic components are compressed by exploiting spatial and angular coherence. Temporal coherence of the dynamic component from frame to frame is an important, but unexplored additional form of coherence. In this thesis, we explore temporal coherence of two forms of all-frequency PRT: BRDF material editing and lighting design. We develop incremental methods for approximating the differences in the dynamic component between consecutive frames. For BRDF editing, we find that a pure incremental approach allows quick convergence to an exact solution with smooth real-time response. For relighting, we observe vastly differing degrees of temporal coherence accross levels of the lighting’s wavelet hierarchy. To address this, we develop an algorithm that treats each level separately, adapting to available coherence. The proposed methods are othogonal to other forms of coherence, and can be added to almost any PRT algorithm with minimal implementation, computation, or memory overhead. We demonstrate our technique within existing codes for nonlinear wavelet approximation, changing view with BRDF factorization, and clustered PCA. Exploiting temporal coherence of dynamic lighting yields a 3×–4× per- formance improvement, e.g., all-frequency effects are achieved with 30 wavelet coefficients, about the same as low-frequency spherical harmonic methods. Distinctly, our algorithm smoothly converges to the exact result within a few frames of the lighting becoming static. | (pdf) |
Speculative Execution as an Operating System Service | Michael Locasto, Angelos Keromytis | 2006-05-12 | Software faults and vulnerabilities continue to present significant obstacles to achieving reliable and secure software. In an effort to overcome these obstacles, systems often incorporate self-monitoring and self-healing functionality. Our hypothesis is that internal monitoring is not an effective long-term strategy. However, monitoring mechanisms that are completely external lose the advantage of application-specific knowledge available to an inline monitor. To balance these tradeoffs, we present the design of VxF, an environment where both supervision and automatic remediation can take place by speculatively executing "slices" of an application. VxF introduces the concept of an endolithic kernel by providing execution as an operating system service: execution of a process slice takes place inside a kernel thread rather than directly on the system microprocessor. | (pdf) |
Privacy-Preserving Payload-Based Correlation for Accurate Malicious Traffic Detection | Janak Parekh, Ke Wang, Salvatore Stolfo | 2006-05-09 | With the increased use of botnets and other techniques to obfuscate attackers' command-and-control centers, Distributed Intrusion Detection Systems (DIDS) that focus on attack source IP addresses or other header information can only portray a limited view of distributed scans and attacks. Packet payload sharing techniques hold far more promise, as they can convey exploit vectors and/or malcode used upon successful exploit of a target system, irrespective of obfuscated source addresses. However, payload sharing has had minimal success due to regulatory or business-based privacy concerns of transmitting raw or even sanitized payloads. The currently accepted form of content exchange has been limited to the exchange of known-suspicious content, e.g., packets captured by honeypots; however, signature generation assumes that each site receives enough traffic in order to correlate a meaningful set of payloads from which common content can be derived, and places fundamental and computationally stressful requirements on signature generators that may miss particularly stealthy or carefully-crafted polymorphic malcode. Instead, we propose a new approach to enable the sharing of suspicious payloads via privacy-preserving technologies. We detail the work we have done with two example payload anomaly detectors, PAYL and Anagram, to support generalized payload correlation and signature generation without releasing identifiable payload data and without relying on single-site signature generation. We present preliminary results of our approaches and suggest how such deployments may practically be used for not only cross-site, but also cross-domain alert sharing and its implications for profiling threats. | (pdf) |
PBS: A Unified Priority-Based CPU Scheduler | Hanhua Feng, Vishal Misra, Dan Rubenstein | 2006-05-01 | A novel CPU scheduling policy is designed and implemented. It is a configurable policy in the sense that a tunable parameter is provided to change its behavior. With different settings of the parameter, this policy can emulate the first-come first-serve, the processing sharing, or the feedback policies, as well as different levels of their mixtures. This policy is implemented in the Linux kernel as a replacement of the default scheduler. The drastic changes of behaviors as the parameter changes are analyzed and simulated. Its performance is measured with the real systems by the workload generators and benchmarks. | (pdf) (ps) |
A First Order Analysis of Lighting, Shading, and Shadows | Ravi Ramamoorthi, Dhruv Mahajan, Peter Belhumeur | 2006-04-30 | The shading in a scene depends on a combination of many factors---how the lighting varies spatially across a surface, how it varies along different directions, the geometric curvature and reflectance properties of objects, and the locations of soft shadows. In this paper, we conduct a complete first order or gradient analysis of lighting, shading and shadows, showing how each factor separately contributes to scene appearance, and when it is important. Gradients are well suited for analyzing the intricate combination of appearance effects, since each gradient term corresponds directly to variation in a specific factor. First, we show how the spatial {\em and} directional gradients of the light field change, as light interacts with curved objects. This extends the recent frequency analysis of Durand et al.\ to gradients, and has many advantages for operations, like bump-mapping, that are difficult to analyze in the Fourier domain. Second, we consider the individual terms responsible for shading gradients, such as lighting variation, convolution with the surface BRDF, and the object's curvature. This analysis indicates the relative importance of various terms, and shows precisely how they combine in shading. As one practical application, our theoretical framework can be used to adaptively sample images in high-gradient regions for efficient rendering. Third, we understand the effects of soft shadows, computing accurate visibility gradients. We generalize previous work to arbitrary curved occluders, and develop a local framework that is easy to integrate with conventional ray-tracing methods. Our visibility gradients can be directly used in practical gradient interpolation methods for efficient rendering. | (pdf) (ps) |
Quantifying Application Behavior Space for Detection and Self-Healing | Michael Locasto, Angelos Stavrou, Gabriela G. Cretu, Angelos D. Keromytis, Salvatore J. Stolfo | 2006-04-08 | The increasing sophistication of software attacks has created the need for increasingly finer-grained intrusion and anomaly detection systems, both at the network and the host level. We believe that the next generation of defense mechanisms will require a much more detailed dynamic analysis of application behavior than is currently done. We also note that the same type of behavior analysis is needed by the current embryonic attempts at self-healing systems. Because such mechanisms are currently perceived as too expensive in terms of their performance impact, questions relating to the feasibility and value of such analysis remain unexplored and unanswered. We present a new mechanism for profiling the behavior space of an application by analyzing all function calls made by the process, including regular functions and library calls, as well as system calls. We derive behavior from aspects of both control and data flow. We show how to build and check profiles that contain this information at the binary level -- that is, without making changes to the application's source, the operating system, or the compiler. This capability makes our system, Lugrind, applicable to a variety of software, including COTS applications. Profiles built for the applications we tested can predict behavior with 97% accuracy given a context window of 15 functions. Lugrind demonstrates the feasibility of combining binary-level behavior profiling with detection and automated repair. | (pdf) |
Seamless Layer-2 Handoff using Two Radios in IEEE 802.11 Wireless Networks | Sangho Shin, Andrea G. Forte, Henning Schulzrinne | 2006-04-08 | —We propose a layer-2 handoff using two radios and achieves seamless handoff. Also, We reduces the false handoff probability signicantly by introducing selective passive scanning. | (pdf) |
Anagram: A Content Anomaly Detector Resistant to Mimicry Attack | Ke Wang, Janak Parekh, Salvatore Stolfo | 2006-04-07 | In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and ^Ósuspicious^Ô network packet payloads. By using higher- order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly efficient Bloom filters, reducing space requirements and enabling privacy-preserving cross-site correlation. The sensor models the distinct content flow of a network or host using a semi- supervised training regimen. Previously known exploits, extracted from the signatures of an IDS, are likewise modeled in a Bloom filter and are used during training as well as detection time. We demon- strate that Anagram can identify anomalous traffic with high accuracy and low false positive rates. Anagram^Òs high-order n-gram analysis technique is also resil-ient against simple mimicry attacks that blend exploits with ^Ónormal^Ô appearing byte padding, such as the blended polymorphic attack recently demonstrated in [1]. We discuss randomized n-gram models, which further raises the bar and makes it more difficult for attackers to build precise packet structures to evade Anagram even if they know the distribution of the local site content flow. Finally, Ana-gram^Òs speed and high detection rate makes it valuable not only as a standalone sensor, but also as a network anomaly flow classifier in an instrumented fault-tolerant host-based environment; this enables significant cost amortization and the possibility of a ^Ósymbiotic^Ô feedback loop that can improve accuracy and reduce false positive rates over time. | (pdf) |
Bloodhound: Searching Out Malicious Input in Network Flows for Automatic Repair Validation | Michael Locasto, Matthew Burnside, Angelos D. Keromytis | 2006-04-05 | Many current systems security research efforts focus on mechanisms for Intrusion Prevention and Self-Healing Software. Unfortunately, such systems find it difficult to gain traction in many deployment scenarios. For self-healing techniques to be realistically employed, system owners and administrators must have enough confidence in the quality of a generated fix that they are willing to allow its automatic deployment. In order to increase the level of confidence in these systems, the efficacy of a 'fix' must be tested and validated after it has been automatically developed, but before it is actually deployed. Due to the nature of attacks, such verification must proceed automatically. We call this problem Automatic Repair Validation (ARV). As a way to illustrate the difficulties faced by ARV, we propose the design of a system, Bloodhound, that tracks and stores malicious network flows for later replay in the validation phase for self-healing software. | (pdf) |
Streak Seeding Automation Using Silicon Tools | Atanas Georgiev, Sergey Vorobiev, William Edstrom, Ting Song, Andrew Laine, John Hunt | 2006-03-31 | This report presents an approach to automation of a protein crystallography task called streak seeding. The approach is based on novel and unique custom-designed silicon microtools, which we experimentally verified to produce results similar to the results from traditionally used boar bristles. The advantage to using silicon is that it allows the employment of state-of-the-art micro-electro-mechanical-systems (MEMS) technology to produce microtools of various shapes and sizes and thatit is rigid and can be easily adopted as an accurately calibrated end-effector on a microrobotic system. A working prototype of an automatic streak seeding system is presented, which has been successfully applied for protein crystallization. | (pdf) |
PalProtect: A Collaborative Security Approach to Comment Spam | Benny Wong, Michael Locasto, Angelos D. Keromytis | 2006-03-22 | Collaborative security is a promising solution to many types of security problems. Organizations and individuals often have a limited amount of resources to detect and respond to the threat of automated attacks. Enabling them to take advantage of the resources of their peers by sharing information related to such threats is a major step towards automating defense systems. In particular, comment spam posted on blogs as a way for attackers to do Search Engine Optimization (SEO) is a major annoyance. Many measures have been proposed to thwart such spam, but all such measures are currently enacted and operate within one administrative domain. We propose and implement a system for cross-domain information sharing to improve the quality and speed of defense against such spam. | (pdf) |
Using Angle of Arrival (Bearing) Information in Network Localization | Tolga Eren, Walter Whiteley, Peter N. Belhumeur | 2006-03-18 | In this paper, we consider using angle of arrival information (bearing) for network localization and control in two different fields of multi-agent systems: (i) wireless sensor networks; (ii) robot networks. The essential property we require in this paper is that a node can infer heading information from its neighbors. We address the uniqueness of network localization solutions by the theory of globally rigid graphs. We show that while the parallel rigidity problem for formations with bearings is isomorphic to the distance case, the global rigidity of the formation is simpler (in fact identical to the simpler rigidity case) for a network with bearings, compared to formations with distances. We provide the conditions of localization for networks in which the neighbor relationship is not necessarily symmetric. | (pdf) (ps) |
A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency | Dhruv Mahajan, Ravi Ramamoorthi, Brian Curless | 2006-03-17 | We develop new mathematical results based on the spherical harmonic convolution framework for reflection from a curved surface. We derive novel identities, which are the angular frequency domain analogs to common spatial domain invariants such as reflectance ratios. They apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. While this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image, to detect tampering or image splicing. | (pdf) |
Passive Duplicate Address Detection for Dynamic Host Configuration Protocol (DHCP) | Andrea G. Forte, Sangho Shin, Henning Schulzrinne | 2006-03-07 | During a layer-3 handoff, address acquisition via DHCP is often the dominant source of handoff delay, duplicate address detection (DAD) being responsible for most of the delay. We propose a new DAD algorithm, passive DAD (pDAD), which we show to be effective, yet introduce only a few milliseconds of delay. Unlike traditional DAD, pDAD also detects the unauthorized use of an IP address before it is assigned to a DHCP client. | (pdf) |
Evaluating an Evaluation Method: The Pyramid Method Applied to 2003 Document Understanding Conference (DUC) Data | Rebecca Passonneau | 2006-03-03 | A pyramid evaluation dataset was created for DUC 2003 in order to compare results with DUC 2005, and to provide an independent test of the evaluation metric. The main differences between DUC 2003 and 2005 datasets pertain to the document length, cluster sizes, and model summary length. For five of the DUC 2003 document sets, two pyramids each were constructed by annotators working independently. Scores of the same peer using different pyramids were highly correlated. Sixteen systems were evaluated on eight document sets. Analysis of variance using Tukey's Honest Significant Difference method showed significant differences among all eight document sets, and more significant differences among the sixteen systems than for DUC 2005. | (pdf) |
Rigid Formations with Leader-Follower Architecture | Tolga Eren, Walter Whiteley, Peter N. Belhumeur | 2006-02-25 | This paper is concerned with information structures used in rigid formations of autonomous agents that have leader-follower architecture. The focus of the paper is on sensor/network topologies to secure control of rigidity. This papers extends the previous rigidity based approaches for formations with symmetric neighbor relations to include formations with leader-follower architecture. We provide necessary and sufficient conditions for rigidity of directed formations, with or without cycles. We present the directed Henneberg constructions as a sequential process for all guide rigid digraphs. We refine those results for acyclic formations, where guide rigid formations had a simple construction. The analysis in this paper confirms that acyclicity is not a necessary condition for stable rigidity. The cycles are not the real problem, but rather the lack of guide freedom is the reason behind why cycles have been seen as a problematic topology. Topologies that have cycles within a larger architecture can be stably rigid, and we conjecture that all guide rigid formations are stably rigid for internal control. We analyze how the external control of guide agents can be integrated into stable rigidity of a larger formation. The analysis in the paper also confirms the inconsistencies that result from noisy measurements in redundantly rigid formations. An algorithm given in the paper establishes a sequential way of determining the directions of links from a given undirected rigid formation so that the necessary and sufficient conditions are fulfilled. | (pdf) (ps) |
Using an External DHT as a SIP Location Service | Kundan Singh, Henning Schulzrinne | 2006-02-22 | Peer-to-peer Internet telephony using the Session Initiation Protocol (P2P-SIP) can exhibit two different architectures: an existing P2P network can be used as a replacement for lookup and updates, or a P2P algorithm can be implemented using SIP messages. In this paper, we explore the first architecture using the OpenDHT service as an externally managed P2P network. We provide design details such as encryption and signing using pseudo-code and examples to provide P2P-SIP for various deployment components such as P2P client, proxy and adaptor, based on our implementation. The design can be used with other distributed hash tables (DHTs) also. | (pdf) (ps) |
Synthesis of On-Chip Interconnection Structures:From Point-toPoint Links to Networks-on-Chip | Alessandro Pinto, Luca P. Carloni, Alberto L. Sangiovanni-Vincentelli | 2006-02-20 | Packet-switched networks-on-chip (NOC) have been advocated as the solution to the challenge of organizing efficient and reliable communication structures among the components of a system-on-chip (SOC). A critical issue in designing a NOC is to determine its topology given the set of point-to-point communication requirements among these components. We present a novel approach to on-chip communication synthesis that is based on the iterative combination of two efficient computational steps: (1) an application of the k-Median algorithm to coarsely determine the global communication structure (which may turned out not be a network after all), and a (2) a variation of the shortest-path algorithm in order to finely tune the data flows on the communication channels. The application of our method to case studies taken from the literature shows that we can automatically synthesize optimal NOC topologies for multi-core on-chip processors and it offers new insights on why NOC are not necessarily a value proposition for some classes of applcation-specific SOCs. | (pdf) |
Theoretical Bounds on Control-Plane Self-Monitoring in Routing Protocols | Raj Kumar Rajendran, Vishal Misra, Dan Rubenstein | 2006-02-15 | Routing protocols rely on the cooperation of nodes in the network to both forward packets and to select the forwarding routes. There have been several instances in which an entire network's routing collapsed simply because a seemingly insignificant set of nodes reported erroneous routing information to their neighbors. It may have been possible for other nodes to trigger an automated response and prevent the problem by analyzing received routing information for inconsistencies that revealed the errors. Our theoretical study seeks to understand when nodes can detect the existence of errors in the implementation of route selection elsewhere in the network through monitoring their own routing states for inconsistencies. We start by constructing a methodology, called Strong-Detection, that helps answer the question. We then apply Strong-Detection to three classes of routing protocols: distance-vector, path-vector, and link-state. For each class, we derive low-complexity, self-monitoring algorithms that use the routing state created by these routing protocols to identify any detectable anomalies. These algorithms are then used to compare and contrast the self-monitoring power these various classes of protocols possess. We also study the trade-off between their state-information complexity and ability to identify routing anomalies. | (pdf) (ps) |
A Survey of Security Issues and Solutions in Presence | Vishal Kumar Singh, Henning Schulzrinne | 2006-02-10 | With the growth of presence based services, it is important to securely manage and distribute sensitive presence information such as user location. We survey techniques that are used for security and privacy of presence information. In particular, we describe the SIMPLE based presence specific authentication, integrity and confidentiality. We also discuss the IETF’s common policy for geo-privacy, presence authorization for presence information privacy and distribution of different levels of presence information to different watchers. Additionally, we describe an open problem of getting the aggregated presence from the trusted server without the server knowing the presence information, and propose a solution. Finally, we discuss denial of service attacks on the presence system and ways to mitigate them. | (pdf) |
SIMPLEstone - Benchmarking Presence Server Performance | Vishal Kumar Singh, Henning Schulzrinne | 2006-02-10 | Presence is an important enabler for communication in Internet telephony systems. Presence-based services depend on accurate and timely delivery of presence information. Hence, presence systems need to be appropriately dimensioned to meet the growing number of users, varying number of devices as presence sources, the rate at which they update presence information to the network and the rate at which network distributes the user’s presence information to the watchers. SIMPLEstone is a set of metrics for benchmarking the performance of presence systems based on SIMPLE. SIMPLEstone benchmarks a presence server by generating requests based on a work load specification. It measures server capacity in terms of request handling capacity as an aggregate of all types of requests as well as individual request types. The benchmark treats different configuration modes in which presence server interoperates with the Session Initiation protocol (SIP) server as one block. | (pdf) |
Grouped Distributed Queues: Distributed Queue, Proportional Share Multiprocessor Scheduling | Bogdan Caprita, Jason Nieh, Clifford Stein | 2006-02-07 | We present Grouped Distributed Queues (GDQ), the first proportional share scheduler for multiprocessor systems that, by using a distributed queue architecture, scales well with a large number of processors and processes. GDQ achieves accurate proportional fairness scheduling with only O(1) scheduling overhead. GDQ takes a novel approach to distributed queuing: instead of creating per-processor queues that need to be constantly balanced to achieve any measure of proportional sharing fairness, GDQ uses a simple grouping strategy to organize processes into groups based on similar processor time allocation rights, and then assigns processors to groups based on aggregate group shares. Group membership of processes is static, and fairness is achieved by dynamically migrating processors among groups. The set of processors working on a group use simple, low-overhead round-robin queues, while processor reallocation among groups is achieved using a new multiprocessor adaptation of the well-known Weighted Fair Queuing algorithm. By commoditizing processors and decoupling their allocation from process scheduling, GDQ provides, with only constant scheduling cost, fairness within a constant of the ideal generalized processor sharing model for process weights with a fixed upper bound. We have implemented GDQ in Linux and measured its performance. Our experimental results show that GDQ has low overhead and scales well with the number of processors. | (pdf) |
W3Bcrypt: Encryption as a Stylesheet | Angelos Stavrou, Michael Locasto, Angelos D. Keromytis | 2006-02-06 | While web communications are increasingly protected by transport layer cryptographic operations (SSL/TLS), there are many situations where even the communications infrastructure provider cannot be trusted. The end-to-end (E2E) encryption of data becomes increasingly important in these trust models to protect the confidentiality and integrity of the data against snooping and modification by the communications provider. We introduce W3Bcrypt, an extension to the Mozilla Firefox web platform that enables application-level cryptographic protection for web content. In effect, we view cryptographic operations as a type of style to be applied to web content along with layout and coloring operations. Among the main benefits of using encryption as a stylesheet are $(a)$ reduced workload on a web server, $(b)$ targeted content publication, and $(c)$ greatly increased privacy. This paper discusses our implementation for Firefox, but the core ideas are applicable to most current browsers. | (pdf) |
A Runtime Adaptation Framework for Native C and Bytecode Applications | Rean Griffith, Gail Kaiser | 2006-01-27 | The need for self-healing software to respond with a reactive, proactive or preventative action as a result of changes in its environment has added the non-functional requirement of adaptation to the list of facilities expected in self-managing systems. The adaptations we are concerned with assist with problem detection, diagnosis and remediation. Many existing computing systems do not include such adaptation mechanisms, as a result these systems either need to be re-designed to include them or there needs to be a mechanism for retro-fitting these mechanisms. The purpose of the adaptation mechanisms is to ease the job of the system administrator with respect to managing software systems. This paper introduces Kheiron, a framework for facilitating adaptations in running programs in a variety of execution environments without requiring the redesign of the application. Kheiron manipulates compiled C programs running in an unmanaged execution environment as well as programs running in Microsoft’s Common Language Runtime and SunMicrosystems’ Java VirtualMachine. We present case-studies and experiments that demonstrate the feasibility of using Kheiron to support self-healing systems. We also describe the concepts and techniques used to retro-fit adaptations onto existing systems in the various execution environments. | (pdf) |
Binary-level Function Profiling for Intrusion Detection and Smart Error Virtualization | Michael Locasto, Angelos Keromytis | 2006-01-26 | Most current approaches to self-healing software (SHS) suffer from semantic incorrectness of the response mechanism. To support SHS, we propose Smart Error Virtualization (SEV), which treats functions as transactions but provides a way to guide the program state and remediation to be a more correct value than previous work. We perform runtime binary-level profiling on unmodified applications to learn both good return values and error return values (produced when the program encounters ``bad'' input). The goal is to ``learn from mistakes'' by converting malicious input to the program's notion of ``bad'' input. We introduce two implementations of this system that support three major uses: function profiling for regression testing, function profiling for host-based anomaly detection (envinroment-specialized fault detection), and function profiling for automatic attack remediation via SEV. Our systems do not require access to the source code of the application to enact a fix. Finally, this paper is, in part, a critical examination of error virtualization in order to shed light on how to approach semantic correctness. | (pdf) (ps) |
Converting from Spherical to Parabolic Coordinates | Aner Ben-Artzi | 2006-01-20 | A reference for converting directly between Spherical Coordinates and Parabolic Coordinates without using the intermediate Cartesian Coordinates. | (pdf) |
Multi Facet Learning in Hilbert Spaces | Imre Risi Kondor, Gabor Csanyi, Sebastian E. Ahnert, Tony Jebara | 2005-12-31 | We extend the kernel based learning framework to learning from linear functionals, such as partial derivatives. The learning problem is formulated as a generalized regularized risk minimization problem, possibly involving several different functionals. We show how to reduce this to conventional kernel based learning methods and explore a specific application in Computational Condensed Matter Physics. | (pdf) (ps) |
A Lower Bound for the Sturm-Liouville Eigenvalue Problem on a Quantum Computer | Arvid J. Bessen | 2005-12-14 | We study the complexity of approximating the smallest eigenvalue of a univariate Sturm-Liouville problem on a quantum computer. This general problem includes the special case of solving a one-dimensional Schroedinger equation with a given potential for the ground state energy. The Sturm-Liouville problem depends on a function q, which, in the case of the Schroedinger equation, can be identified with the potential function V. Recently Papageorgiou and Wozniakowski proved that quantum computers achieve an exponential reduction in the number of queries over the number needed in the classical worst-case and randomized settings for smooth functions q. Their method uses the (discretized) unitary propagator and arbitrary powers of it as a query ("power queries"). They showed that the Sturm-Liouville equation can be solved with O(log(1/e)) power queries, while the number of queries in the worst-case and randomized settings on a classical computer is polynomial in 1/e. This proves that a quantum computer with power queries achieves an exponential reduction in the number of queries compared to a classical computer. In this paper we show that the number of queries in Papageorgiou's and Wozniakowski's algorithm is asymptotically optimal. In particular we prove a matching lower bound of log(1/e) power queries, therefore showing that log(1/e) power queries are sufficient and necessary. Our proof is based on a frequency analysis technique, which examines the probability distribution of the final state of a quantum algorithm and the dependence of its Fourier transform on the input. | (pdf) (ps) |
Dynamic Adaptation of Temporal Event Correlation Rules | Rean Griffith, Gail Kaiser, Joseph Hellerstein, Yixin Diao | 2005-12-10 | Temporal event correlation is essential to realizing self-managing distributed systems. Autonomic controllers often require that events be correlated across multiple components using rule patterns with timer-based transitions, e.g., to detect denial of service attacks and to warn of staging problems with business critical applications. This short paper discusses automatic adjustment of timer values for event correlation rules, in particular compensating for the variability of event propagation delays due to factors such as contention for network and server resources. We describe a corresponding Management Station architecture and present experimental studies on a testbed system that suggest that this approach can produce results at least as good as an optimal fixed setting of timer values. | (pdf) |
Qubit Complexity of Continuous Problems | Anargyros Papageorgiou, Joseph Traub | 2005-12-09 | The number of qubits used by a quantum algorithm will be a crucial computational resource for the foreseeable future. We show how to obtain the classical query complexity for continuous problems. We then establish a simple formula for a lower bound on the qubit complexity in terms of the classical query complexity. | (pdf) |
An Event System Architecture for Scaling Scale-Resistant Services | Philip Gross | 2005-12-09 | Large organizations are deploying ever-increasing numbers of networked compute devices, from utilities installing smart controllers on electricity distribution cables, to the military giving PDAs to soldiers, to corporations putting PCs on the desks of employees. These computers are often far more capable than is needed to accomplish their primary task, whether it be guarding a circuit breaker, displaying a map, or running a word processor. These devices would be far more useful if they had some awareness of the world around them: a controller that resists tripping a switch, knowing that it would set off a cascade failure, a PDA that warns its owner of imminent danger, a PC that exchanges reports of suspicious network activity to its peers to identify stealthy computer crackers. In order to provide these higher-level services, the devices need a model of their environment. The controller needs a model of the distribution grid, the PDA needs a model of the battlespace, and the PC needs a model of the network and of normal network and user behavior. Unfortunately, not only might models such as these require substantial computational resources, but generating and updating them is even more demanding. Modelbuilding algorithms tend to be bad in three ways: requiring large amounts of CPU and memory to run, needing large amounts of data from the outside to stay up to date, and running so slowly that can’t keep up with any fast changes in the environment that might occur. We can solve these problems by reducing the scope of the model to the immediate locale of the device, since reducing the size of the model makes the problem of model generation much more tractable. But such models are also much less useful, having no knowledge of the wider system. This thesis proposes a better solution to this problem called Level of Detail, after the computer graphics technique of the same name. Instead of simplifying the representation of distant objects, however, we simplify less-important data. Compute devices in the system receive streams of data that is a mixture of detailed data from devices that directly affect them and data summaries (aggregated data) from less directly influential devices. The degree to which the data is aggregated (i.e., how much it is reduced) is determined by calculating an influence metric between the target device and the remote device. The smart controller thus receives a continuous stream of raw data from the adjacent transformer, but only an occasional small status report summarizing all the equipment in a neighborhood in another part of the city. This thesis describes the data distribution system, the aggregation functions, and the influence metrics that can be used to implement such a system. I also describe my current towards establishing a test environment and validating the concepts, and describe the next steps in the research plan. | (pdf) |
Tree Dependent Identically Distributed Learning | Tony Jebara, Philip M. Long | 2005-12-06 | We view a dataset of points or samples as having an underlying, yet unspecified, tree structure and exploit this assumption in learning problems. Such a tree structure assumption is equivalent to treating a dataset as being tree dependent identically distributed or tdid and preserves exchange-ability. This extends traditional iid assumptions on data since each datum can be sampled sequentially after being conditioned on a parent. Instead of hypothesizing a single best tree structure, we infer a richer Bayesian posterior distribution over tree structures from a given dataset. We compute this posterior over (directed or undirected) trees via the Laplacian of conditional distributions between pairs of input data points. This posterior distribution is efficiently normalized by the Laplacian's determinant and also facilitates novel maximum likelihood estimators, efficient expectations and other useful inference computations. In a classification setting, tdid assumptions yield a criterion that maximizes the determinant of a matrix of conditional distributions between pairs of input and output points. This leads to a novel classification algorithm we call the Maximum Determinant Machine. Unsupervised and supervised experiments are shown. | (pdf) (ps) |
Micro-speculation, Micro-sandboxing, and Self-Correcting Assertions: Support for Self-Healing Software and Application Communities | Michael Locasto | 2005-12-05 | Software faults and vulnerabilities continue to present significant obstacles to achieving reliable and secure software. The critical problem is that systems currently lack the capability to respond intelligently and automatically to attacks -- especially attacks that exploit previously unknown vulnerabilities or are delivered by previously unseen inputs. Therefore, the goal of this thesis is to provide an environment where both supervision and automatic remediation can take place. Also provided is a mechanism to guide the supervision environment in detection and repair activities. This thesis supports the notion of Self-Healing Software by introducing three novel techniques: \emph{micro-sandboxing}, \emph{micro-speculation}, and \emph{self-correcting assertions}. These techniques are combined in a kernel-level emulation framework to speculatively execute code that may contain faults or vulnerabilities and automatically repair such faults or exploited vulnerabilities. The framework, VPUF, introduces the concept of computation as an operating system service by providing control for an array of virtual processors in the Linux kernel (creating the concept of an \emph{endolithic} kernel). This thesis introduces ROAR (Recognize, Orient, Adapt, Respond) as a conceptual workflow for Self-healing Software systems. This thesis proposal outlines a 17 month program for developing the major components of the proposed system, implementing them on a COTS operating system and programming language, subjecting them to a battery of evaluations for performance and efficacy, and publishing the results. In addition, this proposal looks forward to several areas of follow-on work, including implementing some of the proposed techniques in hardware and leveraging the general kernel-level framework to support Application Communities. | (pdf) (ps) |
A Control Theory Foundation for Self-Managing Computing Systems | Yixin Diao, Joseph Hellerstein, Sujay Parekh, Rean Griffith, Gail Kaiser, Dan Phung | 2005-12-05 | The high cost of operating large computing installations has motivated a broad interest in reducing the need for human intervention by making systems self-managing. This paper explores the extent to which control theory can provide an architectural and analytic foundation for building self-managing systems. Control theory provides a rich set of methodologies for building automated self-diagnosis and self-repairing systems with properties such as stability, short settling times, and accurate regulation. However, there are challenges in applying control theory to computing systems, such as developing effective resource models, handling sensor delays, and addressing lead times in effector actions. We propose a deployable testbed for autonomic computing (DTAC) that we believe will reduce the barriers to addressing research problems in applying control theory to computing systems. The initial DTAC architecture is described along with several problems that it can be used to investigate. | (pdf) |
A New Routing Metric for High Throughput in Dense Ad Hoc Networks | Hoon Chang, Vishal Misra, Dan Rubenstein | 2005-12-01 | Routing protocols in most ad hoc networks use the length of paths as the routing metric. Recent findings have revealed that the minimum-hop metric can not achieve the maximum throughput because it tries to reduce the number of hops by containing long range links, where packets need to be transmitted at the lowest transmission rate. In this paper, we investigate the tradeoff between transmission rates and throughputs and show that in dense networks with uniform-distributed traffic, there exists the optimal rate that may not be the lowest rate. Based on our observation, we propose a new routing metric, which measures the expected capability of a path assuming the per-node fairness. We develop a routing protocol based on DSDV and demonstrate that the routing metric enhances the system throughput by 20% compared to the original DSDV. | (pdf) |
Effecting Runtime Reconfiguration in Managed Execution Environments | Rean Griffith, Giuseppe Valetto, Gail Kaiser | 2005-11-21 | Managed execution environments such as Microsoft’s Common Language Runtime (CLR) and Sun Microsystems’ Java Virtual Machine (JVM) provide a number of services – including but not limited to application isolation, security sandboxing, garbage collection and structured exception handling – that are aimed primarily at enhancing the robustness of managed applications. However, none of these services directly enables performing reconfigurations, repairs or diagnostics on the managed applications and/or its constituent subsystems and components. In this paper we examine how the facilities of a managed execution environment can be leveraged to support runtime system adaptations, such as reconfigurations and repairs. We describe an adaptation framework we have developed, which uses these facilities to dynamically attach/detach an engine capable of performing reconfigurations and repairs on a target system while it executes. Our adaptation framework is lightweight, and transparent to the application and the managed execution environment: it does not require recompilation of the application nor specially compiled versions of the managed execution runtime. Our prototype was implemented for the CLR. To evaluate our framework beyond toy examples, we searched on SourceForge for potential target systems already implemented on the CLR that might benefit from runtime adaptation. We report on our experience using our prototype to effect runtime reconfigurations in a system that was developed and is in use by others: the Alchemi enterprise Grid Computing System developed at the University of Melbourne, Australia. | (pdf) (ps) |
Adaptive Synchronization of Semantically Compressed Instructional Videos for Collaborative Distance Learning | Dan Phung, Giuseppe Valetto, Gail Kaiser, Tiecheng Liu, John Kender | 2005-11-21 | The increasing popularity of online courses has highlighted the need for collaborative learning tools for student groups. In addition, the introduction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources available to students. We present an e-Learning architecture and adaptation model called AI2TV (Adaptive Interactive Internet Team Video), which allows groups of students to collaboratively view a video in synchrony. AI2TV upholds the invariant that each student will view semantically equivalent content at all times. A semantic compression model is developed to provide instructional videos at different level-of-details to accommodate dynamic network conditions and users’ system requirements. We take advantage of the semantic compression algorithm’s ability to provide different layers of semantically equivalent video by adapting the client to play at the appropriate layer that provides the client with the richest possible viewing experience. Video player actions, like play, pause and stop, can be initiated by any group member and and the results of those actions are synchronized with all the other students. These features allow students to review a lecture video in tandem, facilitating the learning process. Experimental trials show that AI2TV successfully synchronizes instructional videos for distributed students while concurrently optimizing the video quality, even under conditions of fluctuating bandwidth, by adaptively adjusting the quality level for each student while still maintaining the invariant. | (pdf) |
A Genre-based Clustering Approach to Content Extraction | Suhit Gupta, Hila Becker, Gail Kaiser, Salvatore Stolfo | 2005-11-11 | The content of a webpage is usually contained within a small body of text and images, or perhaps several articles on the same page; however, the content may be lost in the clutter (defined as cosmetic features such as animations, menus, sidebars, obtrusive banners). Automatic content extraction has many applications, including browsing on small cell phone and PDA screens, speech rendering for the visually impaired, and reducing noise for information retrieval systems. We have developed a framework, Crunch, which employs various heuristics for content extraction in the form of filters applied to the webpage's DOM tree; the filters aim to prune or transform the clutter, leaving only the content. Crunch allows users to tune what we call "settings", consisting of thresholds for applying a particular filter and/or for toggling a filter on/off, because the HTML components that characterize clutter can vary significantly from website to website. However, we have found that the same settings tend to work well across different websites of the same genre, e.g., news or shopping, since the designers often employ similar page layouts. In particular, Crunch could obtain the settings for a previously unknown website by automatically classifying it as sufficiently similar to a cluster of known websites with previously adjusted settings. We present our approach to clustering a large corpus of websites into genres, using their pre-extraction textual material augmented by the snippets generated by searching for the website's domain name in web search engines. Including these snippets increases the frequency of function words needed for clustering. We use existing Manhattan distance measure and hierarchical clustering techniques, with some modifications, to pre-classify the corpus into genres offline. Our method does not require prior knowledge of the set of genres that websites fit into, but to be useful a priori settings must be available for some member of each cluster or a nearby cluster (otherwise defaults are used). Crunch classifies newly encountered websites online in linear-time, and then applies the corresponding filter settings, with no noticeable delay added by our content-extracting web proxy. | (pdf) |
Privacy-Preserving Distributed Event Correlation | Janak Parekh | 2005-11-07 | Event correlation is a widely-used data processing methodology for a broad variety of applications, and is especially useful in the context of distributed monitoring for software faults and vulnerabilities. However, most existing solutions have typically been focused on "intra-organizational" correlation; organizations typically employ privacy policies that prohibit the exchange of information outside of the organization. At the same time, the promise of "inter-organizational" correlation is significant given the broad availability of Internet-scale communications, and its potential role in both software maintenance and software vulnerability exploits. In this proposal, I present a framework for reconciling these opposing forces in event correlation via the use of privacy preservation integrated into the event processing framework. By integrating flexible privacy policies, we enable the correlation of organizations' data without actually releasing sensitive information. The framework supports both source anonymity and data privacy, yet allows for the time-based correlation of a broad variety of data. The framework is designed as a lightweight collection of components to enable integration with existing COTS platforms and distributed systems. I also present two different implementations of this framework: XUES (XML Universal Event Service), an event processor used as part of a software monitoring platform called KX (Kinesthetics eXtreme), and Worminator, a collaborative Intrusion Detection System. KX comprised a series of components, connected together with a publish-subscribe content-based routing event subsystem, for the autonomic software monitoring of complex distributed systems. Sensors were installed in legacy systems. XUES' two modules then performed event processing on sensor data: information was collected and processed by the Event Packager, and correlated using the Event Distiller. While XUES itself was not privacy-preserving, it laid the groundwork for this thesis by supporting event typing, the use of publish-subscribe and extensibility support via pluggable event transformation modules. Worminator, the second implementation, extends the XUES platform to fully support privacy-preserving event types and algorithms in the context of a Collaborative Intrusion Detection System (CIDS), whereby sensor alerts can be exchanged and corroborated--a reduced form of correlation that enables collaborative verification--without revealing sensitive information about a contributor's network, services, or even external sources as required. Worminator also fully anonymizes source information, allowing contributors to decide their preferred level of information disclosure. Worminator is implemented as a monitoring framework on top of a COTS IDS sensor, and demonstrably enables the detection of not only worms but also "broad and stealthy" scans; traditional single-network sensors either bury such scans in large volumes or miss them entirely. Worminator has been successfully deployed at 5 collaborating sites and work is under way to scale it up further. The contributions of this thesis include the development of a cross-application-domain event correlation framework with native privacy-preserving types, the use and validation of privacy-preserving corroboration, and the establishment of a practical deployed collaborative security system. I also outline the next steps in the thesis research plan, including the development of evaluation metrics to quantify Worminator's effectiveness at long-term scan detection, the overhead of privacy preservation and the effectiveness of our approach against adversaries, be they "honest-but-curious" or actively malicious. This thesis has broad future work implications, including privacy-preserving signature detection and distribution, distributed stealthy attacker profiling, and "application community"-based software vulnerability detection. | (pdf) |
Tractability of quasilinear problems. II: Second-order elliptic problems | A. G. Werschulz, H. Wozniakowski | 2005-11-01 | In a previous paper, we developed a general framework for establishing tractability and strong tractability for quasilinear multivariate problems in the worst case setting. One important example of such a problem is the solution of the Poisson equation $-\Delta u + qu = f$ in the $d$-dimensional unit cube, in which $u$ depends linearly on~$f$, but nonlinearly on~$q$. Here, both $f$ and~$q$ are $d$-variate functions from a reproducing kernel Hilbert space with finite-order weights of order~$\omega$. This means that, although~$d$ can be arbitrary large, $f$ and~$q$ can be decomposed as sums of functions of at most $\omega$~variables, with $\omega$ independent of~$d$. In this paper, we apply our previous general results to the Poisson equation, subject to either Dirichlet or Neumann homogeneous boundary conditions. We study both the absolute and normalized error criteria. For all four possible combinations of boundary conditions and error criteria, we show that the problem is \emph{tractable}. That is, the number of evaluations of $f$ and~$q$ needed to obtain an $\e$-approximation is polynomial in~$\e^{-1}$ and~$d$, with the degree of the polynomial depending linearly on~$\omega$. In addition, we want to know when the problem is \emph{strongly tractable}, meaning that the dependence is polynomial only in~$\e^{-1}$, independently of~$d$. We show that if the sum of the weights defining the weighted reproducing kernel Hilbert space is uniformly bounded in~$d$ and the integral of the univariate kernel is positive, then the Poisson equation is strongly tractable for three of the four possible combinations of boundary conditions and error criterion, the only exception being the Dirichlet boundary condition under the normalized error criterion. | (pdf) |
TCP-Friendly Rate Control with Token Bucket for VoIP Congestion Control | Miguel Maldonado, Salman Abdul Baset, Henning Schulzrinne | 2005-10-17 | TCP Friendly Rate Control (TFRC) is a congestion control algorithm that provides a smooth transmission rate for real-time network applications. TFRC refrains from halving the sending rate on every packet drop, instead it is adjusted as a function of the loss rate during a single round trip time. TFRC has been proven to be fair when competing with TCP flows over congested links, but it lacks quality-of-service parameters to improve the performance of real-time traffic. A problem with TFRC is that it uses additive increase to adjust the sending rate during periods with no congestion. This leads to short term congestion that can degrade the quality of voice applications. We propose two changes to TFRC that improve the performance of VoIP applications. Our implementation, TFRC with Token Bucket (TFRC-TB), uses discrete calculated bit rates based on audio codec bandwidth usage to increase the sending rate. Also, it uses a token bucket to control the sending rate during congestion periods. We have used ns2, the network simulator, to compare our implementation to TFRC in a wide range of network conditions. Our results suggest that TFRC-TB can provide a quality of service (QoS) mechanism to voice applications while competing fairly with other traffic over congested links. | (pdf) (ps) |
Performance and Usability Analysis of Varying Web Service Architectures | Michael Lenner, Henning Schulzrinne | 2005-10-14 | We tested the performance of four web application architectures, namely CGI, PHP, Java servlets, and Apache Axis SOAP. All four architectures implemented a series of typical web application tasks. Our findings indicated that PHP produced the smallest delay, while the SOAP implementation produces the largest. | (pdf) |
Square Root Propagation | Andrew Howard, Tony Jebara | 2005-10-07 | We propose a message propagation scheme for numerically stable inference in Gaussian graphical models which can otherwise be susceptible to errors caused by finite numerical precision. We adapt square root algorithms, popular in Kalman filtering, to graphs with arbitrary topologies. The method consists of maintaining potentials and generating messages that involve the square root of precision matrices. Combining this with the machinery of the junction tree algorithm leads to an efficient and numerically stable algorithm. Experiments are presented to demonstrate the robustness of the method to numerical errors that can arise in complex learning and inference problems. | (ps) |
Approximating the Reflection Integral as a Summation: Where did the delta go? | Aner Ben-Artzi | 2005-10-07 | In this note, I explore why the the common approximation of the reflection integral is not written with a delta omega-in ( ) to replace the differential omega-in ( ). After that, I go on to discover what really happens when the sum over all directions is reduced to a sum over a small number of directions. In the final section, I make recommendations for correctly approximating the reflection sum, and briefly suggest a possible framework for multiple importance sampling on both lighting and brdf. | (pdf) |
DotSlash: Providing Dynamic Scalability to Web Applications with On-demand Distributed Query Result Caching | Weibin Zhao, Henning Schulzrinne | 2005-09-29 | Scalability poses a significant challenge for today's web applications, mainly due to the large population of potential users. To effectively address the problem of short-term dramatic load spikes caused by web hotspots, we developed a self-configuring and scalable rescue system called DotSlash. The primary goal of our system is to provide dynamic scalability to web applications by enabling a web site to obtain resources dynamically, and use them autonomically without any administrative intervention. To address the database server bottleneck, DotSlash allows a web site to set up on-demand distributed query result caching, which greatly reduces the database workload for read mostly databases, and thus increases the request rate supported at a DotSlash-enabled web site. The novelty of our work is that our query result caching is on demand, and operated based on load conditions. The caching remains inactive as long as the load is normal, but is activated once the load is heavy. This approach offers good data consistency during normal load situations, and good scalability with relaxed data consistency for heavy load periods. We have built a prototype system for the widely used LAMP configuration, and evaluated our system using the RUBBoS bulletin board benchmark. Experiments show that a DotSlash-enhanced web site can improve the maximum request rate supported by a factor of 5 using 8 rescue servers for the RUBBoS submission mix, and by a factor of 10 using 15 rescue servers for the RUBBoS read-only mix. | (pdf) (ps) |
The Pseudorandomness of Elastic Block Ciphers | Debra Cook, Moti Yung, Angelos Keromytis | 2005-09-28 | We investigate elastic block ciphers, a method for constructing variable length block ciphers, from a theorectical perspective. We view the underlying structure of an elastic block cipher as a network, which we refer to as an elastic network, and analyze the network in a manner similar to the analysis performed by Luby and Rackoff on Fesitel networks. We prove that a three round elastic network is a pseudorandom permutation and a four round network is a strong pseudorandom permutation when the round functions are pseudorandom permutations. | (pdf) (ps) |
A General Analysis of the Security of Elastic Block Ciphers | Debra Cook, Moti Yung, Angelos Keromytis | 2005-09-28 | We analyze the security of elastic block ciphers in general to show that an attack on an elastic version of block cipher implies a polynomial time related attack on the fixed-length version of the block cipher. We relate the security of the elastic version of a block cipher to the fixed-length version by forming a reduction between the versions. Our method is independent of the specific block cipher used. The results imply that if the fixed-length version of a block cipher is secure against attacks which attempt key recovery then the elastic version is also secure against such attacks. | (pdf) (ps) |
On Elastic Block Ciphers and Their Differential and Linear Cryptanalyses | Debra Cook, Moti Yung, Angelos Keromytis | 2005-09-28 | Motivated by applications such as databases with nonuniform field lengths, we introduce the concept of an elastic block cipher, a new approach to variable length block ciphers which incorporates fixed sized cipher components into a new network structure. Our scheme allows us to dynamically "stretch" the supported block size of a block cipher up to a length double the original block size, while increasing the computational workload proportionally to the block size. We show that traditional attacks against an elastic block cipher are impractical if the original cipher is secure. In this paper we focus on differential and linear attacks. Specifically, we employ an elastic version of Rijndael supporting block sizes of 128 to 256 bits as an example, and show it is resistant to both differential and linear attacks. In particular, employing a different method than what is employed in Rijndael design, we show that the probability of any differential characteristic for the elastic version of Rijndael is <= 2^-(block size). We further prove that both linear and nonlinear attacks are computationally infeasible for any elastic block cipher if the original cipher is not subject to such an attack and involves a block size for which an exhaustive plaintext search is computationally infeasible (as is the case for Rijndael). | (pdf) (ps) |
PachyRand: SQL Randomization for the PostgreSQL JDBC Driver | Michael Locasto, Angelos D. Keromytis | 2005-08-26 | Many websites are driven by web applications that deliver dynamic content stored in SQL databases. Such systems take input directly from the client via HTML forms. Without proper input validation, these systems are vulnerable to SQL injection attacks. The predominant defense against such attacks is to implement better input validation. This strategy is unlikely to succeed on its own. A better approach is to protect systems against SQL injection automatically and not rely on manual supervision or testing strategies (which are incomplete by nature). SQL randomization is a technique that defeats SQL injection attacks by transforming the language of SQL statements in a web application such that an attacker needs to guess the transformation in order to successfully inject his code. We present PachyRand, an extension to the PostgreSQL JDBC driver that performs SQL randomization. Our system is easily portable to most other JDBC drivers, has a small performance impact, and makes SQL injection attacks infeasible. | (pdf) (ps) |
Parsing Preserving Techniques in Grammar Induction | Smaranda Muresan | 2005-08-20 | In this paper we present the theoretical foundation of the search space for learning a class of constraint-based grammars, which preserve the parsing of representative examples. We prove that under several assumptions the search space is a complete grammar lattice, and the lattice top element is a grammar that can always be learned from a set of representative examples and a sublanguage used to reduce the grammar semantics. This complete grammar lattice guarantees convergence of solutions of any learning algorithm that obeys the given assumptions. | (pdf) (ps) |
Generic Models for Mobility Management in Next Generation Networks | Maria Luisa Cristofano, Andrea G. Forte, Henning Schulzrinne | 2005-08-08 | In the network community different mobility management techniques have been proposed over the years. However, many of these techniques share a surprisingly high number of similarities. In this technical report we analyze and evaluate the most relevant mobility management techniques, pointing out differences and similarities. For macro-mobility we consider Mobile IP (MIP), the Session Initiation Protocol (SIP) and mobility management techniques typical of a GSM network; for micro-mobility we describe and analyze several protocols such as: Hierarchical MIP, TeleMIP, IDMP, Cellular IP and HAWAII. | (pdf) |
Pointer Analysis for C Programs Through AST Traversal | Marcio Buss, Stephen Edwards, Bin Yao, Daniel Waddington | 2005-08-04 | We present a pointer analysis algorithm designed for source-to-source transformations. Existing techniques for pointer analysis apply a collection of inference rules to a dismantled intermediate form of the source program, making them difficult to apply to source-to-source tools that generally work on abstract syntax trees to preserve details of the source program. Our pointer analysis algorithm operates directly on the abstract syntax tree of a C program and uses a form of standard dataflow analysis to compute the desired points-to information. We have implemented our algorithm in a source-to-source translation framework and experimental results show that it is practical on real-world examples. | (pdf) |
Adaptive Interactive Internet Team Video | Dan Phung, Giuseppe Valetto, Gail Kaiser | 2005-08-04 | The increasing popularity of distance learning and online courses has highlighted the lack of collaborative tools for student groups. In addition, the introduction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources used by students. We present an e-Learning architecture and adaptation model called AI2TV (Adaptive Internet Interactive Team Video), a system that allows borderless, virtual students, possibly some or all disadvantaged in network resources, to collaboratively view a video in synchrony. AI2TV upholds the invariant that each student will view semantically equivalent content at all times. Video player actions, like play, pause and stop, can be initiated by any of the students and the results of those actions are seen by all the other students. These features allow group members to review a lecture video in tandem to facilitate the learning process. We show in experimental trials that our system can successfully synchronize video for distributed students while, at the same time, optimizing the video quality given actual (fluctuating) bandwidth by adaptively adjusting the quality level for each student. | (pdf) |
Tractability of Quasilinear Problems I: General Results | Arthur Werschulz, Henryk Wozniakowski | 2005-08-04 | The tractability of multivariate problems has usually been studied only for the approximation of linear operators. In this paper we study the tractability of quasilinear multivariate problems. That is, we wish to approximate nonlinear operators~$S_d(\cdot,\cdot)$ that depend linearly on the first argument and satisfy a Lipschitz condition with respect to both arguments. Here, both arguments are functions of $d$~variables. Many computational problems of practical importance have this form. Examples inlude the solution of specific Dirichlet, Neumann, and Schr\"odinger problems. We show, under appropriate assumptions, that quasilinear problems, whose domain spaces are equipped with product or finite-order weights, are tractable or strongly tractable in the worst case setting. This paper is the first part in a series of papers. Here, we present tractability results for quasilinear problems under general assumptions on quasilinear operators and weights. In future papers, we shall verify these assumptions for quasilinear problems such as the solution of specific Dirichlet, Neumann, and Schr\"odinger problems. | (pdf) |
Agnostically Learning Halfspaces | Adam Kalai, Adam Klivans, Yishay Mansour, Rocco A. Servedio | 2005-08-02 | We consider the problem of learning a halfspace in the agnostic framework of Kearns et al., where a learner is given access to a distribution on labelled examples but the labelling may be arbitrary. The learner's goal is to output a hypothesis which performs almost as well as the optimal halfspace with respect to future draws from this distribution. Although the agnostic learning framework does not explicitly deal with noise, it is closely related to learning in worst-case noise models such as malicious noise. We give the first polynomial-time algorithm for agnostically learning halfspaces with respect to several distributions, such as the uniform distribution over the $n$-dimensional Boolean cube {0,1}^n or unit sphere in n-dimensional Euclidean space, as well as any log-concave distribution in n-dimensional Euclidean space. Given any constant additive factor eps>0, our algorithm runs in poly(n) time and constructs a hypothesis whose error rate is within an additive eps of the optimal halfspace. We also show this algorithm agnostically learns Boolean disjunctions in time roughly 2^{\sqrt{n}} with respect to any distribution; this is the first subexponential-time algorithm for this problem. Finally, we obtain a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere which can tolerate the highest level of malicious noise of any algorithm to date. Our main tool is a polynomial regression algorithm which finds a polynomial that best fits a set of points with respect to a particular metric. We show that, in fact, this algorithm is an arbitrary-distribution generalization of the well known ``low-degree'' Fourier algorithm of Linial, Mansour, & Nisan and has excellent noise tolerance properties when minimizing with respect to the L_1 norm. We apply this algorithm in conjunction with a non-standard Fourier transform (which does not use the traditional parity basis) for learning halfspaces over the uniform distribution on the unit sphere; we believe this technique is of independent interest. | (pdf) (ps) |
Learning mixtures of product distributions over discrete domains | Jon Feldman, Ryan O'Donnell, Rocco A. Servedio | 2005-07-28 | We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. We give a $\poly(n/\eps)$ time algorithm for learning a mixture of $k$ arbitrary product distributions over the $n$-dimensional Boolean cube $\{0,1\}^n$ to accuracy $\eps$, for any constant $k$. Previous polynomial time algorithms could only achieve this for $k = 2$ product distributions; our result answers an open question stated independently by Cryan and by Freund and Mansour. We further give evidence that no polynomial time algorithm can succeed when $k$ is superconstant, by reduction from a notorious open problem in PAC learning. Finally, we generalize our $\poly(n/\eps)$ time algorithm to learn any mixture of $k = O(1)$ product distributions over $\{0,1, \dots, b\}^n$, for any $b = O(1)$. | (pdf) (ps) |
Incremental Algorithms for Inter-procedural Analysis of Safety Properties | Christopher L. Conway, Kedar Namjoshi, Dennis Dams, Stephen A. Edwards | 2005-07-10 | Automaton-based static program analysis has proved to be an effective tool for bug finding. Current tools generally re-analyze a program from scratch in response to a change in the code, which can result in much duplicated effort. We present an inter-procedural algorithm that analyzes incrementally in response to program changes and present experiments for a null-pointer dereference analysis. It shows a substantial speed-up over re-analysis from scratch, with a manageable amount of disk space used to store information between analysis runs. | (pdf) (ps) |
Lexicalized Well-Founded Grammars: Learnability and Merging | Smaranda Muresan, Tudor Muresan, Judith Klavans | 2005-06-30 | This paper presents the theoretical foundation of a new type of constraint-based grammars, Lexicalized Well-Founded Grammars, which are adequate for modeling human language and are learnable. These features make the grammars suitable for developing robust and scalable natural language understanding systems. Our grammars capture both syntax and semantics and have two types of constraints at the rule level: one for semantic composition and one for ontology-based semantic interpretation. We prove that these grammars can always be learned from a small set of semantically annotated, ordered representative examples, using a relational learning algorithm. We introduce a new semantic representation for natural language, which is suitable for an ontology-based interpretation and allows us to learn the compositional constraints together with the grammar rules. Besides the learnability results, we give a principle for grammar merging. The experiments presented in this paper show promising results for the adequacy of these grammars in learning natural language. Relatively simple linguistic knowledge is needed to build the small set of semantically annotated examples required for the grammar induction. | (pdf) (ps) |
A Uniform Programming Abstraction for Effecting Autonomic Adaptations onto Software Systems | Giuseppe Valetto, Gail Kaiser, Dan Phung | 2005-06-05 | Most general-purpose work towards autonomic or self-managing systems has emphasized the front end of the feedback control loop, with some also concerned with controlling the back end enactment of runtime adaptations -- but usually employing an effector technology peculiar to one type of target system. While completely generic "one size fits all" effector technologies seem implausible, we propose a general purpose programming model and interaction layer that abstractsaway from the peculiarities of target specific effectors,enabling a uniform approach to controlling and coordinatingthe low-level execution of reconfigurations, repairs,micro-reboots, etc | (pdf) |
The Appearance of Human Skin | Takanori Igarashi, Ko Nishino, Shree K. Nayar | 2005-05-31 | Skin is the outer most tissue of the human body. As a result, people are very aware of, and very sensitive to, the appearance of their skin. Consequently, skin appearance has been a subject of great interest in various fields of science and technology. Research on skin appearance has been intensely pursued in the fields of medicine, cosmetology, computer graphics and computer vision. Since the goals of these fields are very different, each field has tended to focus on specific aspects of the appearance of skin. The goal of this work is to present a comprehensive survey that includes the most prominent results related to skin in these different fields and show how these seemingly disconnected studies are related to one another. | (pdf) |
Time-Varying Textures | Sebastian Enrique, Melissa Koudelka, Peter Belhumeur, Julie Dorsey, Shree Nayar, Ravi Ramamoorthi | 2005-05-25 | Essentially all computer graphics rendering assumes that the reflectance and texture of surfaces is a static phenomenon. Yet, there is an abundance of materials in nature whose appearance varies dramatically with time, such as cracking paint, growing grass, or ripening banana skins. In this paper, we take a significant step towards addressing this problem, investigating a new class of time-varying textures. We make three contributions. First, we describe the carefully controlled acquisition of datasets of a variety of natural processes including the growth of grass, the accumulation of snow, and the oxidation of copper. Second, we show how to adapt quilting-based methods to time-varying texture synthesis, addressing the important challenges of maintaining temporal coherence, efficient synthesis on large time-varying datasets, and reducing visual artifacts specific to time-varying textures. Finally, we show how simple procedural techniques can be used to control the evolution of the results, such as allowing for a faster growth of grass in well lit (as opposed to shadowed) areas. | (pdf) (ps) |
Merging Globally Rigid Formations of Mobile Autonomous Agents | Tolga Eren, Brian Anderson, Walter Whiteley, Stephen Morse, Peter Belhumeur | 2005-05-19 | This paper is concerned with merging globally rigid formations of mobile autonomous agents. A key element in all future multi-agent systems will be the role of sensor and communication networks as an integral part of coordination. Network topologies are critically important for autonomous systems involving mobile underwater, ground and air vehicles and for sensor networks. This paper focuses on developing techniques and strategies for the analysis and design of sensor and network topologies required to merge globally rigid formations for cooperative tasks. Central to the development of these techniques and strategies will be the use of tools from rigidity theory, and graph theory. | (pdf) |
Optimal State-Free, Size-aware Dispatching for Heterogeneous $M/G/$-type systems | Hanhua Fengand Vishal Misra, Dan Rubenstein, Dan Rubenstein | 2005-05-04 | We consider a cluster of heterogeneous servers, modeled as $M/G/1$ queues with different processing speeds. The scheduling policies for these servers can be either processor-sharing or first-come first-serve. Furthermore, a dispatcher that assigns jobs to the servers takes as input only the size of the arriving job and the overall job-size distribution. This general model captures the behavior of a variety of real systems, such as web server clusters. Our goal is to identify assignment strategies that the dispatcher can perform to minimize expected completion time and waiting time. We show that there exist optimal strategies that are deterministic, fixing the server to which jobs of particular sizes are always sent. We prove that the optimal strategy for systems with identical servers assigns a non-overlapping interval range of job sizes to each server. We then prove that when server processing speeds differ, it is necessary to assign each server a distinct set of intervals of job sizes in order to minimize expected waiting or response times. We explore some of the practical challenges of identifying the optimal strategy, and also study a related problem that uses our model of how to provision server processing speeds to minimize waiting and completion time given a job size distribution and fixed aggregate processing power. | (pdf) (ps) |
A Hybrid Approach to Topological Mobile Robot Localization | Paul Blaer, Peter K. Allen | 2005-04-27 | We present a hybrid method for localizing a mobile robot in a complex environment. The method combines the use of multiresolution histograms with a signal strength analysis of existing wireless networks. We tested this localization procedure on the campus of Columbia University with our mobile robot, the Autonomous Vehicle for Exploration and Navigation of Urban Environments. Our results indicate that localization accuracy is significantly improved when five levels of resolution are used instead of one in color histogramming. We also find that incorporating wireless signal strengths into the method further improves reliability and helps to resolve ambiguities which arise when different regions have similar visual appearances. | (pdf) |
Classical and Quantum Complexity of the Sturm-Liouville Eigenvalue Problem | A. Papageorgiou, H. Wozniakowski | 2005-04-22 | We study the approximation of the smallest eigenvalue of a Sturm-Liouville problem in the classical and quantum settings. We consider a univariate Sturm-Liouville eigenvalue problem with a nonnegative function $q$ from the class $C^2([0,1])$ and study the minimal number $n(\e)$ of function evaluations or queries that are necessary to compute an $\e$-approximation of the smallest eigenvalue. We prove that $n(\e)=\Theta(\e^{-1/2})$ in the (deterministic) worst case setting, and $n(\e)=\Theta(\e^{-2/5})$ in the randomized setting. The quantum setting offers a polynomial speedup with {\it bit} queries and an exponential speedup with {\it power} queries. Bit queries are similar to the oracle calls used in Grover's algorithm appropriately extended to real valued functions. Power queries are used for a number of problems including phase estimation. They are obtained by considering the propagator of the discretized system at a number of different time moments. They allow us to use powers of the unitary matrix $\exp(\tfrac12 {\rm i}M)$, where $M$ is an $n\times n$ matrix obtained from the standard discretization of the Sturm-Liouville differential operator. The quantum implementation of power queries by a number of elementary quantum gates that is polylog in $n$ is an open issue. We study the approximation of the smallest eigenvalue of a Sturm-Liouville problem in the classical and quantum settings. We consider a univariate Sturm-Liouville eigenvalue problem with a nonnegative function $q$ from the class $C^2([0,1])$ and study the minimal number $n(\e)$ of function evaluations or queries that are necessary to compute an $\e$-approximation of the smallest eigenvalue. We prove that $n(\e)=\Theta(\e^{-1/2})$ in the (deterministic) worst case setting, and $n(\e)=\Theta(\e^{-2/5})$ in the randomized setting. The quantum setting offers a polynomial speedup with {\it bit} queries and an exponential speedup with {\it power} queries. Bit queries are similar to the oracle calls used in Grover's algorithm appropriately extended to real valued functions. Power queries are used for a number of problems including phase estimation. They are obtained by considering the propagator of the discretized system at a number of different time moments. They allow us to use powers of the unitary matrix $\exp(\tfrac12 {\rm i}M)$, where $M$ is an $n\times n$ matrix obtained from the standard discretization of the Sturm-Liouville differential operator. The quantum implementation of power queries by a number of elementary quantum gates that is polylog in $n$ is an open issue. | (pdf) (ps) |
Improving Database Performance on Simultaneous Multithreading Processors | Jingren Zhou, John Cieslewicz, Kenneth A. Ross, Mihir Shah | 2005-04-18 | Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. Because threads share processor resources, an SMT system is inherently different from a multiprocessor system and, therefore, utilizing multiple threads on an SMT processor creates new challenges for database implementers. We investigate three thread-based techniques to exploit SMT architectures on memory-resident data. First, we consider running independent operations in separate threads, a technique applied to conventional multiprocessor systems. Second, we describe a novel implementation strategy in which individual operators are implemented in a multi-threaded fashion. Finally, we introduce a new data-structure called a work-ahead set that allows us to use one of the threads to aggressively preload data into the cache for use by the other thread. We evaluate each method with respect to its performance, implementation complexity, and other measures. We also provide guidance regarding when and how to best utilize the various threading techniques. Our experimental results show that by taking advantage of SMT technology we achieve a 30\% to 70\% improvement in throughput over single threaded implementations on in-memory database operations. | (pdf) |
Quantum algorithms and complexity for certain continuous and related discrete problems | Marek Kwas | 2005-04-14 | The thesis contains an analysis of two computational problems. The first problem is discrete quantum Boolean summation. This problem is a building block of quantum algorithms for many continuous problems, such as integration, approximation, di®erential equations and path integration. The second problem is continuous multivariate Feynman-Kac path integration, which is a special case of path integration. The quantum Boolean summation problem can be solved by the quantum summation (QS) algorithm of Brassard, Høyer, Mosca and Tapp, which approximates the arithmetic mean of a Boolean function. We improve the error bound of Brassard et al. for the worst-probabilistic setting. Our error bound is sharp. We also present new sharp error bounds in the average-probabilistic and worst-average settings. Our average-probabilistic error bounds prove the optimality of the QS algorithm for a certain choice of its parameters. The study of the worst-average error shows that the QS algorithm is not optimal in this setting; we need to use a certain number of repetitions to regain its optimality. The multivariate Feynman-Kac path integration problem for smooth multivariate functions su®ers from the provable curse of dimensionality in the worst-case deterministic setting, i.e., the minimal number of function evaluations needed to compute an approximation depends exponentially on the number of variables. We show that in both the randomized and quantum settings the curse of dimensionality is vanquished, i.e., the minimal number of function evaluations and/or quantum queries required to compute an approximation depends only polynomially on the reciprocal of the desired accuracy and has a bound independent of the number of variables. The exponents of these polynomials are 2 in the randomized setting and 1 in the quantum setting. These exponents can be lowered at the expense of the dependence on the number of variables. Hence, the quantum setting yields exponential speedup over the worst-case deterministic setting, and quadratic speedup over the randomized setting. | (pdf) |
A Hybrid Hierarchical and Peer-to-Peer Ontology-based Global Service Discovery System | Knarig Arabshian, Henning Schulzrinne | 2005-04-06 | Current service discovery systems fail to span across the globe and they use simple attribute-value pair or interface matching for service description and querying. We propose a global service discovery system, GloServ, that uses the description logic Web Ontology Language (OWL DL). The GloServ architecture spans both local and wide area networks. It maps knowledge obtained by the service classification ontology to a structured peer-to-peer network such as a Content Addressable Network (CAN). GloServ also performs automated and intelligent registration and querying by exploiting the logical relationships within the service ontologies. | (pdf) (ps) |
Multi-Language Edit-and-Continue for the Masses | Marc Eaddy, Steven Feiner | 2005-04-05 | We present an Edit-and-Continue implementation that allows regular source files to be treated like interactively updatable, compiled scripts, coupling the speed of compiled na-tive machine code, with the ability to make changes without restarting. Our implementa-tion is based on the Microsoft .NET Framework and allows applications written in any .NET language to be dynamically updatable. Our solution works with the standard ver-sion of the Microsoft Common Language Runtime, and does not require a custom com-piler or runtime. Because no application changes are needed, it is transparent to the appli-cation developer. The runtime overhead of our implementation is low enough to support updating real-time applications (e.g., interactive 3D graphics applications). | (pdf) (ps) |
Similarity-based Multilingual Multi-Document Summarization | David Kirk Evans, Kathleen McKeown, Judi, Kathleen McKeown, Judith L. Klavans | 2005-03-31 | We present a new approach for summarizing clusters of documents on the same event, some of which are machine translations of foreign-language documents and some of which are English. Our approach to multilingual multi-document summarization uses text similarity to choose sentences from English documents based on the content of the machine translated documents. A manual evaluation shows that 68\% of the sentence replacements improve the summary, and the overall summarization approach outperforms first-sentence extraction baselines in automatic ROUGE-based evaluations. | (pdf) (ps) |
802.11b Throughput with Link Interference | Hoon Chang, Vishal Misra | 2005-03-29 | IEEE 802.11 MAC is a CSMA/CA protocol and uses RTS/CTS exchanges to avoid the hidden terminal problem. Recent findings have revealed that the carriersensing range set in current major implementations does not detect and prevent all interference signals even using RTS/CTS access method together. In this paper, we investigate the effect of interference and develop a mathematical model for it. We demonstrate that the 802.11 DCF does not properly act on the interference channel due to the small size and the exponential increment of backoff windows. The accuracy of our model is verified via simulations. Based on an insight from our model, we present a simple protocol that operates on the top of 802.11 MAC layer and achieves more throughput than rate-adjustment schemes. | (pdf) (ps) |
A Lower Bound for Quantum Phase Estimation | Arvid J. Bessen | 2005-03-22 | We obtain a query lower bound for quantum algorithms solving the phase estimation problem. Our analysis generalizes existing lower bound approaches to the case where the oracle Q is given by controlled powers Q^p of Q, as it is for example in Shor's order finding algorithm. In this setting we will prove a log (1/epsilon) lower bound for the number of applications of Q^p1, Q^p2, ... This bound is tight due to a matching upper bound. We obtain the lower bound using a new technique based on frequency analysis. | (pdf) (ps) |
The Power of Various Real-Valued Quantum Queries | Arvid J. Bessen | 2005-03-22 | The computation of combinatorial and numerical problems on quantum computers is often much faster than on a classical computer in numbers of queries. A query is a procedure by which the quantum computer gains information about the specific problem. Different query definitions were given and our aim is to review them and to show that these definitions are not equivalent. To achieve this result we will study the simulation and approximation of one query type by another. While approximation is easy in one direction, we will show that it is hard in the other direction by a lower bound for the numbers of queries needed in the simulation. The main tool in this lower bound proof is a relationship between quantum algorithms and trigonometric polynomials that we will establish. | (pdf) (ps) |
Rigid Formations with Leader-Follower Architecture | Tolga Eren, Walter Whiteley, Peter N. Belhumeur | 2005-03-14 | This paper is concerned with information structures used in rigid formations of autonomous agents that have leader-follower architecture. The focus of this paper is on sensor/network topologies to secure control of rigidity. We extend our previous approach for formations with symmetric neighbor relations to include formations with leader-follower architecture. Necessary and sufficient conditions for stably rigid directed formations are given including both cyclic and acyclic directed formations. Some useful steps for creating topologies of directed rigid formations are developed. An algorithm to determine the directions of links to create stably rigid directed formations from rigid undirected formations is presented. It is shown that k-cycles (k > 2) do not cause inconsistencies when measurements are noisy, while 2-cycles do. Simulation results are presented for (i) a rigid acyclic formation, (i) a flexible formation, and (iii) a rigid formation with cycles. | (pdf) (ps) |
P2P Video Synchronization in a Collaborative Virtual Environment | Suhit Gupta, Gail Kaiser | 2005-02-25 | We have previously developed a collaborative virtual environment (CVE) for small-group virtual classrooms, intended for distance learning by geographically dispersed students. The CVE employs a peer-to-peer approach to the frequent real-time updates to the 3D virtual worlds required by avatar movements (fellow students in the same room are depicted by avatars). This paper focuses on our extension to the P2P model to support group viewing of lecture videos, called VECTORS, for Video Enhanced Collaboration for Team Oriented Remote Synchronization. VECTORS supports synchronized viewing of lecture videos, so the students all see "the same thing at the same time", and can pause, rewind, etc. in synchrony while discussing the lecture material via "chat". We are particularly concerned with the needs of the technologically disenfranchised, e.g., whose only Web/Internet access if via dialup or other relatively low-bandwidth networking. Thus VECTORS employs semantically compressed videos with meager bandwidth requirements. Further, the videos are displayed as a sequence of JPEGs on the walls of a 3D virtual room, requiring fewer local multimedia resources than full motion MPEGs. | (pdf) |
A Study on NSIS Interaction with Internet Route Changes | Charles Shen, Henning Schulzrinne, Sung-Hyuck Lee, Jong Ho Bang | 2005-02-24 | Design of Next Step In Signaling (NSIS) protocol and IP routing interaction requires a good understanding of today's Internet routing behavior. In this report we present a routing measurement experiment to characterize current Internet dynamics, including routing pathology, routing prevalence and routing persistence. The focus of our study is route change. We look at the types, duration and likely causes of different route changes and discuss their impact to the design of NSIS. We also review common route change detection methods and investigate rules to determine whether a route change happened in a node's forward-looking or backward-looking direction is detectable. We introduce typical NSIS deployment models and discuss specific categories of route changes that should be considered in each of these models. With the NSIS deployment models in mind, we further give experimental evaluation of two route change detection methods - the packet TTL monitoring method and a new delay variation monitoring method. | (pdf) (ps) |
Adding Self-healing capabilities to the Common Language Runtime | Rean Griffith, Gail Kaiser | 2005-02-23 | Self-healing systems require that repair mechanisms are available to resolve problems that arise while the system executes. Managed execution environments such as the Common Language Runtime (CLR) and Java Virtual Machine (JVM) provide a number of application services (application isolation, security sandboxing, garbage collection and structured exception handling) which are geared primarily at making managed applications more robust. However, none of these services directly enables applications to perform repairs or consistency checks of their components. From a design and implementation standpoint, the preferred way to enable repair in a self-healing system is to use an externalized repair/adaptation architecture rather than hardwiring adaptation logic inside the system where it is harder to analyze, reuse and extend. We present a framework that allows a repair engine to dynamically attach and detach to/from a managed application while it executes essentially adding repair mechanisms as another application service provided in the execution environment. | (pdf) (ps) |
Manipulating Managed Execution Runtimes to Support Self-Healing Systems | Rean Griffith, Gail Kaiser | 2005-02-23 | Self-healing systems require that repair mechanisms are available to resolve problems that arise while the system executes. Managed execution environments such as the Common Language Runtime (CLR) and Java Virtual Machine (JVM) provide a number of application services (application isolation, security sandboxing, garbage collection and structured exception handling) which are geared primarily at making managed applications more robust. However, none of these services directly enables applications to perform repairs or consistency checks of their components. From a design and implementation standpoint, the preferred way to enable repair in a self-healing system is to use an externalized repair/adaptation architecture rather than hardwiring adaptation logic inside the system where it is harder to analyze, reuse and extend. We present a framework that allows a repair engine to dynamically attach and detach to/from a managed application while it executes essentially adding repair mechanisms as another application service provided in the execution environment. | (pdf) (ps) |
Genre Classification of Websites Using Search Engine Snippets | Suhit Gupta, Gail Kaiser, Salvatore Stolfo, Hila Becker | 2005-02-03 | Web pages often contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Automatic extraction of "useful and relevant" content from web pages has many applications, including browsing on small cell phone and PDA screens, speech rendering for the visually impaired, and reducing noise for information retrieval systems. Prior work has led to the development of Crunch, a framework which employs various heuristics in the form of filters and filter settings for content extraction. Crunch allows users to tune these settings, essentially the thresholds for applying each filter. However, in order to reduce human involvement in selecting these heuristic settings, we have extended this work to utilize a website's classification, defined by its genre and physical layout. In particular, Crunch would then obtain the settings for a previously unknown website by automatically classifying it as sufficiently similar to a cluster of known websites with previously adjusted settings - which in practice produces better content extraction results than a single one-size-fits-all set of setting defaults. In this paper, we present our approach to clustering a large corpus of websites by their genre, utilizing the snippets generated by sending the website's domain name to search engines as well as the website's own text. We find that exploiting these snippets not only increased the frequency of function words that directly assist in detecting the genre of a website, but also allow for easier clustering of websites. We use existing techniques like Manhattan distance measure and Hierarchical clustering, with some modifications, to pre-classify websites into genres. Our clustering method does not require prior knowledge of the set of genres that websites fit into, but instead discovers these relationships among websites. Subsequently, we are able to classify newly encountered websites in linear-time, and then apply the corresponding filter settings, with no noticeable delay introduced for the content-extracting web proxy. | (pdf) (ps) |
A Uniform Programming Abstraction for Effecting Autonomic Adaptations onto Software Systems | Giuseppe Valetto, Gail Kaiser | 2005-01-30 | Most general-purpose work towards autonomic or self-managing systems has emphasized the front end of the feedback control loop, with some also concerned with controlling the back end enactment of runtime adaptations ^V but usually employing an effector technology peculiar to one type of target system. While completely generic ^Sone size fits all^T effector technologies seem implausible, we propose a general-purpose programming model and interaction layer that abstracts away from the peculiarities of target-specific effectors, enabling a uniform approach to controlling and coordinating the low-level execution of reconfigurations, repairs, micro-reboots, etc. | (pdf) |
Dynamic Adaptation of Rules for Temporal Event Correlation in Distributed Systems | Rean Griffith, Joseph L. Hellerstein, Yixin Diao, Gail Kaiser | 2005-01-30 | Event correlation is essential to realizing self-managing distributed systems. For example, distributed systems often require that events be correlated from multiple systems using temporal patterns to detect denial of service attacks and to warn of problems with business critical applications that run on multiple servers. This paper addresses how to specify timer values for temporal patterns so as to manage the trade-off between false alarms and undetected alarms. A central concern is addressing the variability of event propagation delays due to factors such as contention for network and server resources. To this end, we develop an architecture and an adaptive control algorithm that dynamically compensate for variations in propagation delays. Our approach makes Management Stations more autonomic by avoiding the need for manual adjustments of timer values in temporal rules. Further, studies we conducted of a testbed system suggest that our approach produces results that are at least as good as an optimal fixed setting of timer values. | (pdf) |
The Virtual Device: Expanding Wireless Communication Services Through Service Discovery and Session Mobility | Ron Shacham, Henning Schulzrinne, Srisakul Thakolsri, Wolfgang Kellerer | 2005-01-12 | We present a location-based, ubiquitous service architecture, based on the Session Initiation Protocol (SIP) and a service discovery protocol that enables users to enhance the multimedia communications services available on their mobile devices by discovering other local devices, and including them in their active sessions, creating a "virtual device." We have implemented our concept based on Columbia University's multimedia environment and we show its feasibility by a performance analysis. | (pdf) (ps) |
Autonomic Control for Quality Collaborative Video Viewing | Dan Phung, Giuseppe Valetto, Gail Kaiser | 2004-12-31 | We present an autonomic controller for quality collaborative video viewing, which allows groups of geographically dispersed users with different network and computer resources to view a video in synchrony while optimizing the video quality experienced. The autonomic controller is used within a tool for enhancing distance learning with synchronous group review of online multimedia material. The autonomic controller monitors video state at the clients' end, and adapts the quality of the video according to the resources of each client in (soft) real time. Experimental results show that the autonomic controller successfully synchronizes video for small groups of distributed clients and, at the same time, enhances the video quality experienced by users, in conditions of fluctuating bandwidth and variable frame rate. | (pdf) |
Sequential Challenges in Synthesizing Esterel | Cristian Soviani, Jia Zeng, Stephen A. Edwards | 2004-12-20 | State assignment is a formidable task. As designs written in a hardware description language such as Esterel inherently carry more high level information that a register transfer level model, such information can be used to guide the encoding process. A question arises if the high level information alone is strong enough to suggest an efficient state assignment, allowing low-level details to be ignored. This report suggests that with Esterel's flexibility, most optimization potential is not within the high-level structure. It appears effective state assignment cannot rely solely on high level information. | (pdf) (ps) |
Determining Interfaces using Type Inference | Stephen A. Edwards, Chun Li | 2004-12-20 | Porting software usually requires understanding what library functions the program being ported uses since this functionality must be either found or reproduced in the ported program's new environment. This is usually done manually through code inspections. We propose a type inference algorithm able to infer basic information about the library functions a particular C program uses in the absence of declaration information for the library (e.g., without header files). Based on a simple but efficient inference algorithm, we were able to infer declarations for much of the PalmOS API from the source of a twenty-seven-thousand-line C program. Such a tool will aid in the problem of program understanding when porting programs, especially from poorly-documented or lost legacy environments. | (pdf) (ps) |
Remotely Keyed CryptoGraphics - Secure Remote Display Access Using (Mostly) Untrusted Hardware - Extended Version | Debra L. Cook, Ricardo Baratto, Angelos D. Keromytis | 2004-12-11 | Software that covertly monitors user actions, also known as {\it spyware,} has become a first-level security threat due to its ubiquity and the difficulty of detecting and removing it. Such software may be inadvertently installed by a user that is casually browsing the web, or may be purposely installed by an attacker or even the owner of a system. This is particularly problematic in the case of utility computing, early manifestations of which are Internet cafes and thin-client computing. Traditional trusted computing approaches offer a partial solution to this by significantly increasing the size of the trusted computing base (TCB) to include the operating system and other software. We examine the problem of protecting a user accessing specific services in such an environment. We focus on secure video broadcasts and remote desktop access when using any convenient, and often untrusted, terminal as two example applications. We posit that, at least for such applications, the TCB can be confined to a suitably modified graphics processing unit (GPU). Specifically, to prevent spyware on untrusted clients from accessing the user's data, we restrict the boundary of trust to the client's GPU by moving image decryption into GPUs. We use the GPU in order to leverage existing capabilities as opposed to designing a new component from scratch. We discuss the applicability of GPU-based decryption in these two sample scenarios and identify the limitations of the current generation of GPUs. We propose straightforward modifications to future GPUs that will allow the realization of the full approach. | (pdf) (ps) |
Obstacle Avoidance and Path Planning Using a Sparse Array of Sonars | Matei Ciocarlie | 2004-12-08 | This paper proposes an exploration method for robots equipped with a set of sonar sensors that does not allow for complete coverage of the robot's close surroundings. In such cases, there is a high risk of collision with possible undetected obstacles. The proposed method, adapted for use in urban outdoors environments, minimizes such risks while guiding the robot towards a predefined target location. During the process, a compact and accurate representation of the environment can be obtained. | (pdf) |
End System Service Examples | Xiaotao Wu, Henning Schulzrinne | 2004-12-07 | This technical report investigates services suitable for end systems. We look into ITU Q.1211 services, AT&T 5ESS switch services, services defined in CSTA Phase III, and new services integrating other Internet services, such as presence information. We also explore how to use the Language for End System Services (LESS) to program the services. | (pdf) (ps) |
WebPod: Persistent Web Browsing Sessions with Pocketable Storage Devices | Shaya Potter, Jason Nieh | 2004-11-19 | We present WebPod, a portable device for managing web browsing sessions. WebPod leverages capacity improvements in portable solid state memory devices to provide a consistent environment to access the web. WebPod provides a thin virtualization layer that decouples a user's web session from any particular end-user device, allowing users freedom to move their work environments around. We have implemented a prototype in Linux that works with existing unmodified applications and operating system kernels. Our experimental results demonstrate that WebPod has very low virtualization overhead and can provide a full featured web browsing experience, including support for all helper applications and plug-ins one expects. WebPod is able to efficiently migrate a user's web session. This enables improved user mobility while maintaining a consistent work environment. | (pdf) |
Design and Verification Languages | Stephen A. Edwards | 2004-11-17 | After a few decades of research and experimentation, register-transfer dialects of two standard languages---Verilog and VHDL---have emerged as the industry standard starting point for automatic large-scale digital integrated circuit synthesis. Writing RTL descriptions of hardware remains a largely human process and hence the clarity, precision, and ease with which such descriptions can be coded correctly has a profound impact on the quality of the final product and the speed with which the design can be created. While the efficiency of a design (e.g., the speed at which it can run or the power it consumes) is obviously important, its correctness is usually the paramount issue, consuming the majority of the time (and hence money) spent during the design process. In response to this challenge, a number of so-called verification languages have arisen. These have been designed to assist in a simulation-based or formal verification process by providing mechanisms for checking temporal properties, generating pseudorandom test cases, and for checking how much of a design's behavior has been exercised by the test cases. Through examples and discussion, this report describes the two main design languages---VHDL and Verilog---as well as SystemC, a language currently used to build large simulation models; SystemVerilog, a substantial extension of Verilog; and OpenVera, e, and PSL, the three leading contenders for becoming the main verification language. | (pdf) (ps) |
Extracting Context To Improve Accuracy For HTML Content Extraction | Suhit Gupta, Gail Kaiser, Salvatore Stolfo | 2004-11-08 | Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "useful and relevant" content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, reducing noise for information retrieval systems and to generally improve the web browsing experience. In our previous work [16], we developed a framework that employed an easily extensible set of techniques that incorporated results from our earlier work on content extraction [16]. Our insight was to work with DOM trees, rather than raw HTML markup. We present here filters that reduce human involvement in applying heuristic settings for websites and instead automate the job by detecting and utilizing the physical layout and content genre of a given website. We also present work we have done towards improving the usability and performance of our content extraction proxy as well as the quality and accuracy of the heuristics that act as filters for inferring the context of a webpage. | (pdf) (ps) |
Peer-to-Peer Internet Telephony using SIP | Kundan Singh, Henning Schulzrinne | 2004-10-31 | P2P systems inherently have high scalability, robustness and fault tolerance because there is no centralized server and the network self-organizes itself. This is achieved at the cost of higher latency for locating the resources of interest in the P2P overlay network. Internet telephony can be viewed as an application of P2P architecture where the participants form a self-organizing P2P overlay network to locate and communicate with other participants. We propose a pure P2P architecture for the Session Initiation Protocol (SIP)-based IP telephony systems. Our P2P-SIP architecture supports basic user registration and call setup as well as advanced services such as offline message delivery, voice/video mails and multi-party conferencing. We also provide an overview of practical challenges for P2P-SIP such as firewall, Network Address Translator (NAT) traversal and security. | (pdf) (ps) |
Service Learning in Internet Telephony | Xiaotao Wu, Henning Schulzrinne | 2004-10-29 | Internet telephony can introduce many novel communication services, however, novelty puts learning burden on users. It will be a great help to users if their desired services can be created automatically. We developed an intelligent communication service creation environment which can handle automatic service creation by learning from users' daily communication behaviors. The service creation environment models communication services as decision trees and uses the Incremental Tree Induction (ITI) algorithm for decision tree learning. We use Language for End System Services (LESS) scripts to represent learned results and implemented a simulation environment to verify the learning algorithm. We also noticed that when users get their desired services, they may not be aware of unexpected behaviors that the serivces could introduce, for example, mistakenly rejecting expected calls. In this paper, we also did a comprehensive analysis on communication service fail-safe handling and propose several approaches to create fail-safe services. | (pdf) (ps) |
A Microrobotic System For Protein Streak Seeding | Atanas Georgiev, Peter K. Allen, Ting Song, Andrew Laine, William Edstrom, John Hunt | 2004-10-28 | We present a microrobotic system for protein crystal micromanipulation tasks. The focus in this report is on a task called streak seeding, which is used by crystallographers to entice certain protein crystals to grow. Our system features a set of custom designed micropositioner end-effectors we call microshovels to replace traditional tools used by crystallographers for this task. We have used micro-electrical mechanical system (MEMS) techniques to design and manufacture various shapes and quantities of microshovels. Visual feedback from a camera mounted on the microscope is used to control the micropositioner as it lowers a microshovel into the liquid containing the crystals for poking and streaking. We present experimental results that illustrate the applicability of our approach. | (pdf) |
Preventing Spam For SIP-based Instant Messages and Sessions | Kumar Srivastava, Henning Schulzrinne | 2004-10-28 | As IP telephony becomes more widely deployed and used, tele-marketers or other spammers are bound to start using SIP-based calls and instant messages as a medium for sending spam. As is evident from the fate of email, protection against spam has to be built into SIP systems otherwise they are bound to fall prey to spam. Traditional approaches used to prevent spam in email such as content-based filtering and access lists are not applicable to SIP calls and instant messages in their present form. We propose Domain-based Authentication and Policy-Enforced for SIP (DAPES): a system that can be easily implemented and deployed in existing SIP networks. Our system is capable of determining in real time, whether an incoming call or instant message is likely to be spam or not, while at the same time, supporting communication between both known and unknown parties. DAPES includes the deployment of reputation systems in SIP networks to enable real-time transfer of reputation information between parties to allow communication between entities unknown to each other. | (pdf) (ps) |
Programmable Conference Server | Henning Schulzrinne, Kundan Singh, Xiaotao Wu | 2004-10-15 | Conferencing services for Internet telephony and multimedia can be enhanced by the integration of other Internet services, such as instant messaging, presence notification, directory lookups, location sensing, email and web. These services require a service programming architecture that can easily incorporate new Internet services into the existing conferencing functionalities, such as voice-enabled conference control. W3C has defined the Call Control eXtensible Markup Language (CCXML), along with its VoiceXML, for telephony call control services in a point-to-point call. However, it cannot handle other Internet service events such as presence enabled conferences. In this paper, we propose an architecture combining VoiceXML with our Language for End System Services (LESS) and the Common Gateway Interface (CGI) for multi-party conference service programming that integrates existing Internet services. VoiceXML provides the voice interface to LESS and CGI scripts. Our architecture enables many novel services such as conference setup based on participant location and presence status. We give some examples of the new services and describe our on-going implementation. | (pdf) (ps) |
An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol | Salman A. Baset, Henning Schulzrinne | 2004-10-11 | Skype is a peer-to-peer VoIP client developed by KaZaa in 2003. Skype claims that it can work almost seamlessly across NATs and firewalls and has better voice quality than the MSN and Yahoo IM applications. It encrypts calls end-to-end, and stores user information in a decentralized fashion. Skype also supports instant messaging and conferencing. This report analyzes key Skype functions such as login, NAT and firewall traversal, call establishment, media transfer, codecs, and conferencing under three different network setups. Analysis is performed by careful study of Skype network traffic. | (pdf) (ps) |
Building a Reactive Immune System for Software Services | Stelios Sidiroglou, Michael E. Locasto, Stephen W. Boyd, Angelos D. Keromytis | 2004-10-10 | We propose a new approach for reacting to a wide variety of software failures, ranging from remotely exploitable vulnerabilities to more mundane bugs that cause abnormal program termination (e.g., illegal memory dereference). Our emphasis is in creating "self-healing" software that can protect itself against a recurring fault until a more comprehensive fix is applied. Our system consists of a set of sensors that monitor applications for various types of failure and an instruction-level emulator that is invoked for selected parts of a program's code. Use of such an emulator allows us to predict recurrences of faults, and recover program execution to a safe control flow. Using the emulator for small pieces of code, as directed by the sensors, allows us to minimize the performance impact on the immunized application. We discuss the overall system architecture and a prototype implementation for the x86 platform. We evaluate the efficacy of our approach against a range of attacks and other software failures and investigate its performance impact on several server-type applications. We conclude that our system is effective in preventing the recurrence of a wide variety of software failures at a small performance cost. | (pdf) (ps) |
Live CD Cluster Performance | Haronil Estevez, Stephen A. Edwards | 2004-10-04 | In this paper, we present a performance comparison of two linux live CD distributions, Knoppix (v.3.3) and Quantian (v 0.4.96). The library used for performance evaluation is the Parallel Image Processing Toolkit (PIPT), a software library that contains several parallel image processing routines. A set of images was chosen and a batch job of PIPT routines were run and timed using both live CD distributions. The point of comparison between the two live CDs was the total time the batch job required for completion. | (pdf) (ps) |
Information Structures to Secure Control of Rigid Formations with Leader-Follower Structure | Tolga Eren, Walter Whiteley, Brian D.O. Anderson, A. Stephen Morse, Peter N. Belhumeur | 2004-09-29 | This paper is concerned with rigid formations of mobile autonomous agents using a leader-follower structure. A formation is a group of agents moving in real 2- or 3- dimensional space. A formation is called rigid if the distance between each pair of agents does not change over time under ideal conditions. Sensing/communication links between agents are used to maintain a rigid formation. Two agents connected by a sensing/communication link are called neighbors. There are two types of neighbor relations in rigid formations. In the first type, the neighbor relation is symmetric. In the second type, the neighbor relation is asymmetric. Rigid formations with a leader-follower structure have the asymmetric neighbor relation. A framework to analyze rigid formations with symmetric neighbor relations is given in our previous work. This paper suggests an approach to analyze rigid formations that have a leader-follower structure. | (pdf) (ps) |
An Investigation Into the Detection of New Information | Barry Schiffman, Kathleen R. McKeown | 2004-09-29 | This paper explores new-information detection, describing a strategy for filtering a stream of documents to present only information that is fresh. We focus on multi-document summarization and seek to efficiently use more linguistic information than is often seen in such systems. We experimented with our linguistic system and with a more traditional sentence-based, vector-space system and found that a combination of the two approaches boosted performance over each one alone. | (pdf) (ps) |
Machine Learning and Text Segmentation in Novelty Detection | Barry Schiffman, Kathleen R. McKeown | 2004-09-29 | This paper explores a combination of machine learning, approximate text segmentation and a vector-space model to distinguish novel information from repeated information. In experiments with the data from the Novelty Track at the Text Retrieval Conference, we show improvements over a variety of approaches, in particular in raising precision scores on this data, while maintaining a reasonable amount of recall. | (pdf) (ps) |
Voice over TCP and UDP | Xiaotang Zhang, Henning Schulzrinne | 2004-09-28 | We compare UDP and TCP when transmitting voice data using PlanetLab where we can do experiments globally. For TCP, we also do experiments using TCP NODELAY which sends out requests immediately. We compare the performance of different protocols by their 90th percentile delay and jitter. The performance of UDP is better than that of TCP NODELAY and the performance TCP NODELAY is better than that of TCP. We also explore the relation between TCP delay time minus the transmission time and the packet loss rate and find there is a linear relationship between them. | (pdf) (ps) |
Using Execution Transactions To Recover From Buffer Overflow Attacks | Stelios Sidiroglou, Angelos D. Keromytis | 2004-09-13 | We examine the problem of containing buffer overflow attacks in a safe and efficient manner. Briefly, we automatically augment source code to dynamically catch stack and heap-based buffer overflow and underflow attacks, and recover from them by allowing the program to continue execution. Our hypothesis is that we can treat each code function as a transaction that can be aborted when an attack is detected, without affecting the application's ability to correctly execute. Our approach allows us to selectively enable or disable components of this defensive mechanism in response to external events, allowing for a direct tradeoff between security and performance. We combine our defensive mechanism with a honeypot-like con guration to detect previously unknown attacks and automatically adapt an application's defensive posture at a negligible performance cost, as well as help determine a worm's signature. The main benefits of our scheme are its low impact on application performance, its ability to respond to attacks without human intervention, its capacity to handle previously unknown vulnerabilities, and the preservation of service availability. We implemented a stand-alone tool, DYBOC, which we use to instrument a number of vulnerable applications. Our performance benchmarks indicate a slow-down of 20% for Apache in full-protection mode, and 1.2% with partial protection. We validate our transactional hypothesis via two experiments: first, by applying our scheme to 17 vulnerable applications, successfully fixing 14 of them; second, by examining the behavior of Apache when each of 154 potentially vulnerable routines are made to fail, resulting in correct behavior in 139 of cases. | (pdf) (ps) |
A Theoretical Analysis of the Conditions for Unambiguous Node Localization in Sensor Networks | Tolga Eren, Walter Whiteley, Peter N. Belhumeur | 2004-09-13 | In this paper we provide a theoretical foundation for the problem of network localization in which some nodes know their locations and other nodes determine their locations by measuring distances or bearings to their neighbors. Distance information is the separation between two nodes connected by a sensing/communication link. Bearing is the angle between a sensing/communication link and the x-axis of a node's local coordinate system. We construct grounded graphs to model network localization and apply graph rigidity theory and parallel drawings to test the conditions for unique localizability and to construct uniquely localizable networks. We further investigate partially localizable networks. | (pdf) (ps) |
Modeling and Managing Content Changes in Text Databases | Panagiotis G. Ipeirotis, Alexandros Ntoulas, Junghoo Cho, Luis Gravano | 2004-08-12 | Large amounts of (often valuable) information are stored in web-accessible text databases. ``Metasearchers'' provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not need to change over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this paper, we first report the results of a study showing how the content summaries of 152 real web databases evolved over a period of 52 weeks. Then, we show how to use ``survival analysis'' techniques in general, and Cox's proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the quality of our schedules experimentally over real web databases. | (pdf) (ps) |
Cross-Dimensional Gestural Interaction Techniques for Hybrid Immersive Environments | Hrvoje Benko, Edward W. Ishak, Steven Feiner | 2004-08-09 | We present a set of interaction techniques for a hybrid user interface that integrates existing 2D and 3D visualization and interaction devices. Our approach is built around one- and two-handed gestures that support the seamless transition of data between co-located 2D and 3D contexts. Our testbed environment combines a 2D multi-user, multi-touch, projection surface with 3D head-tracked, see-through, head-worn displays and 3D tracked gloves to form a multi-display augmented reality. We also address some of the ways in which we can interact with private data in a collaborative, heterogeneous workspace. | (pdf) (ps) |
Group Ratio Round-Robin: O(1) Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems | Bogdan Caprita, Wong Chun Chan, Jason Nieh, Clifford Stein, Haoqiang Zheng | 2004-07-30 | Proportional share resource management provides a flexible and useful abstraction for multiplexing time-shared resources. We present Group Ratio Round-Robin ($GR^3$), the first proportional share scheduler that combines accurate proportional fairness scheduling behavior with $O(1)$ scheduling overhead on both uniprocessor and multiprocessor systems. $GR^3$ uses a novel client grouping strategy to organize clients into groups of similar processor allocations which can be more easily scheduled. Using this grouping strategy, $GR^3$ combines the benefits of low overhead round-robin execution with a novel ratio-based scheduling algorithm. $GR^3$ can provide fairness within a constant factor of the ideal generalized processor sharing model for client weights with a fixed upper bound and preserves its fairness properties on multiprocessor systems. We have implemented $GR^3$ in Linux and measured its performance against other schedulers commonly used in research and practice, including the standard Linux scheduler, Weighted Fair Queueing, Virtual-Time Round-Robin, and Smoothed Round-Robin. Our experimental results demonstrate that $GR^3$ can provide much lower scheduling overhead and much better scheduling accuracy in practice than these other approaches. | (pdf) |
THINC: A Remote Display Architecture for Thin-Client Computing | Ricardo A. Baratto, Jason Nieh, Leo Kim | 2004-07-29 | Rapid improvements in network bandwidth, cost, and ubiquity combined with the security hazards and high total cost of ownership of personal computers have created a growing market for thin-client computing. We introduce THINC, a remote display system architecture for high-performance thin-client computing in both LAN and WAN environments. THINC transparently maps high-level application display calls to a few simple low-level commands which can be implemented easily and efficiently. THINC introduces a number of novel latency-sensitive optimization techniques, including offscreen drawing awareness, command buffering and scheduling, non-blocking display operation, native video support, and server-side screen scaling. We have implemented THINC in an XFree86/Linux environment and compared its performance with other popular approaches, including Citrix MetaFrame, Microsoft Terminal Services, SunRay, VNC, and X. Our experimental results on web and video applications demonstrate that THINC can be as much as five times faster than traditional thin-client systems in high latency network environments and is capable of playing full-screen video at full frame rate. | (pdf) |
The Simplicity and Safety of the Language for End System Services (LESS) | Xiaotao Wu, Henning Schulzrinne | 2004-07-20 | This paper analyzes the simplicity and safety of the Language for End System Services (LESS). | (pdf) (ps) |
Efficient Shadows from Sampled Environment Maps | Aner Ben-Artzi, Ravi Ramamoorthi, , Maneesh Agrawala | 2004-06-10 | This paper addresses the problem of efficiently calculating shadows from environment maps. Since accurate rendering of shadows from environment maps requires hundreds of lights, the expensive computation is determining visibility from each pixel to each light direction, such as by ray-tracing. We show that coherence in both spatial and angular domains can be used to reduce the number of shadow rays that need to be traced. Specifically, we use a coarse-to-fine evaluation of the image, predicting visibility by reusing visibility calculations from four nearby pixels that have already been evaluated. This simple method allows us to explicitly mark regions of uncertainty in the prediction. By only tracing rays in these and neighboring directions, we are able to reduce the number of shadow rays traced by up to a factor of 20 while maintaining error rates below 0.01\%. For many scenes, our algorithm can add shadowing from hundreds of lights at twice the cost of rendering without shadows. | (pdf) (ps) |
Efficient Algorithms for the Design of Asynchronous Control Circuits | Michael Theobald | 2004-05-27 | Asynchronous (or ``clock-less'') digital circuit design has received much attention over the past few years, including its introduction into consumer products. One major bottleneck to the further advancement of clock-less design is the lack of optimizing CAD (computer-aided design) algorithms and tools. In synchronous design, CAD packages have been crucial to the advancement of the microelectronics industry. In fact, automated methods seem to be even more crucial for asynchronous design, which is widely considered as being much more error-prone. This thesis proposes several new efficient CAD techniques for the design of asynchronous control circuits. The contributions include: (i) two new and very efficient algorithms for hazard-free two-level logic minimization, including a heuristic algorithm, ESPRESSO-HF, and an exact algorithm based on implicit data structures, IMPYMIN; and (ii) a new synthesis and optimization method for large-scale asynchronous systems, which starts from a Control-Dataflow Graph (CDFG), and produces highly-optimized distributed control. As a case study, this latter method is applied to a differential equation solver; the resulting synthesized circuit is comparable in quality to a highly-optimized manual design. | (ps) |
On decision trees, influences, and learning monotone decision trees | Ryan O'Donnell, Rocco A. Servedio | 2004-05-26 | In this note we prove that a monotone boolean function computable by a decision tree of size $s$ has average sensitivity at most $\sqrt{\log_2 s}$. As a consequence we show that monotone functions are learnable to constant accuracy under the uniform distribution in time polynomial in their decision tree size. | (pdf) (ps) |
Orchestrating the Dynamic Adaptation of Distributed Software with Process Technology | Giuseppe Valetto | 2004-05-24 | Software systems are becoming increasingly complex to develop, understand, analyze, validate, deploy, configure, manage and maintain. Much of that complexity is related to ensuring adequate quality levels to services provided by software systems after they are deployed in the field, in particular when those systems are built from and operated as a mix of proprietary and non-proprietary components. That translates to increasing costs and difficulties when trying to operate large-scale distributed software ensembles in a way that continuously guarantees satisfactory levels of service. A solution can be to exert some form of dynamic adaptation upon running software systems: dynamic adaptation can be defined as a set of automated and coordinated actions that aim at modifying the structure, behavior and performance of a target software system, at run time and without service interruption, typically in response to the occurrence of some condition(s). To achieve dynamic adaptation upon a given target software system, a set of capabilities, including monitoring, diagnostics, decision, actuation and coordination, must be put in place. This research addresses the automation of decision and coordination in the context of an end-to-end and externalized approach to dynamic adaptation, which allows to address as its targets legacy and component-based systems, as well as new systems developed from scratch. In this approach, adaptation provisions are superimposed by a separate software platform, which operates from the outside of and orthogonally to the target application as a whole; furthermore, a single adaptation possibly spans concerted interventions on a multiplicity of target components. To properly orchestrate those interventions, decentralized process technology is employed for describing, activating and coordinating the work of a cohort of software actuators, towards the intended end-to-end dynamic adaptation. The approach outlined above, has been implemented in a prototype, code-named Workflakes, within the Kinesthetics eXtreme project investigating externalized dynamic adaptation, carried out by the Programming Systems Laboratory of Columbia University, and has been employed in a set of diverse case studies. This dissertation discusses and evaluates the concept of process-based orchestration of dynamic adaptation and the Workflakes prototype on the basis of the results of those case studies. | (pdf) |
Elastic Block Ciphers: The Feistel Cipher Case | Debra L. Cook, Moti Yung, , Angelos Keromytis | 2004-05-19 | We discuss the elastic versions of block ciphers whose round function processes subsets of bits from the data block differently, such as occurs in a Feistel network and in MISTY1. We focus on how specific bits are selected to be swapped after each round when forming the elastic version, using an elastic version of MISTY1 and differential cryptanalysis to illustrate why this swap step must be carefully designed. We also discuss the benefit of adding initial and final key dependent permutations in all elastic block ciphers. The implementation of the elastic version of MISTY1 is analyzed from a performance perspective. | (pdf) (ps) |
Exploiting the Structure in DHT Overlays for DoS Protection | Angelos Stavrou, Angelos Keromytis, Dan | 2004-04-30 | Peer to Peer (P2P) systems that utilize Distributed Hash Tables (DHTs) provide a scalable means to distribute the handling of lookups. However, this scalability comes at the expense of increased vulnerability to specific types of attacks. In this paper, we focus on insider denial of service (DoS) attacks on such systems. In these attacks, nodes that are part of the DHT system are compromised and used to flood other nodes in the DHT with excessive request traffic. We devise a distributed lightweight protocol that detects such attacks, implemented solely within nodes that participate in the DHT. Our approach exploits inherent structural invariants of DHTs to ferret out attacking nodes whose request patterns deviate from ``normal'' behavior. We evaluate our protocol's ability to detect attackers via simulation within a Chord network. The results show that our system can detect a simple attacker whose attack traffic deviates by as little as 5\% from a normal request traffic. We also demonstrate the resiliency of our protocol to coordinated attacks by up to as many as 25\% of nodes. Our work shows that DHTs can protect themselves from insider flooding attacks, eliminating an important roadblock to their deployment and use in untrusted environments. | (pdf) (ps) |
Host-based Anomaly Detection Using Wrapping File Systems | Shlomo Hershkop, Linh H. Bui, Ryan Ferst, Salvatore J. Stolfo | 2004-04-24 | We describe an anomaly detector, called FWRAP, for a Host-based Intrusion Detection System that monitors file system calls to detect anomalous accesses. The system is intended to be used not as a standalone detector but one of a correlated set of host-based sensors. The detector has two parts, a sensor that audits file systems accesses, and an unsupervised machine learning system that computes normal models of those accesses.We report on the architecture of the file system sensor implemented on Linux using the FiST file wrapper technology and results of the anomaly detector applied to experimental data acquired from this sensor. FWRAP employs the Probabilistic Anomaly Detection (PAD) algorithm previously reported in our work on Windows Registry Anomaly Detection. The detector is first trained by operating the host computer for some amount of time and a model specific to the target machine is automatically computed by PAD, intended to be deployed to a real-time detector. In this paper we describe the feature set used to model file system accesses, and the performance results of a set of experiments using the sensor while attacking a Linux host with a variety of malware exploits. The PAD detector achieved impressive detection rates in some cases over 95\% and about a 2\% false positive rate when alarming on anomalous processes. | (pdf) (ps) |
Self-Managing Systems: A Control Theory Foundation | Yixin Diao, Joseph L. Hellerstein, Sujay Parekh, Rean Griffith, Gail Kaiser, Dan Phung | 2004-04-01 | The high cost of ownership of computing systems has resulted in a number of industry initiatives to reduce the burden of operations and management. Examples include IBM's Autonomic Computing, HP's Adaptive Infrastructure, and Microsoft's Dynamic Systems Initiative. All of these efforts seek to reduce operations costs by increased automation, ideally to have systems be self-managing without any human intervention (since operator error has been identified as a major source of system failures). While the concept of automated operations has existed for two decades, as a way to adapt to changing workloads, failures and (more recently) attacks, the scope of automation remains limited. We believe this is in part due to the absence of a fundamental understanding of how automated actions affect system behavior, especially system stability. Other disciplines such as mechanical, electrical, and aeronautical engineering make use of control theory to design feedback systems. This paper uses control theory as a way to identify a number of requirements for and challenges in building self-managing systems, either from new components or layering on top of existing components. | (pdf) |
Blurring of Light due to Multiple Scattering by the Medium, a Path Integral Approach | Michael Ashikhmin, Simon Premoze, Ravi R, Shree Nayar | 2004-03-31 | Volumetric light transport effects are significant for many materials like skin, smoke, clouds or water. In particular, one must consider the multiple scattering of light within the volume. Recently, we presented a path integral-based approach to this problem which identifies the most probable path light takes in the medium and approximates energy transport over all paths by only those surrounding this most probable one. In this report we use the same approach to derive useful expressions for the amount of spacial and angular blurring light experiences as it travels through a medium. | (pdf) (ps) |
Jitter-Camera: High Resolution Video from a Low Resolution Detector | Moshe Ben-Ezra, Assaf Zomet, , Shree K. Nayar | 2004-03-25 | Video cameras must produce images at a reasonable frame-rate and with a reasonable depth of field. These requirements impose fundamental physical limits on the spatial resolution of the image detector. As a result, current cameras produce videos with a very low resolution. The resolution of videos can be computationally enhanced by moving the camera and applying super-resolution reconstruction algorithms. However, a moving camera introduces motion blur, which limits super-resolution quality. We analyze this effect and derive a theoretical result showing that motion blur has a substantial degrading effect on the performance of super resolution. The conclusion is, that in order to achieve the highest resolution, motion blur should be avoided. Motion blur can be minimized by sampling the space-time volume of the video in a specific manner. We have developed a novel camera, called the jitter camera, that achieves this sampling. By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, we show that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has a significantly higher resolution than the captured one. | (pdf) (ps) |
Improved Controller Synthesis from Esterel | Cristian Soviani, Jia Zeng, , Stephen A. Edwards | 2004-03-22 | We present a new procedure for automatically synthesizing controllers from high-level Esterel specifications. Unlike existing \textsc{rtl} synthesis approaches, this approach frees the designer from tedious bit-level state encoding and certain types of inter-machine communication. Experimental results suggest that even with a fairly primitive state assignment heuristic, our compiler consistently produces smaller, slightly faster circuits that the existing Esterel compiler. We mainly attribute this to a different style of distributing state bits throughout the circuit. Initial results are encouraging, but some hand-optimized encodings suggest room for a better state assignment algorithm. We are confident that such improvements will make our technique even more practical. | (pdf) (ps) |
MobiDesk: Mobile Virtual Desktop Computing | Ricardo Baratto, Shaya Potter, Gong Su, , Jason Nieh | 2004-03-19 | We present MobiDesk, a mobile virtual desktop computing hosting infrastructure that leverages continued improvements in network speed, cost, and ubiquity to address the complexity, cost, and mobility limitations of today's personal computing infrastructure. MobiDesk transparently virtualizes a user's computing session by abstracting underlying system resources in three key areas: display, operating system and network. MobiDesk provides a thin virtualization layer that decouples a user's computing session from any particular end user device and moves all application logic from end user devices to hosting providers. MobiDesk virtualization decouples a user's computing session from the underlying operating system and server instance, enabling high availability service by transparently migrating sessions from one server to another during server maintenance or upgrades. We have implemented a MobiDesk prototype in Linux that works with existing unmodified applications and operating system kernels. Our experimental results demonstrate that MobiDesk has very low virtualization overhead, can provide a full-featured desktop experience including full-motion video support, and is able to migrate users' sessions efficiently and reliably for high availability, while maintaining existing network connections. | (pdf) (ps) |
When one Sample is not Enough: Improving Text Database Selection Using Shrinkage | Panagiotis G. Ipeirotis, Luis Gravano | 2004-03-17 | Database selection is an important step when searching over large numbers of distributed text databases. The database selection task relies on statistical summaries of the database contents, which are not typically exported by databases. Previous research has developed algorithms for constructing an approximate content summary of a text database from a small document sample extracted via querying. Unfortunately, Zipf's law practically guarantees that content summaries built this way for any relatively large database will fail to cover many low-frequency words. Incomplete content summaries might negatively affect the database selection process, especially for short queries with infrequent words. To improve the coverage of approximate content summaries, we build on the observation that topically similar databases tend to have related vocabularies. Therefore, the approximate content summaries of topically related databases can complement each other and increase their coverage. Specifically, we exploit a (given or derived) hierarchical categorization of the databases and adapt the notion of ``shrinkage'' --a form of smoothing that has been used successfully for document classification-- to the content summary construction task. A thorough evaluation over 315 real web databases as well as over TREC data suggests that the shrinkage-based content summaries are substantially more complete than their ``unshrunk'' counterparts. We also describe how to modify existing database selection algorithms to adaptively decide --at run-time-- whether to apply shrinkage for a query. Our experiments, which rely on TREC data sets, queries, and the associated ``relevance judgments,'' show that our shrinkage-based approach is significantly more accurate than state-of-the-art database selection algorithms, including a recently proposed hierarchical strategy that also exploits database classification. | (pdf) (ps) |
Collaborative Distributed Intrusion Detection | Michael E. Locasto, Janak J. Parekh, Sal, Vishal Misra | 2004-03-08 | The rapidly increasing array of Internet-scale threats is a pressing problem for every organization that utilizes the network. Organizations often have limited resources to detect and respond to these threats. The sharing of information related to probes and attacks is a facet of an emerging trend toward "collaborative security." Collaborative security mechanisms provide network administrators with a valuable tool in this increasingly hostile environment. The perceived benefit of a collaborative approach to intrusion detection is threefold: greater clarity about attacker intent, precise models of adversarial behavior, and a better view of global network attack activity. While many organizations see value in adopting such a collaborative approach, several critical problems must be addressed before intrusion detection can be performed on an inter-organizational scale. These obstacles to collaborative intrusion detection often go beyond the merely technical; the relationships between cooperating organizations impose additional constraints on the amount and type of information to be shared. We propose a completely decentralized system that can efficiently distribute alerts to each collaborating peer. The system is composed of two major components that embody the main contribution of our research. The first component, named Worminator, is a tool for extracting relevant information from alert streams and encoding it in Bloom filters. The second component, Whirlpool, is a software system for scheduling correlation relationships between peer nodes. The combination of these systems accomplishes alert distribution in a scalable manner and without violating the privacy of each administrative domain. | (pdf) (ps) |
Failover and Load Sharing in SIP Telephony | Kundan Singh, Henning Schulzrinne | 2004-03-01 | We apply some of the existing web server redundancy techniques for high service availability and scalability to the relatively new IP telephony context. The paper compares various failover and load sharing methods for registration and call routing servers based on the Session Initiation Protocol (SIP). In particular, we consider the SIP server failover techniques based on the clients, DNS (Domain Name Service), database replication and IP address takeover, and the load sharing techniques using DNS, SIP identifiers, network address translators and servers with same IP addresses. Additionally, we present an overview of the failover mechanism we implemented in our test-bed using our SIP proxy and registration server and the open source MySQL database. | (pdf) (ps) |
Virtual Environment for Collaborative Distance Learning With Video Synchronization | Suhit Gupta, Gail Kaiser | 2004-02-25 | We present a 3D collaborative virtual environment, CHIME, in which geographically dispersed students can meet together in study groups or to work on team projects. Conventional educational materials from heterogeneous backend data sources are reflected in the virtual world through an automated metadata extraction and projection process that structurally organizes container materials into rooms and interconnecting doors, with atomic objects within containers depicted as furnishings and decorations. A novel in-world authoring tool makes it easy for instructors to design environments, with additional in-world modification afforded to the students themselves, in both cases without programming. Specialized educational services can also be added to virtual environments via programmed plugins. We present an example plugin that supports synchronized viewing of lecture videos by groups of students with widely varying bandwidths. | (pdf) |
Optimizing Quality for Collaborative Video Viewing | Dan Phung, Giuseppe Valetto, Gail Kaiser, Suhit Gupta | 2004-02-25 | The increasing popularity of distance learning and online courses has highlighted the lack of collaborative tools for student groups. In addition, the introduction of lecture videos into the online curriculum has drawn attention to the disparity in the network resources used by the students. We present an architecture and adaptation model called AI2TV (Adaptive Internet Interactive Team Video), a system that allows geographically dispersed participants, possibly some or all disadvantaged in network resources, to collaboratively view a video in synchrony. AI2TV upholds the invariant that each participant will view semantically equivalent content at all times. Video player actions, like play, pause and stop, can be initiated by any of the participants and the results of those actions are seen by all the members. These features allow group members to review a lecture video in tandem to facilitate the learning process. We employ an autonomic (feedback loop) controller that monitors clients' video status and adjusts the quality of the video according to the resources of each client. We show in experimental trials that our system can successfully synchronize video for distributed clients while, at the same time, optimizing the video quality given actual (fluctuating) bandwidth by adaptively adjusting the quality level for each participant. | (pdf) (ps) |
Elastic Block Ciphers | Debra L. Cook, Moti Yung, , Angelos Keromytis | 2004-02-25 | We introduce a new concept of elastic block ciphers, symmetric-key encryption algorithms that for a variable size input do not expand the plaintext, (i.e., do not require plaintext padding), while maintaining the diffusion property of traditional block ciphers and adjusting their computational load proportionally to the size increase. Elastic block ciphers are ideal for applications where length-preserving encryption is most beneficial, such as protecting variable-length data |