Research from the Security & Privacy Group Accepted to USENIX 2021

Four papers from CS researchers were accepted to the 30th USENIX Security Symposium. The yearly conference highlights research on the security and privacy of computer systems and networks.

Below are the abstracts and links to the accepted papers

Formally Verified Memory Protection for a Commodity Multiprocessor Hypervisor
Shih-Wei Li Columbia University, Xupeng Li Columbia University, Ronghui Gu Columbia University, Jason Nieh Columbia University, and John Zhuang Hui Columbia University

Hypervisors are widely deployed by cloud computing providers to support virtual machines, but their growing complexity poses a security risk, as large codebases contain many vulnerabilities. We present SeKVM, a layered Linux KVM hypervisor architecture that has been formally verified on multiprocessor hardware. Using layers, we isolate KVM’s trusted computing base into a small core such that only the core needs to be verified to ensure KVM’s security guarantees. Using layers, we model hardware features at different levels of abstraction tailored to each layer of software. Lower hypervisor layers that configure and control hardware are verified using a novel machine model that includes multiprocessor memory management hardware such as multi-level shared page tables, tagged TLBs, and a coherent cache hierarchy with cache bypass support. Higher hypervisor layers that build on the lower layers are then verified using a more abstract and simplified model, taking advantage of layer encapsulation to reduce proof burden. Furthermore, layers provide modularity to reduce verification effort across multiple implementation versions. We have retrofitted and verified multiple versions of KVM on Arm multiprocessor hardware, proving the correctness of the implementations and that they contain no vulnerabilities that can affect KVM’s security guarantees. Our work is the first machine-checked proof for a commodity hypervisor using multiprocessor memory management hardware. SeKVM requires only modest KVM modifications and incurs only modest performance overhead versus unmodified KVM on real application workloads.

 

Fine Grained Dataflow Tracking with Proximal Gradients
Gabriel Ryan Columbia University, Abhishek Shah Columbia University, Dongdong She Columbia University, Koustubha Bhat, Vrije Universiteit Amsterdam; Suman Jana Columbia University

Dataflow tracking with Dynamic Taint Analysis (DTA) is an important method in systems security with many applications, including exploit analysis, guided fuzzing, and side-channel information leak detection. However, DTA is fundamentally limited by the Boolean nature of taint labels, which provide no information about the significance of detected dataflows and lead to false positives/negatives on complex real world programs.

We introduce proximal gradient analysis (PGA), a novel, theoretically grounded approach that can track more accurate and fine-grained dataflow information. PGA uses proximal gradients, a generalization of gradients for non-differentiable functions, to precisely compose gradients over non-differentiable operations in programs. Composing gradients over programs eliminates many of the dataflow propagation errors that occur in DTA and provides richer information about how each measured dataflow effects a program.

We compare our prototype PGA implementation to three state of the art DTA implementations on 7 real-world programs. Our results show that PGA can improve the F1 accuracy of data flow tracking by up to 33% over taint tracking (20% on average) without introducing any significant overhead (< 5% on average). We further demonstrate the effectiveness of PGA by discovering 22 bugs (20 confirmed by developers) and 2 side-channel leaks, and identifying exploitable dataflows in 19 existing CVEs in the tested programs.

 

AdCube: WebVR Ad Fraud and Practical Confinement of Third-Party Ads
Hyunjoo Lee Korea Advanced Institute of Science and Technology, Jiyeon Lee Korea Advanced Institute of Science and Technology, Daejun Kim Korea Advanced Institute of Science and Technology, Suman Jana Columbia University, Insik Shin Korea Advanced Institute of Science and Technology, and Sooel Son Korea Advanced Institute of Science and Technology

Web technology has evolved to offer 360-degree immersive browsing experiences. This new technology, called WebVR, enables virtual reality by rendering a three-dimensional world on an HTML canvas. Unfortunately, there exists no browser-supported way of sharing this canvas between different parties. Assuming an abusive ad service provider who exploits this absence, we present four new ad fraud attack methods. Our user study demonstrates that the success rates of our attacks range from 88.23% to 100%, confirming their effectiveness. To mitigate the presented threats, we propose AdCube, which allows publishers to specify the behaviors of third-party ad code and enforce this specification. We show that AdCube is able to block the presented threats with a small page loading latency of 236 msec and a negligible frame-per-second (FPS) drop for nine WebVR official demo sites.

 

Cost-Aware Robust Tree Ensembles for Security Applications
Yizheng Chen Columbia University, Shiqi Wang Columbia University, Weifan Jiang Columbia University, Asaf Cidon Columbia University, and Suman Jana Columbia University

There are various costs for attackers to manipulate the features of security classifiers. The costs are asymmetric across features and to the directions of changes, which cannot be precisely captured by existing cost models based on Lp-norm robustness. In this paper, we utilize such domain knowledge to increase the attack cost of evading classifiers, specifically, tree ensemble models that are widely used by security tasks. We propose a new cost modeling method to capture the feature manipulation cost as constraint, and then we integrate the cost-driven constraint into the node construction process to train robust tree ensembles. During the training process, we use the constraint to find data points that are likely to be perturbed given the feature manipulation cost, and we use a new robust training algorithm to optimize the quality of the trees. Our cost-aware training method can be applied to different types of tree ensembles, including gradient boosted decision trees and random forest models. Using Twitter spam detection as the case study, our evaluation results show that we can increase the attack cost by 10.6X compared to the baseline. Moreover, our robust training method using cost-driven constraint can achieve higher accuracy, lower false positive rate, and stronger cost-aware robustness than the state-of-the-art training method using L∞-norm cost model. Our code is available at https://github.com/surrealyz/growtrees.