11:40 AM to 12:40 PM
Amir Gholami, UC Berkeley
One of the main challenges in designing, training, and implementing Neural Networks is their high demand for computational and memory resources. Designing a model for a new task requires searching through an exponentially large space to find the right architecture, which requires multiple training runs on a large dataset. This has a prohibitive computational cost, as training each candidate architecture often requires millions of iterations. Even after the right architecture with good accuracy is found, implementing it on a target hardware platform to meet latency and power constraints is not straightforward. I will present a framework that efficiently utilizes reduced-precision computing to address the above challenges by considering the full stack of designing, training, and implementing the model on a target hardware platform. This is achieved through careful analysis of the numerical instabilities associated with reduced-precision matrix operations, incorporation of a novel second-order, mixed-precision quantization approach, and a framework for hardware aware neural network design.
11:40 AM to 12:40 PM
Zhihao Jia, Stanford University
As an increasingly important workload, machine learning (ML) applications require different performance optimization techniques from traditional runtimes and compilers. In particular, to accelerate ML applications, it is generally necessary to perform ML computations on heterogeneous hardware and parallelize computations using multiple data dimensions, neither of which is even expressible in traditional compilers and runtimes. In this talk, I will describe my work on automated discovery of performance optimizations to accelerate ML computations.
TASO, the Tensor Algebra SuperOptimizer, optimizes the computation graphs of deep neural networks (DNNs) by automatically generating potential graph optimizations and formally verifying their correctness. TASO outperforms rule-based graph optimizers in existing ML systems (e.g., TensorFlow, TensorRT, and TVM) by up to 3x by automatically discovering novel graph optimizations, while also requiring significantly less human effort.
FlexFlow is a system for accelerating distributed DNN training. FlexFlow identifies parallelization dimensions not considered in existing ML systems (e.g., TensorFlow and PyTorch) and automatically discovers fast parallelization strategies for a specific parallel machine. Companies and national labs are using FlexFlow to train production ML models that do not scale well in current ML systems, achieving over 10x performance improvement.
I will also outline future research directions for further automating ML systems, such as codesigning ML models, software systems, and hardware backends for end-to-end ML deployment.
11:40 AM to 12:40 PM
Rohan Padhye, UC Berkeley
Software bugs affect the security, performance, and reliability of critical systems that much of our society depends on. In practice, the predominant method of ensuring software quality is via extensive testing. Although software developers have considerable domain expertise, handcrafted tests often fail to catch corner cases. Automated testing techniques such as random fuzzing are a promising approach for discovering unexpected inputs that may cause programs to crash. However, by relying solely on hardcoded heuristics, their effectiveness as push-button tools is limited when the test program, the input format, or the testing objective becomes complex. Can we empower software developers to specialize automated testing tools using their domain expertise?
In this talk, I will describe new abstractions and algorithms that enable users to dramatically improve the effectiveness of random fuzzing by subtly transforming the search space. The corresponding research tools such as JQF+Zest, PerfFuzz, and FuzzFactory have unlocked the capability to easily discover new classes of software bugs from compiler optimization failures to algorithmic performance bottlenecks and memory consumption issues. My research tools have helped identify security vulnerabilities affecting billions of devices, have been adopted by firms such as Netflix and Samsung, and have been commercialized as services by multiple startups.