Last August, Wei Hao stepped onto the Google Campus in Sunnyvale, California, as part of the inaugural MLCommons Rising Stars cohort.
Thirty-five recipients, out of over 100 applicants, were invited to this two-day in-person workshop. The cohort had the chance to listen to talks by researchers from Google, Intel, and Meta, and professors from Havard, UC Berkeley, and Cornell about trendy research topics, such as ML for ML systems, software-hardware codesign, and responsible machine learning. They also had the chance to do a poster presentation of their work, where they got useful feedback. The aim of the workshop was to develop community, foster research and career growth, enable collaborations, and discuss career opportunities among the rising generation of researchers at the intersection of machine learning and systems.
“It was a great experience,” said Wei, a third-year PhD student who works with Junfeng Yang and Asaf Cidon. “I always feel the fastest way of developing research ideas is to talk to people and brainstorm, and the workshop was one of the perfect occasions for that.”
His main objective was to make connections, and by the end of the workshop, he came out of it with a potential research collaboration. Along with Amber Liu, a University of Michigan PhD student, they came up with the idea of using a combination of machine learning (ML) models of various sizes to accelerate the inference process of causal language modeling.
We caught up with Wei to talk about his experience at the machine learning workshop and how his PhD life has been.
Q: How did you become part of the workshop?
I applied to the workshop months ago with my resume and a research plan. During the application process, I was not asked to talk about a specific project but an overview of the research I was doing. Looking back, I think this contributed to the diversity of the selected cohort, as people’s work covered the whole stack of ML systems from chip design to application-level ML.
The project I presented at the workshop was titled Nazar: Monitoring and Adapting ML Models on Mobile Devices. The setup is that machine learning models are more and more commonly being pushed to mobile devices due to the convenience of low latency. However, they are often undermined by unpredictable distribution shifts after deployment, such as moderate to severe weather conditions and demographic changes.
We are the first to provide a systematic solution to mitigate the performance degradation of post-deployment models by building a three-stage system that continuously monitors, analyzes, and adapts to distribution shifts without needing user feedback.
Q: Can you talk about your background and why you decided to pursue a PhD?
I engaged in doing research when I was an undergraduate student at the University of Wisconsin-Madison. At the very beginning, getting paid and sharpening my resume were two of my main objectives. However, during the process, I developed an interest in solving open problems that are intellectually challenging.
Moreover, I enjoy defining new problems, which requires a lot of logical thinking but is very rewarding. These two characteristics made me think I am a good candidate for the PhD position. I also really enjoyed the professors I worked with and was encouraged to pursue a PhD. After talking to my current advisors, Junfeng Yang and Asaf Cidon, I was impressed by their enthusiasm and finally made up my mind.
Q: What are your research interests?
My research interest is building efficient and secure systems for machine learning workloads. The reason for pursuing this type of research is my belief in realizing artificial general intelligence (AGI), which requires reliable system support. I decided to focus on it since I found satisfaction in interacting with ML workload while building practical system components while in undergrad.
Q: What sort of research questions or issues do you hope to answer?
Besides the technical questions on how to make ML deployment ubiquitous, I also hope to answer some philosophical questions: What do people expect from using artificial intelligence (AI)? Are there capacity and efficiency boundaries of AI? Which boundaries should I focus on pushing forward in the future?
Q: What are you working on now?
I am building an ML model versioning and management system called MGit.
Models derived from other models are extremely common in machine learning today. For example, transfer learning is used to create task-specific models from “pre-trained” models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values.
However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?).
In the current project I am working on, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating, and collaboration functionality. MGit is able to reduce the lineage graph’s storage footprint by up to 7× and automatically update downstream models in response to updates to upstream models.”
Q: How do you decide what to work on, and what is it like doing research?
I have written four research papers during my PhD so far: Clockworks, DIVA, Nazar, and MGit. All of them are in the field of ML systems and relate to improving the efficiency and robustness of ML applications.
To decide the topics, I always start by brainstorming with my mentors and advisors to derive possible choices. Then, I read related works and define the concrete problem to tackle. The problem definition that I derive at the beginning is usually not exactly the final version before a lot of trial and error.
For example, when we started work on DIVA, we were originally attempting to tame non-determinisms during the model training process. However, I detoured when I read about quantization and found it super interesting. The research morphed into an adversarial attack that tries to enlarge the deviations between ML models and their adapted version on edge devices
Overall, I found the most time-consuming and difficult part of doing research is to define the concrete problem that is logically valid and attractive to me. It can take me up to half a year, while the solutions and corresponding implementations are relatively easy to come up with.
Q: How did your previous experiences prepare you for a PhD?
I started to do research when I was a freshman in college, and I felt well-prepared before my PhD. Since the structure of research projects is more or less the same – brainstorming, defining problems, finding and evaluating solutions, and polishing papers – I get more and more familiar after each project, which makes me confident and not stressed about temporary slow-downs.
Q: Why did you apply to Columbia, and how was that process?
Aside from the prestigious reputation of Columbia and the research interests match, I really appreciate the proactiveness of my advisors during the recruitment process. I still remember that Asaf reached out to me before the application deadline, which made me feel very welcome. Because of him and my previous advisor at Madison, my stress was hugely alleviated during the application process. Thus, I encourage reaching out to whom you are really interested in working with early on, to both students and faculty.
Q: What has been the highlight of your time at Columbia?
The highlight of my time at Columbia so far is when I get the chance to share my research with a wide audience, such as at the CAIT symposium, DSI poster session, or during this interview. I also expect my research to have some real impact, and I believe that day is coming soon.
Q: Was there anything difficult that you had to face while taking your PhD?
So far, there have been three. I think one of the hardest things is to fight the feeling of low self-worth when a paper is rejected by a conference. Then, when a field I am working on attracts too many people, it becomes competitive, and I sometimes feel stressed about this kind of speed race of everyone trying to be the first to come up with something. And some loneliness when seeing friends my age bid farewell to their student life and start a career.
But since I have chosen this road of taking my PhD, I have to bear with these and find other ways to release stress. For example, I recently started indoor cycling at the gym as it is an effective way to burn both calories and overthinking.
Q: Looking back, what would you have done differently?
I would have thought less and got my hands dirty early. Sometimes, I spend too much time reading papers before doing experiments. No one was born prepared, and the earlier one fails, the sooner one can find a way out.
Q: Do you think your skills have been enhanced by your time at Columbia? In which ways?
I think I am more and more confident in delivering my thoughts in a structural way due to the training process of defining concrete problems and writing papers. I also feel that I have gained expertise in my field through the different projects I have taken on.
Q: What is your advice to students on how to navigate their time at Columbia? If they want to do research, what should they know or do to prepare?
My advice to students is to engage in what they feel passionate about as early as possible and not be afraid of failure. For those who are interested in doing research, talk to professors and PhD students proactively about your interests and how you think we can help. Do not be afraid of being an amateur and assume we know everything as the world is moving so fast, especially with the new wave of AI. I think most of us, or at least myself, value vision and passion more than the ability to solve problems, which can definitely be fostered during the PhD journey.
Q: Is there anything else that you think people should know?
My personal goal is to create start-ups that are impactful to society. If you have similar goals or related sources at Columbia that you would like to share, please reach out. Thanks!