Research Fair Spring 2024

Blue Computer Science "CS@CU" logo with Columbia crown

Research Fair Spring 2024

The Spring 2024 Research Fair will be held on Thursday, January 18th, from 12:00 to 14:00 in the CS Lounge (CSB 452). This is an opportunity to meet faculty and Ph.D. Students working in areas of interest to you and possibly work on these projects.

Please read their requirements carefully! There will be a couple of Zoom sessions and recordings available – see below for all details.

In Person

Faculty/Lab: Prof Ken Ross, Junyoung Kim

Brief Research Project Description: Our group will be offering two research projects. The first project involves extending a database management system with features to allow matrix and matrix-like operators to be used alongside traditional relational algebra query operators. These matrix operators could potentially help support machine learning within the DBMS.
The second project involves using novel processor-in-memory technologies to implement database query algorithms.

Required/preferred prerequisites and qualifications: Introduction to databases (4111 or equivalent) required.


Faculty/Lab: Prof Chengzhi Mao, Prof Junfeng Yang

Brief Research Project Description: We will work on safety and understanding of large language model. We are interested in LLM-based Agent, which includes application to embodied AI and task solving.

Required/preferred prerequisites and qualifications: Willing to spend 2 whole days on research a week. Taken deep learning/computer vision/natural language processing or other related class.
Know python programming and pytorch, past lab/research experience is preferred.
Students are motivated to push their advisor.


Faculty/Lab: Prof Brian Plancher (https://a2r-lab.org/)

Brief Research Project Description: The A2R lab’s work is usually focused on the design of robotic algorithms and implementations at the intersection of computer systems / architecture and numerical optimization (with a little ML thrown in there as well). This semester, the majority of open projects focus on different aspects of GPU acceleration of numerical optimal control algorithms in the pursuit of whole body nonlinear model predictive control for locomotion. You can find more detailed descriptions of our projects at: https://bit.ly/A2R-Spring24-Projects. If you are interested, please fill out our form by January 19, 2024 in order to ensure that your application is taken under full consideration: https://bit.ly/A2R-Spring24-Join. Late form submission will be considered provided there is still space in the lab for the given term.

Required/preferred prerequisites and qualifications: Preference for students with prior experience working with open source software on GitHub, coding in C(++), parallel programming, numerical optimization, or robotics software. Ideal candidates will have taken (or be currently enrolled in) COMS BC 3159 Parallel Optimization for Robotics or a similar course. This is not a requirement – just a preference.

Note: If you are an undergraduate consider taking COMS BC 3159 Parallel Optimization for Robotics this semester (https://brianplancher.com/courses/coms-bc3159-sp24/) which would prepare you for research in this or future semesters and includes a large project component you could use for research ideas! If you are a masters of PhD student consider TAing the course!.


Faculty/Lab: Prof Lydia Chilton

Brief Research Project Description: We design, build and deploy database tools to help farmers worldwide make decisions to adapt to climate change. These tools span the data lifecycle pipeline including data collection, cleaning, visualization, and decision making. Have a suite of tools in deployment and we are looking for students to help maintain and extend tools to new regions.

Required/preferred prerequisites and qualifications: Databases (required), UI Design (nice to have).


Faculty/Lab: Albert Boulanger

Brief Research Project Description: Data Engineering for Multiple Studies of Alzheimer’s Disease: Joint Cohort Explorer. This project, the Joint Cohort Explorer, is centered on data engineering for the Thompson Project, a large-scale initiative to use multiple datasets to move the needle on Alzheimer’s Disease and now being used for Multiple Sclerosis. Possible student projects (not exhaustive):
– UI-Experience Design
– JCE integration with REDCap (NLP/Text processing involved)
– Development Environment/Project Management
– Back-end systems/data engineering/User Interface (Django CMS)
– Alternatives exploration:
— Migrate JCE components to R Studio Connect
— Server-Side rendering (using NextJS)
— Explore other Pivot table options: Webassembly based: https://perspective.finos.org/

In addition, there will be an information session on Zoom, see below.

Required/preferred prerequisites and qualifications: The development currently involves the use of a graph database (Neo4J) along with Django CMS, Django, and Plotly Dash, WebAssembly, Bootstrap, and React as underlying web technologies. NLP/Text Processing used in integrating JCE with REDCap.


Faculty/Lab: Prof Itsik Pe’er & Philippe Chlenski

Brief Research Project Description: Multiple projects. Titles: Cellular Plasticity in Cancer; Metagenomics transformer using species embeddings; UI Design for CellStitch; Classification in mixed-curvature spaces; Hyperbolic embeddings for binary trees.

Required/preferred prerequisites and qualifications: different for different projects; No biological background is required.


Faculty/Lab: Prof Corey Toler-Franklin

Brief Research Project Description: Graphics Imaging & Light Measurement Lab (GILMLab)/Corey Toler-Franklin.

Current Projects:
AI for Quantum Physics & Appearance Modeling
Quantum Level Optical Interactions in Complex Materials
The wavelength dependence of fluorescence is used in the physical sciences for material analysis and identification. However, fluorescent measurement techniques like mass spectrometry are expensive and often destructive. Empirical measurement systems effectively simulate material appearance but are time consuming, requiring densely sampled measurements. Leveraging GPU processing and shared super computing resources, we are developing deep learning models that incorporate principles from quantum mechanics theory to solve large scale many-body problems in physics for non-invasive identification of complex proteinaceous materials.

AI for Multimodal Data & Document Analysis
Deciphering Findings from the Tulsa Race Massacre Death Investigation
The Tulsa Race Massacre (1921) destroyed a flourishing Black community and left up to 300 people dead. More than 1000 homes were burned and destroyed. Efforts are underway to locate the bodies of victims and reconstruct lost historical information for their families. Collaborating with the Tulsa forensics team, we are developing spectral imaging methods (on-site) for deciphering information on eroded materials (stone engravings, rusted metal, and deteriorated wood markings), and a novel multimodal transformer network to associate recovered information on gravestones with death certificates and geographical information from public records.

AI for Cancer Detection
Identifying Cancer Cells and Their Biomarker Expressions
Cell quantitation techniques are used in biomedical research to diagnose and treat cancer. Current quantitation methods are subjective and based mostly on visual impressions of stained tissue samples. This time-consuming process causes delays in therapy that reduce the effectiveness of treatments and add to patient distress. Our lab is developing computational algorithms that use deep learning to model changes in protein structure from multispectral observations of tissue. Once computed, the model can be applied to any tissue observation to detect a variety of protein markers without further spectral analysis. The deep learning model will be quantitatively evaluated on a learning dataset of cancer tumors.

AI for Neuroscience
Deep Learning for Diagnosing and Treating Neurological Disorders
Advances in biomedical research are based upon two foundations, preclinical studies using animal models, and clinical trials with human subjects. However, translation from basic animal research to treatment of human conditions is not straightforward. Preclinical studies in animals may not replicate across labs, and a multitude of preclinical leads have failed in human clinical trials. Inspired by recent generative models for semi-supervised action recognition and probabilistic 3D human motion prediction, we are developing a system that learns animal behavior from unstructured video frames without labels or annotations. Our approach extends a generative model to incorporate adversarial inference, and transformer-based self-attention modules.

Required/preferred prerequisites and qualifications: Python and/or C/C++, Machine Learning experience (a plus but not required).

********

Faculty/Lab: IRT Lab – Caspar Lant PhD Student

Brief Research Project Description: GPS-Based Pedestrian Trajectory Prediction at Intersections.
Can we figure out which way a person will cross the street based on the previous behaviour of other pedestrians? Let’s then use this information to implement more sophisticated queuing algorithms at intersections, leading to greater efficiency and a safer environment for our most vulnerable road users. This project is part of an accelerator program, so if you’re ML-savvy and interested in founder roles in the future, this could be a good fit for you.

Required/preferred prerequisites and qualifications: Experience with transformer models, wrangling big datasets, business/pitch experience a plus.

*******


Faculty/Lab: Prof. Venkat Venkatasubramanian

Brief Research Projects Descriptions:

1. Self-Organization and the Emergence of Collective Intelligence – Prof. Venkat Venkatasubramanian

Using agent-based models to study the emergence of pattern formation and collective intelligence in biological and ecological species such as birds, ants, mussels, etc. 

Prerequisites: Python programming and tons of curiosity to ask unusual questions

2.  Architecture of Deep Neural Nets and LLMs – Prof. Venkat Venkatasubramanian

Analyzing the structure of deep neural nets in order to improve their performance using tools from statistical mechanics, game theory, and topology. 

 Prerequisites: machine learning, Python programming, and tons of curiosity to ask unusual questions  

3. Explainable AI – Prof. Venkat Venkatasubramanian and Arijit Chakraborty

Automatically discovering analytical models from data and developing mechanism-based explanations using NLP and ChatGPT-like systems. These will be studied in the context of various chemical/physical systems such as Zeolites. 

Prerequisites: machine learning, NLP,  Python programming, and tons of curiosity to ask unusual questions  

4. Hypergraphs for Automated Mechanism Reduction – Prof. Venkat Venkatasubramanian and Arijit Chakraborty

Automatically reducing the complexity of chemical reaction mechanisms using hypergraphs.

Prerequisites: machine learning, Python programming, and tons of curiosity to ask unusual questions 

5. Semantic search for drug discovery –  Prof. Venkat Venkatasubramanian and Dr. Vipul Mann

Using ontologies to perform semantic search and Q&A for scientific documents for accelerated drug discovery.

Prerequisites: machine learning, NLP,  Python programming, and tons of curiosity to ask unusual questions  

Faculty/Lab: Prof Steven Feiner & CGUI Lab Research Fair

Brief Research Project Description: The Computer Graphics and User Interfaces Lab (Prof. Feiner, PI) does research in the design of 3D and 2D user interfaces, including augmented reality and virtual reality, and mobile and wearable systems, for people interacting individually and together, indoors and outdoors. We use a range of displays and devices: head-worn, hand-held, and table-top, including Varjo XR-3, HP Reverb G2 Omnicept, Meta Quest Pro/3, Valve Index, HoloLens 2, Magic Leap 2, Xreal Light, Snap Next-Generation Spectacles, 3D Systems Touch, and phones. Multidisciplinary projects potentially involve working with faculty and students in other schools and departments, from medicine and dentistry to earth and environmental sciences and social work.

Required/preferred prerequisites and qualifications: We’re looking for students who have done excellent work in one or more of the following courses or their equivalents elsewhere: COMS W4160 (Computer graphics), COMS W4170 (User interface design), COMS W4172 (3D user interfaces and augmented reality), and COMS E6998 (Topics in VR & AR), and who have software design and development expertise. For those projects involving 3D user interfaces, we’re especially interested in students with Unity experience.

Faculty/Lab: DVMM Lab; Hammad Ayyubi (PhD Student); Prof. Shih-Fu Chang (Advisor)

Brief Research Project Description: Multimodal Reasoning. Specifically, we will work on the task of utilizing visual data to improve the reasoning ability of models on temporal event ordering in text-only data.

Required/preferred prerequisites and qualifications: Prior experience in Python/PyTorch and training deep learning models is a must. Hands-on experience and/or knowledge of Transformer architectures is preferred.

Contact: hayyubi@cs.columbia.edu

Zoom

Faculty/Lab: IRT Lab / Professor Henning Schulzrinne, PhD Student Luoyao Hao (l.hao@columbia.edu)

Brief Research Project Description: We design solutions and implement systems to make IoT systems reliable in high heterogeneity, manageable and programmable at large scale. We will have three project opportunities this semester: (1) Identity-independent IoT policy server. We build a policy server that focuses on parsing relationships instead of identities and evaluating upcoming requests against policies (e.g., security or energy related policies). We will study policy specifications, implement some new features based on an existing prototype, and evaluate the system using IoT datasets. (2) Distributed IoT device discovery and authorization. We will explore solutions based on mDNS and capability tokens. (3) Firewall solutions tailored to IoT. We will go through some quick tutorials to grasp a programming language for programmable networks and build a firewall prototype based on it. The firewall will introduce manufacture-specified profiles and dynamically capture network traffic. Feel free to contact Luoyao (l.hao@columbia.edu) if you are not able to attend the session.

Required/preferred prerequisites and qualifications: One or more of the following: Proficient in Python (Flask framework is a plus), good understanding of computer networks (CSEE 4119 or similar), experienced with dataset processing, experienced with Restful API or web development.

Date: January 17th @ 10:00 EST

Link: https://columbiauniversity.zoom.us/j/95292049161?pwd=ZU9ZYzl2bmUrK1JtQ2J6bmhLMGZIdz09

Passcode: 137233


Faculty/Lab: Albert Boulanger

Data Engineering for Multiple Studies of Alzheimers Disease: Joint Cohort
Explorer Supervisor: Albert Boulanger

This project, the Joint Cohort Explorer, is centered on data engineering for the Thompson
Project, a large-scale initiative to use multiple datasets to move the needle on
Alzheimer’s Disease and now being used for Multiple Sclerosis.

Required/preferred prerequisites and qualifications: The development currently
involves the use of a graph database (Neo4J) along with Django CMS, Django, and Plotly
Dash, WebAssembly, Bootstrap, and React as underlying web technologies. NLP/Text
Processing used in integrating JCE with REDCap.
Data Engineering for Multiple Studies of Alzheimer’s Disease: Joint Cohort Explorer
Information Session Friday Jan 19 1PM-2PM. Zoom link:
https://columbiacuimc.zoom.us/j/98530188967?pwd=RFJHUlhlSCs4NlVJcGZpdkRqd1d3Zz09

Integrated Telehealth After Stroke Care:

Supervisors: Dr. Syeda Imama Ali Naqvi & Albert Boulanger
This project, integrated Telehealth After Stroke Care, is a follow up the study to
https://www.thieme-connect.com/products/ejournals/pdf/10.1055/s-0043-1772679.pdf
and is geared to apply informatics-based approaches to deliver equitable care and
improve wellbeing among minoritized stroke populations with hypertension. The
platform currently consists of a web-based blood pressure telemonitoring database using
Django and wireless devices that push data after every measurement. Data is to be
processed through R shiny to create Clinical Decision Support tools for providers and
participants through the use of visually tailored infographics created through iterative
community-based participation with a human-centered design process.
 
Required/preferred prerequisites and qualifications: The development involves the
use of Python, Django, R Shiny, and an eye for good infographics design.
Integrated Telehealth After Stroke Care Information Session Friday Jan 19 12PM-1PM.
Zoom link: https://columbiacuimc.zoom.us/j/98530188967?pwd=RFJHUlhlSCs4NlVJcGZpdkRqd1d3Zz09

Project in Computer Vision and MR Images:

Supervisors: Dr Korhan Buyukturkoglu & Albert Boulanger.
This initiative focuses on applying computer vision techniques, specifically CNNs and
radiomics to analyze medical imaging data, including high and low magnetic field MR
images. The overall aim is to identify imaging features related to cognitive impairment in
Multiple Sclerosis (MS) and develop predictive biomarkers of cognitive impairment. The
project involves using existing MRI segmentation pipelines to create masks for critical
brain regions (e.g., thalamus and hippocampus) and utilizing radiomics and deep learning
methods to extract clinically relevant features. Following these procedures, machine
learning methods will be implemented to build models predicting cognitive performance
of people with MS in cross-sectional and longitudinal data. Two highly motivated
graduate students will be selected and work with experts and other students in this
project.
Required/preferred prerequisites and qualifications: Prior experience in deep learning
for computer vision and image processing, especially of the brain, including
segmentation, clustering, classification, and regression are highly desired.
Project in Computer Vision and MR Images Information Session will be 10-11AM
Friday the 19th
Zoom link: https://columbiacuimc.zoom.us/j/98530188967?pwd=RFJHUlhlSCs4NlVJcGZpdkRqd1d3Zz09


Faculty/Lab: Prof Steven Feiner & CGUI Lab Research Fair Zoom Session

Brief Research Project Description: The Computer Graphics and User Interfaces Lab (Prof. Feiner, PI) does research in the design of 3D and 2D user interfaces, including augmented reality and virtual reality, and mobile and wearable systems, for people interacting individually and together, indoors and outdoors. We use a range of displays and devices: head-worn, hand-held, and table-top, including Varjo XR-3, HP Reverb G2 Omnicept, Meta Quest Pro/3, Valve Index, HoloLens 2, Magic Leap 2, Xreal Light, Snap Next-Generation Spectacles, 3D Systems Touch, and phones. Multidisciplinary projects potentially involve working with faculty and students in other schools and departments, from medicine and dentistry to earth and environmental sciences and social work.

Required/preferred prerequisites and qualifications: We’re looking for students who have done excellent work in one or more of the following courses or their equivalents elsewhere: COMS W4160 (Computer graphics), COMS W4170 (User interface design), COMS W4172 (3D user interfaces and augmented reality), and COMS E6998 (Topics in VR & AR), and who have software design and development expertise. For those projects involving 3D user interfaces, we’re especially interested in students with Unity experience.

Friday, January 19 1pm-2:30pm

https://columbiauniversity.zoom.us/j/95284715422?pwd=SndIQ3czMnI4NEZuTVQzOXAyaVlJQT09

******

Faculty/Lab: Gagan Khandate (PhD Student), Prof. Matei Ciocarlie:

Brief Research Project Description: The research focus is broadly on learning robotic manipulation with human videos, potentially with a particular focus on dexterous manipulation with multi-fingered robotic hands. This research aims to address a key issue in robotics i.e. the lack of real robot data by leveraging large amounts of human video datasets in conjunction with some real robot data. To this end, we are building an imitation learning framework embodying these principles. Highly motivated students interested in reinforcement learning, imitation learning, and generative models for robotics are encouraged to apply.

Required/preferred prerequisites and qualifications:

  • Strong development experience with Python and PyTorch
  • Experience with reinforcement learning/imitation learning for robot learning
  • Strong communication skills (written and spoken)
  • Demonstrate a resource-full nature and a very strong commitment to project goals

Preferred:

  • Experience with simulators like MuJoCo/IsaacGym.
  • Experience with large-vision language models
  • Experience building/working with large vision datasets
  • Prior research experience

Join Zoom Meeting: Friday, January 19th @ 10:00
https://columbiauniversity.zoom.us/j/98481779603?pwd=RFlhQU1rNmJqWkNncTd0ajBqZGdkZz09

Faculty/Lab: Professor Emily Black (eblack@barnard.edu)
Projects: We have several potential projects you can join! Generally, my domain of expertise is in something called algorithmic fairness. In other words, I work to create methods to determine whether AI systems will cause various types of harm to the public, study the equity impacts of AI systems in high-stakes settings, such as the government, and to connect my own and related research to the legal and policy worlds to help better regulate AI systems. For examples of what that work looks like, see my papers on my website: 

https://emblack.github.io/. The ongoing project I’d like help with are the following:(1) Building AI Pipeline Interventions for Machine Learning FairnessThis project aims to 

understand relationships between choices made throughout the ML pipeline and harmful model behavior, and leverage these insights to create harm-reduction strategies. The experimental approach has two prongs: building a pipeline-testing platform to test the effect of a variety of ML pipeline designchoices on various definitions of undesirable model prediction behavior, and performing a series of IRB-approved studies to understand how pipeline decisions involving humans-in-the-loop impact downstream bias. Subsequently, we will analyze the results from our two-pronged experiments to find reliable patterns of increased or decreased bias resulting from modeling choices in a variety of data contexts. So, to be involved in this project, you should either like coding and working with ML models, or running surveys on how humans work in AI-in-the-loop decision making systems.
(2) Exploring on-the-ground Fairness Debiasing Tools in Regulated DomainsT

he focus of this project is to bridge the gap between legal anti-discrimination requirements and the algorithmic fairness literature by first exploring how regulatory frameworks have influenced the development of technical algorithmic harm prevention on the ground to date. This project will involve conducting interviews with people in financial firms (and potentially others– firms with help businesses with selecting employees, and businesses related to housing as well) about how they think about fairness in the ML/AI systems, and what technical infrastructure they have to build fair systems in their businesses.
(3) Equity in Robotics — Joint with Professor Brian Plancher

We will explore the differences in robot user needs through large scale surveys and interviews with a focus on applications of walking robots for agriculture and household tasks. As mentioned above, we will leverage our existing connections to global networks and also leverage outreach to the wider global robotics community.
Lines of inquiry will include: 
  *Within the same sorts of task–e.g. various types of farming–are the same sub-tasks necessary across robot deployments in various countries? E.g., What if the seeds grown in the US need to be planted deeper than those in Brazil?
* Are there opportunities for robot deployments in countries underrepresented in robot manufacturing for which no robot has been developed? E.g., are there unique ways that robots can help protect endangered species or ecosystems in current robotics deserts?

Next we will work closely with our global collaborators to investigating deployment failures in robots created to perform the same task in different environments. This is motivated by prior work which documented how household objects look extremely different in various countries—a stove in a lower income household in Vietnam does not look the same as a stove in the United States—which can drastically degrade the perform acne of existing computer vision systems. 
To test these failure modes, we will develop our algorithmic approaches and train our robots in our US-based lab, but test its performance on (simulated) sensor information gathered from performing that task across the globe. We will begin by co-designing simulated environments with partners in South and Central America and Malaysia and then move to leveraging data collected by our partners in their local environments. As PI Plancher will already be making trips to visit these partners in Spring and Summer 2024, it will be easy to personally provide partners with additional data collection equipment where needed. Finally, we will close the loop by testing our resulting approaches through future planned visits to the regions by PI Plancher.
(4) Testing Expert Preferences for Debiasing Pretrial Risk Assessments

This project again involves both a coding component and an interview component. Pretrial risk assessments are algorithms that decide whether someone should be jailed between their arrest date and their court date. They have been shown to be biased against Black individuals in the United States. We will be interviewing experts on pretrial risk assessments in their daily life–lawyers, judges, academics who made them, etc—to see how they think the best way would be to debias these systems, and what change they would want to see. Then we will be taking their recommendations on a mock pretrial risk assessment system built with real court/arrest data, and see how well their suggestions work.

Required/preferred prerequisites and qualifications: Proficient in python, familiarity with machine learning tools such as scikit-learn and ideally (but not necessarily) any of Keras/Tensorflow/Pytorch, experience with data cleaning and processing. OR: Experience designing and conducting surveys and interviews for HCI-style papers.

Zoom: Thursday, Jan. 18th 13:00 to 14:00

https://stanford.zoom.us/j/7942372381?pwd=b09QTEN5VklvOWlWTzVmajBzb2VOQT09