Data, Algorithms and Problems on Graphs

picture from http://gsas.columbia.edu/sites/default/files/slides/GSAS-Slide01.jpg

DAPG@Columbia

We welcome you to participate in the DARPA GRAPHS/SIMPLEX Workshop:
Data, Algorithms and Problems on Graphs.
To be held on September 28th, 2015
at Columbia University, New York, NY.
in CEPSR Davis Auditorium, 4th Floor

Time	Speaker	Title	Links
08:30-09:00	Registration, coffee and bagels
09:00-09:05	Tony Jebara	Welcome and opening remarks
Welcome and opening remarks.
09:05-12:15	Invited and contributed talks
09:05-09:50	Duncan Watts	An experimental study of collective self-organization in crisis mapping	[abstract] [slides]	[video]
A central problem in organization science is how groups of people coordinate to solve common tasks or sets of tasks. Of particular interest are "complex" tasks, meaning roughly tasks that (a) are too large for any one person, (b) are composed of distinct subtasks with varied requirements, and (c) exhibit some degree of interdependency among the subtasks. Traditionally organizational problem solving is studied using observational approaches -for example, comparative studies of successful vs. unsuccessful firms, or single-firm case studies- however, understanding the causal relationship between collective organization and collective problem solving requires controlled experimentation. In this talk I describe a series of web-based "virtual lab" experiments in which groups of workers of various sizes -ranging from 1 to 32- self-organize to solve a realistic crisis-mapping problem. Specifically, the groups are given one hour to create an annotated map of crisis-related events based on 1600 social media reports that were generated during Typhoon Pablo, which hit the Philippines in Dec 2012. As I will argue, crisis mapping is a relatively simple yet still realistic task, hence it serves as a useful "model task" for studying the relationship between organization and collective problem solving. Crisis mapping is also very much a real-world problem, and hence our research can also be viewed as making a practical contribution to the field of crisis mapping itself.
09:50-10:15	Uygar Sumbul, Suraj Keshri, Min-hwan Oh, Dawen Cai, John Cunningham, Liam Paninski	Semi-supervised segmentation of neurons from brainbow images	[abstract] [paper][slides]	[video]
Understanding the organization of brain circuits is a basic goal in neuroscience. However, reconstructing neuroanatomy at the level of individual cells and on a large scale remains a challenge. Imaging neural tissues with electron microscopy and the ensuing reconstruction of the circuits have been used successfully on small tissues. Yet, this resource-intensive method faces scaling challenges in the imaging, storage and reconstruction steps.
10:15-10:40	Angela Wilkins, Scott Spangler, Andreas Lisewski, Shivas Amin, Olivier Lichtarge	Discovery of protein functions and interactions from structures sequences and text in enzymes, malaria and cancer	[abstract] [paper]	[video]
We present four complementary studies that show the central role of network representations in molecular biology. Whether the information conveyed by edges that connect protein nodes represents (1) structural and evolutionary mimicry, (2) high throughput experimental data on interaction and homology, (3) contextual word similarities in papers, or (4) other biological relationships mined from textV network analysis using a diversity of algorithm can pass information from areas richly annotated to areas that are less so. These applications span enzymology, malaria and cancer, and all are supported by experimental discoveries guided by network predictions. Together these studies suggest that network views of biological processes are fundamental tools with the power to integrate high throughput experimental biology (BIG DATA) with the entire corpus of the biomedical literature (BIG LITERATURE) in order to guide discoveries through automated hypotheses generation.
10:40-11:05	Agostino Capponi	Robust performance analysis of complex network infrastructures	[abstract] [paper][slides]	[video]
We develop a multifaceted framework for performance analysis of complex network infrastructures. In contrast to using traditional one-dimensional aggregate performance measurements on the state of the network, the proposed framework allows comparing networks simultaneously through the rich class of Schur-convex functions (e.g. Arnold et al. (2011)). Such a class includes those which are increasing, symmetric, and jointly convex, such as the maximum, worst-case, average and aggregate performance.
11:05-11:30	Yuxiao Dong, Jing Zhang, Jie Tang, Nitesh Chawla, Bai Wang	CoupledLP: Link prediction in coupled networks	[abstract] [paper][slides]	[video]
In this paper, we study the link prediction problem in an interesting new setting: coupled networks, where we have two networks: one source network GS and one target network GT . Basically, we have structure information of the source network GS and interactions GC between the two networks, but do not have any structure information for the target network. The objective of link prediction here is to predict the existence of links in the target network GT .
11:30-11:55	Ekaterina Taralova, Tony Jebara, Rafael Yuste	Functional models of mouse visual cortex	[abstract] [paper][slides]	[video]
We develop machine learning methods to decipher the neural code that links perception with the firing of neurons in the cerebral cortex. We deploy Bayesian graphical models for holistic neuron data analysis to provide a framework for understanding the neural circuitry, which is unattainable with current single neuron methods. The proposed models are data-driven and capture the probabilistic conditional dependencies between the neural activity and the visual stimuli.
11:55-12:15	David Hallac, Jure Leskovec, Stephen Boyd	Network lasso	[abstract] [paper] [slides]	[video]
Convex optimization is an essential tool for modern data analysis, as it provides a framework to formulate and solve many problems in machine learning and data mining. However, general convex optimization solvers do not scale well, and scalable solvers are often specialized to only work on a narrow class of problems. Therefore, there is a need for simple, scalable algorithms that can solve many common optimization problems. We introduce the network lasso, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs. We develop an algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in a distributed and scalable manner, which allows for guaranteed global convergence even on large graphs. We then demonstrate that many types of problems can be expressed in our framework. We focus on three particular in binary classification, predicting housing prices, and event detection in time data series comparing the network lasso to baseline approaches and showing that it is both a fast and accurate method of solving large optimization problems.
12:15-13:15	Lunch break (lunch provided)
13:15-16:00	Invited and contributed talks
13:15-14:00	Alireza Tahbaz-Salehi	Supply chain disruptions: Evidence from the Great East Japan Earthquake	[abstract] [slides]	[video]
This talk examines whether propagation of idiosyncratic, firm-level shocks through input-output linkages can lead to sizable fluctuations at the aggregate level. Using a large-scale dataset on supply chain linkages among Japanese firms together with information on firm-level exposures to a large, but localized, natural-disasterthe Great East Japan Earthquake 2011 inwe quantify the s impact on firms that were (directly or indirectly) linked to affected firms. We find that having a supplier in the earthquake-hit region led to a 3% loss in terms of sales growth compared to firms with no such suppliers. We also find evidence for smaller but nevertheless significant upstream propagation from affected firms to their suppliers. Furthermore, we show that these losses do not remain confined to the disrupted immediate customers and suppliers. Rather, firms that were only indirectly related to the firms in the affected areas (such as their customers) were also negatively impacted. Even though our results suggest that such cascade effects decay with supply chain distance, the number of firms affected is large enough for this localized disruption to have a meaningful macroeconomic impact: the propagation of the earthquake shock over input-output linkages led to a 1% drop in s aggregate output in the year following the earthquake.
14:00-14:45	Anima Anandkumar	Tensor methods: A new paradigm for training probabilistic models and neural networks	[abstract] [slides]	[video]
Tensors are rich structures for modeling complex higher order relationships in data rich domains such as social networks, computer vision, internet of things, and so on. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates for many probabilistic models such as topic models, community models, hidden Markov models, and so on. I will also demonstrate how tensor methods can yield rich discriminative features for classification tasks and provides a guaranteed method for training neural networks.
14:45-13:10	Jelena Stojanovic, Djordje Gligorijevic, Milos Jovanovic, Zoran Obradovic	Structured regression on partially observed evolving graphs with uncertainty propagation	[abstract] [paper]	[video]
Conditional probabilistic graphical models provide a powerful framework for structured regression in spatio-temporal datasets with complex correlation patterns. However, in real-life applications a large fraction of observations is often missing, which can severely limit the representational power of these models. We have proposed a Marginalized Gaussian Conditional Random Fields (m-GCRF) structured regression model for dealing with missing labels in partially observed temporal attributed graphs. This method is aimed at learning with both labeled and unlabeled parts and effectively predicting future values in a graph. The method is even capable of learning from nodes for which the response variable is never observed in history, which poses problems for many state-of-the-art models that can handle missing data.
15:10-16:00	POSTER PRESENTERS	SPOTLIGHT TALKS	[abstract] [slides]	[video]
Selected posters will be described briefly in a brief oral presentation with a single slide to highlight the key takeaways from the posters in the subsequent Poster Session.
16:00-18:00	Poster session and coffee break
	Greg Henselman, Robert Ghrist	A novel algorithm for topological persistence, with application to neuroscience	[abstract] [paper]
A recent advance in computational homology gives an order-of-magnitude improvement in applications to neural coding analysis.
	Andrew Quitadamo, Benika Hall, Xinghua Shi	An integrative approach to constructing microRNA-gene networks in ovarian cancer	[abstract] [paper]
Network integration is critical in understanding the underlying mechanisms of human health and diseases. Changes in miRNA and mRNA expression are known to be involved in both ovarian cancer development and progression. Pin-pointing the exact changes and the relationships that occur between them could lead to advances in how ovarian cancer is treated and diagnosed. Creating an integrated network involving eQTLs, miRNA targets, protein-protein interactions and correlation graphs is one way to explore these relationships. Integrating multiple data sources can thus allow us to create a wider and more holistic view of the interactions in ovarian cancer. Therefore, we developed a new method of constructing an integrated network by combining the strength of association study and network analysis. Applied to ovarian cancer, our integrated analysis replicated known cancer related miRNAs and genes, in addition to providing new candidate markers.
	J.C. Smart, Joshua Ripple	Application of a system engineering structure for private information sharing and graph theoretic analysis of private health information across public health jurisdictions	[abstract] [paper]
Movement of people across jurisdictions complicates maintenance of accurate records on individuals for informing public health policy decisions and improving care. Contextual issues hinder the tracking of individuals across local, state, and federal jurisdictions by limiting sharing of private information across jurisdictions. To address this problem, we developed and implemented a 'black box' to analyze health records across a small subset of public health jurisdictions in the United States.
	Erdem Koyuncu, Hamid Jafarkhani	Local construction of bounded-degree network topologies using only incidence information	[abstract] [paper]
We consider ad-hoc networks consisting of n wireless nodes that are located on R2. Any two given nodes are called neighbors if they are located within a certain distance from one another. A given node can be directly connected to any one of its neighbors and picks its connections according to a unique topology control algorithm that is available at every node. Given that each node knows only the indices of its one- and two-hop neighbors, we identify an algorithm that preserves connectivity and can operate without the need of any synchronization among nodes. Moreover, the algorithm results in a sparse graph with at most 5n edges and a maximum node degree of 10. Existing algorithms with the same promises further require neighbor distance and/or direction information at each node.
	Diana Palsetia, William Hendrix, Ankit Agrawal, Wei-keng Liao, Alok Choudhary	Parallel distributed-memory based community detection for large graphs	[abstract] [paper]
Community detection is a well-studied problem in graph data analytics. As graph sizes have increased, more attention is turning to parallel techniques. In general, graph algorithms may be parallelized by dividing the data or by dividing the workload. One of the main challenges in designing a parallel algorithm is partitioning the data. Since the data access pattern in graph algorithms is often irregular and highly dependent on the network structure, a poor partitioning scheme can cause high communication cost and severely affect the quality of clustering. Here we describe our ongoing work on developing a distributed-memory based parallel algorithm for community detection. The algorithm adopts a data-based decomposition strategy with duplication, which is expected to achieve good scalability without sacrificing cluster quality.
	Jian Xu, Thanuka Wickramarathne, Nitesh Chawla	Representing higher order dependencies in networks	[abstract] [paper]
A network, with nodes representing entities and edges the connections between entities, is a representation of data or events. The data does not necessarily come in a pre-defined network structure, but rather the network is a representation generated from the data as sequences of recorded events (e.g., trajectories of vehicles, retweets of a message, streams of web clicks, etc.). Assuming that network analysis is the right framework for analyzing such data, it begets the question: how to construct network representations from data, such that the underlying phenomena in data are correctly captured?
	Michael Robinson	Simplicial complex sampling in inference using exact sequences	[abstract] [paper]
A space of signals on a graph can be represented using an abstract mathematical construction called a sheaf - a general tool that allows the creation of topological filters. If only partial information about a graph signal is present - yielding a sampling problem analogous to the Shannon-Nyquist theorem - then an exact sequence of sheaves provides detailed information about how this information can be extrapolated. This general framework applies to many kinds of information integration problems and leads directly to computable invariants about graph signal processing filters.
	Kui Tang, Henrique Gubert, Rashmi Tonge, Anyi Wang, Liang Wu, Dwayne Campbell, Chris Kedzie, Liao Wang, Andelyn Russell, Anthony Kimball, Anju Kambadur, Gideon Mann, Stefano Pacifico, James Hodson, David Yao,Kathleen McKeown, Tony Jebara	Learning a graphical model of Bloomberg financial and news data	[abstract] [paper]
We build a Bayesian network that models interactions between heterogeneous data sources including news feeds, social media and financial indices. We also propose a method for temporal adjustment using conjugate priors. We use this network to do inference about the different variables of the model under stress conditions.

DAPG2015

Site menu:

DAPG@Columbia

NEWS

Additional Links

Sponsors