



All meetings are at 2:45 in CEPSR 6LE5, unless otherwise noted.
Old meetings are below..

May 4, 2006 
Pannagadatta K Shivaswamy and Tony Jebara:
Permutation Invariant SVMs
(Pannagadatta presenting) 
March 2, 2006 
Yirong Shen, Ng Andrew, Seeger Matthias:
Fast Gaussian Process Regression using KDTrees
(Vlad presenting) 
January 26, 2006 
Koby Crammer and Yoram Singer:
Ultraconservative Online Algorithms for Multiclass Problems
(Risi presenting) 
January 19, 2006 
Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans:
Maximum Margin Clustering
(Pannagadatta presenting) 
January 5, 2006 
(Bert Presenting) 
December 29, 2005 
No group meeting due to holidays. 
December 22, 2005 
Vlad Shchogolev: Using bmatching to improve clustering (Vlad Presenting) 
December 15, 2005 
NIPS Highlights (Andy Presenting) 
December 8, 2005 
JeanPhilippe Vert, Robert Thurman and William Stafford Noble Kernels for Regulatory Regions
(Rui Presenting) 
December 1, 2005 
Brad Schumitch, Sebastian Thrun, Gary Bradski and Kunle Olukotun The InformationForm Data Association Filter
(Tony Presenting) 
November 29, 2005 
Neil Lawrence (University of Sheffield) High Dimensional Probabilistic Modelling through Manifolds 2:303:15 in Interschool Lab
Density modelling in high dimensions is a traditionally very
difficult problem. Approaches such as mixtures of Gaussians typically
fail to capture the structure of data sets in high dimensional spaces.
In this talk we will argue that for many data sets of interest, the data
can be represented as a lower dimensional manifold immersed in the
higher dimensional space. We will then present the Gaussian Process Latent
Variable Model (GPLVM), a nonlinear probabilistic variant of principal
component analysis (PCA) which implicitly assumes that the data lies on
a lower dimensional space.
We will demonstrate the application of the model to a range of data
sets, but with a particular focus on human motion data. We will show
some preliminary work on facial animation and make use of a skeletal
motion capture data set to illustrate differences between our model and
traditional manifold techniques.
BIO: Neil Lawrence received his PhD from Cambridge University in 2000 after
which he spend a year at Microsoft Research, Cambridge. Currently he his
a senior lecturer in the Department of Computer Science, University of
Sheffield, U.K.. His research interests are probabilistic models with a
particular focus on Gaussian processes. He has a particular interest in
applications of these models and recent work has involved him in
computational biology, speech, vision and graphics. See http://www.dcs.shef.ac.uk/~neil

November 17, 2005 
Risi Kondor: Fourier Transformation on nonAbelian Goups 
November 10, 2005 
Risi Kondor: Representations of the Symmetric Group 
November 3, 2005 
Raphael Pelossof,
Particle Filtering Using Dynamic Clustering. (Raphael presenting) 
October 27, 2005 
Chiranjib Bhattacharyya
Second Order Cone Programming Formulations for Feature Selection (Pannagadatta presenting) 
November 12, 2004 
Sanjeev Arora and
Ravi Kannan Learning mixtures of arbitrary gaussians (Darrin presenting) 
November 5, 2005 
Matching via loopy propagation (Bert Presenting) 
October 29, 2004 
Trevor Hastie, Saharon Rosset, Rob Tibshirani and Ji Zhu The Entire Regularization Path for the Support Vector Machine (Andy presenting) 
September 27, 2004 
Yee Whye Teh (U.C. Berkeley):
Hierarchical Dirichlet processes: A Bayesian approach to sharing
clusters among related groups
4pm
We consider problems involving groups of data, where each observation
within a group is drawn from a mixture model, and where it is
desirable to share mixture components both within and between groups.
We assume that the number of mixture components is unknown a priori
and is to be inferred from data. Such problems occur often in
practice, e.g. in the problem of topic discovery in document corpora,
each document is treated as a group of data items (bag of words),
and each topic corresponds to a mixture component. In this setting,
sharing components between groups simply means that topics can occur
across a number of documents, allowing dependencies across documents
(groups) to be modeled effectively as well as conferring generalization
to new documents (groups).
The hierarchical Dirichlet process (HDP) is a Bayesian solution to
this problem, utilizing both hierarchical and nonparametric modeling
ideas. Each group is modeled using a Dirichlet process (DP) mixture,
which provides a nonparametric prior for the number of components
within each group. To facilitate sharing components between groups,
we consider a hierarchical extension where the common base distribution
for the DPs is itself distributed according to a DP. Such a base
distribution being discrete, the group specific DPs necessarily
share atoms, hence the mixtures for different groups necessarily
share components.
We discuss a variety of representations for the HDP, as well as
Markov chain Monte Carlo algorithms for posterior inference. We
report experimental results on three text corpora showing the
effective and superior performance of the HDP over previous models.
Technical Report: Hierarchical Dirichlet processes. Teh, Jordan, Beal
and Blei (2004). UC Berkeley Department of Statistics, TR 653. Can be
obtained at:
http://www.cs.berkeley.edu/~ywteh/research/npbayes

September 2, 2004 
Matthias Seeger (U.C. Berkeley):
Sparse Multi Gaussian Process Methods:
Multiway Classification and beyond
3pm4pm
While supervised kernel techniques involving a single latent function (or
discriminant) are well understood and powerful methods have emerged,
much less is known about models which combine a number of latent
functions. We describe a generic way of generalizing the sparse Bayesian
Gaussian process Informative Vector Machine (IVM) to such multi process
models, emphasizing the key techniques which are required for an
efficient solution (exploiting matrix structure, numerical quadrature).
We apply our method to the multiway classification problem, obtaining
a scheme which scales essentially linearly in the number of datapoints
and classes. We show how kernel parameters can be learned by empirical
Bayesian techniques. We argue that a good solution for the multiclass
problem leads to schemes for larger structured graphical models such as
conditional random fields.
Joint work with Michael Jordan.

July 20, 2004 
Pat Langley (Center for the Study of Language and Information, Stanford University):
Computational Discovery of Explanatory Process Models
11:00am CS Conference Room
The growing amount of scientific data has led to the increased use
of computational discovery methods to understand and interpret them.
However, most work has relied on knowledgelean techniques like
clustering and classification learning, which produce descriptive
rather than explanatory models, and it has utilized formalisms
developed in AI or statistics, so that results seldom make contact
with current theories or scientific notations. In this talk, I present
an approach to computational discovery that encodes explanatory
scientific models as sets of quantitative processes, simulates these
models' behavior over time, incorporates background knowledge to
constrain model construction, and induces the models from timeseries
data. I illustrate this framework on data and models from Earth
science and microbiology, two domains in which explanatory process
accounts occur frequently. In closing, I describe our progress toward
an interactive software environment for the construction, evaluation,
and revision of such explanatory scientific models.
This talk describes joint work with Kevin Arrigo, Nima Asgharbeygi,
Stephen Bay, Andrew Pohorille, and Jeff Shrager.
http://cll.stanford.edu/~langley/

July 2, 2004 
Risi: Highlights of the Barcelona conference on
the Mathematical
Foundations of Learning Theory 
June 21, 2004 
Gokhan BakIr (Tuebingen):
Breaking SVM Complexity with CrossTraining
11:00am Machine Learning Lab
We propose an algorithm for selectively removing examples from the training set using probabilistic estimates related to editing algorithms (Devijver and Kittler, 1982). The procedure creates a separable distribution of training examples with minimal impact on the decision boundary position. It breaks the linear dependency between the number of SVs and the number of training examples, and sharply reduces the complexity if SVMs during both the training and prediction stages.

June 18, 2004 
Thomas Blumensath (University of London):
Bayesian Music Transcription
2:30pm Machine Learning Lab
German Creamer (Columbia University):
Automated trading strategies using the boosting approach,
technical indicators and constant rebalanced portfolio
2:45pm Machine Learning Lab

April 30, 2004 
Tony:
Orbit Learning via Convex Optimization
ML Lab, 2:30pm
Snowbird workshop

April 16, 2004 
At 2:30pm, we will be continuing with the Chinese Restaurant Process and Dirichlet Process papers with Risi leading the discussion from April 2nd.

April 13, 2004 
Risi:
Multifacet Learning in Hilbert Spaces
ML Lab, 5pm
Slides from Snowbird workshop:
[ps]

April 16, 2004 
Deb Roy (MIT Media Laboratory):
Meaning Machines
11:00am Interschool Lab
Computers don't grasp meaning in any deep, human sense. From a
computer's point of view, words are meaningless bits of information to
be processed with speed and precision, but without any genuine
interpretation of content. We envision a new class of machines that
understand the meaning of information in more humanlike ways by
grounding knowledge in the physical world and in the machines' own
goals. Towards this long term goal, our group develops conversational
robots and other situated systems designed to communicate with human
partners using natural language. In this talk I will highlight recent
experimental results and applications in multimodal humanmachine
interaction. I will also discuss an emerging theoretical framework for
connecting language, action, and perception which has relevance both
for designing situated systems and for cognitive modeling.
Professor Roy's and the Cognitive Machines Group Web Page are at:
http://www.media.mit.edu/cogmac

April 9, 2004 
Sayan Mukherjee (MIT):
Gene Set Enrichment Analysis
11:00am Interschool Lab
The selection and analysis of differentially expressed gene profiles
(markers) helps associate a biological phenotype with its underlying
molecular mechanisms and provides valuable insights into the structure
of pathways and cellular regulation. However, analyzing and interpreting
a given list of gene markers to glean useful biological insights can be
extremely challenging. This is in part due to the difficulty of
objectively evaluating how well members of a given a pathway or
functional class of interest (Gene Set) are represented in the markers
list. To address this problem we introduce a statistical methodology
called Gene Set Enrichment Analysis (GSEA) for determining whether a
given Gene Set is overrepresented or enriched in a Gene List of markers
ordered by their correlation with a phenotype or class distinction of
interest. The method is based upon a score computed as the maximum
deviation of a random walk (in the same spirit as the KolmogorovSmirnov
statistic) and uses permutation testing to assess significance. When
multiple Gene Sets are tested simultaneously we propose two approaches
to address the multiplicity: Validation GSEA which controls the
Familywise error rate (FWER) and Discovery GSEA which controls the False
Discovery rate (FDR). The utility of this procedure will be illustrated
on two biological problems: validating a mouse model of lung cancer and
finding chromosomal dislocations for myeloid leukemia.

April 2, 2004 
Blei, Griffiths, Jordan and Tenenbaum: Hierarchical Topic Models and the Nested Chinese Restaurant Process.
(Risi presenting) Slides:
[ps]
[pdf]

March 25, 2004 
Naftali Tishby (The Hebrew University):
Efficient data representations that preserve relevant information 
A new look at Shannon's Information Theory
11:00am Interschool Lab
In this talk I will take a new look at Shannon's Information theory from the
Machine Learning perspective . I will argue that Shannon's theory provides a
compelling mechanism for quantifying the fundamental tradeoff between
complexity and accuracy, by unifying the source and channel coding theorems
into one principle which I call the "Information Bottleneck" (IB).
This unified view of the coding theorems can shed new
light on the question of "relevant data representation" and suggests new
algorithms for extracting such representations from cooccurrence
statistics. It also provides new ways of thinking about neural coding and
neural data analysis. When applied to the analysis of human language it
reveals new  possibly universal  scaling law that may reflect the way
words are acquired in natural languages. The IB principle has new
interesting extensions that deal with multivariate data using Graphical
models and multiinformation; allow adding irrelevant side information; and
extract nonlinear continuous dimension reduction that preserve information
(SDR). I will describe some of those extensions as time allows.
More information and many related papers can be found at:
http://www.cs.huji.ac.il/labs/learning/Papers/IBM_list.html

March 12, 2004 
Brendan J. Frey: A Principled and Computationally Efficient Approach to Visual and Auditory Scene Analysis
11:00am Interschool Lab
Scene analysis is currently the most exciting problem to work on in sensory processing. Scene analysis entails decomposing sensory inputs into a combinatorial explanation that can be used to predict new inputs. The problem is old and can be traced back to Helmholtz and more fuzzily to the ancient Greeks (who thought that fire, etc, was the combinatorial explanation). However, only recently do we have 1) Efficient algorithms that turn exponentialtime combinatorial inference and learning algorithms into "linear time" approximate algorithms; and 2) Fast computers that can process sensory data repeatedly and quickly enough that grad students stick around to look at the results. In this talk, I'll describe research on visual and auditory scene analysis, being done in my group at the University of Toronto (PSILab), using graphical models and efficient, approximate inference algorithms. The focus will be on the generative modeling approach and why this approach holds the most promise for "solving" the problem of sensory scene analysis.
For serious reading, check out the following tutorial paper: http://www.psi.toronto.edu/~frey/stuff/tutorial.ps.gz
For fun videos, audio clips, gene expression array results, etc, see the web pages of Nebojsa Jojic, my postdocs (Romer Rosales, Quaid Morris) and my current students (Kannan Achan, Anitha Kannan, Chris Pal).

March 5, 2004 
John Langford:
The Method of Reduction in Machine Learning
Reductions transform a solver for one domain into a solver
for another domain. I will discuss a general framework for analyzing
reductions between learning domains which captures Boosting, ECOC, and
other well known algorithms. I will also present new algorithmic
reductions and empirical tests of their performance.

February 27, 2004 
Yoav Freund, Robert E. Schapire, Yoram Singer, Manfred K. Warmuth: Using and combining predictors that specialize. (Ray presenting) 
February 6, 2004 
Denis V. Chigirev, William S. Bialek: Optimal Manifold Representation of Data: An Information Theoretic Approach. (Andy presenting) 
January 30, 2004 
Vishy (SVN Vishwanathan):
2:30pm in CEPSR 6LE5
We present Hilbert space embeddings of dynamical systems
and embeddings generated via dynamical systems. This is
achieved by
following the behavioural framework invented by Willems, namely by
comparing
trajectories of states. As important special cases we recover the
diffusion
kernels of Kondor and Lafferty, generalised versions of directed graph
kernels of Gartner, novel kernels on matrices and new similarity
measures
on Markov Models. We show applications of our method to Dynamical
texture
recognition problems from computer vision.

January 27, 2004 
Eleazar Eskin: The Homology Kernel: A Biological Motivated Sequence Embedding
4:30pm in CS Lounge
Many recent techniques in learning over biological sequences
implicitly embed sequences into a Euclidean space in order to take
advantage of strong margin based learning algorithms. However, these
embeddings often do not take advantage of the rich biological
intuitions that have motivated development of Hidden Markov Model
style biological sequence models and have lead to great successes in
computational biology.
In this talk, we present a new biological motivated sequence
embedding. We discuss several of the formal properties of the
embedding include its connection to local sequence alignment. One of
the key features of the embedding is that a sequence is embedded along
with its homologues or neighboring sequences. The distance between
two sequences is defined by the distance between close neighbors of
the sequences. We demonstrate application of the embedding to several
applications. We apply the embedding to learning protein secondary
structure and protein family classification. We also show how the
embedding can be used for aligning two sequences based on their
homologues.
Finally we discuss how due to the properties of the embedding, we can
efficiently compute the nearest neighbors.
Joint work with Sagi Snir.

December 19, 2003 
Casual meeting to discuss plans for ICPR, JMLR, ICML, UAI, COLT, and Snowbird. 
December 15, 2003 
Andrew Gelman: Bayesian data analysis: What it is and what it isn't
11am in Interschool Lab
Bayesian inference is often associated with the concept of subjective
probability and prior distributions. We have a different attitude, in
which inference proceeds from a posited (not "elicited") probability
model, and then the implications of these inferences are compared to
existing data, new data, and other subjectmatter knowledge. The
resulting approach to statistics places the flexibility of complex
probability modeling in a scientific context of falsifiable
hypotheses. In practice, it means that we can solve lots of problems
without agonizing over philosophical questions of subjectivity. We
illustrate with several examples in social science and public health.

December 5, 2003 
Zhou, Bousquet, Lal, Weston and Scholkopf: Learning with Local and Global Consistency. (Darrin presenting) 
November 21, 2003 
M. Hein and O. Bousquet: Maximal Margin Classification in Metric Spaces. (Risi presenting) 
November 7, 2003 
J. Lafferty, A. McCallum, F. Pereira: Conditional Random Fields. (Deep presenting) 
October 31, 2003 
B. Taskar, C. Guestrin and D. Koller: Max Margin Markov Networks. 
October 10, 2003 
T. Jebara:
Latent Entropy Discrimination. 
September 26, 2003 
O. Bousquet:
New Approaches to Statistical Learning Theory. (Risi presenting) 
September 19, 2003 
Brief hello meeting to hear about everyone's work and progress.

September 5, 2003 
Yoav Freund: How to be a Bayesian without believing
At the Probability Seminar, 11am in Mathematics Building (Room 520)
Most of the study of Bayesian prediction procedures is premised on some
strong assumptions regarding the
prior distribution. In particular, an assumption that always needs to
be made is that the prior distribution
assigns a nonzero probability to the correct model. In practice,
however, we often have to restrict the set of
models (in the support of the prior) in order to make it feasable to
compute the posterior average. As a
result, we often can't assume that the correct model is in our set and
the standard Bayesian theory cannot be
applied.
In this work we show a classification procedure which uses model
averaging and can be interpreted as a
Bayesian procedure. We show that this procedure has some desirable
properties when it is applied to *any*
source of IID examples, regardless of whether or not the source
distribution is related to the models in the
support of the prior. Our main result is that the predictions made by
this procedure are very stable with
regard to the choice of the random training set.
This stability property has some farreaching implications on a variety
of issues, including BiasVariance
decomposition of classification error, the "curse of dimensionality"
and Bagging.
This is joint work with Yishay Mansour and Rob Schapire

August 29, 2003 
Alex Smola: Exponential Families in Feature Space
12:45pm in CS Conference Room

August 15, 2003 
Y. Altun, I. Tsochantaridis, and T. Hofmann:
Hidden Markov Support Vector Machines. (ICML 2003) 
August 8, 2003 
Paolo Frasconi: Learning Structured Data: Theory and Applications
An informal talk but should be very interesting!

July 18, 2003 
Nebojsa Jojic: Epitomic analysis of appearance and shape
11am in Interschool Lab
We present novel simple appearance and shape models that we call
epitomes. The epitome of an image is its miniature, condensed version
containing the essence of the textural and shape properties of the
image. As opposed to previously used simple image models, such as
templates or basis functions, the size of the epitome is considerably
smaller than the size of the image or object it represents, but the
epitome still contains most constitutive elements needed to reconstruct
the image. A collection of images often shares an epitome, e.g., when
images are a few consecutive frames from a video sequence, or when they
are photographs of similar objects. A particular image in a collection
is defined by its epitome and a smooth mapping from the epitome to the
image pixels. When the epitomic representation is used within a
hierarchical generative model, appropriate inference algorithms can be
derived to extract epitome from a single image or a collection of images
and at the same time perform various inference tasks, such as image
segmentation, motion estimation, object removal and superresolution. We
have also had some preliminary success in other domains (audio waveforms
and MID files, for example).
Joint work with Brendan Frey and Anitha Kannan.

July 11, 2003 
John Langford: Bounds for Sale
Constructing tight bounds on the future error rate of a
classifier has historically been a rather difficult task. This task
has been accomplished for support vector machines. I'll discuss the
intuitions behind a couple bounds, and presents some empirical
results. With bounds this tight, direct bound optimization algorithms
are motivated, for which I'll also present results.
Related Papers:
PACBayes and Margins
Relating Data Compression and Learnability

July 1, 2003 
Chris Pal: Probabilistic Montage Models of Images
In this talk I'll present a new class of image models we call
Probabilistic Montages. Probabilistic Montages represent an image using a
grid of smaller subimages. Each subimage is then modeled as arising
from a transformed and cropped version of one of a collection of possible
slightly larger images. We describe the probability distribution relating
the way in which subimages are cropped and transformed with respect to
the other locations on the grid using various graphical models. In this
talk I'll illustrate a number of variations of the probabilistic montage.
I'll present an EM algorithm for estimating parameters in Bayesian tree
structured montages and show how probabilistic montages are applicable
to various classification and segmentation problems. In particular, I will
illustrate the use of a tree structured model for recognition tasks
applied to full body images of people in motion.
Joint work with Brendan Frey and Nebojsa Jojic
ECCV Paper Learning Montages of Transformed Latent Images...

June 20, 2003 
G. Lebanon:
Learning Riemannian Metrics

June 13, 2003 
K. Barnard, P. Duygulu, N. de Freitas, D. A. Forsyth, D. M. Blei, and M. I. Jordan:
Matching words and pictures (presented by Andy)

May 9, 2003 
O. Bousquet and A. Elisseeff:
Algorithmic Stability and Generalization Performance
(presented by Risi) 
April 25, 2003 
T. Jebara and A. Howard:
Dynamical Systems Trees for Communities of Interacting Processes
(presented by Andy) 
April 11, 2003 
M. Belkin and P. Niyogi:
Using Manifold Structure for Partially Labelled Classification
(presented by Darrin) 
April 4, 2003 
F. Cucker and S. Smale: On the Mathematical Foundations of Learning
(presented by Risi) 
March 14, 2003 
T. Jebara:
Alternating Projection for Independent Component Analysis

March 7, 2003 
K. Bennett and A. Demiriz:
SemiSupervised Support Vector Machines;
T. Joachims:
Transductive Inference for Text Classification using Support Vector Machines
(presented by Darrin)

February 24, 2003 
Balazs Szendroi:
Power laws and selfsimilarity in large networks: a case study
Using a social network consisting of 10^4 human individuals as a
motivating example, I will recall some generic features occuring in
various classes of large networks such as "small world" properties and
power laws. Going beyond aggregated measures, I will present a somewhat
mysterious representation of the selfsimilar architecture of the network
based on its adjacency spectrum. A model reproducing both aggregated and
selfsimilar features of the network will also be discussed. I will
conclude by showing that such selfsimilarity is present in several,
though not all, realworld networks.
Joint work with Gabor Csanyi
Some background articles:
(Small World)

February 14, 2003 
Gunter Ratsch, Alexander Smola, and Sebastian Mika:
Adapting Codes und Embeddings for Polychotomies, NIPS 2002
(Katherine)

February 7, 2003 
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell:
Text Classification from Labeled and Unlabeled Documents using EM
(Darrin)

January 31, 2003 
E. Segal, Y. Barash, I. Simon, N. Friedman and D. Koller:
From Promoter Sequence to Expression: A Probabilistic Framework
(Omar) 
January 22, 2003 
John Langford and John ShaweTaylor:
PACBayes And Margins (NIPS 2002) (Darrin);
Michael Tipping, Anita Faul:
Fast Marginal Likelihood Maximisation for Sparse Bayesian Models.
(AISTAT 2003) 
December 4, 2002 
John Lafferty and Guy Lebanon:
Information Diffusion Kernels (Risi);
Gal Chechik and Naftali Tishby:
Extracting relevant structures with side information (Anshul)

November 27, 2002 
Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell:
Distance Metric Learning, with application to Clustering with
sideinformation (Risi) 
November 20, 2002 
We will be preparing for the ACM research fair on Friday, sorting out
presentations, etc.:
Details and schedule 
November 13, 2002 
V. Pavlovik, J. Rehg and J. MacCormick:
Learning Switching Linear Models of Human Motion. (NIPS 2000) (Andy) 
October 30, 2002 
We will wrap up the Sufficent Dimensionality paper and look at the following related paper:
D. D. Lee and H.S. Seung: Algorithms for Nonnegative Matrix Factorization.
(NIPS 2000) (Risi) 
October 23, 2002 
A. Globerson and N. Tishby:
Sufficient Dimensionality Reduction. 
October 16, 2002 
N. Tishby, F. Pereira and W. Bialek:
The Information Bottleneck Method. (Anshul) 
October 9, 2002 
We will continue with:
F. R. Bach and M. I. Jordan: Kernel independent component analysis. 
October 2, 2002 
F. R. Bach and M. I. Jordan:
Kernel independent component analysis. 
September 25, 2002 
Paul Viola and Michael Jones:
Rapid Object Detection using a Boosted Cascade of Simple Features
Paul Viola and Michael Jones:
Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade 


