Neil's homepage  |   research group  |   MLO Researchers  |  
  University Logo
Machine Learning Seminars

2010

February 2010

Monday, 22 Feb

14:00-15:00 , Charles Fox, Adaptive Behaviour Research Group, University of Sheffield
Venue: 2.19, Kilburn Building

Monday, 08 Feb

14:00-15:00 Statistical Alignment and Footprinting, Jotun Hein, Department of Statistics, University of Oxford
Venue: 2.19, Kilburn Building

Abstract

Although bioinformatics perceived is a new discipline, certain parts have a long history and could be viewed as classical bioinformatics. For example, application of string comparison algorithms to sequence alignment has a history spanning the last three decades, beginning with the pioneering paper by Needleman and Wunch, 1970. They used dynamic programming to maximize a similarity score based on a cost of insertion-deletions and a score function on matched amino acids. The principle of choosing solutions by minimizing the amount of evolution is also called parsimony and has been widespread in phylogenetic analysis even if there is no alignment problem. This situation is likely to change significantly in the coming years. After a pioneering paper by Bishop and Thompson (1986) that introduced and approximated likelihood calculation, Thorne, Kishino and Felsenstein (1991) proposed a well defined time reversible Markov model for insertion and deletions (the TKF91-model), that allowed a proper statistical analysis for two sequences. Such an analysis can be used to provide maximum likelihood (pairwise) sequence alignments, or to estimate the evolutionary distance between two sequences. Steel et al. (2001) generalized this to any number of sequences related by a star tree. This was subsequently generalized further to any phylogeny and more practical methods based on MCMC has been developed. We have developed this into a generally available program package.

Traditional alignment-based phylogenetic footprinting approaches make predictions on the basis of a single assumed alignment. The predictions are therefore highly sensitive to alignment errors or regions of alignment uncertainty. Alternatively, statistical alignment methods provide a framework for performing phylogenetic analyses by examining a distribution of alignments. We developed a novel algorithm for predicting functional elements by combining statistical alignment and phylogenetic footprinting (SAPF). SAPF simultaneously performs both alignment and annotation by combining phylogenetic footprinting techniques with an hidden Markov model (HMM) transducer-based multiple alignment model, and can analyze sequence data from multiple sequences. We assessed SAPF's predictive performance on two simulated datasets and three well-annotated cis-regulatory modules from newly sequenced Drosophila genomes. The results demonstrate that removing the traditional dependence on a single alignment can significantly augment the predictive performance, especially when there is uncertainty in the alignment of functional regions. The transducer-based version of SAPF is currently able to analyze data from up to five sequences. We are currently developing an MCMC approach that we hope will be capable of analyzing data from 12-16 species, enabling the user to input sequence data from all 12 recently sequenced Drosophila genomes. We will present initial results from the MCMC version of SAPF and discuss some of the challenges and difficulties affecting the speed of convergence.

January 2010

Tuesday, 26 Jan

14:00-15:00 Some Issues in Robust Bayesian Inference for Functional Genomics, Chris Holmes, Department of Statistics, University of Oxford
Venue: 2.15, Kilburn Building

Abstract

Experiments in functional genomics typically produce highly structured data sets, with thousands of measurements on tens to hundreds of subjects. The nature of the assays and the sheer number of measurements taken leaves analysis of such data prone to influence by outliers that arise say from bad samples or bad measurement probes. This influence is especially problematic within high-throughput discovery driven studies where a priori we lack well defined hypotheses. In these scenarios, semi-automated robust Bayesian methods provide an attractive inferential framework. We will discuss our experience in developing robust Bayesian ANOVA methodology for the analysis of complex genomic data sets and show these lead to substantial gains in inference and resulting findings.

2009

November 2009

Monday, 23 Nov

14:00-15:00 Filtering and Smoothing in Gaussian Process Dynamic Systems, Marc Deisenroth, Universität Karlsruhe (TH), Germany and University of Cambridge, U.K. [slides]
Venue: G33, Kilburn Building

Abstract

We propose an algorithm for Bayesian filtering and smoothing in nonlinear dynamic systems for the case where both the transition function and the measurement function are described by Gaussian process (GP) models. GPs provide a consistent framework for analytically tractable Bayesian inference over functions and, thus, non-parametric regression even if the structure of the underlying dynamic system is unknown. Embedded into the forward-backward algorithm, we present filtering and smoothing with GPs based on exact moment matching. Our evaluations hint at the robustness of our approach in situations, where common Gaussian filters and smoothers fail if they are based on local linearization and/or the unscented transform. We point out that not the use of GPs is the key to the success of our algorithm, but rather the exact moment matching when propagating state distributions forward.

Joint work with:

Marco Huber (Fraunhofer Institute for Information and Data Processing)
Ryan Turner (University of Cambridge)
Uwe Hanebeck (Karlsruhe Institute of Technology)
Carl Edward Rasmussen (University of Cambridge)

Monday, 09 Nov

14:00-15:00 Infer.NET and Csoft: a framework and language for machine learning, John Winn, Microsoft Research, Cambridge, U.K. [slides]
Venue: IT407

Abstract

Infer.NET is a framework for running Bayesian inference in graphical models. Internally, graphical models are expressed as a program in Csoft - a language which allows variables to be random. The flexibility of this language means that you can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification or clustering through to customised solutions to domain-specific problems. Infer.NET has already been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others.

In this talk, I will describe how Infer.NET and Csoft work. I will include an extended tutorial/demo showing how you can get hold of Infer.NET and try it out for yourself.

October 2009

Monday, 12 Oct

14:00-15:00 The Inevitable Languages of Probability and Quantum Mechanics, John Skilling, Maximum Entropy Data Consultants and ex-DAMTP, University of Cambridge [slides]
Venue: Atlas 1

Abstract

Bayesian calculus is based on the sum and product rules of probability. Why? Quantum mechanics is also based on sum and product rules, but using complex arithmetic instead. Why?

It turns out that in each case the rules derive from symmetries so compelling that there is no realistic alternative for the calculus. Our reasoning is necessarily in terms of standard probability. Our models of a finite physical world have the same symmetries and follow the same mathematical structure. But, in physics, we only learn about an object through it's interactions with a device, and we only know about that device through its interactions with other devices, and so on in arbitrary regress. This means that we never know everything about an object, and can at best extract one number from a device:object pair. The calculus of this is necessarily standard quantum mechanics.

Accordingly, we have a consistent structure for what we see and for how we learn.

Monday, 05 Oct

14:00-15:00 Model-based basecalling and quality assessment of second-generation sequencing data, Héctor Corrada Bravo, Biostatistics Department, Bloomberg School of Public Health, Johns Hopkins University [slides]
Venue: IT407

Abstract

Second-generation sequencing technology, capable of sequencing millions of short fragments of DNA in parallel, is increasingly used in a wide array of applications, from resequencing to gene expression profiling. However, these data present unprecedented challenges in statistical and computational analysis. Analysis operates on millions of short nucleotide sequences, or reads, which are the result of complex processing of noisy continuous fluorescence intensity data. Large variation in the processing quality of the reads results in infrequent but systematic errors that we have found to mislead downstream analysis in some applications. For instance, a central goal of the 1000 Genomes Project is to quantify variation at the single nucleotide level. At this resolution, small error rates in sequencing are significant. Modeling and quantifying the uncertainty inherent in the generation of sequence reads is essential. In this talk, I'll review aspects of sec-gen technology, emphasizing characteristics of the upstream intensity data from which basecalls are produced. I will present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina platform. Model parameters have a straightforward interpretation allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. In contrast to other proposed methods for improved base-calling in the Illumina platform, our model provides informative estimates readily usable in quality assessment tools while improving base-calling performance. I'll conclude with a discussion of a few remaining problems in processing and analysing sec-gen data and how Statistical and Machine Learning methods may be used to address them.

June 2009

Tuesday, 30 Jun

12:00-13:00 Visualizing data with t-SNE, Geoffrey Hinton, Canadian Institute for Advanced Research and University of Toronto, Canada
Venue: Lecture Theatre 2.19, Kilburn Building

Abstract

Over the last decade, many new methods have been developed for visualizing high-dimensional data by giving each data-point a location in a two-dimensional map. The goal is to represent the separations of pairs of data-points by the separations of their corresponding map-points, with an emphasis on representing the small separations accurately. I will describe a new method, called t-SNE, that is based on two ideas. The first idea is to convert the set of pairwise distances between data-points into a set of probabilities of selecting pairs of data-points. The selection probability of a pair of points is proportional to a Gaussian function of their separation. If the distances between map-points are converted into pairwise probabilities in the same way, any given arrangement of map-points can be evaluated by measuring the divergence between the probability distributions obtained from the data-points and the map-points. A good arrangement of map-points is then found by performing gradient descent in this divergence.

Unfortunately, if the probabilities of pairs of map-points are computed using a Gaussian function of their separation, the difference between the distributions of pairwise distances in high-dimensional and low-dimensional spaces causes the map-points to be crowded together in the center of the map. This problem can be largely overcome by using a heavy-tailed t-distribution when computing the selection probabilities of pairs of map-points. This leads to maps that look much better than those produced by other recent methods. In particular, t-SNE is very good at preserving clusters in the data at many different scales simultaneously.

The talk describes joint work with Laurens van der Maaten that recently appeared in the Journal of Machine Learning Research.

Monday, 29 Jun

14:00-15:00 A quick way to learn a mixture of exponentially many linear models, Geoffrey Hinton, Canadian Institute for Advanced Research and University of Toronto, Canada
Venue: Lecture Theatre 1.4, Kilburn Building

Abstract

Mixtures of linear models can be used to model data that lies on or near a smooth non-linear manifold. A proper Bayesian treatment can be applied to toy data to determine the number of models in the mixture and the dimensionality of each linear model but this neurally uninspired approach completely misses the main problem: Real data with many degrees of freedom in the manifold requires a mixture with an exponential number of components. It is quite easy to fit mixtures of 21000 linear models by using a few tricks: First, each linear model selects from a pool of shared factors using the selection rule that factors with negative values are ignored. Second, undirected linear models are used to simplify inference and the models are trained by matching pairwise statistics. Third, Poisson noise is used to implement L1 regularization of the activities of the factors. The factors are then threshold linear neurons with Poisson noise and their positive integer activities are very sparse. Preliminary results suggest that these exponentially large mixtures work very well as modules for greedy, layer-by-layer learning of deep networks. Even with one eye closed, they outperform Support Vector machines for recognizing 3-D images of objects from the NORB database.

February 2009

Monday, 23 Feb

14:00-15:00 Analysis of Diversity Mechanisms for Global Exploration, Peter Oliveto, University of Birmingham, U.K.
Venue: Atlas 1

Abstract

Maintaining diversity is important for the performance of evolutionary algorithms. Diversity mechanisms can enhance global exploration of the search space and enable crossover to find dissimilar individuals for recombination. We focus on the global exploration capabilities of mutation-based algorithms. Using a simple bimodal test function and rigorous runtime analyses, we compare well-known diversity mechanisms like deterministic crowding, fitness sharing, and others with a plain algorithm without diversification. We show that diversification is necessary for global exploration, but not all mechanisms succeed in finding both optima efficiently. Our theoretical results are accompanied by additional experiments for different population sizes.

2008

December 2008

Wednesday, 03 Dec

14:00-15:00 An Inverse Problems Perspective on Supervised Learning, Lorenzo Rosasco, Massachusetts Institute of Technology, Boston, U.S.A.
Venue: IT407 (Information Technology Building)

Abstract

In supervised learning, the classical way to design learning algorithms is to restrict the class of possible solutions. This in turn can be interpreted as imposing regularity assumptions on the problem at hand. In this talk, we discuss how interpreting learning from the perspective of inverse problems provides a different way to think of learning algorithms. The concept of stability -- with respect to noise and random sampling -- plays a crucial role in this picture. Different learning approaches such as dimensionality reduction, early stopping, and others, can be cast in a new, unifying framework. From a statistical standpoint, the class of methods under consideration can be proved to adaptively achieve optimal minmax learning rates.

November 2008

Monday, 17 Nov

14:00-15:00 Developments in nonparametric Bayesian modeling with Gaussian processes, Ryan Adams, University of Cambridge, Cambridge, U.K. [slides]
Venue: Atlas 1, Kilburn Building

Abstract

I will discuss recent work that enables modeling of probability density functions with Gaussian process priors. In the past this has been difficult due to the impossibility of integrating over an infinite-dimensional random function. We have a novel model that allows generating exact and exchangeable data from the prior. With a generative model, we are able to perform MCMC-based inference without making intermediate approximations. We call this model and generative procedure the Gaussian Process Density Sampler. I will also give an overview of how we can apply similar GP-based models to other tasks, such as inference in time series and semi-supervised learning.

This is joint work with Iain Murray (University of Toronto) and David J.C. MacKay (University of Cambridge).

Monday, 10 Nov

16:00-18:00 Five Years of Graph Kernels, Karsten Borgwardt, Max Planck Institute, Tuebingen, Germany [slides]
Venue: Atlas 1, Kilburn Building

Abstract

Kernel machines have had a major impact on the field of machine learning and even data analysis as a whole, and when the first graph kernels were proposed in 2003, this machinery became applicable to graph- and network-structured data as well. However, being applicable in theory and being useful in practice turned out to be two very different sheets of paper for graph kernels, and hence a major focus of subsequent research on graph kernels has been to design more efficient, more expressive, in short, more powerful graph kernels over the last five years. In this talk, I will give an introduction to the field of graph kernels and present its development over the last five years.

Monday, 03 Nov

14:00-15:00 Theoretical results on artificial immune systems, Andrew Hone, University of Kent, Canterbury, U.K. [slides]
Venue: 2.15, Kilburn Building

Abstract

Artificial immune systems are a relatively new area of bio-inspired computation. The inspiration for artificial immune algorithms has come from biological models of the natural immune system, especially the theories of clonal selection, immune networks and negative selection. Moreover, these algorithms have been successfully employed in a variety of different application areas. However, until quite recently their has been a dearth of theoretical studies to justify their use. In this talk, the existing theoretical work on artificial immune systems is reviewed, and some of the future challenges in this area are highlighted.

October 2008

Monday, 20 Oct

14:00-15:00 Approximate inference in continuous time discrete and hybrid stochastic systems, Guido Sanguinetti, University of Sheffield, Sheffield, U.K. [slides]
Venue: Atlas 1, Kilburn Building

Abstract

In this talk I will present some results on inference for continuous time stochastic processes. The type of process we will consider is the Markov Jump process (MJP), where the state of the system (a multi-dimensional vector with integer entries) evolves in continuous time with Markovian dynamics. This type of system is frequently used in modelling chemical reactions at low count numbers, or in ecological systems. In the first part of the process I'll introduce a variational framework for inference from noisy observations of a MJP. In the second part, I will present some more recent work on hybrid systems where a latent MJP drives the dynamics of continuous observed variables.

September 2008

Monday, 08 Sep

14:00-15:00 Inference of regulatory networks combining expression, ChIP-on-chip and genetic variation, Oliver Stegle, University of Cambridge, Cambridge, U.K. [slides]
Venue: Atlas 1, Kilburn Building

Abstract

The problem of deducing regulatory networks in model organisms, such as mouse, fruit-fly or yeast is of considerable interest and a multitude of approaches have been proposed. The most common data source for this inference problem is gene expression and hence algorithms estimate the dependency structure of regulatory networks from gene expression profiles in multiple conditions or from multiple individuals. A key insight common to such models is that expression alone is not enough to obtain meaningful predictions and hence prior knowledge such as binding affinities from ChIP-on-chip experiments is taken into account when performing inference. In this talk we will expand the set of data sources for this problem and include genetic variation from multiple yeast individuals. This leads to a large joint model of gene expression, binding affinities and SNPs. We discuss inference challenges in such models as well as the biological meaning of the additional association model between SNPs and inferred latent transcription factor activations.

June 2008

Tuesday, 03 Jun

14:00-15:00 Bayesian Inference for Systems Biology Models via a Diffusion Approximation, Andrew Golightly, Newcastle University, Newcastle, U.K. [slides]
Venue: 2.15, Kilburn Building

Abstract

As post-genomic biology becomes more predictive, the ability to infer rate parameters (known as reverse-engineering) of biochemical networks will become increasingly important. One approach is to replace the underlying model by a diffusion approximation so that a noise term represents intrinsic stochastic behaviour and the model is identified using discrete-time (and often incomplete) data that is subject to error. Unfortunately, likelihood based inference can be problematic as closed form transition densities of nonlinear diffusions are rarely available. A widely used solution follows the work of Pedersen (1995) and involves the introduction of latent data points between every pair of observations to allow an Euler-Maruyama approximation of the true transition densities to become accurate. Markov chain Monte Carlo (MCMC) methods can then be used to sample the posterior distribution of latent data and model parameters; however, naive schemes suffer from a mixing problem that worsens with the degree of augmentation. We therefore explore some recently developed MCMC schemes whose performance is not adversely affected by the amount of augmentation. The methodology is applied to the estimation of parameters governing some simple systems biology models.

May 2008

Thursday, 15 May

14:00-15:00 Probabilistic non-parametric models for shape recovery and pose estimation, Raquel Urtasun, MIT, Cambridge, U.S.A. and UC Berkeley EECS and ICSI, California, U.S.A. [slides]
Venue: 2.15, Kilburn Building

Abstract

In this talk I will present the use of probabilistic non-parametric models in two different contexts: learning local prior models for recovering shape deformations from single images, and fast discriminative approaches to pose estimation from single images with very large databases.

Without a deformation model, monocular 3D shape recovery of deformable surfaces is severly under-constrained. Even when the image information is rich enough, prior knowledge of the feasible deformations is required to overcome the ambiguities. This is further accentuated when such information is poor, which is a key issue that has not yet been addressed.

We propose an approach to learning shape priors to solve this problem. By contrast with typical statistical learning methods that build models for specific object shapes, we learn local deformation models, and combine them to reconstruct surfaces of arbitrary global shapes. Not only does this improve the generality of our deformation models, but it also facilitates learning since the space of local deformations is much smaller than that of global ones. While using a texture-based approach, we show that our models are effective to reconstruct from single videos poorly-textured surfaces of arbitrary shape, made of materials as different as cardboard, that deforms smoothly, and much lighter tissue paper whose deformations may be far more complex.

Discriminative approaches to human pose inference involve mapping visual observations to articulated body configurations. Current probabilistic approaches to learn this mapping have been limited in their ability to handle domains with a large number of activities that require very large training sets. We propose an online probabilistic regression scheme for efficient inference of complex, high dimensional, and multimodal mappings. Our technique is based on a local mixture of Gaussian Processes, where locality is defined based on both appearance and pose, and where the mapping hyperparameters can vary across local neighborhoods to better adapt to specific regions in the pose space. The mixture components are defined online in very small neighborhoods, so learning and inference is extremely efficient. When the mapping is one-to-one, we derive a bound on the approximation error of local regression (vs. global regression) for monotonically decreasing covariance functions. Our method can determine when training examples are redundant given the rest of the database, and use this criteria for pruning. We report results on synthetic (Poser) and real (Humaneva) pose databases, obtaining fast and accurate pose estimates using training set sizes up to 105.

Tuesday, 13 May

15:00-15:15 On a Form of Advertiser Cheating, Onno Zoeter, Microsoft Research, Cambridge, U.K. [slides]
Venue: 2.15, Kilburn Building

Abstract

This short talk was an appendum Onno gave on the problem of auctions for internet advertisers.

Tuesday, 13 May

14:00-15:00 Generalized expectation propagation for on-line monitoring tasks, Onno Zoeter, Microsoft Research, Cambridge, U.K. [slides]
Venue: 2.15, Kilburn Building

Abstract

Many real world dynamical systems such as industrial processes, financial markets, bio-medical systems, etc. can be interpreted as following different regimes. For instance an industrial process can follow a set of nearly deterministic laws in normal operation (a normal regime), but show markedly different behavior after the failure of certain parts (a failure regime). A switching linear dynamical system (SLDS) is a probabilistic model that explicitly models such regime changes. The conditional independence structure of the SLDS is extremely simple, namely a chain. This allows for a straightforward recursive inference algorithm. However, the model consists of both multinomial and Gaussian variables, and local integrals are over conditional Gaussian distributions. These integrals have analytic solutions, but since the conditional Gaussian family is not closed under marginalization, the size of the messages -- the information that is passed along the graph in a recursive algorithm -- grows exponentially with the problem size. Motivated by the expectation propagation framework from Tom Minka, I describe how a Kikuchi free energy with weak marginalization constraints, i.e. constraints that only enforce equality on expected sufficient statistics, can be used to formulate an approximate inference algorithm. If we restrict the switching linear dynamical system in an appropriate way, we obtain a model that can detect change points in a dynamic system. I will show that for this change point model the Kikuchi free energy approach results in an interesting class of approximate inference algorithms where a trade-off can be made between computational complexity and accuracy.

This is joint work with Tom Heskes from the Radboud University, Nijmegen, The Netherlands.

April 2008

Tuesday, 22 Apr

14:00-15:00 Sensible Priors for Sparse Bayesian Learning, Joaquin Quiñonero Candela, Microsoft Research, Cambridge, U.K. [slides]
Venue: IT407

Abstract

The Sparse Bayesian Learning framework applied to finite linear models yields compact, sparse models with very few basis functions relative to the training set size. The Relevance Vector Machine is an example. Yet these sparse models imply restrictive priors over functions, and consequently impractical, overconfident predictions. We propose an alternative treatment that breaks the rigidity of the implied prior through decorrelation, and consequently gives reasonable and intuitive error bars. The attractive computational efficiency is retained; learning leads to sparse solutions. An exciting by-product is the ability to model non-stationarity and input-dependent noise.

March 2008

Tuesday, 11 Mar

14:00-15:00 Making sense of a complex world: the role of data and models, Dan Cornford, Aston University, Birmingham, U.K. [slides]
Venue: Atlas 1

Abstract

In this presentation I review the role of data and models in trying to make sense of the world around us. Motivated by problems in environmental science, and in particular weather and climate, I review and contrast how machine learners and the natural scientists approach modelling. Using examples from projects I run, I will try to show how we can combine the strengths of both approaches. In machine learning this leads to what might be called inference with complex priors, while in natural sciences this leads to data assimilation and parameter estimation. I will try to avoid pointing out that this is what statisticians have always done. I conclude with a sketch of how I see us being able to make progress is such complicated problems, with an emphasis on inference in stochastic dynamic systems.

2007

November 2007

Tuesday, 20 Nov

13:00-14:00 A Tutorial on Expectation Propagation, Antti Honkela, Helsinki University of Technology [slides]
Venue: Atlas 1

Tuesday, 06 Nov

13:00-14:00 On Stratified Path Sampling of the Thermodynamic Integral: Computing Bayes Factors for Nonlinear ODE Models of Biochemical Pathways, Mark Girolami, Department of Computing Science, University of Glasgow, U.K.. [slides]
Venue: Atlas 1

Abstract

Bayes factors provide a means of objectively ranking a number of plausible statistical models based on their evidential support. Computing Bayes factors is far from straightforward and methodology based on thermodynamic integration can provide stable estimates of the marginal likelihood. This talk will present a stratified path sampling strategy in estimating the thermodynamic integral and will consider issues such as optimal paths and the variance of the overall estimator in comparison to non-equilibrium based methods. The main application considered will be the computation of Bayes factors for deterministic biochemical pathway models based on systems of nonlinear ordinary differential equations (ODE). A large scale study of the ExtraCellular Regulated Kinase (ERK) pathway will be briefly discussed where recent Small Interfering RNA (siRNA) experimental validation of the predictions made using the computed Bayes factors is presented. Preliminary work on a computationally economic posterior sampling scheme for ODE models employing auxiliary Gaussian state and derivative processes will also be presented.

October 2007

Tuesday, 23 Oct

14:00-15:00 Aspects of inference from gene expression data, Niranjan, Department of Computer Science, University of Sheffield [slides]
Venue: Atlas 1

Abstract

I will describe some recent work in my group on the analysis of gene expression data: (a) how does precision of gene expression measurements influence predictions we can make using supervised and unsupervised learning methods; (b) can we find a data-driven mapping between gene expression measurements and corresponding protein concentrations? and (c) how do we infer transcription factor activity from gene expression measurements?

Tuesday, 09 Oct

14:00-15:00 Bayesian Inference, Sparse Linear Models, and the Relevance Vector Machine, Michael Tipping, Vector Anomaly Ltd, Cambridge, U.K.. [slides]
Venue: Atlas 1

Abstract

Beginning with a brief introduction to the principles of Bayesian inference, this talk will present a practical Bayesian approach to "sparse" predictive modelling (for regression and classification) as exemplified by the "Relevance Vector Machine". The presentation will feature some simple examples and demonstrations, along with some gratuitous video game footage.

September 2007

Tuesday, 25 Sep

14:00-15:00 Bayesian Agglomerative Clustering with Coalescents, Yee Whye Teh, Gatsby Computational Neuroscience Unit, University College, London, U.K.. [slides]
Venue: Atlas 1

Abstract

Hierarchical clustering of data is one of the most widely used machine learning techniques. Traditional hierarchical clustering techniques construct a single tree in a greedy fashion, either in a top-down or a bottom-up agglomerative fashion. Sometimes we are interested in how reliable the constructed tree is, i.e. how much we believe that the structure of the tree reflects true underlying structure in the data rather than spurious effects due to noise. Such a question can be answered using a Bayesian approach where we define a prior over trees and compute a posterior distribution over trees. However past Bayesian models for hierarchical clustering either does not give a posterior over trees, or is simply too complex to have widespread appeal. In this talk we present a model that 1) gives a posterior distribution over trees, 2) is easy to implement, and 3) has the additional nice property that it is exchangeable. We show that our model performs well compared to previous approaches on a number of small datasets, and apply it to document clustering and phylolinguistics.

Tuesday, 11 Sep

14:00-15:00 Variational Inference for Diffusion Processes, Cedric Archambeau, Department of Computer Science, University College, London, U.K.. [slides]
Venue: 2.15

Abstract

Diffusion processes are continuous-time continuous-state stochastic processes that are in general only partially observed. The joint estimation of the forcing parameters and the system noise (volatility) in these dynamical systems is a crucial, but non-trivial task, especially when the system is nonlinear and multi-modal. In this talk, we will discuss an approximate inference procedure for diffusion processes, which allows us to estimate these parameters by simple gradient techniques and which is computationally less demanding than most MCMC approaches. We provide a rigourous proof of our approximation in continuous time and apply the resulting smoother to a bi-stable system for illustration.

Page generated on Tuesday 09 Feb 2010 at 18:34, maintained by Neil D. Lawrence