Projects for Students

I am looking for highly motivated students who are interested in research in any of the following projects. If you are that student, drop me an email.

Project 1: Sparse-Constrained Optimization — Theory

Many optimization problems in statistics and machine learning can be written as

$$ \min_{\theta\in C}\;f(\theta) \quad\text{s.t.}\quad \|\theta\|_{0}\le k, $$

where $f$ is an objective such as a loss function or a negative log-likelihood, $C$ is a constraint set, and $\|\theta\|_{0}$ counts the nonzero entries of $\theta$. The sparsity constraint makes the problem combinatorial and often NP-hard. Our approach is to decouple the sparsity and non-sparsity constraints via a Boolean relaxation, yielding a continuous problem that modern solvers can handle efficiently.

In this project we study the theoretical foundations: local and global optimality conditions under sparsity constraints, convergence guarantees, and the geometry of the relaxed landscape.

Key skills required: Linear Algebra, Calculus, Convex Analysis, and Coding in Python/Julia/C++.

Project 2: Sparse-Constrained Optimization — Applications

Building on the theoretical framework above, this project focuses on applying sparse-constrained optimization to real-world problems. Key application areas include:

  • Statistical model selection: best subset selection in generalised linear models (GLMs) for biomedical and high-dimensional data.
  • Neural network pruning: removing redundant weights from trained networks to reduce model size and inference cost while preserving accuracy.
Key skills required: Linear Algebra, Statistical Modelling, Deep Learning, and Coding in Python/R/C++.

Project 3: Importance Sampling Methods for Random Geometric Graphs

A random geometric graph is constructed by placing points at random in a region and connecting pairs that lie within a prescribed distance. Many structural properties of these graphs — such as bounded edge count, bounded degree, planarity, and being a forest — become rare events when the point intensity is high.

We develop importance sampling methods that estimate rare-event probabilities of the form $P(G(X) \text{ satisfies } \mathcal{P})$, where $\mathcal{P}$ is a structural property such as "the maximum degree is at most $\ell$" or "the graph is planar". These estimators can be orders of magnitude faster than naive Monte Carlo. Our methods currently cover seven properties on two-dimensional windows, and are implemented in the Python package PyREGG.

Ongoing and future directions include extending to additional hereditary properties (e.g., bipartiteness, outerplanarity), higher-dimensional windows, and using the unbiased IS estimators within pseudo-marginal MCMC for doubly intractable distributions.

Key skills required: Probability Theory and Coding in Python/R/Julia/C++.

Project 4: Algorithms for Hard Graph Problems

Given a graph $G = (V, E)$, a fundamental question is: what is the largest subset of vertices with no two adjacent? This is the maximum independent set problem, and it is NP-hard in general. A closely related challenge is uniform $k$-colouring: sampling a proper vertex colouring with $k$ colours uniformly at random, which is #P-hard for general graphs even when valid colourings exist.

Both problems sit at the intersection of combinatorial optimisation and probabilistic computation. Exact algorithms scale poorly, while approximate methods — such as MCMC samplers, message-passing heuristics, and continuous relaxations — offer practical alternatives but come with their own convergence and accuracy trade-offs.

In this project we explore novel exact and approximate approaches for these problems, drawing on techniques from optimisation, statistical physics, and Monte Carlo methods. Potential directions include new relaxation-based solvers for independent sets, efficient Markov chain mixing for graph colourings, and leveraging structural properties (e.g., bounded treewidth, planarity) to design faster algorithms.

Key skills required: Graph Theory, Combinatorial Optimisation, Probability Theory, and Coding in Python/C++.

Project 5: Machine Learning Prediction of Bluebottle's Presence Along the Australian Coast

Many Australians have had a painful bluebottle sting when swimming at the beach, yet little is known about the bluebottle, and when they will arrive, and if it will be in large swarms or only a few individuals.

Dr Amandine Schaeffer (UNSW) and I are looking for a masters/honors student to work on a data driven project to investigate machine learning techniques to address these challenges. Drop us an email if you are a student and interested in working on this project. For more details, click here.

Key skills required: Applications of Deep Learning Models and Coding in Python.

Project 6: Deep Learning Models for Spatio-temporal Data

Recent advances in remote sensing has resulted in large volumes of data sets easily available. As a result, assimilating such large dat sets into numerical hydrological models is often a computationally demanding task.

In this study, Dr Sreekanth Janardhanan (CSIRO) and I are aiming to study the application of deep learning models to investigate the relationship between spatio-temporal data sets and hydrological processes.

Key skills required: Mathematical Understanding of Deep Learning and Coding in Python.


Industry Collaborations

Estimating the Number of Tyres in Stockpiles

Unregulated tyre stockpiles pose significant health and environmental risks, particularly through catastrophic fires that release toxic emissions. Enforcing regulatory limits requires accurate estimation of tyre counts, but manual counting is impractical for large stockpiles and existing methods lack the precision needed for legal proceedings.

In this joint project with the Environment Protection Authority (EPA) Victoria, Prof Samuel Muller (Macquarie University) and I developed a scientifically defensible method to estimate tyre counts from drone-measured stockpile volumes. Through laboratory experiments with up to 10,000 miniature tyres and field validation with real stockpile data, we established a statistically robust linear relationship between stockpile volume and tyre quantity, providing a conservative lower-bound estimate with at least 99% confidence — suitable for regulatory enforcement and court admissibility.

Key skills: Lab Experiments, Frequent Visits to Bunnings for Setting Up the Lab, Probability Theory, and Basic Statistics.

Deep Learning Emulators for Physics-Based Models

Physics-based models such as rainfall-runoff, infectious disease (SEIR), and advection-diffusion models are essential tools in environmental and public health sciences, but running them repeatedly for calibration or uncertainty quantification can be computationally prohibitive.

In this joint project with CSIRO, Dr Sahani Pathiraja (UNSW) and I co-supervised a Masters student (Joseph Gurr) alongside Dr Joel Dabrowski and Dr Dan Pagendam from CSIRO. The project focused on developing a two-model deep learning architecture for emulating physics-based models with built-in uncertainty quantification. A primary model captures the structure of the physical process while a secondary neural network generates parameter distributions, naturally capturing predictive uncertainty without requiring explicit Bayesian priors or expensive posterior computation. The framework was validated on a range of applications including rainfall-runoff modelling, epidemic dynamics, and advection-diffusion processes.

Key skills: Deep Learning, Python Programming, and Bayesian Methods.