Milky Way Galaxy Evolution Cosmology


Data Integration Scalable Inference Sampling Methods


My astronomy research interests are quite broad and generally focus on using large datasets to generate 3-D maps of the sky to constrain the structure, formation, and/or evolution of large populations of astronomical objects. I am particularly interested in developing techniques to jointly model observations from separate (but often complementary) datasets.

Galactic Structure and Dynamics

Our Milky Way Galaxy is comprised of roughly 200 billion stars. These stars trace Galactic structure and serve as records of the formation history of the Galaxy. Studying them, however, is challenging because we can't tell, e.g., how old they are or how far away they are from us from just images of the sky (see SED Fitting). This is made even more difficult because the Galaxy is also filled with cosmic dust, which blocks and ''reddens'' the light from many of these stars. We also know many stars are born together in clusters from clouds of dense molecular gas.

Studying Galactic structure with modern astronomical datasets requires integrating observations of all of these features (stars, dust, gas) from many different datasets (imaging, spectroscopy, time series, etc.) and at many different scales (from isolated stars to dwarf galaxies). I am interested in combining these data with theoretical models to create sophisticated 3-D models of the Milky Way such as 3-D dust maps.

2-D dust map snapshot
A 2-D snapshot of the cumulative dust (reddening) out to 500 parsecs from the Bayestar17 3-D dust map. Credit: Green et al. (2018).

Galaxy Evolution

The story of how galaxies evolve is complex and involves many moving pieces. Observations suggest galaxies are formed hierarchically through the merger of many smaller galaxies throughout the course of their lifetimes, and their evolution is a complex interplay of secular processes involving ongoing star formation and gas physics as well as catastrophic processes such as mergers with neighboring galaxies and feedback from their central supermassive black holes that can rapidly "quench" star formation. This leads to an extraordinary diversity of galaxies with varied assembly histories and physical properties.

To understand the details of how galaxies evolve, we need to observe large samples of galaxies across the electromagnetic spectrum in order to constrain their formation and evolutionary histories. Astronomers have had difficulty keeping pace with the increasing quantity and quality of data from large surveys, which now include many bands of broadband photometry, 1-D spectra, and spatially-resolved 2-D spectra. These often contain complementary information but are challenging to model. I am interested in developing techniques and tools designed to model these data and the corresponding galaxy populations.

Evolution of star-forming galaxies across cosmic time
Evolution of star-forming galaxies across cosmic time based on a compilation of 25 studies. Credit: Speagle et al. (2014).

Cosmology and Large-Scale Structure

The growth of large-scale structure and the evolution of the Universe at large is mysterious and complex. We now know that baryonic matter (i.e. "normal stuff") only makes up 5% of the energy budget of the Universe. Of the remaining 95%, 25% is comprised of "dark matter" and the remaining 70% is made up of "dark energy". While we can infer the presence of dark matter through its gravitational effects, dark energy only can be observed by studying the evolution of the Universe on the largest of scales over long periods of time.

One powerful method for probing cosmology is gravitational lensing, where the light from distant background objects is warped by intervening structures. This warping is sensitive to many features we are interested in studying and so allows us (in theory) to reconstruct them. This reconstruction, however, is often extremely challenging and depends heavily on knowing the distances to many of these "background" sources. I am interested in developing quick yet robust probabilistic approaches to derive distances to hundreds of millions of galaxies collected in modern surveys along with methods to subsequently incorporate them into cosmological analyses.

Cosmological constraints from HSC
Cosmological constraints from cosmic shear measurements from the HSC-SSP Survey. Credit: Hikage et al. (2019).


My statistics research interests are broadly oriented around performing robust statistical inference on large, diverse datasets. This requires developing frameworks to jointly model separate (but complementary) observations, scalable methods to implement them, and sampling methods to explore the inferred distribution of model parameters.

Data Integration

Modern science has entered the era of "big data", with large datasets commonly available across a host of scientific domains from genomics to economics to astronomy. These datasets often overlap with each other while probing complementary sets of information, and so jointly modeling them often enables us to draw better statistical inferences.

While straightforward in theory, this type of "data integration" is often difficult since many of these datasets (e.g., photometry versus spectroscopy) possess widely different characteristics, favor different sub-populations, and often only partially overlap. I am interested in developing frameworks and methods that help to deal with these issues with applications in astronomy (e.g., modeling the Milky Way).

Distances across the Perseus cloud.
Distance and velocity estimates to various regions of the Perseus cloud derived through a combination of observations of stars, dust, and gas. Credit: Zucker et al. (2018).

Scalable Inference

Analyzing large datasets has increasingly become the domain of machine learning methods. However, these methods tend to have difficulty deriving estimates of the uncertainty and reliability of their predictions and can often be difficult to interpret. While performing statistical inference with interpretable models can help to address these concerns, many methods are often prohibitively slow and therefore limited to small datasets.

In order to address these concerns, I am interested in combining elements of machine learning and statistical modeling to develop quick yet robust approaches for "scalable" inference that can be applied to large datasets (such as those used in cosmology).

Workflow combining kNN with SED fitting.
A schematic showing a hybrid approach that combined statistical modeling and machine learning to derive "photometric redshifts" to galaxies. Credit: Speagle et al. (2019).

Sampling Methods

Much of science involves using data to test, constrain, and/or rule out various models that represent our current understanding of how we think things work. These models are often complex, involving on many parameters and requiring lots of computational effort to generate. The constraints on these parameters given our data (and possibly prior beliefs) are often unknown (and can sometimes be multi-modal), requiring the use of numerical techniques to estimate them.

One class of techniques for estimating these constraints relies on generating random samples from the distribution via computationally tractable numerical simulation. These are known as a "Monte Carlo" approaches, and include methods such as Markov Chain Monte Carlo that are widely used throughout the sciences. I am interested in developing efficient sampling strategies that can be applied to multi-modal distributions (such as those seen when modeling galaxies) to estimate parameters and perform model comparison.

An animation of dynesty in action.
An animated demonstration of the Nested Sampling code "dynesty" (Speagle 2019). Credit: dynesty documentation.