July 10, 2019

Classifying millions of galaxies – with a little help from citizen science

Classifying millions of galaxies – with a little help from citizen science

- Eliu Antonio Huerta Escudero, National Center for Supercomputing Applications

What is the first thing you should do when you need to label hundreds of millions of galaxies based on their shape or structure? Ask for some help, of course!

In 2007, the Sloan Digital Sky Survey (SDSS) launched Galaxy Zoo, a citizen science campaign in which the public helped to classify galaxy images captured by an optical telescope at the Apache Point Observatory in New Mexico, USA. Volunteers reviewed the images online to help determine whether each galaxy had a spiral or elliptical structure.

 

Samples of newly labelled images of DES elliptical galaxies. This is the first time these images have been correctly classified in DES data, using AI.

Now, a team of scientists is using this data labelled by citizens in the Galaxy Zoo project, and harnessing the power of artificial intelligence (AI) and high-performance supercomputers to start labelling the galaxy images gathered in the Dark Energy Survey (DES).

In 2019, the DES completed a six-year mission that scanned in depth about a quarter of the southern skies, observing hundreds of supernovae and millions of galaxies. Mining these data will provide unique insights into the nature of dark matter and dark energy, and their evolution in cosmic time. This is no easy task, however, so scientists from the National Center for Supercomputing Applications (NCSA) and the Argonne Leadership Computing Facility (ALCF) have used an innovative approach to address this computational grand challenge.

The scientists realized that citizen science campaigns may not be a scalable approach to labelling the galaxies in the DES data, which number over 300 million. Further, the Galaxy Zoo project did not provide the millions of high-quality galaxy images needed to train state-of-the-art AI algorithms for image classification from scratch.

Instead, the team pioneered the use of deep transfer learning in cosmology. They selected a neural network model that is the state-of-the-art for computer vision, and that was pre-trained with the ImageNet dataset, which contains millions of high-quality, real-object images divided into several thousand classes. Next, they fined-tuned their neural network model using a dataset of about 40,000 high-quality galaxy images from the Galaxy Zoo project.

The final product is an AI algorithm that achieves state-of-the-art accuracy to classify galaxy images, both in SDSS and DES data. The scientists have used this neural network model to label over 10,000 DES galaxies that had not been observed in previous surveys. Furthermore, this study is the first of its kind in the literature in which deep transfer learning is combined with distributed training, i.e., using tens of graphics processing units to reduce the training stage from five hours to just eight minutes.

“Big-data experiments such as the DES provide unique opportunities to be innovative,” says Eliu Huerta, Head of the NCSA Gravity Group. “We have demonstrated that reaching convergence between citizen science, AI, and large-scale computing provides a gateway to produce large-scale galaxy catalogues in DES. This work lays the foundation to address this problem at a larger scale, once the Large Synoptic Survey Telescope begins operations.”

The team recently published their findings in the journal Physics Letters B. Meanwhile, the latest iteration of the Galaxy Zoo project continues today on Zooniverse, providing further opportunities for citizen scientists with an interest in cosmology to play their part in classifying galaxies far, far away.

For further information about the analysis process, read the press release on the NCAS website. You can watch a video about the project here.

Share on