how can you classify galaxies on your own?

what do images of galaxies look like as raw data?

how does an image recognition model separate unique features?

This project shows how you can use artificial intelligence to classify galaxies based on their shapes using real astronomical images. The GitHub repository linked at the bottom trains a computer vision model to recognise three main galaxy types: elliptical, lenticular, and spiral. Instead of a human looking at each image and deciding the type, the computer learns to do this automatically by studying many examples. This is a common technique in modern astronomy, where surveys collect millions of galaxy images and it is impossible for people to classify all of them by hand.

To understand how this works, it helps to know what an image looks like to a computer. While we see a galaxy image as a picture with stars and structure, the computer only sees numbers. A color image is stored as a grid of pixels, and each pixel has three values representing red, green, and blue brightness. So, an image is really just a large table of numbers. The job of the machine learning model is to find patterns in these numbers that usually correspond to certain shapes, such as smooth round light distributions for elliptical galaxies or curved arm patterns for spiral galaxies.

An image recognition model learns in stages. In the early layers, it detects very simple features like edges, bright and dark regions, and lines. In the middle layers, it starts combining these into more complex shapes like curves, bars, and rings. In the deeper layers, it can recognize larger structures, such as spiral arms, central bulges, or smooth elliptical profiles. By stacking many layers together, the model gradually builds an understanding of what makes one type of galaxy look different from another, even though it only ever works with numbers.

In this project, you start by downloading galaxy images from the EFIGI dataset, which is a collection of real, labeled galaxy images used for research and education. These images are placed into a folder on your computer. Then, you organize them into a structure that the learning model understands, separating them into training, validation, and testing sets, and also into folders for each class: elliptical, lenticular, and spiral. The training images are used to teach the model, the validation images are used to check its progress while it is learning, and the test images are used at the end to see how well it performs on images it has never seen before.

A preparation script is used to automate this process. It takes the original images, sorts them by galaxy type, and splits them into the correct folders for training, validation, and testing. This step is important because machine learning models expect data to be well organized and clearly labeled. Once this is done, the dataset is ready for training.

The training step uses a neural network based on a YOLO-style architecture, which is a type of deep convolutional neural network commonly used in computer vision. Even though YOLO is famous for detecting objects in images, the same kind of network can also be used for classification. During training, the model looks at each image, makes a guess about its class, compares that guess to the correct label, and then slightly adjusts its internal settings to reduce the error. This process is repeated many times, and gradually the model becomes better at telling the difference between elliptical, lenticular, and spiral galaxies.By the end of training, the model has learned to recognize visual patterns that correspond to different galaxy shapes. When you give it a new image, it does not “see” the galaxy the way a human does, but it can still analyze the pixel values, detect learned features, and predict the most likely class. This is essentially how astronomers use AI today to sort huge numbers of galaxies in large sky surveys. Through this project, you are learning not only about galaxy classification, but also about how modern image recognition systems work in real scientific research. For more details, check the article linked below

You can find the code here to my galaxy classifier model:

https://github.com/bpyoda/glx-classifier/tree/main

For further research and reading about star-yolo: https://iopscience.iop.org/article/10.1088/1674-4527/ae0c79

More resources

https://arxiv.org/pdf/2507.11692

https://academic.oup.com/mnras/article/511/3/3330/6529246

https://academic.oup.com/mnras/article/399/3/1367/1074847

https://ieeexplore.ieee.org/document/6920476/

https://link.springer.com/article/10.1007/s00500-018-3521-2

https://www.ibm.com/think/topics/convolutional-neural-networks

https://ijits-bg.com/sites/default/files/archive/2023%28vol.15%29/No2/contents/2023-N2-08.pdf

http://www.scholarpedia.org/article/Encyclopedia:Astrophysics

https://www.d2l.ai/chapter_convolutional-neural-networks/conv-layer.html#learning-a-kernel

https://esahubble.org/images/heic9902o/