top of page
  • Writer's picturePriti Rangnekar

Computational Astrophysics Advancements of 2020

Updated: Jul 23, 2021

SkyShot, Volume 1, Issue 1

Author: Priti Rangnekar (Founder of SkyShot and Science Connect)

Traditionally, the words “astronomy” and “astrophysics” may conjure images of ancient star charts, telescopes staring into the night sky, or chalkboards filled with Einstein’s equations detailing special and general relativity. However, with the rise of ground and space-based sky survey projects and Citizen Science endeavors involving contributions from amateur astronomers worldwide, the field of astronomy is becoming increasingly data-driven and computationally enhanced. Survey projects, such as The Large Synoptic Survey Telescope, bring data issues such as high volume (nearly 200 petabytes of data), large varieties of data, and rapid speeds of data production and transmission, requiring efficient analysis through statistical computing. [1] As we collect more information about the physical world and develop powerful software and hardware, we gain the ability to methodically find patterns and make large scale predictions based on what we do know, allowing us to embrace the frontier of what has always been unknown.

In June 2019, researchers from institutions including Carnegie Mellon University and the Flatiron Institute announced the development of the first artificial intelligence simulation of the universe - the Deep Density Displacement Model. With the ability to complete a simulation in less than 30 milliseconds, the model proved to be both efficient and accurate, with relative errors of less than 10% when compared with both accurate but slow models and fast but less accurate models. Moreover, it could provide accurate values for certain physical values, such as dark matter amount, even when tested with parameters, such as gravitational conditions, it was not originally trained on. This is just one example of how the power of computing techniques can allow us to better understand the universe and its past. [2]

Depiction of lower error for the D3M model compared to the earlier 2LPT model

Figure 1: Depiction of lower error for the D3M model compared to the earlier 2LPT model [2].

In 2020, research groups from around the world have further capitalized on artificial intelligence and supercomputing to analyze specific aspects of the universe, including exoplanets, galaxies, hypernovae, and neutron star mergers.

Gaussian Process Classifiers for Exoplanet Validation

University of Warwick scientists Armstrong, Gamper, and Damoulas recently capitalized on the power of machine learning to develop a novel algorithm for confirming the existence of exoplanets, which are planets that orbit stars outside the Solar System. [3]

Traditionally, exoplanet surveys use large amounts of telescope data and attempt to find evidence of an exoplanet transit, or any sign of the planet passing between the telescope and the star it is orbiting. This typically comes in the form of a dip in the observed brightness of the target star, which makes intuitive sense given that the planet would be obstructing some light. Nevertheless, this analysis can be prone to false-positive errors, given that an observed dip does not necessarily indicate the presence of an exoplanet; it could also be caused by camera errors, background object interference, or binary star systems.[3] In the case of a binary star system, eclipsing binaries may result, in which a star’s brightness would vary periodically as one passes in front of the other, causing the observed dip. Such a phenomenon would require extended analysis of the target star’s flux lightcurve, which shows changes in brightness. In the case of background object interference, a background eclipsing binary or planet may blend with the target star, requiring researchers to observe any offset between the target star and the transit signal. [4]

As a result, researchers use a planetary validation process in order to provide the statistical probability that a transit arose from a false positive, in which a planet was not present. [5] A common algorithm used for validating some of the approximately 4,000 known exoplanets has been the vespa algorithm and open source code library. The procedure, detailed in a paper by Morton in 2012, accounts for factors such as features of the signal, target star, follow-up observations, and assumptions regarding field stars. [6] However, as Armstrong, Gamper, and Damoulas explain in their abstract published in August 2020, a catalogue of known exoplanets should not be dependent on one method. [5] Previous machine learning strategies have often generated rankings for potential candidates based on their relative likelihoods of truly being planets; however, these approaches have not provided exact probabilities for any given candidate. For example, in 2017, Shallue and Vanderburg developed a model that generated rankings for potential candidates based on their relative likelihoods of truly being planets. 98.8% of the time, plausible planet signals in the test set were ranked higher than false positive signals. [7]

However, a probabilistic framework is a key component of the planetary validation process. Thus, by employing a Gaussian Process Classifier along with other models, the University of Warwick researchers could find the exact statistical probability that a specific exoplanet candidate is a false positive, not merely a relative ranking. In general, a Gaussian Process generates a probabilistic prediction, which allows researchers to incorporate prior knowledge, potentially find confidence intervals and uncertainty values, and make decisions about refitting. [8] If the probability of a candidate being a false positive is less than 1%, it would be considered a validated planet by their approach. Trained using two samples of confirmed planets and positive samples from Kepler, the model was tested on unconfirmed Kepler candidates and confirmed 50 new planets with a wide range of sizes and orbital periods. [3]

Depiction of an exoplanet transit lightcurve

Figure 2: A depiction of an exoplanet transit lightcurve; the Gaussian Process Classifier prioritizes the ingress and egress regions, indicated by the 2 dotted lines, when classifying exoplanets [5].

Although the computational complexity for training the model is higher than that of traditional methods, and certain discrepancies with vespa were found, this approach demonstrates a clear potential for efficient automated techniques to be applied for the classification of future exoplanet candidates, while becoming more accurate with each dataset due to machine learning. In fact, the researchers aim to apply this technique to data from the missions PLATO and TESS, which has already identified over 2,000 potential exoplanet candidates. [9]

Machine Learning and Deep Learning for Galaxy Identification and Classification

Another area of artificial intelligence growing in popularity is image classification and object detection, with common applications for autonomous vehicles and medical imaging. A powerful technique in this field is a convolutional neural network, a form of deep learning roughly based on the functionalities and structure of the human brain. Each layer of the network serves a unique purpose, such as convolution layers for generating feature maps from the image, pooling layers for extracting key features such as edges, dense layers for combining features, and dropout layers that prevent overfitting to the training set. [10]

This method was applied to galaxy classification by researchers at the National Astronomical Observatory of Japan (NAOJ). The Subaru Telescope, an 8.2-meter optical-infrared telescope at Maunakea, Hawaii, serves as a robust source of data and images of galaxies due to its wide coverage, high resolution, and high sensitivity. [11] In fact, earlier this year, astronomers used Subaru Telescope data to train an algorithm to learn theoretical galaxy colors and search for specific spectroscopic signatures, or light frequency combinations. The algorithm was used to identify galaxies in the early stage of formation from data containing over 40 million objects. Through this study, a relatively young galaxy HSC J1631+4426, breaking the previous record for lowest oxygen abundance, was discovered. [12]

In addition, NAOJ researchers have been able to detect nearly 560,000 galaxies in the images and have had access to big data from the Subaru/Hyper Suprime-Cam (HSC) Survey, which contains deeper band images and has a higher spatial resolution than images from the Sloan Digital Sky Survey. Using a convolutional neural network (CNN) with 14 layers, they could classify galaxies as either non-spirals, Z-spirals, or S-spirals. [10]

This application presents several important takeaways for computational astrophysics. The first is the augmentation of data in the training set. Since the number of non-spira

l galaxies was significantly greater than the number of spiral galaxies, the researchers needed more training set images for Z-spiral and S-spiral galaxies. In order to achieve this result without actively acquiring new images from scratch, they flipped, rotated, and rescaled the existing images with Z-spiral and S-spiral galaxies, generating a training set with roughly similar numbers for all types of galaxies.

An example of data augmentation for galaxy images using rotation and flipping

Figure 3: An example of data augmentation for galaxy images using rotation and flipping [10].

Second, it is also important to note that the accuracy levels of AI models may reduce when working with celestial bodies or phenomena that are rare, due to a reduction in the size of the training set. The galaxy classification CNN originally achieved an accuracy of 97.5%, identifying spirals in over 76,000 galaxies in a testing dataset. However, this value decreased to only 90% when the model was trained on a set with fewer than 100 images per galaxy type, demonstrating the potential for concerns if more rare galaxy types were to be used.

A final important takeaway is regarding the impact of misclassification and differences between the training dataset and the testing dataset. When applying the model to the testing set of galaxy images to classify, the model found roughly equal numbers of S-spirals and Z-spirals. This contrasted with the training set, in which S-spiral galaxies were more common. Although this may appear concerning, as one would expect the distribution of galaxy types to remain consistent, the training set may have not been representative, likely due to human selection and visual inspection bias. In addition, the authors point out that the criterion of what constitutes a clear spiral is ambiguous, and that the training set images were classified by human eye. As a result, while the training set only included images that had unambiguous spirals; the validation set may have included more ambiguous cases, causing the model to incorrectly classify them.

Several strategies can be used to combat such issues in scientific machine learning research. In terms of datasets, possible options include creating a new, larger training sample or employing numerical simulations to create mock images. On the other hand, a completely different machine learning approach - unsupervised learning - could be used. Unsupervised learning would not require humans to visually classify the training dataset, as the learning model would identify patterns and create classes on its own. [10]

In fact, researchers at the Computational Astrophysics Research Group at the University of Santa Cruz have taken a very similar approach to the task of galaxy classification, focusing on galaxy morphologies, such as amorphous elliptical or spheroidal. Their deep learning framework, named Morpheus, takes in image data by astronomers and uniquely does pixel level classification for various features of the image, allowing it to discern unique objects within the same image rather than merely classifying the image as a whole (like the models used by the NAOJ researchers). A notable benefit of this approach is that Morpheus can discover galaxies by itself and would not require as much visual inspection or human involvement, which can be fairly high for traditional deep learning approaches - the NAOJ researchers worked with a dataset that required nearly 100,000 volunteers. [13] This is crucial, given that Morpehus could be used to analyze very large surveys, such as the Legacy Survey of Space and Time, which would capture over 800 panoramic images per night. [13]

Examples of a Hubble Space Telescope Image and its classification results using Morpheus

Figures 4 and 5: Examples of a Hubble Space Telescope Image and its classification results using Morpheus [13].

Supercomputing for Analyzing Hypernovae and Neutron Star Mergers

Given the data-intensive nature of this endeavor as well as the need for intensive pixel-level classification, it is natural to wonder how scientists are able to run such algorithms and programs in the first place. The answer often lies in supercomputing, or high performance computing (HPC). Often Supercomputers often involve interconnected nodes that can communicate, use a technique called parallel processing to solve multiple computational problems via multiple CPUs or GPUs, and can rapidly input and output data. [14] This makes them prime candidates for mathematical modeling of complex systems, data mining and analysis, and performing operations on matrices and vectors, which are ubiquitous when using computing to solve problems in physics and astronomy. [15]

The robust nature of supercomputing was recently seen, as researchers from the Academia Sinica’s Institute of Astronomy and Astrophysics used the supercomputer at the NAOJ to simulate a hypernova, which is potentially 100 times more energetic than a supernova, resulting from the collapse of a highly massive star. The program simulated timescales nearly an order of magnitude higher than earlier simulations, requiring significantly higher amounts of computational power while allowing researchers to analyze the exploding star 300 days after the start of the explosion. [16] However, this was indeed beneficial, as the longer timescale enabled assessment of the decay of nickel-56. This element is created in large amounts by pair-instability supernovae (in which no neutron star or black hole is left behind) and is responsible for the visible light that enables us to observe supernovae. Moreover, we cannot underestimate the importance of simulations, as astronomers cannot rely on observations given the rarity of hypernovae in the real world. [17]

Figure 6 (left): A 3-D visualization of a pair-instability supernova, in which nickel-56 decays in the orange area [17].

Figure 7 (right): ATERUI II, the 1005-node Cray XC50 system for supercomputing at the Center for Computational Astrophysics at the NAOJ [16].

Supercomputers have also been used for simulating collisions between 2 neutron stars of significantly different masses, revealing that electromagnetic radiation can result in addition to gravitational waves. [18] Once again, we can see the usefulness of computational simulations when real observations do not suffice. In 2019, LIGO researchers detected a neutron star merger with 2 unequal masses but were unable to detect any signal of electromagnetic radiation. Now, with the simulated signature, astronomers may be capable of detecting paired signals that indicate unequal neutron star mergers. In order to conduct the simulations using the Bridges and Comet platforms, researchers used nearly 500 computing cores and 100 times as much memory as typical astrophysics simulations due to the number of physical quantities involved. [19] Despite the tremendous need for speed, flexibility, and memory, supercomputers prove an essential tool in modeling the intricacies of our multifaceted universe.


Undoubtedly, scientific discovery is at the essence of humankind, as our curiosity drives us to better understand and adapt to the natural and physical world we live in. In order to access scientific discovery, we must have the necessary tools, especially as the questions we ask are becoming more complex and data is becoming more ubiquitous. Outer space continues to feature so many questions left to answer, yet with profound implications for humankind. The overarching, large-scale nature of the physical processes that govern celestial bodies begs for further research and analysis to learn more about unknown parts of the universe. Yet, we are now better equipped than ever to tackle these questions. We can find trends in the seemingly unpredictable and using logic, algorithms, and data through computer programs, creating a toolbox of methods that can revolutionize astronomy and astrophysics research. Ultimately, as we strive to construct a world view of how the universe functions, we will be able to make the most of large portions of data from a variety of research institutions while fostering collaboration and connected efforts by citizens, scientists, and governments worldwide.


  1. Zhang, Y., & Zhao, Y. (2015). Astronomy in the Big Data Era. Data Science Journal, 14(0), 11. doi:10.5334/dsj-2015-011

  2. Sumner, T. (2019, June 26). The first AI universe sim is fast and accurate-and its creators don't know how it works. Retrieved November 25, 2020, from

  3. Armstrong, D. J., Gamper, J., & Damoulas, T. (2020). Exoplanet Validation with Machine Learning: 50 new validated Kepler planets. Monthly Notices of the Royal Astronomical Society. doi:10.1093/mnras/staa2498

  4. S. T. Bryson, M. Abdul-Masih, N. Batalha, C. Burke, D. Caldwell, K. Colon, J. Coughlin, G. Esquerdo, M. Haas, C. Henze, D. Huber, D. Latham, T. Morton, G. Romine, J. Rowe, S. Thompson, A. Wolfgang, 2015, The Kepler Certified False Positive Table, KSCI-19093-003

  5. Staff, S. (2020, August 25). 50 new planets confirmed in machine learning first. Retrieved November 25, 2020, from


  7. Shallue, C. J., & Vanderburg, A. (2018). Identifying Exoplanets with Deep Learning: A Five-planet Resonant Chain around Kepler-80 and an Eighth Planet around Kepler-90. The Astronomical Journal, 155(2), 94.

  8. 1.7. Gaussian Processes — scikit-learn 0.23.2 documentation. (2020). Scikit-Learn.Org.

  9. Yeung, J., & Center/NASA, D. (2020, August 26). Artificial intelligence identifies 50 new planets from old NASA data. Retrieved November 25, 2020, from

  10. Tadaki, K.-, Iye, M., Fukumoto, H., Hayashi, M., Rusu, C. E., Shimakawa, R., & Tosaki, T. (2020). Spin parity of spiral galaxies II: a catalogue of 80 k spiral galaxies using big data from the Subaru Hyper Suprime-Cam survey and deep learning. Monthly Notices of the Royal Astronomical Society, 496(4), 4276–4286.

  11. Overview of Subaru Telescope: About the Subaru Telescope: Subaru Telescope. (n.d.). Retrieved November 25, 2020, from

  12. Kojima, T., Ouchi, M., Rauch, M., Ono, Y., Nakajima, K., Isobe, Y., Fujimoto, S., Harikane, Y., Hashimoto, T., Hayashi, M., Komiyama, Y., Kusakabe, H., Kim, J. H., Lee, C.-H., Mukae, S., Nagao, T., Onodera, M., Shibuya, T., Sugahara, Y., … Yabe, K. (2020). Extremely Metal-poor Representatives Explored by the Subaru Survey (EMPRESS). I. A Successful Machine-learning Selection of Metal-poor Galaxies and the Discovery of a Galaxy with M* < 106 M ⊙ and 0.016 Z ⊙. The Astrophysical Journal, 898(2), 142.

  13. Stephens, T. (2020). Powerful new AI technique detects and classifies galaxies in astronomy image data. Retrieved November 25, 2020, from

  14. Hosch, W. L. (2019, November 28). Supercomputer. Retrieved November 25, 2020, from

  15. HPC Basics Series: What is Supercomputing? (2019, March 11). Retrieved November 25, 2020, from

  16. Peckham, O. (2020, July 24). Supercomputer Simulations Delve Into Ultra-Powerful Hypernovae. Retrieved November 25, 2020, from

  17. Gough, E. (2020, July 21). Supercomputer Simulation Shows a Supernova 300 Days After it Explodes. Retrieved November 25, 2020, from

  18. C., H. (2020, September 25). Scientists May Have Developed New Way to Detect 'Invisible' Black Holes. Retrieved November 25, 2020, from

Penn State. (2020, August 3). Unequal neutron-star mergers create unique 'bang' in simulations. ScienceDaily. Retrieved November 24, 2020 from

98 views0 comments

Recent Posts

See All


bottom of page