Contest
1. Introduction
The breakthroughs in generative AI technology have dramatically transformed everyday life, exemplified by the emergence of ChatGPT and the recent Deepseek. In the realm of images, the advent of diffusion models has made high-quality image generation a reality. This has significant implications for the real world, as it allows for creating images in various styles tailored to human needs, with applications in production, education, work, and artistic creation. However, the significance of these advancements is somewhat limited when it comes to scientific data generation. Unlike natural images, generating standalone scientific data presents unique challenges, as such data typically serve specialized purposes and rely heavily on specific scientific contexts and applications. Moreover, the original diffusion models are too large to be placed on resource-limited devices for executions, while the scientific applications are commonly time-critical and require in-situ analyses and decision-making. To address these pressing issues, this competition takes a fundamental Full Waveform Inversion (FWI) task as a vehicle to demonstrate and explore the fundamental needs, major challenges, and potential solutions when AI for Science meets the resource-constrained embedded systems. In particular, we focus on data-driven Seismic FWI, a representative task in Geophysical applications.
Traditional physics-based FWI: Traditional FWI methods aim to reconstruct subsurface velocity models by iteratively minimizing the difference between observed and simulated seismic data, typically using gradient-based optimization methods. The key challenge lies in solving the wave equation, which governs wave propagation through the Earth. While effective, these methods are computationally expensive and sensitive to factors such as the quality of the initial velocity model, noise in the data, and cycle-skipping issues - where the inversion algorithm converges to incorrect solutions due to poor starting models or insufficient low frequency data In recent years, machine learning approaches have been increasingly explored for FWI. Convolutional Neural Networks (CNNs) have shown promise in learning image-to-image mappings from seismic data to velocity models, bypassing the need for iterative solvers.
While data-driven models offer great potential for portable, real-time, and detailed subsurface imaging, they have significant limitations. Unlike computer vision, the subsurface geophysics field is challenged by data scarcity, mainly due to a prevalent culture of non-sharing data. What's worse, in practical applications, the imbalance of data modalities presents a significant challenge. Velocity maps, which are more intuitive and understandable for humans, can be more easily simulated through various physical methods. In contrast, seismic data-critical for understanding subsurface structures is often more difficult and expensive to acquire. This imbalance results in an abundance of velocity maps, while the corresponding seismic data needed to create paired datasets remains limited. Therefore, efficiently generating paired multi-modal data is crucial for achieving accurate and comprehensive subsurface modeling, as it addresses the real-world scarcity of balanced, high-quality datasets.
2. Objective
The goal of the On-Device Multi-modal Generative AI for Science Contest at DAC 2025 is to achieve the best performance in multi-modal (seismic wave and velocity maps) generation quality and latency for the hardware platform from the FWI dataset.
The participants are required to design and implement a working, open-source generative AI algorithm that can automatically generate seismic data and velocity maps simultaneously while being able to be deployed and run on the given platform [1]. We will award prizes to the teams with top comprehensive performances in terms of multi-modal generation quality, and inference latency.
3. Data
The dataset for this contest is the CurveFault-A in OpenFWI datasets which is a collection of seismic data and velocity maps. The dataset is generated from the Seismic FWI simulation, which is a widely used method for subsurface imaging in geophysics. The dataset contains a total of 60000 pairs of seismic data and velocity maps, with each pair representing a unique subsurface model.
4. Platform
Raspberry Pi [1] boards are tiny, incredibly versatile computers that have been put to an increasing number of practical, fun, and diverse uses by hobbyists. This exceptional flexibility has only been increased over the years by manufacturers coming out with a plethora of add-ons like sensors, touchscreens, wireless connectivity modules, and purpose-built cases. This latest main-line generation from the Raspberry Pi Project includes the Raspberry Pi 4 with 8GB of onboard RAM.
The board has 2 \( \times \) micro-HDMI ports (up to 4kp60 supported), a 2-lane MIPI DSI display port, a 2-lane MIPI CSI camera port, a Micro-SD card slot for loading the operating system, and data storage 5V DC via USB-C connector, 5V DC via GPIO header. 2.4 GHz and 5.0 GHz IEEE 802.11ac wireless, Bluetooth 5.0, BLE Gigabit Ethernet and for the memory.
5. Scoring
We will evaluate the submitted algorithm with the scoring metric that evaluates the comprehensive performances in terms of detection performances and practical performances from macro, pairwise quality, and latency. It is defined as follows:
For macro perspective:
-
FID Score: we will compute the FID (Frechet Inception Distance) score [1] of the generated seismic data and velocity maps with respect to the ground truth data. The FID score is a widely used metric for evaluating the quality of generated images, and it measures the distance between the distributions of the generated and real data in feature space. It is expected to be as low as possible, indicating that the generated data is similar to the real data.
We will train a InceptionNet on the dataset, and then compute the FID score of the generated seismic data and velocity maps with respect to the ground truth data.
The normalized FID score will be calculated as follows: $$ FID_v = 1 - \frac{F_{vel} - F_{vmin}}{F_{vmax} - F_{vmin}} $$ $$ FID_s = 1 - \frac{F_{seis} - F_{smin}}{F_{smax} - F_{smin}} $$ where \( F_{seis} \) and \( F_{vel} \) represents the FID score of the generated seismic wave velocity map, and \( FID_v \) and \( FID_s \) means the normalized FID score of the generated seismic wave velocity map. \( F_{vmin} = 1 \) , \( F_{vmax} = 20\) , \( F_{smin} = 1 \), and \( F_{smax} = 20 \).
If the \( F_{seis} \) and \( F_{vel} \) is less than \(1\), the \(FID_v\) and \(FID_s\) will be set to \(1\), and if the \( F_{seis} \) and \( F_{vel} \) is greater than \(20\), the \(FID_v\) and \(FID_s\) will be set to \(0\).
For pairwise quality perspective:
-
SSIM Score: we will compute the SSIM (Structural Similarity Index) score [2] by inputting the generated
seismic data to a pre-trained InversionNet [3] to get the corresponding velocity map, and then compute the
SSIM score.
The SSIM score is a widely used metric for evaluating the quality of generated images, and it measures the
similarity between two images in terms of luminance, contrast, and structure.
It is expected to be as high as possible, indicating that the pairwise quality of the generated seismic data
and velocity map is good.
The SSIM score will be normalized by
$$ SSIM = \frac{S - S_{min}}{S_{max} - S_{min}} $$ where \( S_{min} = 0.8 \) and \( S_{max} = 1\).
If the SSIM score is less than \(0.8\), the SSIM will be set to \(0\).
For latency perspective:
-
Inference Latency: We will run inference 10 times to get the average latency \( Lat \) (in s) on Raspberry Pi 4 for one data sample generation (one seismic wave (5, 1000, 70) and one velocity map (1, 70, 70)).
The average inference latency L will be recorded The latency score will be normalized by
$$ L = 1 - \frac{Lat - L_{min}}{L_{max} - L_{min}} $$ \( L_{min} = 5 (s) \) and \( L_{max} = 100 (s) \) TBD soon.
If the latency is less than \(5s \), the latency will be set to \(1\), and if the latency is greater than \(100s \), the latency will be set to \(0\).
The final score will be calculated as follows:
$$ Score = (\frac{1}{3} \times (\frac{1}{2} \times (FID_{s} + FID_{v})) + \frac{1}{3} \times SSIM + \frac{1}{3} \times L) \times 100 $$
6. Demo
A demo implementation of the multi-modal generative AI algorithm can be found here. Please find the README file for more details, and read the "Important Note" section for implementation requirements.
Due to time constraints, we encourage all teams to focus on optimizing their machine learning algorithms. Therefore, only Python-based solutions will be accepted. The Raspberry Pi 4 running Ubuntu 23.04 (64-bit) with an aarch64 architecture will be used for latency evaluation. Please ensure that your solution is compatible with this environment.
This demo is implemented based on "A Novel Diffusion Model for Pairwise Geoscience Data Generation with Unbalanced Training Dataset" [5].
The score of this demo is calculated as follows:
- \( F_{seis} = 16.2055 \), \( F_{vel} = 15.8805\), and thus the \( FID_{s} = 0.1997 \), \( FID_{v} = 0.2168\).
- \( S = 0.8827 \), and thus the \( SSIM = 0.4137 \).
- \( Lat = 70 \), and thus the \( L = 0.3158 \).
- The final score is calculated as $$ Score = (\frac{1}{3} \times (\frac{1}{2} \times (0.1997 + 0.2168)) + \frac{1}{3} \times 0.4137 + \frac{1}{3} \times 0.3158) \times 100 = 31.26 $$
7. References
[1] Platform. https://www.raspberrypi.com/for-home/.
[2] Heusel, Martin, et al. "Gans trained by a two time-scale update rule converge to a local nash equilibrium." Advances in neural information processing systems 30 (2017).
[3] Wang, Zhou, et al. "Image quality assessment: from error visibility to structural similarity." IEEE transactions on image processing 13.4 (2004): 600-612.
[4] Wu, Yue, and Youzuo Lin. "InversionNet: An efficient and accurate data-driven full waveform inversion." IEEE Transactions on Computational Imaging 6 (2019): 419-433.
[5] Yang, Junhuan, et al. "A Novel Diffusion Model for Pairwise Geoscience Data Generation with Unbalanced Training Dataset." Proceedings of the AAAI Conference on Artificial Intelligence 2025.