Strong scalability of our single GPU code with different mesh sizes. The black squares show the average wall-time per 100 time steps, and the number of elements varies from 800 to 57, 920

X

Download： Full Size Img PowerPoint

Strong scalability of our single GPU code with different mesh sizes. The black squares show the average wall-time per 100 time steps, and the number of elements varies from 800 to 57, 920

2013, 26(6): 377-393.

Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI .

Figures of the Article

The flowcharts of the major steps in the reference parallel CPU codes (left) and those in our CPU-GPU hybrid implementation (right). The whole calculation can be separated into three sections: pre-processing, time-stepping, and post-processing. The pre-processing section reads and calculates all the data that the time-stepping section will use. The time-stepping section updates the DOFs of each tetrahedral element according to Eqs. (8)-(10) and has been ported to the GPU. The post-processing section is in charge of writing out the DOFs and/or the seismograms at the pre-specified locations
a The flowchart of the calculations in step (2), the volume contributions. "dudt" is the volume contribution, "Kxi, " "Keta, " "Kzeta" correspond to the stiffness matrices, K_lk^ζ, K_lk^η, K_lk^ζ, in the text. "JacobianDet" is the determinant of the Jacobian |J|. "nElem" is the number of tetrahedral elements in the subdomain, "nDegFr" is the number of DOFs per component per tetrahedral element, "nVar" is the number of components in the governing equation. "AStar, " "BStar, " "CStar" correspond to A*, B*, C* in the text. Code segments for the calculations in the dark-gray box are listed in b and c. b. Baseline implementation of the CUDA kernel for the "left-multiplication" between the time-integrated DOF and the stiffness matrix K_lk^ζ. "Kxi_dense" corresponds to the dense matrix representation of K_lk^ζ, "dgwork" corresponds to the time-integrated DOF, and the result of the multiplication is stored in "temp_DOF." c A segment of the optimized CUDA kernel for the "left-multiplication" between the time-integrated DOF and the stiffness matrix K_lk^ζ. "Kxi_sparse" corresponds to the sparse matrix representation of K_lk^ζ. Meanings of other symbols are identical to those in Fig. b
Single-GPU speedup factors obtained using 7 different meshes and 4 different CPU core numbers. The total number of tetrahedral elements in the 7 meshes is 3, 799, 6, 899, 12, 547, 15, 764, 21, 121, 24, 606, and 29, 335, respectively. The speedup factors were obtained by running the same calculation using our CPU-GPU hybrid code with 1 GPU and using the serial/parallel "SeisSol" CPU code on 1/2/4/8 CPU cores on the same compute node. The black columns represent the speedup of the CPU-GPU hybrid code relative to 1 CPU core, the dark-gray columns represent the speedup relative to 2 CPU cores, the light gray column represents the speedup relative to 4 CPU cores and the lightest gray columns represent the speedup relative to 8 CPU cores
Strong scalability of our single GPU code with different mesh sizes. The black squares show the average wall-time per 100 time steps, and the number of elements varies from 800 to 57, 920
a A perspective view of the 3D geometry of the discretized salt body in the SEG/EAGE salt model. b A two-dimensional cross-section view of the SEG/EAGE salt model along the A-A′ profile (Aminzadeh et al. 1997). The material properties for the different geological structures are listed in Table 1
Speedup factors of our parallel GPU codes obtained using two different mesh sizes. The number of tetrahedral elements used in our experiments are 327, 866, 935, 870. The speed factors were computed for our single-precision multiple-GPUs code with respect to the CPU code running on 16/32/48/64 cores runs on different nodes
Strong scalability of our multiple-GPUs codes with 1.92 million elements, the black line shows the average wall-time per 100 time steps for this size-fixed problem performed by 32-64 GPUs
Weak scalability of our multiple-GPUs code performed by 2-80 GPUs, the black line shows the average wall-time per 100 time steps for these size-varied problems. The average number of elements per GPU is around 53, 000 with about 6% fluctuation
a A perspective view of the 3D Marmousi2 model with dimension of 3, 500 m in depth, 17, 000 m in length, and 7, 000 m in width. There are 379, 039 tetrahedral elements, and each element has its own material property. There are 337 receivers locate at 5 m beneath the surface along the A-A′ (yellow) line, and the horizontal interval is 50 m; the explosive source located at 10.0 m(depth), 7, 500.0 m(length), 3, 500.0 m(width). (b). The plot of the marmousi2 model shot gather, computed by our multiple-GPUs code. Our CUDA code uses 16 GPUs spend 4, 812.64(s) calculat e 5 s seismogram, while the SeisSol runs on 16 CPU cores need 135, 278.50(s), it's a speedup of 28.11×. (Color figure online)
a A perspective view of the simplified SEG/EAGE salt model with dimension of 4, 200 m in depth, 13, 500 m in length and 13, 500 m in width. There are 447, 624 tetrahedral elements and each element has its own material property. There are 192 receivers locate at 5 m beneath the surface along A-A′ line and the horizontal interval is 50 m, the explosive source located at 10.0 m (depth), 7, 060.0 m (length), 4, 740.0 m (width). b The plot of the SEG/EAGE salt model shot gather, computed by our multiple-GPUs code. Our CUDA code uses 16 GPUs spend 7, 938.56(s) calculate 7 s seismogram, while the SeisSol runs on 16 CPU cores need 224, 589.80(s), it's a speedup of 28.29×

Related articles

Erratum to: A review of the wave gradiometry method for seismic imaging

2024, 37(1): 91-91. DOI: 10.1016/j.eqs.2023.12.002
A review of the wave gradiometry method for seismic imaging

2023, 36(3): 254-281. DOI: 10.1016/j.eqs.2023.04.002
Dynamic inversion of the rupture parameters on fault system with complex geometry: A GPU parallel genetic algorithm based on BIEM

2019, 32(5-6): 187-196. DOI: 10.29382/eqs-2019-0187-01
Erratum to: Numerical simulation of seismic wavefields in TTI media using the rotated staggered-grid compact finite-difference scheme

2018, 31(4): 75-75. DOI: 10.29382/eqs-2018-0224-6
Global SH-wave propagation in a 2D whole Moon model using the parallel hybrid PSM/FDM method

2015, 28(3): 163-174. DOI: 10.1007/s11589-015-0121-4
Seismic wave modeling in viscoelastic VTI media using spectral element method

2014, 27(5): 553-565. DOI: 10.1007/s11589-014-0094-8
Spectral-element simulations of elastic wave propagation in exploration and geotechnical applications

2014, 27(2): 179-187. DOI: 10.1007/s11589-014-0069-9
The inversion of density structure by graphic processing unit (GPU) and identification of igneous rocks in Xisha area

2014, 27(1): 117-125. DOI: 10.1007/s11589-014-0063-2
Application of passive source surface-wave method in site engineering seismic survey

2014, 27(1): 101-106. DOI: 10.1007/s11589-014-0065-0
Crustal S-wave velocity structure of the Yellowstone region using a seismic ambient noise method

2013, 26(5): 283-291. DOI: 10.1007/s11589-013-0016-1

Supported by: Beijing Renhe Information Technology Co., Ltd.