
Citation: | Weitao Wang, Baoshan Wang, Xiufen Zheng (2018). Public cloud computing for seismological research: Calculating large-scale noise cross-correlations using ALIYUN. Earthq Sci 31(5-6): 227-233. DOI: 10.29382/eqs-2018-0227-2 |
Seismic networks with long operational time (Large-T) or/and dense stations (Large-N) provide massive data in terms of both time and space for seismology studies. Seismological studies are facing the era of big data. For instance, the continuous waveforms records in China, from both permanent stations and mobile stations (Figure 1), have exceeded 800 TB and are continuously increasing. The massive data provide an unprecedented opportunity to investigate the structure and evolution of the subsurface, while requiring higher computation power for data processing. Modern seismology is deeply involved in advanced computing techniques, such as cloud computing and GPU computing (Zinno et al., 2015; Magana-Zook et al., 2016; Zeng and Thurber, 2016; Fichtner et al., 2017).
During the past decade, noise cross-correlation has become an emerging technique turning random seismic noise into deterministic signals (Lobkis and Weaver, 2001; Shapiro et al., 2005). By cross-correlating continuous records of two seismic stations, the impulse response between the stations could be reconstructed, as though one of them was a source. One noise cross-correlation function (NCF) is obtained between any two stations, providing dense path coverage for seismic tomography. Moreover, the NCFs between fixed stations can also be used to monitor time-lapse velocity variations (Brenguier et al., 2008; Liu et al., 2014). Thus, NCFs are extensively used to study the structure and evolution of the Earth (e.g. Campillo and Roux, 2015 and references therein).
Ambient noise cross-correlation relies on continuous seismic records. Therefore, it benefits from a large amount of data and suffers most from computation challenges. The scale of NCF calculation is controlled by the number of inter-station paths and the duration of records. The number of paths grows with the squared number of stations (i.e. K×(K–1)/2 paths for K stations). The NCFs are typically calculated daily and stacked over a time duration T to obtain stable results. The entire computation cost and storage requirement are proportional to N2T. They are further modulated by the input sampling rate, which is usually one sample per second for most studies.
With the development of NCF-based studies, there are several major factors that result in a higher requirement for computation power, such as a large number of stations, calculating nine component NCFs, interest in high-frequency NCFs, and development of computation-intensive cross-correlation algorithms (Schimmel et al., 2011; Savage et al., 2013; Lin et al., 2014).
The number and duration of stations are the most important factor posing higher loads in NCF calculation. Both the time cost and storage requirement are high when long time records from a large number of stations are used. For example, calculating the Z-Z component NCFs using five years of records from 970 permanent stations (Figure 1) would produce 858 million daily NCFs for 469,965 paths (Table 1). This requires 25 TB to store daily NCFs and would take ~2290 hours (three months) using a single server. For tensor NCFs (i.e. nine components with full E/N/Z combinations) from two years of records, the numbers increase to three billion daily NCFs and 90 TB of storage, and it would take nearly one year to finish (Table 1). The time cost is unaffordable for building NCF databases for further studies, and the storage requirement is heavy even for a small cluster. New facilities and methods are necessary to accelerate the calculation.
Dataset | NSTA | NPATH | NDAYS | Type | Insize number/ Storage size | Outsize number/ Storage size | ESTTIME (h) | NVS number | BCSTIME (h) |
CN | 970 | 469,965 | 1826 | Z-Z | 1,771,220/1.22TB | 858,156,090/25 TB | 2290 | 300 | 10.5 |
CN | 970 | 469,965 | 730 | Tensor | 2,124,300/1.46TB | 3,087,670,050/92 TB | 8233 | 960 | 11.2 |
X1 | 350 | 61,075 | 945 | Tensor | 992,250/700 GB | 519,442,875/15 TB | 1385 | 186 | 9.8 |
X2 | 674 | 226,801 | 730 | Tensor | 1,476,060/1.0 TB | 1,490,082,570/44 TB | 3973 | 480 | 10.2 |
X1GSN | 712 | 120,012 | 945 | Z-Z | 589,680/415 GB | 113,411,340/3.4 TB | 302 | 31 | 10.3 |
Note:1. The size of single daily input is 740 KB and size of a 2-hour-long daily NCF is 32 KB. 2. Calculating daily NCFs for 100,000 paths of 30 days requires approximately 8 hour using a Linux server (Intel Xeon E5-2620 V3 *2,64 GB RAM). It was used for the entire time cost estimation. 3. Abbrs. CN = ChinaNet, NSTA = Number of stations, NPATHY = Number of paths, NDAYS = Number of days, ESTTIME = Estimated time, NVS = Number of evoked virtual servers, BCSTIME= Time cost of the BCS system. |
Nowadays, cloud computing is becoming an infrastructure with both commercial and scientific applications (Hashem et al., 2015). Cloud computing is suitable to process massive data and has a few applications in seismology studies. The Hadoop framework was used for qualifying large-scale seismic waveforms (Magana-Zook et al., 2016). Cloud computing was also used for data intensive task for both seismic and InSAR data (Heilmann et al., 2013; Zinno et al., 2015). Instead of building cloud system upon local clusters, the public cloud services were also explored (Chen et al., 2013, 2016). Compared to the local implementation, the public cloud services could provide scalable computing and storage services with low cost, it is necessary to explore its applications in processing large-scale seismic data.
In this study, we propose one framework to calculate large-scale NCF calculation using public cloud services from ALIYUN. Using the framework, five NCF databases are built and the results show that, compared with single server approach, the computation can be speeded up over two orders of magnitude depending on the configurations. In the following sections, the design and performance of the cloud-based system are introduced and preliminary results on retrieving global body wave phases using massive NCFs are also demonstrated.
To meet the requirements for large-scale NCF calculation, the Object Storage Service (OSS) and Batch Computing Service (BCS) from ALIYUN are used for data storage and computation, respectively. The OSS provides distributed large storage for input and output files. The BCS can provide hundreds to thousands of virtual servers (VSs) on user’s demand. They offer scalable computation power for calculating and stacking millions of daily NCFs.
The flowchart of the framework is demonstrated in Figure 2. It consists of the local server part and cloud end. The configuration and data pre-processing are performed using the local server and the calculation is conducted at BCS. Several utilities are provided to factorize the entire computation for parallelization, as described in the followings.
Data pre-processing was performed using the local server following Bensen et al. (2007). For each component, daily 100-Hz sampled seismograms were first decimated to 1 Hz, and the mean, trend and instrumental responses were removed. They were then band-pass filtered between 15 and 50 s and smoothed using a 120-s time window and running the absolute value method (Bensen et al., 2007) as weight functions highlighting surface waves from earthquakes. The weighted functions were used to inversely weight the decimated seismograms as temporal normalization. The normalized seismograms were Fourier transformed to spectra and whitened between 0.005 and 0.45 Hz by dividing the smoothed spectral amplitudes. For the Z-Z component NCFs, only the vertical records were processed. For tensor NCFs, for each day, uniform functions were used for both temporal normalization and spectral whitening. The uniform temporal weight function was built by taking the maximum of the three component weight functions at each time. The spectral weight function was constructed using three component maximum amplitudes for each frequency. Afterwards, all three components were simultaneously normalized using uniform weight functions. This normalization schedule can maintain the relative amplitude ratio for tensor NCFs (Lin et al., 2014; Ma et al., 2016).
After processing using the local server, the daily spectra were uploaded to the OSS. The cross correlations were calculated through convolution of the spectra. Using pre-calculated spectra instead of seismograms is optimal for a large-scale dataset as this avoids repeat spectrum calculation at one station. After uploading the spectra to the OSS, the calculation was performed at BCS. The OSS acts as storage pool for both input and output.
The basic principle of accelerating the NCF calculation is factorizing the entire computation into small pieces and running each piece in one VS evoked by the BCS. The smallest element is calculating one daily NCF using two daily spectra. The daily NCFs are subsequently stacked to obtain reference NCFs. Thus, the number of paths and processing days are the two key parameters for parallelization. To factorize the whole computation, the entire path is split into small path groups and the entire time duration is divided into small time segments, as shown in Figure 3. Suppose the number of groups is M and the number of time segments is N, then M tasks are performed on BCS using N virtual servers evoked for each task. Each task will calculate the NCFs for the corresponding group Gm and each of the N-evoked servers for that task will address the corresponding time segment Tn. The daily NCFs associated with group Gm and time Tn will be calculated and stacked over Tn on one VS. By such means, the entire calculation is easily parallelized to conduct on M×N VSs.
For a given station profile, the paths are generated and split on the local server. Afterwards, the path group, entire time duration, and desired number of VS are sent to the BCS to evoke the proper VSs. These parameters are handled by the dispatcher at the BCS (Figure 2). Each evoked VS has one unique index, which is used to obtain the corresponding time segment Tn. During the computation, each VS will acquire data for its own Gm and Tn and transfer them to its local disk for best performance. The commands to calculate and stack daily NCFs for Gm and Tn will be generated and executed using each VS. These stacked NCFs will be archived and transferred back to the OSS, which are ready for subsequent processing, such as dispersion measurement and waveform analysis.
To evaluate the system performance, we have analyzed the averaged time cost for each main step, e.g. data transferring, daily NCFs calculating, stacking and archiving. Moreover, to evaluate the scalability, we built five NCF databases with different configurations for various number of stations, time duration and component combinations (i.e. Z-Z component or tensor NCFs). Details of these configurations are listed in Table 1.
Because the scale of computation is proportional to the number of daily NCFs, we used a subset of Z-Z NCFs to evaluate the entire time and output storage requirement. The evaluation was conducted on one Linux server (Intel Xeon E5-2620 V3 ×2, 64GB RAM) and the two-hour long (–3, 600 s to 3, 600 s) Z-Z daily NCFs were calculated for 100,000 paths and 30 days. The calculation took about 8 hours. This is used as the basis to estimate the storage requirement and time cost using one single server.
The calculation of Z-Z NCFs using five years (2011–2015) continuous records from 970 permanent stations was used to evaluate the time cost for each processing step in BCS. We first split the 469,965 paths into five path groups then evoked five tasks with 60 VSs for each task. Finally a total number of 300 VSs were used and each server would calculate and stack daily NCFs for about 90,000 paths of 30 days. The average time costs of each main step are shown in Figure 4. It is clear that calculating the daily NCFs is most time consuming and data archiving and transferring also contribute to the entire computation time. But overall each task can be finished in 10.5 hours after virtual servers have been evoked.
The BCS provides scalable computation power suitable for different calculation scales. Generally speaking, to finish the NCF computation for given path and time duration, the time cost is inversely proportional to the number of evoked VSs without considering time cost contributed by data transferring with OSS. We estimated the calculation of five NCF databases listed in Table 1 to demonstrate the scalability. The number of evoked VSs is chosen to scale up with daily NCFs. On each VS, we calculated and stacked 3,000,000 daily NCFs, and adequate VSs were evoked to meet the scale of the entire computation. The results show that, for such a configuration, the computations are finished in less than 12 hours after evoking the VSs (Figure 5). The average time cost is stable at difference scales, implying the effectiveness of the factorizing scheme and scalable BCS performance.
The NCF is the data product of continuous seismic recordings, while at the same time, it is the valuable data resource for further research. Cloud computing offers a rapid solution to calculate large-scale NCFs, providing necessary data for studies relying on a massive number of NCFs.
Using massive NCFs from dense array, the array interferometry can be applied to retrieve body waves, which are usually weak in NCFs. In array interferometry, the NCFs with similar inter-station distance are stacked into one trace to enhance the weak body waves. It has been used to extract Earth’s core phases and global body waves (Lin et al., 2013; Nishida, 2013; Boué et al., 2013). The retrieved body waves have been utilized to investigate the Earth’s discontinuities (Poli et al., 2012; Feng et al., 2017).
The ChinArray Phase I (blue triangles, Figure 1) was deployed in southwest China during 2011–2013. These 350 mobile stations and 88 permanent broadband stations in that region form a dense large-aperture array with average spacing less than 50 km. We calculated the Z-Z NCFs between these 438 stations and 274 GSN broadband stations and applied array interferometry to retrieve global body waves. Totally 131 millions of daily NCFs for 120,012 paths were calculated and stacked using 31 VSs (Table 1). The stacked NCFs with similar inter-station distances were further bin-stacked to enhance signals arriving later than surface waves. The resultant waveforms with the 50 km bin were band-pass filtered for a period of 20–100 s, as shown in Figure 6a. Clear signals were observed as global body waves, which correlated well with the theoretical travel times (Figure 6b) from the PREM model (Dziewonski and Anderson, 1981). Figure 6c demonstrates the global stacking image from earthquakes (IRIS DMC, 2014). The phases retrieved from the massive number of NCFs also agree well with those from the earthquakes. Although the coda waves and reverberations of major earthquakes are proposed as sources of deep body waves in NCFs, the mechanism and precision of retrieving these phases remain under investigation (Lin et al., 2013; Boue et al., 2013; Snieder and Sens-Schönfelder, 2015; Sens-Schönfelder et al., 2015; Wu et al., 2018), which is beyond the scope of present study. However, our result shows that the cloud-based solution provides an efficient way to calculate massive NCFs, which can serve to evaluate different algorithms on retrieving deep body wave from NCFs. The tensor NCFs at the global scale could also be calculated with current cloud approach and would be a further work.
Rapid increasing of seismological observations requires high performance computing to process massive data parallelly. The parallelization can be achieved traditionally by developing multi-core algorithm with supports of openMP or MPI libraries. Usually, special treatment on the parallel algorithms and fine turning the parameters are necessary to get best performance using clusters. Certain applications can also be accelerated using modern GPU approaches (Zeng and Thurber, 2016). However, the data transferring between host server and GPU needs ad-hoc adjustment to speed the computation, especially for data intensive tasks. The simple solution with no extra efforts of these special treatments exists when the computation scale is mainly controlled by the scale of input and output data, such as the calculation of large-scale NCFs, it can be parallelized at a higher level. By factorizing entire computation into small pieces and running each piece on adequate computers provided by cloud services, the time cost is reduced in a fast and easy way. For such a divide-and-conquer scheme, there are no extra effects revising original algorithm and could migrate to public cloud computing easily. For this reason, we do not compare time costs between our system and those on local cluster, as they belong to different parallelization yet with similar acceleration effects. Moreover, the VSs from cloud services can also provide GPU or multi-processing environment, so the traditional parallelization method and cloud approach can be implemented jointly.
One feature of the BCS is that each VS can execute binary codes or scripts stored in OSS. Once they are transferred to OSS, each evoked VS will have the same processing environment. This feature provides great flexibility for evaluating various algorithms by simply replacing corresponding executables in OSS. This also provides a way to migrate other applications rather than cross-correlations to cloud-based implementations when the problem can be factorized parallelly.
For the NCF calculation with a given configuration, the time cost would be inversely proportional to evoked VSs. In our study, we assigned calculating 100,000 paths for 30 days to each VS, which will finish in less than 12 hours (Table 1). We increase number of VS to scale up with the daily NCFs for heavier computation. As shown in Figure 5, the rate between number of VS and number of daily NCFs are almost constant. The time could be further reduced by evoking more VSs with the scalable could computing services.
A limitation of current design is that all the inputs and outputs are stored in OSS, extra time is needed to upload and download data between local servers and OSS, as indicated in Figure 4 and Table 1. The dominant storage requirement is due to the daily NCFs which are proportional to number of processing days and squared number of stations. Usually the stacked NCFs are used for further processing, the size of stacked NCFs is reduced by dividing stacking duration. For instance, the size of five-year stacked Z-Z NCFs would be 13 GB rather than 25 TB by dividing 1826 days, and 416 GB for monthly stacked NCFs (Table 1). They are easy to download to local server. Moreover, as discussed above, the further processing can also migrate to cloud end easily, so that both the NCF calculation and post processing can be conducted at the cloud end without data exchanging.
In summary, we here present one framework to efficiently calculate large-scale NCFs using public cloud computing. Five NCF databases were built using years of continuous records from most seismic stations in China. The results show that the time costs of producing these databases are reduced from months for single server implementation to less than 12 hours. The calculation can be speeded up by at least two orders of magnitude, depending on the number of evoked VSs. Modern seismology is facing a rapid increase in observations, applying advanced computing techniques to traditional research helps in utilizing massive seismological datasets
First and foremost, the authors wish to thank numerous technicians and scientists both in China and global who have contributed time and efforts to seismic data used in present research. The waveform data of permanent stations are provided by Data Management Center of China National Seismic Network at Institute of Geophysics (SEISDMC, doi:10.11998/SeisDmc/SN; Zheng et al., 2010) and data of China Array are provided by China Seismic Array Data Management Center at Institute of Geophysics, China Earthquake Administration (doi:10.12001/ChinArray.Data; ChinArray, 2006). The waveforms for GSN broadband stations are provided by IRIS DMC. This research was supported by National Key R&D Program of China (No. 2018YFC1503200) and National Natural Science Foundation of China (Nos. 41674061, 41790463 and 41674058). The authors thank for two anonymous reviewers for their suggestions, which improve the manuscript substantially. Many thanks are also for editor Lin Li for her kindly help.
Bensen GD, Ritzwoller MH, Barmin MP, Levshin AL, Lin F, Moschetti MP, Shapiro NM and Yang Y (2007) Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements. Geophys J Int 169: 1 239–1 260 doi: 10.1111/gji.2007.169.issue-3
|
Boue P, Poli P, Campillo M, Pedersen H, Briand X and Roux P (2013) Teleseismic correlations of ambient seismic noise for deep global imaging of the Earth. Geophys J Int 194: 844–848 doi: 10.1093/gji/ggt160
|
Brenguier F, Campillo M, Hadziioannou C, Shapiro NM, Nadeau RM and Larose E (2008) Postseismic relaxation along the San Andreas Fault at Parkfield from continuous seismological bbservations. Science 321(5895): 1 478–1 481 doi: 10.1126/science.1160943
|
Campillo M and Roux P (2015) Crust and lithospheric structure seismic imaging and monitoring with ambient noise correlations, in Treatise on Geophysics. G Schubert (Editor), Vol. 1, Elsevier, Oxford, United Kingdom, pp 391–417
|
Chen P, Lee EJ and Wang L (2013) A cloud-based synthetic seismogram generator implemented using Windows Azure. Earthq Sci 26(5): 321–329 doi: 10.1007/s11589-013-0038-8
|
Chen P, Taylor NJ, Dueker KG, Keifer IS, Wilson AK, McGuffy CL, Novitsky CG, Spears AJ, Holbrook WS (2016) A scalable, Parallel algorithm for Seismic Interferometry of large-N ambient-noise data. Computers and Geosciences 93: 88–95
|
ChinArray (2006) China Seismic Array waveform data, China Earthquake Administration, doi:10.12001/ChinArray. Data
|
Dziewonski AM and Anderson DL (1981) Preliminary reference Earth model. Phys Earth Planet Inter 25: 297–356 doi: 10.1016/0031-9201(81)90046-7
|
Feng JK, Yao HJ, Poli P, Fang LH, Wu Y and Zhang P (2017) Depth variations of 410 km and 660 km discontinuities in eastern North China Craton revealed by ambient noise interferometry. Geophys Res Lett 44: 8 328–8 335 doi: 10.1002/2017GL074263
|
Fichtner A, Ermert L and Gokhberg A (2017) Seismic noise correlation on heterogeneous supercomputers. Seismol Res Lett 88(4): 1 141–1 145 doi: 10.1785/0220170043
|
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A and Khan SU (2015) The rise of " big data” on cloud computing: Review and open research issues. Information Systems 47(C): 98–115
|
Heilmann Z, Deidda G, Satta G, Bonomi E (2013) Real-time imaging and data analysis for shallow seismic data using a cloud-computing portal. Near Surface Geophysics 11(4): 407–421
|
IRIS DMC (2014) Data Services Products: Global stacks of millions of Seismograms. doi: 10.17611/DP/GS.1
|
Lin FC, Tsai VC, Schmandt B, Duputel Z and Zhan ZW (2013) Extracting seismic core phases with array interferometry. Geophys Res Lett 40: 1 049–1 053
|
Lin FC, Tsai VC and Schmandt B (2014) 3-D crustal structure of the western United States: Application of Rayleigh-wave ellipticity extracted from noise cross-correlations. Geophys J Int 198(2): 656–670 doi: 10.1002/grl.50237
|
Liu ZK, Huang JL, Peng ZG and Su JR (2014) Seismic velocity changes in the epicentral region of the 2008 Wenchuan earthquake measured from three component ambient noise correlation techniques. Geophys Res Lett 198(2): 656–670 doi: 10.1093/gji/ggu160
|
Lobkis OI and Weaver RL (2001) On the emergence of the Green's function in the correlations of a diffuse field. J Acous Soc Am 41(1): 37–42 doi: 10.1002/2013GL058682
|
Ma YR, Clayton RW and Li D (2016) Higher-mode ambient-noise Rayleigh waves in sedimentary basins. Geophys J Int 206(3): 1 634–1 644 doi: 10.1121/1.1417528
|
Magana-Zook S, Gaylord JM, Knapp DR and Ruppert SD (2016) Large-scale seismic waveform quality metric calculation using hadoop. Computers and Geosciences 94: 18–30 doi: 10.1093/gji/ggw235
|
Nishida K (2013) Global propagation of body waves revealed by cross-correlation analysis of seismic hum. Geophys Res Lett 40: 1 691–1 696 doi: 10.1002/grl.50269
|
Poli P, Campillo M, Pedersen H and LAPNET Working Group (2012) Body-wave imaging of Earth's mantle discontinuities from ambient seismic noise. Science 338: 1 063–1 065 doi: 10.1126/science.1228194
|
Savage MK, Lin F and Townend J (2013) Ambient noise cross-correlation observations of fundamental and higher-mode Rayleigh wave propagation governed by basement resonance. Geophys Res Lett 40(14): 3 556–3 561 doi: 10.1002/grl.50678
|
Schimmel M, Stutzmann E and Gallart J (2011) Using instantaneous phase coherence for signal extraction from ambient noise data at a local to a global scale. Geophys J Int 184(1): 494–506 doi: 10.1111/gji.2010.184.issue-1
|
Sens-Schönfelder C, Snieder R and Stähler SC (2015) The lack of equipartitioning in global body wave coda. Geophys Res Lett 42(18): 7 483–7 489 doi: 10.1002/2015GL065108
|
Shapiro NM, Campillo M, Stehly L and Ritzwoller MH (2005) High-resolution surface-wave tomography from ambient seismic noise. Science 307: 1 615–1 618 doi: 10.1126/science.1108339
|
Snieder R and Sens-Schönfelder C (2015) Seismic interferometry and stationary phase at caustics. J Geophys Res Solid Earth 120(6): 4 333–4 343 doi: 10.1002/2014JB011792
|
Wu BJ, Xia H, Wang T and Shi XQ (2018) Simulation of core phases from coda interferometry. J Geophys Res Solid Earth 123: 4 983–4 999
|
Zeng XF and Thurber CH (2016) A graphics processing unit implementation for time–frequency phase-weighted stacking. Seismol Res Lett 87(2A): 358–362 doi: 10.1785/0220150192
|
Zheng XF, Yao ZX, Liang JH and Zheng J (2010) The role played and opportunities provided by IGP DMC of China National Seismic Network in Wenchuan earthquake disaster relief and researches. Bull Seismol Soc Am 100(5B): 2 866–2 872 doi: 10.1785/0120090257
|
Zinno I, Elefante S,Luca CD, Manunta M, Lanari R and Casu F (2015) New advances in intensive DInSAR processing through cloud computing environments Geoscience and Remote Sensing Symposium. IEEE, 5 264–5 267
|
1. | Deng, L., Wang, W., Wang, F. et al. Teleseismic body waves extracted from ambient noise cross correlation between F-net and ChinArray phase II. 2022, 2(1): 100068. DOI:10.1016/j.eqrea.2021.100068 | |
2. | Zhang, Z., Yao, H., Wang, W. et al. 3-D Crustal Azimuthal Anisotropy Reveals Multi-Stage Deformation Processes of the Sichuan Basin and Its Adjacent Area, SW China. Journal of Geophysical Research: Solid Earth, 2022, 127(1): e2021JB023289. DOI:10.1029/2021JB023289 | |
3. | Wang, F., Wang, W., Yuan, S. P wave signals in the noise cross-correlation functions of a large-aperture array illustrated by the data of ChinArray Phase II | [大孔径地震台阵噪声互相关函数中体波信号的研究-以ChinArray二期数据为例]. Acta Geophysica Sinica, 2020, 63(9): 3370-3386. DOI:10.6038/cjg2020N0394 | |
4. | Yao, H., Wang, B., Tian, X. et al. Preface to the special issue of dense array seismology. Earthquake Science, 2018, 31(5-6): 225-226. DOI:10.29382/eqs-2018-0225-1 |
Dataset | NSTA | NPATH | NDAYS | Type | Insize number/ Storage size | Outsize number/ Storage size | ESTTIME (h) | NVS number | BCSTIME (h) |
CN | 970 | 469,965 | 1826 | Z-Z | 1,771,220/1.22TB | 858,156,090/25 TB | 2290 | 300 | 10.5 |
CN | 970 | 469,965 | 730 | Tensor | 2,124,300/1.46TB | 3,087,670,050/92 TB | 8233 | 960 | 11.2 |
X1 | 350 | 61,075 | 945 | Tensor | 992,250/700 GB | 519,442,875/15 TB | 1385 | 186 | 9.8 |
X2 | 674 | 226,801 | 730 | Tensor | 1,476,060/1.0 TB | 1,490,082,570/44 TB | 3973 | 480 | 10.2 |
X1GSN | 712 | 120,012 | 945 | Z-Z | 589,680/415 GB | 113,411,340/3.4 TB | 302 | 31 | 10.3 |
Note:1. The size of single daily input is 740 KB and size of a 2-hour-long daily NCF is 32 KB. 2. Calculating daily NCFs for 100,000 paths of 30 days requires approximately 8 hour using a Linux server (Intel Xeon E5-2620 V3 *2,64 GB RAM). It was used for the entire time cost estimation. 3. Abbrs. CN = ChinaNet, NSTA = Number of stations, NPATHY = Number of paths, NDAYS = Number of days, ESTTIME = Estimated time, NVS = Number of evoked virtual servers, BCSTIME= Time cost of the BCS system. |