人教版小学语文一年级上册第三次月考试题下载_Word模板_3

is_036899

暂无简介

人教版小学语文一年级上册第三次月考试题 Monte Carlo Simulation With The GATE Software Using Grid Computing R. Reuillon D.R.C Hill C. Gouinaud UMR CNRS 6158 - LIMOS BP 10125 - 63177 Aubière - France Tel: +33 (0)473 40 53 68 romain.reuillon@isima.fr Z. El Bitar V. Breton LPC, 24 av. des ...

Monte Carlo Simulation With The GATE Software Using Grid Computing R. Reuillon D.R.C Hill C. Gouinaud UMR CNRS 6158 - LIMOS BP 10125 - 63177 Aubière - France Tel: +33 (0)473 40 53 68 romain.reuillon@isima.fr Z. El Bitar V. Breton LPC, 24 av. des Landais, 63000 Clermont-Ferrand FRANCE Tel:+33 (0) 473 40 72 19 breton@clermont.in2p3.fr I. Buvat UMR S 678 – 91 Bd de l’Hôpital F-75634 PARIS cedex 13 FRANCE Tel:+33 (0)1 53 82 84 00 buvat@imed.jussieu.fr ABSTRACT Monte Carlo simulations needing many replicates to obtain good statistical results can be easily executed in parallel using the “Multiple Replications In Parallel” approach. However, several precautions have to be taken in the generation of the parallel streams of pseudo-random numbers. In this paper, we present the distribution of Monte Carlo simulations performed with the GATE software using local clusters and grid computing. We obtained very convincing results with this large medical application, thanks to the EGEE Grid (Enabling Grid for E- sciencE), achieving in one week computations that could have taken more than 3 years of processing on a single computer. This work has been achieved thanks to a generic object-oriented toolbox called DistMe which we designed to automate this kind of parallelization for Monte Carlo simulations. This toolbox, written in Java is freely available on SourceForge and helped to ensure a rigorous distribution of pseudo-random number streams. It is based on the use of a documented XML format for random numbers generators statuses. Categories and Subject Descriptors G.3. [Probability and Statistics]: Statistical Computing. J.3 [Life and Medical Sciences] Health. I.6.3 [Simulation and Modeling] Application. General Terms Algorithms, Performance, Design, Reliability, Experimentation. Keywords Monte Carlo, Grid computing, GATE simulation. 1. INTRODUCTION Monte Carlo simulations (MCS) are widely used in emission tomography; for protocol optimization, design of processing or data analysis methods, tomographic reconstruction, or tomograph design optimization. GATE [1] is a Monte Carlo simulation tool based on the Geant4 package and dedicated to Single Photon Emission Computed Tomography and Positron Emission Tomography simulations. It was designed to be flexible and precise, thus GATE simulations are computer intensive and cannot be used in a clinical context. This work presents a distributing method and a tool for the parallelization of MCS. This method is then applied to a practical application in image reconstruction using GATE and execution times are given for clusters and the EGEE European grid environment. 2. MATERIAL AND METHODS MCS are commonly considered to be naturally parallel [2]. It is widely assumed that with N processors executing N replicates of a Monte Carlo calculation, the pooled result will achieve a variance N times smaller than a single instance of calculation in the same time [3]. In the next sections we discuss why we changed the default Pseudo-Random Number Generator (PRNG) of the GATE software and will also present how we separate experiments to avoid correlations that could slow down the convergence. 2.1 A. Using GATE with a better Pseudo- Random Number Generator (PRNG) GATE simulations were initially based on the “James Random” algorithm [4], [5] as implemented in the “Class Library for High Energy Physics” (CLHEP) [6]. This generator is 21 years old and has been shown to have poor statistical properties. We checked that it succeeded in only 36 tests out of 122 using the recent and already well-known statistical test battery “TestU01” of L’Ecuyer [7]. We therefore modified GATE to use the Mersenne Twister 19937 [8] as implemented in CLHEP. This generator is recent, has a huge period of 219937 and is equidistributed in 623 dimensions. It passes almost all the tests of the test battery TestU01 and it is fast. 2.2 Parallelization of PRNG For quantitative Monte Carlo simulations the MRIP or “Multiple Replication In Parallel” parallelization approach ([13], [14]), allows a maximum speed up if many replications of the same experiment have to be made in order to obtain a good approximation of the result. However, when parallelizing the underling pseudo-random number generator (RNG), correlations within and between the random numbers streams generated in each processor have to be avoided [2]. Different parallel generation techniques of pseudo random numbers can be found in the following documentation: [17]. In the “central server approach”, a central RNG generator provides numbers for all simulation jobs. This approach is the natural one but doesn’t fulfill the requirements for a good parallel RNG [18] and creates a bottleneck that slows down the distributed simulation. The “sequence splitting” or “blocking” consists in splitting the RNG cycle into non-overlapping contiguous sections [19]. This Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. NOTERE 2008, June 23-27, 2008, Lyon, France. Copyright 2008 ACM 978-1-59593-937-1/08/0003…$5.00. in 2p 3- 00 36 73 74 , v er sio n 1 - 1 2 M ar 2 00 9 Author manuscript, published in "8ème Conférence Internationale sur les NOuvelles TEchnologies de la REpartition, NOTERE 2008, Lyon : France (2008)" hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 technique must be used with caution because long range correlation in the parallelized generator might become short range inter-sequences correlations. Instead of unrolling the generator, one might consider to randomly generate states of the pseudo- random number generator (a status is archiving a precise state). The average minimal distance between n statuses should be in this case 1/n2 times the distributed generator period [20]. This is possible using a cryptographic generator or a hash function to generate the statuses. The distribution of the Mersenne Twister 19937 algorithm is achieved that way in the library SPRNG [3]. The “leap frog” technique distributes the sequences to the processor like a deck of cards to card players. Each process of the distributed simulation uses 1 number out of n in the original sequence. This last technique requires a generator that allows cycle division [1]. With this technique long range correlations in the original sequence might also become short range inter- sequences correlations if the interval of sampling in the original sequence is not chosen carefully [20]. The “independent sequences” technique produces different cycles of numbers depending on the initial seed. This technique is available for a few generators like some lagged Fibonacci pseudo-random number generators. [19]. This last technique is close to the “parameterization” technique that might be used with some RNG like the Mersenne Twister [16] or linear congruential generator with Mersenne or Sophie-Germain prime moduli [21]. It generates algorithm parameters leading to the generation of highly independent random number streams. Within the current state of the art, we are not able to provide a theoretical proof of independence between pseudo-random number streams. However, various approaches can be tested empirically, implying heavy computation that can be achieved once for many applications under a precise experimental framework. The Mersenne Twister 19937 has a very long period of 219937 drawings. It is an already parameterized version of the generic Mersenne Twister algorithm and it has no efficient cycle division technique available. The Mersenne Twister 19937 generator is well suited to a parallelization using the “sequence splitting” technique. To achieve the non-overlapping condition of the “sequence splitting” parallelization technique, we first estimated the number of random numbers drawn using a simulation job designed to have an average execution time of 12 hours (on an average working node of the European grid). This estimation led to 12 billion drawings per job. Then we generated over 6000 statuses for the Mersenne Twister spaced by 15 billion numbers each. We used a similar approach in [15]. The generated status were archived and converted into a documented XML format, in order to be reused with different implementations of the Mersenne Twister 19937 algorithm. 2.3 Creation of a generic parallelizing tool for MCS We designed and implemented an open source software tool in Java called “DistMe”, which is dedicated to parallelize stochastic simulations. This tool contains a status database and is able to create jobs for various distributed environments independently from the random number generation library. It is based on the intensive use of a documented XML generic format for the pseudo-random number generator statuses [22]. DistMe is fully usable and its sources can be found on the Internet (http://sourceforge.net/projects/distme). With this tool, we could generate GATE jobs for any distributed execution system: basic scripts (using ssh for instance), bags of work for the European Grid using JDL descriptors (the European grid Job Description Language) and more specially “OpenPBS” (using Portable Batch System scripts). A tutorial is available (www.isima.fr/~reuillon) as well as 9000 statuses for the Mersenne Twister 19937 algorithm, spaced of 15 billion drawings each. 2.4 Hardware We could access 650 worker nodes of the EGEE European computing grid (Enabling Grid for E-sciencE, www.eu-egee.org) mainly in France, United Kingdom, Netherlands and Poland. We also had at our disposal two clusters hosted by local research laboratories (the LIMOS/ISIMA cluster composed of 14 bi- processors and the LAMI/IFMA cluster composed of 28 bi- processors managed by an “OpenPBS” system. Each processor of the cluster is an Intel Xeon 3 GHz with hyper-threading. 2.5 Merging of simulation results Each simulation generates two binary output files requiring about 10 megabytes of storage space. Simulation output files produced on the grid were automatically registered and copied on a Storage Element (SE). When all the simulations were completed, a script using grid commands retrieved these files from the SE into a local machine. When the computing was performed on a local cluster, the retrieving of the simulation output files was achieved using a regular and local FTP commands (File Transfer Protocol). The merging of all files was performed using a simple C code and required less than 5 minutes for less than 30 Gigabytes (on a local desktop computer – Xeon 3 GHz with simple SATA disk). 3. RESULTS 3.1 PRNG Parallelization The computation of the 6000 PRNG status is not possible in parallel and took around 80 days on a single node of the ISIMA cluster running at 2.4 GHz. Once the list of status is generated, it is important to test the resulting random number series. Indeed, a good parallel PRNG must behave like several good sequential PRNG. Each sequence was then tested using the statistical tests battery for sequential PRNG TestU01. As shown in Table I, only 2% of sequences failed in more than 5 tests and no sequence failed in more than 10 tests out of the 122 tests of the battery. This calculation has been made on the ISIMA cluster and took 35 days at full cluster load. If the work has been made on a single machine it would have taken around 3 years. 3.2 Simulation execution The two local clusters achieved 600 jobs, which all succeeded and 2300 jobs were executed on the EGEE European grid, (with 1811 usable results). Fig. 1 shows how the jobs were executed on the different calculation units. The IFMA cluster hosted 400 jobs, 200 jobs ran on the ISIMA cluster, 200 on the Polish worker nodes, 499 on the Dutch ones, 922 in England and 190 in France. A variance study on the final results showed that after the execution of 2000 jobs a convergence was reached. The curve on Fig. 2 shows an asymptotic behavior around 2000 jobs. Executing more jobs was then un-necessary. 3.3 Distribution of the computing time The time required for the total execution of the simulation on a single sequential computation unit (Intel Xeon 3 GHz) is 906 days/CPU (Central Processing Unit). in 2p 3- 00 36 73 74 , v er sio n 1 - 1 2 M ar 2 00 9 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 The execution time of the distributed simulation on clusters is inversely proportional to the number of processors from which they are composed since the migration time of the jobs is neglected compared to the total execution time. Hence, the gain factor was 84 since the number of local bi-processors is 42 (14 bi- processors on ISIMA cluster; 28 bi-processors on IFMA cluster) resulting in 84 execution units running in parallel. Table 1. Failed tests of the battery testu01 for the 6000 random numbers sequences Figure 1. Repartition of the jobs Figure 2. Variance of the results as a function of the numbers of executed jobs For the grid the problem is a bit trickier. The execution power of the grid is virtually unlimited, supposing that the number of processors available on the grid is always greater than the number of submitted jobs. Furthermore, we may consider that there is no latency time in the grid architecture: no job migration time, no data migration time, and that execution units are homogenous and fast. Hence, the ideal proportional gain factor in time execution is proportional to the number of jobs in which the sequential simulation is distributed. In our case the number of jobs has been arbitrarily set to 1813 jobs. Each job should run during 12 hours on a fast local computer for completion. This mean the total execution time of the simulation is 22356 hours / CPU (Central Processing Unit). To compute the gains in the next paragraph we compute virtual execution times of jobs under certain conditions: taking into account the average execution times of all jobs or only the longest execution time of a job taking into consideration that we have access to a limited part of the grid or consider the grid as able to execute all our jobs in parallel In each case, we compute virtual execution times for our simulation jobs. After that we consider that the virtual jobs are executed perfectly in parallel. By consequence the resulting gain is the total execution time of the simulation divided by the duration of one virtual job. On a real grid, the gain factor in term of execution time is penalized by the latency time affected by the performances of the targeted worker nodes and the migration time in which a job passes through the following states: submitted, waiting, ready and scheduled. Taking into account the latency and the average power of the execution units, we have computed from the execution log files the average execution time among all our simulation jobs on the grid. The average execution time for our simulation is 24.675 hours. This lead to what we have called the average theoretical gain. In our case we obtain a gain of 906. Unfortunately, the number of worker nodes available in our real execution environment was 650. It is greatly inferior to the number of jobs we had to execute. This means, that we could only execute 650 jobs concurrently. This impact negatively on the virtual execution time of our global simulation jobs, increasing it to 70,722 hours by job in average, thus the average practical gain factor is 316. The end-user might be interested in the global execution time of the simulation corresponding to the time between the submission of the jobs and the return of the results from the last job. Supposing a concurrent submission the simulation ends when the last part of the results is returned. From the log files, the longest job with the longest execution time over all jobs is 36 hours. This leads to a minimal theoretical gain of 621. Taking into account the fact we are limited by the number of the worker nodes the virtual length of a jobs increases to 103,2 hours and the minimal practical gain is about 217. The different gain values are summarized hereafter: ideal proportional gain: 1813 average theoretical gain: 906 average practical gain: 316 minimal theoretical gain: 621 minimal practical gain: 217 4. CONCLUSION AND DISCUSSION By distributing the calculation on many execution units our nuclear medicine simulation was achieved in a few days. It would have taken more than three years on a single powerful computer without distributing the simulation using the MRIP approach. We have not repeated this simulation to study the grid and cluster overhead, since we may obtain different execution times with different grid/cluster loads. The simulation results were directly used by scientists working in nuclear medicine [10], [11]. Improvements can be made in the following directions: the different gain factors might be improved using more worker nodes 190; 8% 922; 38% 499; 21% 200; 8% 200; 8% 400; 17% France United Kingdoms Netherlands Poland ISIMA IFMA in 2p 3- 00 36 73 74 , v er sio n 1 - 1 2 M ar 2 00 9 hu 文本高亮工具 hu 文本高亮工具 hu 文本高亮工具 of the grid and optimization techniques for stochastic simulations distribution like the “N out of M” strategy presented in [23]. Furthermore, each random number sequence has been tested individually with the best test battery presently available, but tests have to be done to check that the correlation between the sequences is acceptable using the parallel PRNG tests described in [1] and implemented in SPRNG. It represents a huge amount of calculation and it will be achieved using the internet computing platform BOINC [12]. The use of the DistMe toolbox requires the downloading of the statuses from the internet and a manual operation to insert them in a database on the local computer running DistMe. To simplify this task for the end user, the statuses and the tests results will be published via a central web service and DistMe will gain a transparent access to this web service. Last but not least, it might be interesting to optimize the status generation phase, by combining the sequence splitting technique with highly independent random numbers sequences obtained using a “parameterization” technique and then generating the statuses for each sequence in parallel. A pseudo- random number generation library, “DistRNG”, is being implemented and already allow the use of cutting edge parallelization techniques. 5. ACKNOWLEDGMENTS The authors would like to thank the Auvergne Regional council (France) for its research sponsoring, the EGEE consortium and also ISIMA/LIMOS & IFMA/LAMI laboratories which provided an easy access to their local clusters. 6. REFERENCES [1] S. Jan et al., "GATE: a simulation toolkit for PET and SPECT," Phys. Med. Biol., vol. 49, pp 4543-4561, 2004. [2] M. Mascagni and A. Srinivasan, "Parameterizing parallel multiplicative lagged-Fibonacci generators," Parallel Computing, vol. 30, 2004, pp. 899-916. [3] M. Mascagni, D. Ceperley and A. Srinivasan, "SPRNG: a scalable library for p

本文档为【人教版小学语文一年级上册第三次月考试题】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

人教版小学语文一年级上册第三次月考试题

热门搜索

历史搜索