Parallel region division algorithm for automatic m

  • Detail

Parallel region division algorithm for automatic generation of finite element lattices

Abstract: a parallel region division algorithm based on lattice generation recursive method is proposed. The algorithm uses iterative decomposition method to divide regions in parallel according to the estimation and analysis of lattice generation cost. The running results on Dawning 1000A system show that the efficiency and speedup ratio of the lattice algorithm are better than the serial recursive algorithm

key words: finite element lattice; Parallel region partition algorithm; Lattice generation cost; Iterative decomposition method

based on the recursive method of network generation [1 ~ 3], this paper proposes a parallel region division algorithm, which meets the following four basic principles: a. the principle of task balance. A balanced set of sub regions can be generated, that is, the time to generate lattices in each sub region is approximately the same. b. Boundary minimalism principle. The boundary structure of the sub region is simple, the boundary processing time is short, and the cost of message transmission between processors is low. c. The principle of uniform lattice. The final lattice generated in parallel has uniform shape and no singular elements. d. The principle of minimum cost for regional division. The cost of region partition algorithm is as small as possible

1. Basic idea and related algorithms

in the lattice generation recursion method, if each sub region contains the same number of units, it is easier to achieve task balance. Therefore, to avoid the appearance flow trace caused by uneven thickness, first estimate the lattice generation cost of the region to be processed according to the number of units, then decompose the region according to the number of processors currently participating in parallel processing n, and process the boundaries of the decomposed sub regions, and finally obtain n parallel sub tasks that are balanced and independent of each other

1.1 lattice generation cost estimation algorithm

lattice generation cost is closely related to the number of cells distributed in the region to be processed, and the number of cells is determined by the total area s of the region and the cell distribution density in the region. The estimation formula is as follows:

g=s/stri, (1)

stri= [l2/(2M2)] sin60 °, (2)

m= Σ 2li/(di1+di2) (i=1, 2,..., n), (3)

where g represents the lattice generation cost of a three-dimensional plane region (containing N edges); S is the area of the area; Stri is a rough estimate of the average area of a single triangular element in the region; L is the perimeter of the area; M represents the total number of discrete segments on the boundary of the region; Li is the length of the ith edge; Di1 and di2 respectively represent the cell side length at the front end point and the rear end point of the ith side

according to the above formula, the estimation algorithm steps of lattice generation cost are as follows

step 1: successively calculate the length of each boundary and the number of discrete segments of the region, and calculate stri according to equations (2) and (3)

in step 2, apply the triangle accumulation algorithm to calculate s, and obtain g according to equation (1). The steps of triangle accumulation algorithm are as follows: A. set the variable s to 0; b. Select any vertex of the region as V1, and take the two points adjacent to V1 as V2 and V3 in a clockwise (or counterclockwise) direction; c. Calculate the area of the triangle composed of V1, V2 and V3, and add the variable s; d. Take V3 as the new V2, and take the first point adjacent to V2 as the new V3 in the clockwise (or counterclockwise) direction. If v3=v1, the algorithm stops; Otherwise, turn to C

1.2 region division algorithm

after estimating the lattice generation cost of the whole region, the task of the region division algorithm is to find n-1 segmentation lines and divide the region into n sub regions, so that the lattice generation cost of each sub region is roughly the same. 7. It has a number of patented technologies (zl.3; zl.8), etc. Since the number of cells in the sub region cannot be accurately determined, the lattice generation cost of each sub region is allowed to have a δ The error, that is, the lattice generation cost of the generated sub region, belongs to [g/n- δ, G/N+ δ] Interval

according to Principle B, the two ends of any division line are located on the boundary of the region, not in the region. In Fig. 1 (a), any subdomain is adjacent to other subdomains only through a complete division line; In Fig. 1 (b), the adjacent relationship between subdomain 3 and subdomains 1 and 2 is not convenient for boundary processing

(a) (b)

Fig. 1 Effect of region division

according to Principle C, multiple division lines do not intersect at the same point, because this division will lead to pole phenomenon when n is large - long and thin elements with small internal angles appear at the intersection of multiple division lines, affecting the lattice shape

in this paper, the iterative decomposition method is used to divide the region, and the steps are as follows

in step 1, the area to be decomposed is D

step 2 circulates the following steps n-2 times: A. find a dividing line in D to divide the region into two subdomains D1 and D2, so that the lattice generation cost of D1 is about equal to g/n; b. Remove D1 from D, take D2 as the new D, and turn to step a

step 3 find a dividing line in D to divide the region into two subdomains D1 and D2, so that the lattice generation costs of D1 and D2 belong to [g/n- δ, G/N+ δ]。

in this algorithm, the method to find the dividing line is: first find a suitable vertex in the region as the front end point X of the dividing line, and then start from this vertex and find the back end point y of the dividing line on the boundary of the region by the half search method in a clockwise (or counterclockwise) direction. The steps of the half search algorithm are as follows:

step 1 starts from point X and takes the two points adjacent to X as R and Y in a clockwise (or counterclockwise) direction

step 2: calculate the lattice generation cost g of the subdomain on the left (or right) side of the division line XY

step 3 if G ∈ [g/n- δ, G/N+ δ], The algorithm stops; If G> g/n+ δ, Then let t=y, take the midpoint of T and R as the new y, and turn to step 2; If G R, change X

Figure 2 is a schematic diagram of division of ABCDEF for 6 times. In Figure 2 (a), the value of R is 2. When looking for the fifth division line, because ∠ Chi is larger than ∠ HIE, X should have been positioned at point h. However, since two division lines have passed through point h, if h is also selected as the front point this time, the co location rate r increases to 3, exceeding the predefined value of R. therefore, point I is selected as X; In Figure 2 (b), since R is taken as 3, h can still be used as the front point of the fifth division line

(a) r=2 (b) r=3

Figure 2 common point control

the characteristic of the region division algorithm is that the load balance degree of parallel tasks is determined by δ Adjustment; All the segmentation points are located on the boundary of the region, because only one edge of the remaining sub domain is a newly added segmentation line in the segmentation process. No matter which endpoint of the edge is selected as X, X and Y always fall on the boundary of the region, effectively meeting the boundary simplification principle,; By using R to flexibly control the common point ratio of the dividing lines, the poles are eliminated and the uniformity of the lattice is ensured

2. Parallel implementation

the parallel region partition algorithm studied in this paper has been implemented on Dawning 1000A system. According to the current conditions and characteristics of Dawning 1000A system, PVM is selected as the parallel programming environment, and the parallel generation of finite element lattice is realized in three steps: first, the node number N in the current PVM is detected by a common lead screw master task program on node1, and the region to be processed is decomposed into n sub regions by using the region division algorithm, and each sub region is allocated to each node; Then, the slave task programs on each processor are executed in parallel to generate uniform lattices in each sub region; Finally, the master task program on node1 collects and assembles the subarea lattices generated on each processor. Here, a dynamic load balancing strategy farm mode is adopted, but one thing is different from the usual situation: in order to improve processor utilization, node1 is also divided into a sub region during sub region allocation. Therefore, after data allocation, the main processor is not idle until the sub region lattice is returned

since the boundary has been pretreated in the parallel division phase, once the sub area is allocated to each processor, the support of all processing policies is a favorable guarantee for the development of the recycled plastic granulator. At the same time, the cell will be generated independently on its own data sub block according to the predefined cell distribution density. After approximately the same period of time, they will complete their work and transmit the generated sub area grid to the master task program. The communication of the whole parallel algorithm is very small. It only includes broadcasting sub region data from node1 to other nodes, and each node sends back the generated sub region cell to node1

3. Experimental results

Table 1 illustrates the operation effect of the above parallel algorithm on Dawning 1000A system with examples (the data inside and outside the brackets in the table correspond to example 1 and example 2 respectively). In example 1, δ= G/(5N (n-1)), g=15987, and the total number of units actually generated is 12531; In example 2, δ= G/(5N (n-1)), g = 33452, and the total number of units actually generated is 28769. Because the time spent in zoning and assembling data is μ Therefore, the execution time (T) of the parallel algorithm is divided into two parts: sub region lattice generation time (T1) and communication time (T2). T1 refers to the time when each node generates a lattice in its assigned sub region; T2 includes the time when node1 sends sub area data to other nodes and the time when each node sends back sub area grid data to node1. Table 1 shows the time cost and acceleration ratio of parallel lattice generation using 2, 3, 4, 5, and 6 nodes( ν)。 When a processor is used, neither area division nor communication is required. Therefore, the time generated in the area is the total running time. Table 1 execution time and acceleration ratio of parallel algorithm

number of processors t1/st2/st/s ν 11297 (9770) 1297 (9770) 1.00 (1.00) 2172 (1138) 5 (35) 177 (1173) 7.33 (8.33) 375 (570) 5 (35) 80 (605) 16.21 (16.15) 440 (200) 5 (35) 45 (235) 28.82 (41.57) 520 (152) 5 (35) 25 (187) 51.88 (52.25) 614 (100) 5 (35) 19 (135) 68.26 (72.37)

Table 2 gives the parallel algorithm in example 1 the estimated value (n) and actual value (n) of the total number of cells generated on each node when running on six nodes, And the time (T3) at which the lattice is generated in the sub region. It can be seen from table 2 that the estimated value of the total number of cells in each sub region is consistent with the proportion distribution of the actual value. The difference in cell generation time between nodes is less than 5%, achieving a good load balance. It is worth noting that the lattice generation time is not always proportional to the number of lattice cells, because the time cost of lattice generation is not only affected by the number of lattice cells in the sub region, but also related to the distribution of lattice cells in the sub region. Table 2 load distribution table

processor No. n/n /t3/st2/s12 7362 49512.310.7522 6321 93312.740.7532 6702 03112.180.7542 592208912.910.7552 7672 07312.290.7562 5332 04513.200.75


1 facello m A. implementation of a randomized algorithm for delaunery and regular triangulations in thr ee dimensions Computer Aided Geometric Design, 1995 (12): 349~370

2 Chan C T, Anastasion K. An Automatic Tetrahedral Mesh Generation Scheme by the Advancing Front Method. Communications in Numerical Methods in Engineering, 1997, 13: 33~46

3 Zeng Yong, Cheng Gengdong. Knowledge-Based Free Mesh Generation of Quadrilateral Elements in Two-Dimensional Domains. Microcomputers in Civil Engineering, 1993 (8): 259~270(end)

Copyright © 2011 JIN SHI