JOURNAL OF 
COMPUTATIONAL AND 
APPLIED MATHEMATICS 

ELSEVIER Journal of Computational and Applied Mathematics 103 (1999) 77-92 

Simulation of free surface flows in a distributed 
memory environment 

Jos6 A. Cumina to  a' *, Anton io  Castelo Fi lho a, Mauri l io  Boaventura  b, Muri lo  F. Tom6  c 
alCMSC-USP, P.O. Box 668-13560-970, Sao Carlos, S.P., Brazil 

blBILCE-UNESP, P.O. Box 136-15054-000-S. J. do Rio Preto, S.P., Brazil 
Clnstituto Superior Tbcnico de Lisboa, Av. Rovisco Paes, 1096 Lisboa CODEX, Portugal 

Received 29 October 1997 

Abstract 

A parallel technique, for a distributed memory machine, based on domain decomposition for solving the Navier-Stokes 
equations in cartesian and cylindrical coordinates in two dimensions with free surfaces is described. It is based on the code 
by Tome and McKee (J. Comp. Phys. 110 (1994) 171-186) and Tome (Ph.D. Thesis, University of Strathclyde, Glasgow, 
1993) which in turn is based on the SMAC method by Amsden and Harlow (Report LA-4370, Los Alamos Scientific 
Laboratory, 1971), which solves the Navier-Stokes equations in three steps: The momentum and Poisson equations and 
particle movement. These equations are discretized by explicit and 5-point finite differences. The parallelization is performed 
by splitting the computation domain into vertical panels and assigning each of these panels to a processor. All the 
computation can then be performed using nearest neighbour communication. Test runs comparing the performance of the 
parallel with the serial code, and a discussion of the load balancing question are presented. PVM is used for communication 
between processes. ~) 1999 Elsevier Science B.V. All rights reserved. 

Keywords: Free surface flow; Distributed memory; Parallel technique; Domain decomposition 

1. Introduction 

The marker-and-cell (MAC) method, introduced by Harlow and Welch [8] has been particularly 
designed for the modelling of  fluid flows with free surfaces. Some of  its important features are the 
superposition of  a mesh of  rectangular cells onto the flow domain and the use of  virtual particles 
to represent the existing fluid. The particles move from one cell to another as a function of  flow 
velocities, keeping track of  the fluid location. If  a cell contains a number of  particles it is considered 
to contain fluid, providing the localization of  the free surface. So in the MAC method the flow domain 

* Corresponding author. 

0377-0427/99/S-see front matter (~) 1999 Elsevier Science B.V. All rights reserved. 
PII: S 0 3 7 7 - 0 4 2 7 ( 9 8 ) 0 0 2 4 2 - 8  


78 J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

is split into a number of cells which are labelled to be either empty, full, surface or boundary cells, 
and one has to update the labels as time evolves. 

Amsden and Harlow [1] developed a simplified MAC method (SMAC), where the calculation 
cycle is split into two parts: a provisional velocity field calculation followed by a velocity update 
using a potential function which is the solution of Poisson's equation and ensures mass conservation. 
Tome and McKee [10] developed this technique further by adding a number of specially designed 
devices to deal with general domains and multiple inlets/outlets. They called it GENSMAC. This 
code has been successfully used to simulate different types of two-dimensional free surface flows, 
such as: jet filling of complex shaped cavities, penstock head-gate closure, jet buckling and die 
swell problems, see [10]. 

One of the main drawbacks in the implementation of either SMAC or GENSMAC methods is the 
storage space needed for the virtual particles, which can be quite large for very fine grids or problems 
requiring a fair amount of particles. Furthermore, moving those particles around at every computing 
cycle can take more than half the total computing time. In order to get round this difficulty, Tom6 
et al. [11] presented a new version of the GENSMAC method in which only the storage and move- 
ment of the particles sitting on the free surface is required. This brought a substantial improvement 
in both the storage space used by the code and the total computing time. This new version of GENS- 
MAC includes also an extension which permits the simulation of three-dimensional axisymmetric 
flows. Axisymmetric problems are quite common in practical applications mainly in the chemical 
and food industries for the filling of molds. 

As already mentioned, cutting down the number of particles representing the fluid was a big 
improvement on the previous methodologies, but for some problems it is not yet enough to bring the 
computing times within reasonable bounds. This may be achieved by employing parallel computation. 
The parallel algorithm to be described in this work is based on domain decomposition techniques 
and it will be implemented for both: the case of particles on the whole domain and on the free 
surface only. The reason for implementing both techniques lies in the fact that the free surface 
version is so involved with regard to the use of complex data structures, that we need to make sure 
its parallel version would keep the computing times gain presented by the serial version. 

In this work we describe the method GENSMAC and the parallel algorithm which implements 
the domain decomposition technique. Both versions of the GENSMAC code with particles on the 
entire domain and on the free surface, will be considered. Issues like load balancing and efficiency 
will be addressed. Two numerical tests will be presented and comparison of the computing times 
of the parallel code with that of the serial code will be given. 

2. The SMAC method in two dimensions 

The SMAC method is based on a staggered grid finite differences discretization of the equations 
arising from the following procedure used to solve the Navier-Stokes equations. 

Assume that at time to the velocity field u(x, to) be given. Calculate u(x,t), t=to + 6t from the 
following: 
(1) Let/~(x, to) be an arbitrary pressure field satisfying the correct pressure conditions on the free 

surface, see Eqs. (6) and (7). 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 79 

(2) Calculate the intermediate velocity field ti(x, t) from 

_ 1 1 coil COU 2 COUl) COP d V ~72U + ~2r2 gx, (1) 
COt COx Oy COx Ree 

CO~ COY 2 COuv coil + ~ ~72v + 1 (2) 
cOt -- COx 0y Oy Re Fr ----~gy' 

where Re = UL/v and Fr = U / v / ~ ,  with U and L typical velocity and length scales, 9 gravity 
and v viscosity. 
Solve Poisson's equation (3) 

~721fl = ~7.  /~ in  O. 

(4) Update the velocities and pressure from 

u(x,  t) : ~(x, t) - ~7~k(x, t), 

p (x ,  t) = p(x ,  t) + d/(x, t) / f t .  

(3) 

(4) 

~ n +  1 
Ui+l/2, j - -  

~t 

uin+l/2,j __ P i , j  - -  ~+ l , j  + lff+l/2,j--1/2Ui%l/2,j - 1/2 -- ~n+l/2,j+l/21)in+l/2,j+l/2 

AT- uin--l/2"juin--1/2'J - -  Ui%3/2'jUi%l/2'J 1 [ui~+l/2,j+l + Ui~l/Z,j_l + 

l)i%l,j+l/2 - -  13i%1,j_1/2 - -  I)inj+l/2 "~- 1)~,j_l/2 ] 1 
- axay + ~r 29x' 

- -  2uin+l/2,j 

(8) 

(5) Update the particles position by solving the ODEs 

dx/dt  = u, dy/dt = v. (5) 

A number of boundary conditions can be applied at the rigid boundaries, namely: no-slip, free- 
slip, prescribed inflow, prescribed outflow and continuative outflow. At the free surface the usual 
free stress conditions apply: 

2 nxnx-~x + n x n y  -]- ~ x  -~- n x n y  P - Ree = 0, (6) 

nxmy-~x + nxmy + -~x + nxmy-~ =O, (7) 

where n = (n~, ny) is the outward unit normal vector to the surface and m = (rex, ~ , )  is the tangential 
vector. For the Poisson equation the boundary conditions are of mixed type, at the free surface a 
homogeneous Dirichlet boundary condition applies and at the rigid boundary the condition is of 
Neuman type. 

The grid is staggered with the subscripts i and j denoting the cell centre, i counts columns in the 
x-direction and j counts rows in the y-direction. The unknowns are located as displayed in Fig. 1. 

The implementation of the SMAC method effected in GENSMAC has four main parts: 
(1) The velocity solver based on explicit finite differences: 


80 J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

Vi,j+l/2 

© 

© 
V 

id-l/2 

Fig. 1. Variable location in the staggered grid. 

/•n+l n 

i , j+l /2  - -  Di, j+ l / 2  __ Pi ,  j - -  P i , j + l  ..~ Uin- l /2 , j+l /2Vi  n - l / 2 , j+ l / 2  - -  uin+l/2,j+l/2Vin+l/2,j+l/2 

St Sy Sx 
n n n n r n n 

.~_ I)i,j+l/2Vi, j - -1/2 - -  Vi, j+3/2Vi , j+l /2  __ __1 2V i , j+U2 - -  I)i+l,j+l/,  2 

Sy Re [ Sx 2 

.~_ Ui+l/Z,J +1 - -  Ui+l/2, j - -  Ui_ l /2 ,  j "~- Ui_ l /2 ,  j J Jr- ~ r 2 g y  • 
gy y 

(2) The Poisson solver Based on 5-point FD: 

- -  [ i  n -  l , j + l / 2  

4~,; - ~i+,,; - ~/ - , j  - ~ i , j + l  - ~ , j - 1  = - h21~)i,j 

where h = Sx (assuming Six = Sy) and 

J~i , j  : ~ti+l/2"J - -  ~ii--l/2'J "71- Vi, j + l / 2  - -  ~ i , j - - l /2  

Sx Sy 

(3) Update the velocity and pressure fields: 

u(x, t )  = (fix, t )  - V'~(x,  t), 

p (x ,  t )  =/~(x, t )  + ~(x ,  t ) / f t .  

(4) Particle movement: 

(9) 

(lO) 

(11) 

(12) 

X n+l : X  n "71- u f t ,  (13) 

yn+l = yn + vSt.  (14) 

3. The SMAC method in cylindrical coordinates 

The SMAC method in cylindrical coordinates for the solution of three-dimensional axisymmetric 
problems follows the same development of the above section. The equations are: 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 81 

(1) The tilde velocity ~(r,z,t) is calculated from: 

0U=c~t [ - 1-~(ru2)r Or O(uV)oz 

0/) 

0t 

0/~ 1 O (Ou Or) 1 ] 
N N , 

t = l 0  

r 0r 0z 0z R--er0--r r N N +F~9 :  ,=,o 

(2) The Poisson equation: 

V'2~(r,z,t) = ~7 . ~(r,z,t). 

(3) The boundary conditions on the free surface: 

P -  Ree -~rn" + O-z~ n~ + -~ +-&r ran= =0, 

2 [(0U ~V) (~U OV)(n) n2)]=O. 
R---e -~r ~ nrnz q- -~z q--~r 

(15) 

(16) 

(17) 

(18) 

(19) 

4. The parallel technique 

The basic idea is the use of parallel programming by message passing, that is, a number of serial 
processes exchanging information (data) through messages. The public domain package PVM, see 
[7], will be used for message passing. At a given time t the solution domain will resemble the one 
depicted in Fig. 2 below, where S denotes the cells forming the free surface, B denotes the rigid 
boundary and F denotes the fluid. The algorithm described in steps l, 2 and 3 is to be applied 
to the cells marked with F. On the cells marked with S the velocities are updated using the free 
surface stress free conditions (6) and (7). On the rigid boundaries, inlet and outlet, the velocities 
are calculated from the appropriate boundary conditions applicable to each case. 

The region containing all the cells is split into a number of vertical panels. These panels are 
then assigned to each of the available processors, which goes on to apply the velocity solver 

mmmmmmmmmmmmmmmmmmmmmmnnmnnm 
mmmmmmmmmm + mmmmmmnmmm 
mmmmmmmmmm mmmmmmmmmm 
mmmmmmmmmm ~ : mmmmmmmmmmm 
m mmmmmmmmmcmut) mmmmmmmmm m 
mk a i m i n m m m m i a ~ n ~ m m m m m i m m m m l  Jm 
mt Imiimmimmagcsammmmmmmml am 
mm m)n~mmnim Fant~r'* nmnmomr.rm m 4n 
mm ~ a ~m ~ mm mmmm ~t~a ~ iN mm,-J ~[. ,"r mm 
mm, ~ar~ ") mmm ~ ~[~ ra mm ~ ~a~ mm 
Nil ll~("a ra L~ m mm r, rat) ~ ~ ~I m n r~ rara ra r Jim 
mmm r ~ r ~ n r ~ , . , ,  r a n r l r ] n n , . < , r ] n n n  imn 
mini l t ] r l t ) t m ~ r ~ t l t i ~ a a t l r m r m  aNn 

m i l l  ! ~1 ) [1 [1~ I )  F~tl I I  l i l l ~  ~ l ~ P l [ i  immm 
m i n i  r l ~  r lF i  [1 r i l )  r l  i1 ~lrtPIFJ ~ ~ r l  I m m l  

m i m m l  , t7 r~t7 tlt 'd t~ r~i7 r i  r l  r l  ~ rl  f)  r l i i m i  
l l l l i  l-~ i )  F I l l  f t I1 I I  r l  Fi f7 Fi Fir) ~l l l l l l  
l l l l l  l l l l U  
l l l l i i  ~ l l l l l  
l i m  m l l l l l l l l l l l l l l l l l l l l l l m l  

Fig. 2. Domain (2 and cells types as in GENSMAC. 


82 J.A. Cuminato et al./ Journal of Computational and Applied Mathematics 103 (1999) 77-92 

v °l u 
, ]  ]r ,u  ; -v 

Pl'oces$or 0 Processor 1 Processor 2 

Fig. 3. Stencil for the momentum equations. 

to its grid points which are full cells, see Fig. 2. For the explicit finite differences scheme this 
step presents no difficulties for the parallel algorithm. Fig. 3 shows the stencil for the momentum 
equations. 

After having calculated the tilde velocities from Eqs. (8) and (9), each processor communicates 
the first and last columns of both u and v, see Fig. 3, to the right and left neighbour processors, 
respectively. Obviously the first and last processors communicate to only the last and the first 
columns, respectively. The next step is the solution of Poisson's equation through the 5-point finite 
difference scheme of step 2. This step is effected using the same processor topology used for the 
momentum equations. We use a parallel implementation of the conjugate gradient method presented 
in [3, 4] distributed in the package Parallel Iterative Methods (PIM), see [5]. PIM implements all 
the code necessary for the iteration of the CG method, apart from the routines for matrix-vector 
products and inner products, which are supplied by the user. In this context this is one of the major 
difficulties, because the matrix-vector products are quite involved and require careful coding. One 
step of the conjugate gradient solver requires the computation of the matrix-vector product y = Ax, 

where for a grid point ( i , j )  the corresponding Yi,/ is calculated from 

Yij = OtijXij + ~ijXi+l,j AI- •ijXi-l,j 7!- OijXi,j+l "AC eijXi, j - I  (20) 

and ~,/~,7,6 and e are the coefficients corresponding to the centre, north, south, east and west 
locations, respectively. These coefficients are derived from the PDE and boundary conditions. 

The main difficulty in performing this matrix-vector product is associated with the domain being 
irregular. This is done in such a way that only points falling inside the region f2 will be included 
in the calculations. To achieve this, indirect addressing has to be used in order to cope with the 
contribution of the farthest points (east and west in our case because we are numbering the unknowns 
by columns) and to minimize communication, as we do not wish to communicate the coefficients of 
points which are not going to be used to form the product. For points inside f2 with neighbours not 
in f2, the corresponding coefficient is set to zero. To implement the correct boundary conditions it 
is necessary at this stage to modify the coefficients and/or the right-hand side vector as appropriate. 
Note that the coefficients are formed only once, and are kept fixed during the iterations of  the 
iterative method. This arrangement permits the use of nonuniform meshes. The generality of the 
geometry makes the next neighbour communication pattern much more involved than the case of a 
square region, because the length and positions of the vectors to be exchanged depend now on the 
geometry. 


J.A. Cuminato et al./ Journal of Computational and Applied Mathematics 103 (1999) 77-92 83 

The parallel implementation of the particle movement is done likewise. Each processor having 
upgraded the velocity and pressure fields applies the ODE (5) to find the position of its particles at 
that time. Particle movement across processor boundaries will be necessary and will only involve 
next neighbour communication. The position of the fluid is updated with each processor communi- 
cating the possible particles which are to be moved across the boundaries. At the end of a computing 
cycle all processors hold the correct information for performing the next step. The algorithm for 
the cylindrical coordinates is exactly the same as that for the cartesian coordinates, considering the 
appropriate equation for each case. 

5. A new technique for representing the free surface 

The most time consuming parts of the GENSMAC code are the routines implementing virtual 
particle movement. They account for about 50% of the total work load. In addition, for relatively 
fine meshes the number of particles needed for representing the fluid properly is very large, re- 
quiting a large memory storage space. These two facts present a serious restriction in the use of 
GENSMAC for large problems and/or fine meshes. Recently, Tome et al. [11] developed a new 
version of GENSMAC, in which a new technique for representing the free surface is implemented. 
We shall call this version of the code the free surface particle. It permits the storage and movement 
of particles on the free surface only, as opposed to the previous version of GENSMAC (which we 
call here the fu l l  particle), where particles were used in the bulk of the fluid including the free 
surface. 

In the sequential version of  the free surface particle implementation, each body of fluid is rep- 
resented as a oriented closed list, which records the position of the particles on the free sur- 
face. The orientation is crucial for determining where the fluid lies. Fig. 4 illustrates one of such 
lists. 

As can be noted from the picture the initial point PI coincides with the last one PL. The arrows 
give the orientation of the list which in turn determines the position of the fluid. Fluid lies always 
on the left of the arrow's direction. 

This data structure, unfortunately, does not carry over for the parallel implementation, for the 
domain decomposition may split each body of fluid into smaller pieces which will be assigned 
to different processors. Each of these pieces will possibly have to be continued in a neighbour 

Fig. 4. Free surface representation as closed oriented list. 


84 J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

~ C omponen! 0 

Fig. 5. Components of a list. 

processor, so it cannot be a closed list for a start. We used an open oriented list instead which is 
composed of  several components, each processor may hold one, more than one or even none of  
them. The following picture shows an example of  such arrangement. 

Fig. 5 illustrates the situation of  a flow with one body of  fluid separated in three processors. 
Note that processor 1 holds information on three components of  the fluid: component 0 - starts 
at component 0 of  processor 2 and ends at component 0 of  processor 0; component 1 - starts at 
component 1 and ends at component 1 both in processor 2; and component 2 - starts at component 
0 of  processor 0 and ends at component 1 of  processor 2. As time evolves the fluid geometry can 
vary considerably, so that the start, the end, a part or even the whole of  a component can transfer 
from one processor to another. When this happens, information is exchanged between the involved 
processors so that the new situation is accommodated within the data structure and a new time step 
can begin. For instance, when the start of  a component, together with some part of  it, moves to a 
neighbouring processor, they are appended at the end of  a component there, which corresponds to its 
continuation in the destination processor. When part of  a component, which does not contain either 
the start or end of  it, needs to be transferred to another processor, a new component is created at 
the destination, while in the original processor the donor component is broken into two new ones. 
It can also happen the situation where several components group together to form a unique new 
one. Figs. 6 and 7 illustrate some of  those situations. 

Consider the situation where on a given time instant the fluid has the shape depicted in Fig. 5, and 
as time evolves the fluid assumes the form shown in Fig. 6, that is, the start, the end and a central 
part of  components 0 and 1 in processor 2 must be transferred to processor 1. In this case compo- 
nents 0 and 1 of  processor 2, turn into components 0 -2  and 1-3, respectively, of  processor 2; 
while in processor 1 two new components are created, 3 and 4. The blank spaces shown in 
the pictures, between the components, are left purposely in order to make clear which compo- 
nents belonged to which processors. The next figure illustrates the situation where all components 
of  a processor migrate to another. Processor 2 ceased holding fluid so that it holds no compo- 
nents of  the list. On the other hand processor 1 used to hold 5 components which collapsed into 
only one. 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 85 

PROCESSOR 0 PROCESSOR I 

0 

4C 

2 

PROCI~SOR 2 

o 

D2 

Fig. 6. Fluid movement. 

PROCESSOR 0 PROCESSOR I 

I 

PROCESSOR 2 

Fig. 7. Fluid movement. 

The computer implementation of  the data structure storing the free surface and its migration from 
one processor to another is relatively involved because the fluid can take almost any shape which 
is being modified at each time step. It has been implemented for the code GENSMAC in cartesian 
coordinates as well as in cylindrical coordinates. 

6. Test problems 

The sequential version of GENSMAC has been throughly tested throughout these years of  use, 
including its validation against experiments [9]. The parallel version was tested in several examples, 
and two of  them were chosen to be presented here. They were run in a parallel machine using a 
minimum of  two processors and a maximum of  eight. 

Problem 1 (Waterfall  onto a step). In this example the simulation of  the flow of a water falling 
onto a round obstacle is presented. The fluid enters the container through an inflow starting at 
time to falls onto the round obstacle under gravity and leaves through a continuative outflow. The 
boundary conditions on the container and obstacle walls are no-slip. The plots presented show 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 86 

t----2.349s t=7.047 

t--lO.179s t=18.009 

Fig. 8. Particle plots for the waterfall problem at different times. 

particle configuration at different times. The Reynolds and Froude numbers for this example are: 
(Re = 10.0) and (Fr = 1.0). 

Figs. 8 and 9 presents the numerical results for the same problem as above using the free surface 
particles version of the code, so that the fluid is represented by its free surface only. 

Problem 2. This problem is concerned with an axisymmetric flow into a square box of side 5 cm 
which contains fluid. An inlet of diameter 5mm is placed above this box and a coloured jet flows 
from it into the fluid in the box with velocity of l m/s. The fluid viscosity is v = 0.005, giving 
( R e =  10.0) and (Fr2=4.52). The results shown below were produced by the code implementing 
the use of particles on the free surface only. 

The examples presented above are classical test problems in the literature. The comparison between 
the numerical solution obtained by the methods used in this work with those in the literature were 
given in [9,10] and we have no intention of discussing them here again. The sole purpose of 
Fig. 10 is to illustrate the problems we are solving and not to discuss how good the solutions are. 
We shall concentrate the discussion on the performance of the parallel code instead. 

7. Load balancing 

An important aspect of parallel computing which needs consideration is load balancing. Balancing 
the work load fairly among the available processors is crucial for the good performance of the 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 87 

t = 2 . 3 4 9 s  t = 7 . 0 4 7  

t=10.179s t = 1 8 . 0 0 9  

Fig. 9. Particle plots for the waterfall problem at different times. 

parallel code. The ideal load balancing is achieved when all the available processors are assigned 
an equal share of  the work. Nevertheless, this is not always easily accomplished, specially in the 
case of problems with free surfaces. The dynamics of load balancing presents serious difficulties in 
this case as the domain varies with time making the total work-load time dependent. The problem 
geometry can vary extensively with time and the region occupied by the fluid can be very small for 
a substantial time interval so that distributing the work-load will not be efficient. As time evolves 
this region increases and so does problem size so that it becomes worth having more processors 
cooperating. 

The technique we have been using of splitting the domain into vertical strips of approximately 
the same width and assigning each of these strips to a processor, which will be responsible for the 
calculations on that strip throughout the whole computation, forces those processors holding parts 
of the domain where fluid has not yet arrived remain idle until fluid arrives to them. This is an 
inherent weakness of free surface problems and no parallel technique can avoid it. 

8. Performance validation 

Throughout this section we shall denote by Tp the time the parallel code using p processors takes 
to solve a given problem and by T~ the time the sequential code takes to solve the same problem. 


88 J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

~iiiiiiiiiiiiii ~i!!!iiiiii!iiii 
~i,~,~ii~iiiiiil, iiiiiiiiiiiiiiii 
i~iiiiiiiiiiiiii ,,~ii!!i~ili~,i 

iiiiiiiiiiiiiiii iiiiiiiiiiii!iii 
iiiiiiiiiiiiiiii ~ii!i!ii!i~ ............ 

iiiiiiilliiii, iiiiii}i}i~il 

t----4.0OOs t----12.0OOs 

? )  

t----21.000s t--29.500s 

Fig. 10. Particle plots for jet flow experiment at different times. 

The Efficiency of the parallel code is then defined by 

E = Tp/pT~. 

Note that by this definition when the problem is embarrassingly parallel  Tp = Ts/p and then the 
efficiency is 1, this is the best one can hope for. 

In this section we present the results of applying the parallel code for solving the problems of 
the previous section. 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

Table 1 
Time in seconds 

89 

NEQ T~, T2 T4 T6 7"8 

500 226 290 297 272 262 
2000 2570 2189 1750 1795 1593 
3000 4976 3880 2839 2725 2597 
8000 28621 21594 12499 11449 10328 

12000 65 708 44505 25886 19585 17119 

Table 2 
Efficiency 

NEQ T~/2T2 Ts/4T4 Ts/6T6 T~/8T8 

500 0.39 0.19 0.12 0.08 
2000 0.58 0.38 0.24 0.20 
3000 0.64 0.44 0.30 0.24 
8000 0.66 0.57 0.42 0.35 

12 000 0.73 0.63 0.56 0.48 

Table 1 displays the time for the solution of  Problem 1 using a serial code and a parallel code 
with 2, 4, 6 and 8 processors. The version of  the code used is the free surface particles. This is a 
typical example of  a transient problem, because in the beginning there is hardly any fluid in the 
container, and the work load is very low, making the use of  several processors unjustified. As more 
fluid enters the container more calculation needs to be performed and the cooperation of several 
processors makes sense. The overall performance of  the parallel code, however, is greatly affected 
by this initial phase . The purpose of  the example is mainly to give the reader an idea of  the sort 
of  times we are talking about. 

Table 2 displays the variation of  code efficiency with problem size. The purpose of  this example 
is to produce some evidence that the code becomes more efficient for large problems. It can be 
noted in Table 2 that efficiency is growing with problem size; that is, as problem size increases 
the work load becomes more evenly balanced among the processors. This brings the possibility of 
solving really large problems, provided that sufficient number of  processors are available. 

Tables 1 and 2 display the time (in seconds) and the efficiency for problem 1, in the particular 
situation when the maximum number of cells containing fluid, i.e., 12000, is achieved. NPROCS 
is the number of  processors and the full particles version of the code is being used. Note from 
Tables 3 and 4 that for this problem we have better efficiency. This happens because the number 
of  particles to be moved around is quite large and this improves the computation to communication 
ratio. Nevertheless, the total time taken by both the sequential and parallel codes is around three 
times that of  the same codes for the free surface particles version, showing that the free surface 
particles code is much faster. 

The efficiency of  the code is studied for Problem 2 using the free surface particles version of the 
code with two problem sizes: NEQ = 4000 and NEQ = 16000. Note that for this problem the work 
load is well balanced among the processors from the start of  the computation: all the processors have 
approximately the same work load. If  the number of  processors is increased, the work load of  each 


90 J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

Table 3 
Time in seconds and efficiency 

NPROC Time Efficiency 

1 198 018 1.00 
2 121 740 0.81 
4 75 009 0.66 
6 55 302 0.60 
8 41 555 0.60 

Table 4 
Efficiency for problem 2 

NEQ Ts/2T2 Ts/4T4 Ts/6T6 T~/8T8 

4000 0.78 0.58 0.48 0.45 
16 000 0.83 0.73 0.70 0.63 

1 2 3 4 5 6 7 8 

Fig. 11. Speed-up for Problem 1. 

processor decreases, so that the computation to communication ratio decreases and consequently so 
does the efficiency. On the other hand if problem size is increased the opposite situation occurs and 
so efficiency is increased. 

Figs. 11 and 12 illustrate the speed-up obtained by the parallel code in the solution of Problems 1 
and 2 above with the maximum number of cells allowed (12 000 for Problem 1 and 16 000 for 
Problem 2). It can be observed that again the performance of the parallel code is rather good for 
truly large problems, where the use of parallel processing can be justified. The decrease in efficiency 
with increasing number of processors is expected due to the very nature of the parallel technique 
being employed. 


J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 91 

1 2 3 4 5 6 7 8 

Fig. 12. Speed-up for Problem 2. 

9. Conclusions 

A parallel implementation of the SMAC method is presented. This implementation is based on the 
code GENSMAC which has been successfully tested with a variety of problems. The parallel code 
is based on domain decomposition and is implemented using PVM, a public domain software that 
permits the use of  cluster of workstations as a parallel computer. The tests showed that the parallel 
implementation can harness the power of the cluster cutting running times to at least half that of 
the serial code. However, as the tests showed, one cannot hope to increase the number of processes 
indefmitely, for a fixed sized problem. This is a characteristic inherent to the type of problem we 
are dealing with, where the solution domain varies with time, starting from a very small region and 
growing as time evolves. 

Acknowledgements 

This work is part of an on-going project supported by FAPESP and CNPq under grants 94/6019- 
2 and 523141/94-7, respectively. The author Maurilio Boaventura is supported by CAPES under a 
Ph.D. grant. The code is implemented in C using PVM for message passing and the run tests were 
performed on a IBM/SP2 with eight processors, available at CENAPAD-SP, at Campinas University 
(UNICAMP). The computer was not dedicated to our use and the network was shared with other 
applications. 

References 

[1] A.A. Amsden, F.H. Harlow, The SMAC method: a numerical technique for calculating incompressible fluid flows, 
Los Alamos Scientific Laboratory, Report LA-4370, 1971. 


92 J.A. Cuminato et al./Journal of Computational and Applied Mathematics 103 (1999) 77-92 

[2] J.A. Cuminato et al., Parallel solution of free surface flows, in: VI Congresso Brasileiro de Engenharia e Ci6ncias 
T6rmicas, VI Congreso Latinoamericano de Tranferencia de Calory  Materia, Proceedings, Florian6polis, vol. 6, 
1996, pp. 583-588. 

[3] R.D. Cunha, A study of iterative methods for the solution of systems of linear equations on transputer networks, 
Ph.D. Thesis, University of Kent at Canterbury, UK, 1992. 

[4] R.D. Cunha, T. Hopkins, The parallel solution of linear equations using iterative methods on transputer networks, 
Technical Report 16/92, University of Kent at Canterbury, UK, 1992a. 

[5] R.D. Cunha, T. Hopkins, PIM 1.1, The parallel iterative package for systems of linear equation, User's guide, 
Technical Report, University of Kent at Canterbury, UK, 1993. 

[6] M.O. Deville, Numerical experiments on then MAC Code for a slow flow, J. Comput. Phys. 15, 13 (1974) 362-374. 
[7] A. Geist et al., PVM3 user's Guide and Reference Manual, Oak Ridge, National Laboratory, Oak Ridge, TN, 1994. 
[8] F. Harlow, J.E. Welch, Numerical calculation of time-dependent viscous incompressible flow of fluid with free 

surfaces, Phys. Fluids 8 (1965) 2182-2189. 
[9] M.F. Tome, GENSMAC: A multiple free surface fluid flow solver, Ph.D. Thesis, University of Strathclyde, Glasgow, 

Scotland, 1993. 
[10] M.F. Tome, S. McKee, GENSMAC: A computational marker and cell method for free surface flows in general 

domains, J. Comput. Phys. 110 (1994) 171-186. 
[11] M.F. Tom6 et al., Numerical simulation of axisymmetric free surface flows, Notas do ICMSC-USP, No. 30, $5o 

Carlos, 1996. 
[12] J.A. Viecelli, A method for including arbitrary external boundaries in the MAC incompressible fluid computing 

technique, J. Comput. Phys. 4 (1969) 543-551.