2.1 Governing equations and numerical approach
The compressible Navier-Stokes equations in the integral and conservation form are considered, which can be written as follows:
$$ \frac{\partial}{\partial{t}} \int_{\Omega} \mathbf{W}d\Omega + \oint_{\partial{\Omega}} (\mathbf{F}_{c} - \mathbf{F}_{v}) d\mathbf{S}=0, $$
(1)
where ∂Ω is the boundary of the control volume, W is the vector of conserved variable, Fc and Fv correspond to the vectors of the inviscid and viscous flux respectively. The vectors are given as:
$$ \begin{aligned} \mathbf{W}=\left[\begin{array}{c}\rho\\\rho{u}\\\rho{v}\\\rho{w}\\\rho{E} \end{array}\right], \mathbf{F}_{c}=\left[\begin{array}{c}\rho{V}\\\rho{u}{V}+n_{x}{P}\\\rho{v}{V}+n_{y}{P}\\\rho{w}{V}+n_{z}{P}\\\rho{H}{E} \end{array}\right], \mathbf{F}_{v}=\left[\begin{array}{c}{0}\\n_{x}{\tau_{{x}{x}}} + n_{y}{\tau_{{x}{y}}} + n_{z}{\tau_{{x}{z}}}\\ n_{x}{\tau_{{y}{x}}} + n_{y}{\tau_{{y}{y}}} + n_{z}{\tau_{{y}{z}}}\\ n_{x}{\tau_{{z}{x}}} + n_{y}{\tau_{{z}{y}}} + n_{z}{\tau_{{z}{z}}}\\ n_{x}{\Theta_{x}} + n_{y}{\Theta_{y}} + n_{z}{\Theta_{z}} \end{array}\right], \end{aligned} $$
(2)
where ρ is the density; u,v,w are the velocity components in x,y,z directions, respectively; P is the pressure; E and H are the total energy and the total enthalpy per unit mass. τij are the viscous stress tensor for Newtonian fluids, which are defined as:
$$ \begin{aligned} &\tau_{{x}{x}}=\lambda{(\frac{\partial{u}}{\partial{x}} + \frac{\partial{v}}{\partial{y}} + \frac{\partial{w}}{\partial{z}}) + 2{\mu}\frac{\partial{u}}{\partial{x}}},\\ &\tau_{{y}{y}}=\lambda{(\frac{\partial{u}}{\partial{x}} + \frac{\partial{v}}{\partial{y}} + \frac{\partial{w}}{\partial{z}}) + 2{\mu}\frac{\partial{v}}{\partial{y}}},\\ &\tau_{{z}{z}}=\lambda{(\frac{\partial{u}}{\partial{x}} + \frac{\partial{v}}{\partial{y}} + \frac{\partial{w}}{\partial{z}}) + 2{\mu}\frac{\partial{w}}{\partial{z}}},\\ &\tau_{{x}{y}}=\tau_{{y}{x}}=\mu(\frac{\partial{u}}{\partial{y}} + {\partial{v}}{\partial{x}}),\\ &\tau_{{x}{z}}=\tau_{{z}{x}}=\mu(\frac{\partial{u}}{\partial{z}} + {\partial{w}}{\partial{x}}),\\ &\tau_{{y}{z}}=\tau_{{z}{y}}=\mu(\frac{\partial{v}}{\partial{z}} + {\partial{w}}{\partial{y}}), \end{aligned} $$
(3)
where μ is molecular viscosity coefficient calculated by the Sutherland law [24], and λ=−2/3μ with Stokes hypothesis.
For the compressible Euler/Navier-Stokes equations, the flow states need to be reconstructed on the left and right sides of an interface of neighboring control volumes, as sketched in Fig. 1. The governing equations are discretized using the finite volume formulation, and a cell-centered, second-order method is used in this paper [24]:
$$ \mathbf{U}_{L}=\mathbf{U}_{i} + \mathbf{\Phi}_{i}\cdot(\nabla \mathbf{U}_{i}\cdot \mathbf{r}_{L}), \mathbf{U}_{R}=\mathbf{U}_{j} + \mathbf{\Phi}_{j}\cdot(\nabla \mathbf{U}_{j}\cdot \mathbf{r}_{R}), $$
(4)
where rL and rR represent the vector from the left and right cell center to the face midpoint. Ui is the gradient at cell center i, Uj is the gradient at cell center j, and they are all calculated by the Green-Guass method [25] in this paper. Φi is the limiter for cell center i, Φj is the limiter for cell center j, and they are all calculated by the Venkatakrishnan limiter [26]. The inviscid flux Fc at each cell interface is computed by the HLLC scheme developed by Toro et al. [27], and the viscous flux Fv is approximated by using 2nd order accurate central difference scheme in Ref. [28]. The solution is updated by using the explicit three-stage third-order Runge-Kutta method [24], and the CFL number is set to 0.8 for all examples in this paper.
2.2 Mesh generation
The process for generating an adaptive Cartesian grid is shown in Fig. 2. The entire process is highly parallelized and automated. All the user needs to do is to specify the input geometry file, the calculation domain and the maximum level of refinement. Figure 3 shows the changes of the grid during the generation process of DLR-F6 model. In addition, it should be emphasized that the second and fifth steps of the process are critical to the efficiency of generating grid and the quality of the grid. Therefore, the detailed strategies for these two steps are given below.
The second step in the process is to determine the intersection of the cell and the object surface. For complex three-dimensional geometric shapes, the number of a high-quality Cartesian grid often reaches to the order of tens of millions. For such a situation, an efficient algorithm for judging the intersection of the object surface and the cell is highly needed. Here, a fast intersection test [23] that is based on the axis aligned bounding box (AABB) theorem is employed in our work. It is based on Plucker coordinate and tests the ray against the silhouette of the AABB, instead of testing against individual faces of the box or comparing intersection intervals. The algorithm is performed using only dot products and comparisons while the classic algorithm requires division. Its computational simplicity results in excellent performance. After quickly identifying the cells that intersect the wall boundary, these cells will be refined recursively until the maximum refinement level is reached as shown in Fig. 3(c).
The fifth step in Fig. 2 is to establish the transition zone of the grid. Refining only the intersecting cells may result in the cell size in the boundary layer being too large, thus the obtained mesh needs to be further refined. The precise distance to the object surface is calculated for the cells with level R and R−1, where R is the maximum level of refinement. The rest of the cells only calculate the rough distance to the object surface to save calculations. Then the cells will be recursively refined if the distance satisfies the following relationship:
where D is the distance to the object surface, r is the level of the cell and h is the length of the cell. Then, the resulting grid fits the object surface model to a large extent, and the size of the grid is guaranteed in the boundary layer and nearby areas. Figure 3(d) shows the grid with transition zone.
According to our experience, the maximum refinement level and the number of surface meshes of the object are the key factors that determine the time to mesh generation. In order to capture the main flow phenomenon in specific flow, the minimum size of the grid should be estimated in advance, so that the maximum refinement level is also determined. The minimum size of the triangular mesh of the surface should preferably be limited to match the level, so that the time to generate the mesh can be minimized.
After the entire process, a high-quality computational grid is obtained. For the model of DLR-F6, it contains 16,280 triangular surface meshes and it takes about 600 seconds to generate the final grid of 15 million. For models with more complex surface, such as the COVID-19 model with 188,280 triangular surface meshes, the time to generate 3 million grids is about 500 seconds. The final adaptive grid is shown in Fig. 4. Furthermore, for arbitrary shapes, CABA can automatically and efficiently generate high-quality computational grids without any manual intervention. This is an important part of solving large-scale and complex problems on the Cartesian grid. The cases were tested on a server with two Intel(R) Xeon(R) E5-2680 V3 CPUs (48 cores).
After the above process, high-quality grids can be obtained. But they are not capable of simulating complex flows. In order to accurately capture flow phenomena such as shock waves and vortices, we perform mesh refinements based on the characteristic of the flow field. The following criteria are mainly used in this paper to capture the special flow field structures: the divergence of velocity, the curl of velocity, or both of them. Their specific expressions are as follows [29]:
$$ \tau_{ci}=|\nabla \times V|h_{i}^{\frac{r+1}{r}}, \tau_{di}=|\nabla \cdot V|h_{i}^{\frac{r+1}{r}}, \sigma_{c}=\sqrt{\frac{\sum_{i=1}^{N}{\tau_{ci}^{2}}}{N}}, \sigma_{d}=\sqrt{\frac{\sum_{i=1}^{N}{\tau_{di}^{2}}}{N}}, $$
(6)
where N is the total number of cells and hi is the length scale of the cell, computed as \(h_{i}=\sqrt [r]{\Omega _{i}}\) with Ω being the volume of the cell. Here we use the standard deviations of divergence and curl as the sensors, the conditions can be described as:
-
(1)
refine: when τci>w1σc or τdi>w2σd,
-
(2)
coarsen: when τci<w3σc and τdi<w4σd,
where wi(i=1,2,3,4) are adjustable coefficients based on different problems.
After two kinds of adaptations, large-scale and high-quality meshes for arbitrary complex shapes are generated, besides, through mesh refinements the steady and unsteady flow phenomena can be automatically captured. Note that the whole process is automatic and efficient without manual intervention.
2.3 Parallel computing of adaptive Cartesian grid
Since the adaptive Cartesian grid is continuously refined and coarsened with the flow characteristics, it brings many difficulties in large-scale high-performance computing, such as load balancing, reducing communication cost, and search algorithm. And, generally, the storage of tree structure used in AMR is made in linear arrays to increase efficiency. However, this method causes a bad cache locality making it difficult to parallelize. Up to now, there are many cell-based parallel AMR libraries, such as CHOMBO [16], Dendro [30], and p4est [17, 18]. Among them, only p4est does not have strict modularity restrictions. Therefore, in this paper, the open-source library p4est [17, 18] is employed in the in-house CABA solver.
In the solver, multiple original trees represent a discretization of the physical space Ω. The trees define a macro layer, their refined cells define a micro layer, and these two layers make up the domain. The data in the domain is stored in linear tree structure, which is determined by Z-order curve (a space-filing curve). The property of all kinds of space-filing curves, which is called compactness, makes the continuity along the space-filing curve index equal to the continuity in the Cartesian grid. Thus, the Z-order curve could provide an efficient way of partitioning data for load balancing. Meanwhile, it can help to number the nodes by managing the data memory layout in p4est. As shown in Fig. 5, the Z-order curve covers both the macro layer and the micro layer, which means a one-to-one mapping from the spatial coordinates to the index in linear tree storage. And it also shows the order of the index and load balancing between processes (different colors mean different processors).
2.4 Ghost-cell method
For Cartesian grid, the immersed boundary method is generally combined to simulate flow problems because the grid lines are not always aligned with the body [5, 6]. Figure 6 shows a schematic diagram of the ghost-cell method in a two-dimensional case. For closed curves, the Cartesian grids are classified into three categories by the ray-tracing technique [31]: flow cell which is completely inside the fluid, boundary cell which intersects with wall boundary, and solid cell which is completely inside the solid. And the primitive variables of the ghost-cells are determined by variables on the symmetric point.
For example, the symmetric point of ghost-cell A is on the extension line of AD, where D is the closest point on the body surface from A. Then C is the symmetric point of cell A by symmetry, and the primitive variables at the symmetric point are interpolated from the located cell. By using a first-order pressure extrapolation, the wall pressure is taken as the value associated with the nearest cell center, which means the normal pressure gradient is zero. Therefore, the relationship can be expressed as:
$$ P_{A}=P_{D}=P_{C}, \rho_{A}=\rho_{D}=\rho_{C}, $$
(7)
where PD and ρD represent the wall pressure and the wall density of the point D. Then the classical non-penetration and slip wall boundary conditions are considered for inviscid flow, the following equations can be obtained:
$$ V_{t,A}=V_{t,C}, V_{n,A}=-V_{n,C}, $$
(8)
For viscous flow, non-slip wall boundary condition is considered and the equations are:
$$ V_{t,A}=-V_{t,C}, V_{n,A}=-V_{n,C}, $$
(9)
Then, a relationship between variables of ghost-cell A and symmetric point C is established [7, 12]. It needs to be stated that for the high Reynolds number compressible flow, the wall function method is needed to deal with the boundary conditions of the object surface. This part is under development so far and will not be introduced in this article.
In particular, it needs to be emphasized that there will be some challenges when implementing the GCM method under the MPI parallel framework. One is that establishing the relationship between ghost-cell and symmetry point may become very difficult due to distributed storage. As shown in Fig. 7, ghost-cell A and symmetric point B are not in the same process, and even point B is beyond the ghost layer used for general parallel communication. In fact, for a three-dimensional grid of tens of millions, one hundred thousand ghost-cells might not be able to find the symmetric points in the process. Especially for three-dimensional situation, just increasing the ghost layer to two layers can not satisfy all possible symmetric point distributions, let alone the greatly increased communication cost. The second challenge is that the multi-valued ghost-cell method [22] used to handle thin objects requires additional sets of data to be stored in ghost-cell. However, this approach will increase the size of the structure of all cells several times. This means that while greatly increasing the communication costs, only part of the delivered information around the thin object is useful.
For the first challenge, in order to efficiently transfer information between ghost-cell and symmetric point after parallelization, we established a special point-to-cell relationship for each group of ghost-cell and symmetric point that are not in the same process. As long as the grid does not change, these relationships will not change, so the information can be delivered efficiently. The specific process is as follows:
-
Collect all the symmetric points that can not be found locally and search them globally. The coordinates of these points need to be temporarily shared globally.
-
Establish unique relationships between the original processes and the processes where the symmetric points are located. For a three-dimensional grid of tens of millions, there might be thousands of relationships that need to be established for each process.
-
Dynamically apply for storage space for the information to be sent and received according to the established relationship.
-
Connect the symmetric points, ghost-cell and these storage spaces through the pointer.
In this way, only one simple communication is required for each time step, and all the information required by GCM is available. But we still have to face the unsteady problem involving complex flow, which requires AMR. The continuously change of the grid means that the above relationships need to be rebuilt frequently, which requires frequent search for points. Thus, in order to reduce calculation costs, we optimized the follow-up point-finding logic. Because in CABA, recursive mesh refinement is prohibited during iteration, each adaptation will not cause drastic changes in the mesh partition. The ID of the process of the information source received during the first point finding is recorded. And searching for these “neighboring processes" in each subsequent point finding process could effectively reduce the calculation cost.
Then, the second challenge encountered when applying GCM to parallelization is discussed. It has always been the difficulty of GCM to treat thin bodies such as the trailing edge of the airfoils and the leading edge of the delta wing. If the thickness of a body becomes smaller than 1.5 times of cell size, some ghost-cells have to handle both sides of the body as shown in Fig. 7. Cell A inside the geometry is the ghost-cell for the upper side of the corner surface with symmetric point B, as well as for the lower side with symmetric point C. Ignoring the multi-value points will cause unavoidable errors in the flow simulation of shapes such as delta wing. A multi-valued ghost-cell [22] is usually employed to handle this problem. By sweeping in the three coordinate directions, the ghost-cell A could have sets of properties computed from both sides of the trailing edge. In a three-dimensional problem, a ghost-cell may have 3, 4 or even more symmetric points. This method needs to open up storage space for all possible data of all ghost-cells, which will greatly increase the cost of parallel communication. And only a few of the additional information will be used in the simulation of the flow field near the multi-value point.
In order to enhance the accuracy and robustness of the algorithm in calculating three-dimensional thin shape, the multi-valued method is improved. We collect the intersecting surfaces of all surrounding boundary cells and search the symmetric point for each surface. By matching the vectors from the center of the cell to the symmetric point with the normal vectors of the cell surfaces, each ghost-cell can match up to 6 symmetric points in the three-dimensional case. The information of the local symmetry point can be accessed by pointer, and the information of the symmetry point of other processes will be passed through the point-to-cell relationship mentioned above.
In this way, CABA can get as much information as possible to fit the surface of the object when simulating the flow field. Compared with non-special processing of multi-value points, this method can guarantee the authenticity of the flow simulation near the thin object. This is of great significance for dealing with three-dimensional pointed objects such as supersonic wave riders.