2.1 Governing equations and numerical approach
The compressible NavierStokes equations in the integral and conservation form are considered, which can be written as follows:
$$ \frac{\partial}{\partial{t}} \int_{\Omega} \mathbf{W}d\Omega + \oint_{\partial{\Omega}} (\mathbf{F}_{c}  \mathbf{F}_{v}) d\mathbf{S}=0, $$
(1)
where ∂Ω is the boundary of the control volume, W is the vector of conserved variable, F_{c} and F_{v} correspond to the vectors of the inviscid and viscous flux respectively. The vectors are given as:
$$ \begin{aligned} \mathbf{W}=\left[\begin{array}{c}\rho\\\rho{u}\\\rho{v}\\\rho{w}\\\rho{E} \end{array}\right], \mathbf{F}_{c}=\left[\begin{array}{c}\rho{V}\\\rho{u}{V}+n_{x}{P}\\\rho{v}{V}+n_{y}{P}\\\rho{w}{V}+n_{z}{P}\\\rho{H}{E} \end{array}\right], \mathbf{F}_{v}=\left[\begin{array}{c}{0}\\n_{x}{\tau_{{x}{x}}} + n_{y}{\tau_{{x}{y}}} + n_{z}{\tau_{{x}{z}}}\\ n_{x}{\tau_{{y}{x}}} + n_{y}{\tau_{{y}{y}}} + n_{z}{\tau_{{y}{z}}}\\ n_{x}{\tau_{{z}{x}}} + n_{y}{\tau_{{z}{y}}} + n_{z}{\tau_{{z}{z}}}\\ n_{x}{\Theta_{x}} + n_{y}{\Theta_{y}} + n_{z}{\Theta_{z}} \end{array}\right], \end{aligned} $$
(2)
where ρ is the density; u,v,w are the velocity components in x,y,z directions, respectively; P is the pressure; E and H are the total energy and the total enthalpy per unit mass. τ_{ij} are the viscous stress tensor for Newtonian fluids, which are defined as:
$$ \begin{aligned} &\tau_{{x}{x}}=\lambda{(\frac{\partial{u}}{\partial{x}} + \frac{\partial{v}}{\partial{y}} + \frac{\partial{w}}{\partial{z}}) + 2{\mu}\frac{\partial{u}}{\partial{x}}},\\ &\tau_{{y}{y}}=\lambda{(\frac{\partial{u}}{\partial{x}} + \frac{\partial{v}}{\partial{y}} + \frac{\partial{w}}{\partial{z}}) + 2{\mu}\frac{\partial{v}}{\partial{y}}},\\ &\tau_{{z}{z}}=\lambda{(\frac{\partial{u}}{\partial{x}} + \frac{\partial{v}}{\partial{y}} + \frac{\partial{w}}{\partial{z}}) + 2{\mu}\frac{\partial{w}}{\partial{z}}},\\ &\tau_{{x}{y}}=\tau_{{y}{x}}=\mu(\frac{\partial{u}}{\partial{y}} + {\partial{v}}{\partial{x}}),\\ &\tau_{{x}{z}}=\tau_{{z}{x}}=\mu(\frac{\partial{u}}{\partial{z}} + {\partial{w}}{\partial{x}}),\\ &\tau_{{y}{z}}=\tau_{{z}{y}}=\mu(\frac{\partial{v}}{\partial{z}} + {\partial{w}}{\partial{y}}), \end{aligned} $$
(3)
where μ is molecular viscosity coefficient calculated by the Sutherland law [24], and λ=−2/3μ with Stokes hypothesis.
For the compressible Euler/NavierStokes equations, the flow states need to be reconstructed on the left and right sides of an interface of neighboring control volumes, as sketched in Fig. 1. The governing equations are discretized using the finite volume formulation, and a cellcentered, secondorder method is used in this paper [24]:
$$ \mathbf{U}_{L}=\mathbf{U}_{i} + \mathbf{\Phi}_{i}\cdot(\nabla \mathbf{U}_{i}\cdot \mathbf{r}_{L}), \mathbf{U}_{R}=\mathbf{U}_{j} + \mathbf{\Phi}_{j}\cdot(\nabla \mathbf{U}_{j}\cdot \mathbf{r}_{R}), $$
(4)
where r_{L} and r_{R} represent the vector from the left and right cell center to the face midpoint. U_{i} is the gradient at cell center i, U_{j} is the gradient at cell center j, and they are all calculated by the GreenGuass method [25] in this paper. Φ_{i} is the limiter for cell center i, Φ_{j} is the limiter for cell center j, and they are all calculated by the Venkatakrishnan limiter [26]. The inviscid flux F_{c} at each cell interface is computed by the HLLC scheme developed by Toro et al. [27], and the viscous flux F_{v} is approximated by using 2nd order accurate central difference scheme in Ref. [28]. The solution is updated by using the explicit threestage thirdorder RungeKutta method [24], and the CFL number is set to 0.8 for all examples in this paper.
2.2 Mesh generation
The process for generating an adaptive Cartesian grid is shown in Fig. 2. The entire process is highly parallelized and automated. All the user needs to do is to specify the input geometry file, the calculation domain and the maximum level of refinement. Figure 3 shows the changes of the grid during the generation process of DLRF6 model. In addition, it should be emphasized that the second and fifth steps of the process are critical to the efficiency of generating grid and the quality of the grid. Therefore, the detailed strategies for these two steps are given below.
The second step in the process is to determine the intersection of the cell and the object surface. For complex threedimensional geometric shapes, the number of a highquality Cartesian grid often reaches to the order of tens of millions. For such a situation, an efficient algorithm for judging the intersection of the object surface and the cell is highly needed. Here, a fast intersection test [23] that is based on the axis aligned bounding box (AABB) theorem is employed in our work. It is based on Plucker coordinate and tests the ray against the silhouette of the AABB, instead of testing against individual faces of the box or comparing intersection intervals. The algorithm is performed using only dot products and comparisons while the classic algorithm requires division. Its computational simplicity results in excellent performance. After quickly identifying the cells that intersect the wall boundary, these cells will be refined recursively until the maximum refinement level is reached as shown in Fig. 3(c).
The fifth step in Fig. 2 is to establish the transition zone of the grid. Refining only the intersecting cells may result in the cell size in the boundary layer being too large, thus the obtained mesh needs to be further refined. The precise distance to the object surface is calculated for the cells with level R and R−1, where R is the maximum level of refinement. The rest of the cells only calculate the rough distance to the object surface to save calculations. Then the cells will be recursively refined if the distance satisfies the following relationship:
where D is the distance to the object surface, r is the level of the cell and h is the length of the cell. Then, the resulting grid fits the object surface model to a large extent, and the size of the grid is guaranteed in the boundary layer and nearby areas. Figure 3(d) shows the grid with transition zone.
According to our experience, the maximum refinement level and the number of surface meshes of the object are the key factors that determine the time to mesh generation. In order to capture the main flow phenomenon in specific flow, the minimum size of the grid should be estimated in advance, so that the maximum refinement level is also determined. The minimum size of the triangular mesh of the surface should preferably be limited to match the level, so that the time to generate the mesh can be minimized.
After the entire process, a highquality computational grid is obtained. For the model of DLRF6, it contains 16,280 triangular surface meshes and it takes about 600 seconds to generate the final grid of 15 million. For models with more complex surface, such as the COVID19 model with 188,280 triangular surface meshes, the time to generate 3 million grids is about 500 seconds. The final adaptive grid is shown in Fig. 4. Furthermore, for arbitrary shapes, CABA can automatically and efficiently generate highquality computational grids without any manual intervention. This is an important part of solving largescale and complex problems on the Cartesian grid. The cases were tested on a server with two Intel(R) Xeon(R) E52680 V3 CPUs (48 cores).
After the above process, highquality grids can be obtained. But they are not capable of simulating complex flows. In order to accurately capture flow phenomena such as shock waves and vortices, we perform mesh refinements based on the characteristic of the flow field. The following criteria are mainly used in this paper to capture the special flow field structures: the divergence of velocity, the curl of velocity, or both of them. Their specific expressions are as follows [29]:
$$ \tau_{ci}=\nabla \times Vh_{i}^{\frac{r+1}{r}}, \tau_{di}=\nabla \cdot Vh_{i}^{\frac{r+1}{r}}, \sigma_{c}=\sqrt{\frac{\sum_{i=1}^{N}{\tau_{ci}^{2}}}{N}}, \sigma_{d}=\sqrt{\frac{\sum_{i=1}^{N}{\tau_{di}^{2}}}{N}}, $$
(6)
where N is the total number of cells and h_{i} is the length scale of the cell, computed as \(h_{i}=\sqrt [r]{\Omega _{i}}\) with Ω being the volume of the cell. Here we use the standard deviations of divergence and curl as the sensors, the conditions can be described as:

(1)
refine: when τ_{ci}>w_{1}σ_{c} or τ_{di}>w_{2}σ_{d},

(2)
coarsen: when τ_{ci}<w_{3}σ_{c} and τ_{di}<w_{4}σ_{d},
where w_{i}(i=1,2,3,4) are adjustable coefficients based on different problems.
After two kinds of adaptations, largescale and highquality meshes for arbitrary complex shapes are generated, besides, through mesh refinements the steady and unsteady flow phenomena can be automatically captured. Note that the whole process is automatic and efficient without manual intervention.
2.3 Parallel computing of adaptive Cartesian grid
Since the adaptive Cartesian grid is continuously refined and coarsened with the flow characteristics, it brings many difficulties in largescale highperformance computing, such as load balancing, reducing communication cost, and search algorithm. And, generally, the storage of tree structure used in AMR is made in linear arrays to increase efficiency. However, this method causes a bad cache locality making it difficult to parallelize. Up to now, there are many cellbased parallel AMR libraries, such as CHOMBO [16], Dendro [30], and p4est [17, 18]. Among them, only p4est does not have strict modularity restrictions. Therefore, in this paper, the opensource library p4est [17, 18] is employed in the inhouse CABA solver.
In the solver, multiple original trees represent a discretization of the physical space Ω. The trees define a macro layer, their refined cells define a micro layer, and these two layers make up the domain. The data in the domain is stored in linear tree structure, which is determined by Zorder curve (a spacefiling curve). The property of all kinds of spacefiling curves, which is called compactness, makes the continuity along the spacefiling curve index equal to the continuity in the Cartesian grid. Thus, the Zorder curve could provide an efficient way of partitioning data for load balancing. Meanwhile, it can help to number the nodes by managing the data memory layout in p4est. As shown in Fig. 5, the Zorder curve covers both the macro layer and the micro layer, which means a onetoone mapping from the spatial coordinates to the index in linear tree storage. And it also shows the order of the index and load balancing between processes (different colors mean different processors).
2.4 Ghostcell method
For Cartesian grid, the immersed boundary method is generally combined to simulate flow problems because the grid lines are not always aligned with the body [5, 6]. Figure 6 shows a schematic diagram of the ghostcell method in a twodimensional case. For closed curves, the Cartesian grids are classified into three categories by the raytracing technique [31]: flow cell which is completely inside the fluid, boundary cell which intersects with wall boundary, and solid cell which is completely inside the solid. And the primitive variables of the ghostcells are determined by variables on the symmetric point.
For example, the symmetric point of ghostcell A is on the extension line of AD, where D is the closest point on the body surface from A. Then C is the symmetric point of cell A by symmetry, and the primitive variables at the symmetric point are interpolated from the located cell. By using a firstorder pressure extrapolation, the wall pressure is taken as the value associated with the nearest cell center, which means the normal pressure gradient is zero. Therefore, the relationship can be expressed as:
$$ P_{A}=P_{D}=P_{C}, \rho_{A}=\rho_{D}=\rho_{C}, $$
(7)
where P_{D} and ρ_{D} represent the wall pressure and the wall density of the point D. Then the classical nonpenetration and slip wall boundary conditions are considered for inviscid flow, the following equations can be obtained:
$$ V_{t,A}=V_{t,C}, V_{n,A}=V_{n,C}, $$
(8)
For viscous flow, nonslip wall boundary condition is considered and the equations are:
$$ V_{t,A}=V_{t,C}, V_{n,A}=V_{n,C}, $$
(9)
Then, a relationship between variables of ghostcell A and symmetric point C is established [7, 12]. It needs to be stated that for the high Reynolds number compressible flow, the wall function method is needed to deal with the boundary conditions of the object surface. This part is under development so far and will not be introduced in this article.
In particular, it needs to be emphasized that there will be some challenges when implementing the GCM method under the MPI parallel framework. One is that establishing the relationship between ghostcell and symmetry point may become very difficult due to distributed storage. As shown in Fig. 7, ghostcell A and symmetric point B are not in the same process, and even point B is beyond the ghost layer used for general parallel communication. In fact, for a threedimensional grid of tens of millions, one hundred thousand ghostcells might not be able to find the symmetric points in the process. Especially for threedimensional situation, just increasing the ghost layer to two layers can not satisfy all possible symmetric point distributions, let alone the greatly increased communication cost. The second challenge is that the multivalued ghostcell method [22] used to handle thin objects requires additional sets of data to be stored in ghostcell. However, this approach will increase the size of the structure of all cells several times. This means that while greatly increasing the communication costs, only part of the delivered information around the thin object is useful.
For the first challenge, in order to efficiently transfer information between ghostcell and symmetric point after parallelization, we established a special pointtocell relationship for each group of ghostcell and symmetric point that are not in the same process. As long as the grid does not change, these relationships will not change, so the information can be delivered efficiently. The specific process is as follows:

Collect all the symmetric points that can not be found locally and search them globally. The coordinates of these points need to be temporarily shared globally.

Establish unique relationships between the original processes and the processes where the symmetric points are located. For a threedimensional grid of tens of millions, there might be thousands of relationships that need to be established for each process.

Dynamically apply for storage space for the information to be sent and received according to the established relationship.

Connect the symmetric points, ghostcell and these storage spaces through the pointer.
In this way, only one simple communication is required for each time step, and all the information required by GCM is available. But we still have to face the unsteady problem involving complex flow, which requires AMR. The continuously change of the grid means that the above relationships need to be rebuilt frequently, which requires frequent search for points. Thus, in order to reduce calculation costs, we optimized the followup pointfinding logic. Because in CABA, recursive mesh refinement is prohibited during iteration, each adaptation will not cause drastic changes in the mesh partition. The ID of the process of the information source received during the first point finding is recorded. And searching for these “neighboring processes" in each subsequent point finding process could effectively reduce the calculation cost.
Then, the second challenge encountered when applying GCM to parallelization is discussed. It has always been the difficulty of GCM to treat thin bodies such as the trailing edge of the airfoils and the leading edge of the delta wing. If the thickness of a body becomes smaller than 1.5 times of cell size, some ghostcells have to handle both sides of the body as shown in Fig. 7. Cell A inside the geometry is the ghostcell for the upper side of the corner surface with symmetric point B, as well as for the lower side with symmetric point C. Ignoring the multivalue points will cause unavoidable errors in the flow simulation of shapes such as delta wing. A multivalued ghostcell [22] is usually employed to handle this problem. By sweeping in the three coordinate directions, the ghostcell A could have sets of properties computed from both sides of the trailing edge. In a threedimensional problem, a ghostcell may have 3, 4 or even more symmetric points. This method needs to open up storage space for all possible data of all ghostcells, which will greatly increase the cost of parallel communication. And only a few of the additional information will be used in the simulation of the flow field near the multivalue point.
In order to enhance the accuracy and robustness of the algorithm in calculating threedimensional thin shape, the multivalued method is improved. We collect the intersecting surfaces of all surrounding boundary cells and search the symmetric point for each surface. By matching the vectors from the center of the cell to the symmetric point with the normal vectors of the cell surfaces, each ghostcell can match up to 6 symmetric points in the threedimensional case. The information of the local symmetry point can be accessed by pointer, and the information of the symmetry point of other processes will be passed through the pointtocell relationship mentioned above.
In this way, CABA can get as much information as possible to fit the surface of the object when simulating the flow field. Compared with nonspecial processing of multivalue points, this method can guarantee the authenticity of the flow simulation near the thin object. This is of great significance for dealing with threedimensional pointed objects such as supersonic wave riders.