An ultralight geometry processing library for parallel mesh refinement

In applications such as parallel mesh refinement, it remains a challenging issue to ensure the refined surface respects the original Computer-Aided Design (CAD) model accurately. In this paper, an ultralight geometry processing library is developed to resolve this issue effectively and efficiently. Here, we say the kernel is ultralight because it has a very small set of data-structures and algorithms by comparison with industrial-level geometry kernels. Within the library, a simplified surface boundary representation (B-rep) and a radial edge structure are developed respectively to depict the geometry model and the surface mesh, plus hash tables that record the connections between the geometry model and the surface mesh. Based on these data structures, a set of efficient algorithms are developed, which initializes the connection tables, projects a point back to the original geometry, etc. With these data-structure and algorithmic infrastructures set up, the callings of eight well-designed Application Programming Interfaces (APIs) are powerful enough to enable the parallel mesh refinement algorithm outputs a mesh respecting the input CAD model accurately. Numerical experiments will be finally presented to evaluate the performance of the overall parallel mesh refinement algorithm and the algorithms in relation with the developed library.

large-scale mesh generation. By comparison, the later approaches are suitable for shared memory architecture and its capability is limited by the size of available memories.
Parallel refinement is another simple but powerful approach to create large-scale meshes. With a coarse mesh as input, parallel refinement subdivides each element according to templates in parallel. Commonly, large-scale meshes contain different types of elements, for instance, the hybrid prism-tetrahedra mesh for viscous simulations, in which layered prismatic elements are aligned with domain boundaries, tetrahedral elements are filled in far fields, and a few pyramids are used in the transition region. It is challenging to create a hybrid prism-tetrahedra mesh having billions of elements by domain decomposition approaches. However, much larger meshes have been demonstrated by using parallel refinement techniques [11].
One drawback of refinement-based approaches is that the new generated surface meshes will deviate from the original geometry. Reconstructing a high-order representation locally is a solution [12]; however, its accuracy depends on how accurately the initial surface mesh approximates the original geometry. Another choice is to respect the original geometry. The core issue here is how to represent the geometry and implement computations such as mapping and projection efficiently on that geometry. Zhao et al. [13] employ OpenCascade (OCC) [14], an open-source computer-aided design (CAD) kernel to represent the original geometry and reuse the algorithms provided by the kernel to accomplish the projection procedure. However, using industrial-level CAD kernels like OCC in this context is too heavy, not only because their library size is too big and their functions are too redundancy, but also because their learning curve is very high. Meanwhile, these CAD kernels are designed for general applications and its efficiency is not acceptable in critical parallel applications [13,15].
In this study, we suggest the development of an ultralight geometry processing library (also called ultralight geometry kernel) for parallel mesh refinement. Here, we say the geometry library is ultralight because it has a very small set of data-structures and algorithms by comparison with industrial-level geometry kernels: 1. Data structures. A simplified surface boundary representation (B-rep) is used to record the topology objects. Ferguson curves and Coons surfaces, for their simplicity, are presently employed to define the curve and surface geometry [16][17][18][19][20]. The connections between B-rep objects (points/curves/faces) and surface mesh objects (vertices/edges/facets) are maintained to support the implementation of accurate and efficient projection algorithms [21]. 2. Algorithms. Two key algorithms are developed: one algorithm initializes the connection between surface B-rep objects and surface mesh objects, the other one projects a point back to the original geometry. Techniques to improve the robustness and efficiency of both algorithms will be depicted.
An essential set of programming interfaces (APIs) is provided to perform the data query and key geometry algorithms. Examples show that a few callings of these APIs are powerful enough to enable the refined mesh to respect the original geometry. We will demonstrate this desirable feature by applying the kernel in a refinement-based parallel mesh generator. It is worth noting that the similar approach can be naturally extended for applications such as high-order mesh generation [22,23] and surface mesh adaptation [24].
The following discussion is organized as below. In Section 2, the layered structure of the developed geometry kernel is introduced. In Sections 3 and 4, the focuses are on the implementation of data structures and key algorithms, respectively. In Section 5, the set of APIs for external callings is listed and how they are used in parallel mesh refinement is demonstrated. Section 6 presents numerical studies on the proposed approach. Section 7 summarizes the article and presents a few suggestions on future work.

Layered structure of the kernel
As seen in Fig. 1, the ultralight kernel is organized into three layers: 1. The data structures layer, in which the geometry model, the surface mesh and their connections are represented by a simplified surface B-rep, a radial-edge structure and some hash tables. 2. The algorithms layer, which consists of the algorithms projecting a point to a curve and a surface, the algorithm setting up the connections between the geometry model and the surface mesh, the input/out algorithms and the basic query algorithms of the fundamental data structures. 3. The APIs layer, which, for the users' convenience, consists of a set of user functions that are implemented by warping or combining the algorithms implemented in the algorithms layer.

The surface B-rep
The surface B-rep is introduced in [21], which refers to a subset of the solid B-rep and it includes three basic topology entities: face, curve and point, as illustrated in Fig. 2. Meanwhile, a specific topology entity named loop is used to limit the valid region of a face. Internally, a loop refers to a set of boundary curves and is a group entity that distinguishes from other topology entities. With respect to the geometric description of curves and surfaces, Ferguson curves and Coons surfaces [25,26] are selected for their simplicity and powerful capability for geometry representation. For completeness, their analytic definitions are presented as below.
A Ferguson curve is composed of many end-to-end connected curve segments. Each segment is analytically defined as below (see Fig. 3).
where P(0) and P(1) refer to starting and ending points of the segment, respectively, and P ' (0) and P ' (1) refer to tangent vectors at respective points. Given a set of interpolation points, the tangent vectors at these points could be computed by introducing two-order continuous conditions and two boundary conditions at the starting and ending points of the entire curve.
A Coons surface is composed of a matrix of surface patches, and each patch is analytically defined as below (see Fig. 4).
Illustration for the surface B-rep [21] where ) is a vector of values computed by four Hermite interpolation functions as below, .

Fig. 4 A Coons surface
Given a matrix of interpolation points, the partial differentials defined in the matrix A could be computed by introducing two-order continuous conditions and four boundary conditions at the corner points of the entire surface. Interested readers are referred to [25,26] for more details.
For simplicity, only one type of curve and one type of surface are supported in our geometry kernel, far fewer than that number supported in industry geometry kernels. For instance, in OpenCascade 9 types of curves and 11 types of surfaces are respectively supported. To use this library, it is necessary to convert other types of curves and surfaces to their Ferguson counterparts at first. The deviation error between the Coons representation and the original one can be controlled by the resolutions of sample points. Table 1 shows the deviation error between the Coons representation of a sphere with radius 50 and the analytic representation of the sphere. The error is evaluated by the distance between the sample points on Coons surface and the analytic surface. About 2 million points are uniformly sampled on the surface. Given the fact that the error is usually smaller by a few factors than the resolution of the required mesh, we say it is a reasonable compromise to use Coons surface instead of the more general but complex NURBS representation.

The data structure for surface mesh
Here, the data structure introduced in [27] is reused, in which the surface mesh is represented by a list of edges and a list of facets. The following codes present the data structures used to define surface edges and facets: To consider non-manifold cases, the number of facets adjacent to one edge is not fixed. Figure 5 explains how to link the adjacent faces of an edge cyclically, where e.faceH is the head of the link. For the case shown in Fig. 5, e is supposed to be the first edge of its adjacent faces; therefore, the first neighboring index of each face is used to point to the next face.
A frequently employed routine is the search of a facet with its three corner nodes as inputs. Its brute-force implementations need to traverse the facet list. Therefore, a hash table is created to improve the performance of this routine, where the smallest index of the corner nodes is the key value of a facet, and the facets having an identical key value form a backup list. Likely, this technique is used to speed up the routine that searches a surface edge by its corner nodes.

Classifications and reverse classifications
Three basic mappings between their topology entities are defined as follows to connect the B-rep and the surface mesh in [21]: 1. The face-facet mapping. A face corresponds to a set of facets. 2. The curve-edge mapping. A curve corresponds to a set of edges.

The point-vertex mapping. A point corresponds to a vertex.
Other mappings can be defined as well, e.g., between a curve and all vertices that lie on the curve, or between a face and all edges that bound the face. As these additional mappings can be derived from the basic mappings, they are not explicitly represented.
Two definitions are introduced below to describe the above mappings [21,28]:

Setting up connections between CAD model and surface mesh
This algorithm is employed to set up three basic mappings between a CAD model and a surface mesh. A bottom-up workflow is thus developed, which sets up the point-vertex mapping first, then the curve-edge mapping, and finally the face-facet mapping (see Fig. 6).
Here we depict the respective procedures setting up these three mappings in details as below.

Setting up point-vertex mapping
This procedure traverses all the CAD points and attempts to classify a surface mesh vertex on each point. Here, we say a vertex is classified on a point when their distance is smaller than a user-specified tolerance. The timing performance of a brute-force implementation is evidently unacceptable. An octree is used to speed up the computation presently.

Setting up curve-edge mapping
This procedure traverses all the CAD curves and attempts to classify a set of surface edges on each curve. Here, we say an edge is classified on a curve when the distances between the ending points of the edge and the curve are all smaller than a user-specified tolerance. The timing performance of a brute-force implementation by computing the distances of each pair of a curve and a mesh vertex is unacceptable. Algorithm 1 presents the improved version of this procedure.

Setting up face-facet mapping
This procedure traverses all the CAD faces and attempts to classify a set of surface facets on each face. Here, we say a facet is classified on a face when all the corner vertices of the facet are classified on the face. Since the surface is trimmed by its boundary loop (see Fig. 2), we say a point is classified on a face if and only if: 1. The distance between the vertex and the supporting surface of the face is smaller than a user-specified tolerance; and 2. The vertex is located within the valid trimmed region of the face.
To investigate whether the second condition is met, a winding number algorithm is presently implemented in the parametric space of the face [29].
The timing performance of a brute-force implementation by computing the distances of each pair of a face and a mesh vertex is unacceptable. We implemented an improved version, which identifies one facet classified on the face first and then employs the coloring algorithm to search all the other facets classified on the face. Algorithms 2 and 3 present the procedure identifying the first facet and the coloring procedure, respectively.

Projecting point to curve
Given a parametric curve r = r(u), the expression of the problem becomes: subjecting to Here, r * is the physical coordinates of the point, u is the parametric coordinate of r * , and ε is a distance tolerance for duplicate points. Presently, the Brent's algorithm [30] is employed to solve the above nonlinear programming (NLP) problem.

Projecting point to surface
Given a parametric surface r = r(u, v), the expression of the problem becomes: (1) min | r * − r(u) |,

subjecting to
Here, r * is the physical coordinates of the point, (u, v) are the parametric coordinates of r * , and ε is a distance tolerance for duplicate points.
Solving the above NLP problem takes the following steps [31]: 1. Check if any singular point of the surface meets Inequality (4). If no singular point meets Inequality (4), seek the solution of NLP (1) and (2) in the parametric spaces of the boundary curves as mentioned in Section 4.2.

If no solution is targeted in
Step 1, seek the solution in the interior of the parametric space of the surface by using the GSA (Geometric Strategy Algorithm) for orthogonal projection [32].

If
Step 2 still fails to present a solution, a direct searching procedure is executed, which is more robust but less efficient. This procedure creates a quadtree to cover the parametric space by recursive refinement: amongst existing quadrants, the one that is the closest to r * (e.g., evaluated by the distance of the centroid of the quadrant and r * ) is subdivided into four child quadrants. This refinement procedure is repeated until the centroid of a quadrant meets Inequality (4) or the size of the quadrant is smaller than the floating-point precision.

List of APIs for parallel mesh refinement
For the convenience of external callings, a few APIs are implemented in the kernel to wrap the query of data structures and the key algorithms presented in Section 4. Listed as below are 8 APIs that will be called by the parallel mesh refinement algorithm. As discussed in Section 5.2, a few callings of these APIs are powerful enough to enable the refined mesh to respect the original geometry. Figure 7 presents the overall workflow of the parallel mesh refinement algorithm, which adopts a manager/worker structure. The manager inputs a CAD model and a coarse hybrid prism-tetrahedra mesh, then extracts the surface of the volume mesh and sets up the connections between the CAD model and the surface mesh. After that, the manager employs a graph partition tool named Metis [33,34] to subdivide the volume mesh into sub-meshes and distribute each sub-mesh onto the workers. Note that the manager will get one sub-mesh for itself such that it can do the same work as the workers. As will be demonstrated in Section 6, the memory consumption of lightweight kernel is neglected in real applications. Therefore, each worker also gets all the data stored in the ultralight geometry kernel such that the subsequent projection procedure could be executed in parallel. See Fig. 8 for an example illustrating the above workflow. After the sequential mesh partitioning, all the following steps are conducted in parallel. Each worker refines its sub-mesh by using the subdivision templates shown in Fig. 9,   Fig. 7 Overall workflow of the parallel mesh refinement algorithm and then projects newly inserted surface points onto the input CAD model. To improve the mesh quality after projection, a moving mesh technique by using radial basis functions (RBF) is employed [35][36][37][38]. If the available memory on the manager process is large enough, the distributed refined mesh could be sent back to the manager for combination. However, since the algorithm is developed to deal with large-scale meshes, the output is a distributed mesh in default, which is produced by repartitioning the refined mesh to achieve a better tradeoff between loading balance and communications. The parallel version of Metis, namely ParMetis, is presently employed to achieve this goal.

Parallel mesh refinement
Three procedures of the parallel mesh refinement algorithm require the callings of APIs listed in Section 5.1.
1. In the initialization procedure, the manager calls Initialize_map(…) to set up the connections between the CAD model. 2. In the refinement procedure, the workers call Attach_face(…) to classify newly formed surface facets onto CAD faces and Detach_face(…) to remove the connections between split facets and CAD faces. Fig. 8 An example illustrating the parallel mesh refinement workflow. a is the input coarse mesh. b is the partitioned result of the input mesh. c is the refinement result of (b). d is the final mesh after projecting new surface points onto the original CAD model. Note that a mesh deformation is carried on after projection to improve shape quality of volume elements   Fig. 11 Comparison of the refinement results with and without the projection step enabled. a is the mesh with this step disabled, b is with this step enabled, and c presents a color map describing how far the mesh (a) deviates from the original CAD model

Numerical experiments
To prove the effectiveness and efficiency of the proposed method and library, numerical tests were carried out on TianHe Exascale Prototype System, in which each computer node contains a Matrix-2000+ CPU configured with 32 computer cores and 16 GB memory. Three typical aerodynamics inputs (see Fig. 10) are selected and their info is listed in Table 2.
The mapping and projection are accomplished by the ultralight geometry kernel. The mapping step is a serial process and only depends on the complexity of the input geometry and the size of initial surface mesh and it is not a compute-intensive task.
The projection process is a critical step in mesh refinement. Figure 11 shows how the projection process improves the quality of a mesh on the missile model. Figure 11a is the one-level refinement result before projection and 11b is its counterpart after projection. The mesh in Fig. 11b is visually smoother than the mesh in Fig. 11a because the mesh after projection respects the original geometry more accurately. To quantify the movement distance resulted by the projection, Fig. 11c presents a color map of that distance, in which each node is assigned a value representing the distance between the origin position of a mesh point and the original geometry. Table 3 lists the timing data of the overall refinement algorithm. The timing cost of the projection step is relatively small in all tests. By comparison, if a general-purpose CAD kernel such as OpenCascade is employed, this cost can be much larger, as reported in [13]. To evaluate the efficiency of our projection algorithm, we compared our method with another open-source lightweight geometry kernel named EGADSlite, which is reported much faster than OpenCascade in such a kind of computation [15].
The geometry models provided in the EGADSlite project are used as inputs. Testing meshes are generated separately by the two libraries with similar element number and the projection of EGADSlite is accomplished by the EGlite_invEvaluate() function. Table 4 shows the time cost of the two projection algorithms on a personal computer with an Intel I5-6500 CPU and 16 GB memory. Table 4 shows that our projection algorithm is faster than EGADSlite in all cases. The last step in the parallel refinement is mesh deformation, which takes the largest portion of the total time in all the tests listed in Table 3. The primary reason is that we presently employ an RBF algorithm without incorporating any data reduction techniques [38], which ensures a better mesh deformation effect at a rather huge cost of computing time. Nevertheless, the percentage of this portion deceases evidently when more computer cores are invested owing to the efficient parallelization of this step.
The projection step directly replaces newly inserted surface mesh nodes with their projections on the original CAD model. This process may degrade or even invert the elements connected to these nodes, in particular when the elements are stretched prisms with very small lateral edges. Figure 12b presents a case in which some prisms near the wall of the missile model are inverted due to the projection. After performing the RBF-based mesh deformation, the quality of these elements is remarkably improved, as seen in Fig. 12c.
Finally, we analyze the mesh quality statistics with the metric Equiangular skewness [39] which is used to evaluate the shape quality of elements. This value varies between 0 and 1: 0 stands for an element with the best quality and 1 for a degenerate element. In real simulations, it is required that the skewness values of the majority of elements are below 0.8 and the elements with their skewness values above 0.9 should be removed as many as possible. For the F6 case, Fig. 13 presents the prism element quality of the initial mesh, the projected mesh and the final mesh (after deformation). In general, the quality of the refined elements highly depends on the input ones. However, projection and deformation can make certain improvement of mesh quality. In this case, the projection and deformation procedure increases the percentage of elements with satisfactory skewness values (i.e., below 0.2) and decreases unsatisfactory skewness values (i.e., above 0.8) remarkably.

Conclusions
In this study, it is shown that an ultralight geometry processing library is powerful enough to ensure the refined surface respects the original CAD model accurately. A very small set of data-structures and algorithms is included in the kernel. Techniques are developed to improve the robustness and efficiency of these data-structures and algorithms. Meanwhile, with the aid of this kernel, a parallel mesh refinement algorithm is enhanced with the ability to respect the original CAD model at a small computing cost. Numerical experiments configured with geometry with industry-level complexity are presented to certify the performance of the developed algorithms.