A new multi-level algorithm for balanced partition problem on large scale directed graphs

Graph partition is a classical combinatorial optimization and graph theory problem, and it has a lot of applications, such as scientific computing, VLSI design and clustering etc. In this paper, we study the partition problem on large scale directed graphs under a new objective function, a new instance of graph partition problem. We firstly propose the modeling of this problem, then design an algorithm based on multi-level strategy and recursive partition method, and finally do a lot of simulation experiments. The experimental results verify the stability of our algorithm and show that our algorithm has the same good performance as METIS. In addition, our algorithm is better than METIS on unbalanced ratio.

efficient heuristic algorithm for 2-BGP with time complexity O(n 2 log n). Then, Fiduccia and Mattheyses [10] developed a linear heuristic algorithm. Spectral method [11] is also an important method to solve BGP. This method divides the given graph into two parts, by using their eigenvalues and eigenvectors of its adjacency matrix or Laplacian matrix. At present, there are many graph partition algorithms based on spectral method [12,13], which can solve 2-BGP or general k-BGP iteratively.
On the other hand, with the increasing of the problem scale and improvement of the computing power, the size of the graph to be partitioned is becoming larger and larger, and the number of vertices of the graph reaches 100,000,000 or more. Thus, it is impractical to use the previous algorithms to solve large scale graph partition problem. Therefore, researchers proposed multi-level method and streaming algorithms to solve this problem. The main idea of multi-level method is to convert the original graph into a small scale resulting graph by multiple contraction firstly, then divide the new graph into k-parts, and finally back map and modify the partition of the contracted graph to become a partition of the original graph. The popular software and software package of graph partition, METIS [14] and KaHIP [15] were designed based on this method. The main idea of the streaming algorithm is to assign each vertex of the graph into the suitable part one by one, through a specific potential function. The advantage of streaming algorithm is fast and memorysaving, and it is very suitable for large-scale graph partition problem. The graph partition software FENNEL is based on streaming algorithm [16].
Although a lot of theoretical results and algorithms on graph partition have been obtained, there are still some problems that have not been explored. The first problem is partition on directed graph. Most of the previous works are on undirected graphs, but for some practical applications, such as multi-subject coupling problem, the corresponding models should be directed graph. Therefore, it is necessary to study the partition on directed graphs. The second one is about the objective function. In the past, researchers often considered the vertex-weight and the edge-weight separately, that is, to optimize some edge-weight objective functions under some vertex-weight constraints. There are few works on objective functions combining the two weights together. Based on the above two points, we study the directed graph problem with combined weight function.
The organization of this paper is as follows. Some basic conceptions of graph theory and the mathematical modeling of this problem will be presented in Section 2. In Section 3, we introduce the main idea and process of our algorithm. The experimental results are exhibited in Section 4. In detail, we will verify the stability of our algorithm, determine some parameters and compare our algorithm with METIS. Finally, the conclusion and future work are given in Section 5.

Basic conceptions and mathematical modeling
In this section, we will introduce some conceptions in graph theory and develop the mathematical programming for the new balanced graph partition problem.
A (undirected) graph G is an ordered pair (V (G), E(G)) consisting of a set V (G) of vertices, and a set E(G) of edges. Each edge of G is an unordered pair of vertices. If an edge e joins vertices u and v, then u and v are called the ends of e. A directed graph D is an ordered pair (V (D), A(D)) consisting of a set V (D) of vertices, and a set A(D) of arcs (directed edges). Each arc of D is an ordered pair of vertices. If an arc a joins vertices u to v, then u is the tail of a, v is the head of a, and u and v are the ends of a. For any graph, if we regard each edge e = uv as two arcs (u, v) and (v, u), then this graph becomes a directed graph. Thus, undirected graphs can be considered as a special class of directed graphs. For any vertex v in D, the notation A − D ({v}) is the sets of arcs whose heads are v, and the notation A + D ({v}) is the sets of arcs whose tails are v. Furthermore, for any vertex subset X, A − D (X) (A + D (X)) is the sets of arcs whose heads (tails) are in X, but tails (heads) are not in X. A set M of independent arcs (no common ends) in a digraph D is called a matching.
Given a specific k-partition P, for any part j, we define its load w(a). Let L P M and L P m be the maximum load and minimum load among all parts in P, that is, Thus, we model the balanced graph partition problem as the following unconstrained two-objective programming, where P is the set of all k-partitions of G and ρ P is the unbalanced ratio of the partition P.
As mentioned in Section 1, our problem differs from the one in METIS in two points. The first is that METIS only deals with undirected graphs, but our problem is defined on directed graphs. The second is the different objections. The optimization problem of METIS is as follows, where E C is the set of edges whose ends are in distinct parts, and ρ ≥ 1 is the unbalanced ratio of the vertex weights. That is to say, the model of METIS considers vertices and edges separately, but we consider them together.

Algorithm
Since the scale of the graphs we're going to deal with is very large (up to 100,000,000 vertices), and the number of parts is also large (up to 100,000), our algorithm is designed by combining the classical multi-level method and the recursive partition method.

Multi-level stage
Recently, the popular method to partition the large scale graph is the multi-level method. The multi-level method contains three phases: iterative contraction, initial partition and modification, and backward mapping. We will introduce the detail of each phase in the following. PHASE 1: Iterative Contraction. In this phase, we will construct a sequence of directed graphs To do this, we use the standard strategy for any current graph D i . We compute a maximal matching M i and contract every arc of M i into a new vertex to obtain the next graph D i+1 . In detail, for any arc a = (u, v) of M i , the process of contraction is removing a and a = (v, u) and identifying u and v as a new vertex x so that it is incident with whose arcs (other than a and a ) that were originally incident u or v or both. The weight of new vertex x is the sum of weights of vertices u and v, and the weight of This phase ends when one of the following occurs: (i) the number of vertices of the current graph is less than ck, where k is the number of parts of the partition and c = 90 is the contracted parameter chosen by our experiments in the next section; (ii) the ratio of contraction |V To compute the maximal matching, we will use the following two random methods.

Random Maximum Weight Matching (RMWM).
This classical method is used in METIS [14] and other multi-level algorithms [15]. The process of RMWM is as follows. The vertices of the graph are chosen by a random order. For a chosen vertex u, if u is already matched by other vertex or its in-neighbors are all matched, we choose the next vertex. Otherwise, u is matched with its unmatched in-neighbor v with the maximum When all vertices are chosen, we can obtain a maximal matching. Random Maximum Ratio Matching (RMRM). The motivation to use this matching is the new objective functions. The only difference between the processes of RMRM and RMWM is the way to choose a vertex to match a vertex u, from its in-neighbors. Since the objective function considers the weights of vertices and arcs together, u is matched with its unmatched in-neighbor v with the maximum ratio of arc-weight to vertex-weight, that is,

PHASE 2: Initial Partition and Modification.
After iterative contraction, the final graph D m has at most ck vertices. Thus, we can fast obtain a good initial partition by greedy strategy. In detail, we will use the best fit decreasing (BFD) algorithm similar to that of solving the bin-packing problem. Firstly, we set every part P j = ∅ for any j = 1, 2, . . . , k and reordering the vertices with decreasing vertex-weight. For each stage, if we put the current vertex v into the j-th part, then the load of the j-th part will become and the load of other part i ( = j) will become Thus, we put v into the part so that the maximum load is minimum. When all the vertices are visited, the initial partition P is obtained.
The aim of modification is to make the initial partition a local optimum. The main strategy is local search, that is, move a vertex of the maximum load part into another part to reduce the maximum load, iteratively. In detail, for current iteration, we firstly choose a part P j with the maximum load. Then, for any vertex v in P j , we calculate its in-arc-weight Now, if we move vertex v from part P j into part P i , then the load of any part other than P i and P j has not changed, and the new loads L j and L i become For every pair (v, P i ), we can calculate the maximum load and the sum of loads of the swapped partition.
If there exist some swapped partitions whose maximum load is less than that of the current partition, then we choose the swapped partition with minimum maximum load to replace the current one, and repeat this operation. Otherwise, if there are some swapped partitions whose maximum load is equal to that of the current partition, but the sum of loads is less than that of the current partition, we choose the partition with minimum sum of loads instead of the current one, and repeat this operation; else, the current partition achieves a local optimum, and the process of modification is finished.

Recursive partition stage
As stated in the former subsection, the phase of iterative contraction ends when the number of vertices of contracted graph D m is less than 90k, where k is the number of parts of desired partition. This implies that if k is large, the scale of D m is also large, which can result in bad performance and long running time. Thus, we use the recursive partition strategy to avoid this.  The main idea of the recursive partition method is as follows. At the beginning, we factorize k into several small numbers, say, k = k 1 k 2 · · · k t , with k i ≤ 20. This can often be accomplished, because in practice k is often chosen to be a number with many factors. In the first step, we use the multi-level method to obtain a k 1 -partition P of the original graph. Since k 1 is small, we can guarantee good performance and short running time. Based on the partition P, the whole graph is decomposed into k 1 subgraphs, and each is induced by a part in P. Note that the weight of arcs in the subgraphs is the same as that in the original graph, but the weight of every vertex v needs to be changed as follows, Fig. 1 The unbalanced ratios of the two types of maximal matching where P[ v] is the part which v belongs to P. The purpose of changing vertex-weight is to ensure that the objective value for each subgraph sums up to the one for the whole graph. In the second step, we will divide every subgraph into k 2 parts, and obtain k 1 k 2 new subgraphs by decomposing all old subgraphs. Hence, in the last step, we have k 1 k 2 · · · k t−1 subgraphs and obtain a k t -partition of every subgraph. That is, we obtain a partition of the original graph with k 1 k 2 · · · k t = k parts.
How to choose a recursive partition strategy? Based on our experiments in the next section, we find that there is little difference between different strategies. Thus, if k is a power of some integer b ≤ 20, that is k = b t , then we divide k into b × b × · · · × b.

Experimental results
In this section, our experiment is mainly divided into two parts: design of algorithm and comparison with other algorithms. In the part of design of algorithm, we will test the performance of the two random matching methods, verify the stability of random method, and determine the contracted parameter c and strategy of recursive partition. In the comparison part, we will compare our algorithm with the k-way partition algorithm in METIS on unbalanced ratio, maximum load and running time to evaluate the performance of our algorithm.
The directed graphs used in the experiment consist of two classes, theoretical and practical models. We use the grid graph as the representative of the theoretical model, which can also be regarded as the inner dual graph of the square grid of a plane. We consider grid graphs of three sizes, namely, Grid-1 with 1,000,000 vertices and 3,996,000 arcs, Grid-2 with 10,890,000 vertices and 43,546,800 arcs, and Grid-3 with 100,000,000 vertices and 399,600,000 arcs, each of which has a random vertex-weight of 120-150, and the weight Fig. 2 The ratio of the max-load of the RMRM to that of RMWM. Bars above the baseline indicate that the performance of RMRM is worse than RMWM   Table 1. All the experiments were performed on a Dell T7610 graphics workstation with Intel Xeon 2.6GHz CPU (6 cores) and 1866mhz DDR3 32 GB memory.

Matching comparison
The aim of the subsection is to test the performance of the two matching contraction methods, RMWM and RMRM mentioned in Subsec. 3.1. We do the experiment on five graphs, Grid-1, Grid-2, MDual, FEM-1 and FEM-3. The small-scale graphs (Grid-1, Fig. 4 The ratios of best and worst max-load and running time to the relative average results. The baseline is the relative average value  Table 2, Figs. 1 and 2. Figure 1 illustrates that the unbalanced ratios of RMWM are better than that of RMRM, except for the maximum unbalanced ratio of 100-partition on MDual. Figure 2 implies that in term of max-load, while the performance of RMWM is better than that of RMRM, the gap is very small and the maximum ratio is less than 1.012. Hence, we use the method in the following.

Stability verification
In this subsection, we will test the stability of the algorithm, that is, determining whether randomness brings a large deviation to the output. The same graphs with same parts are used in the experiment. We compare the experiment results from three aspects: unbalance ratio, max-load and running time. The detail can be seen in Table 3.
From Fig. 3, we can see that the gap between the best and the worst result is very small and does not exceed 0.70%. Furthermore, the unbalance ratio in every test case is quite small, less than 2.00% except the worst result of 10000-partition on FEM-3. Figure 4 illustrates the max-load and the running time, where the baseline is average values. For each example, the worst max-load is almost equal to the best one; the difference of running time is also very small, and the maximum ratio is about 1.10. Hence, the randomness of our algorithm does not bring much deviation, and it is very stable.

Determining parameters
In our algorithm, there is a parameter and a strategy that need to be determined. Firstly, we determine the parameter, contracted parameter c mentioned in Subsec.  Figure 5 shows the unbalanced ratios with different contracted parameter c. Figure 6 and Fig. 7 exhibit the ratios of results of other parameters at maximum load and running time to results of c = 90, respectively. From these figures, we can see that the unbalanced ratio will basically decrease with the increase of the contracted parameters, on the contrary, the max-load and the running time will often rise with the increase of the parameters. Overall, good performance occurs when the parameter is selected as 70, 90, 110. Thus, we will choose the parameter c = 90.
For the recursive partition strategy, by dividing the number k and doing corresponding experiments, we find that there is little difference between these results. The deviations of unbalanced ratio and ratio of max-load are at most 0.5% and 0.2%, respectively. Hence, we choose the simplest strategy, that is, divide k into a power of some integer b ≤ 20. For example, if k = 1000, our algorithm is divided into three stages, and each stage does 10-partition.

Comparison with METIS
In this subsection, we will compare the performance of our algorithm (Graph Partition) with the k-way partition in METIS by carrying out the experiments on the 11 graphs of Table 1. Since METIS can only deal with undirected graphs, we transform each directed graph in Table 1 into an undirected graph, by modifying the weight of every edge uv as . Then, the resulting undirected graphs are partitioned by the k-way partition. Finally, we calculate the unbalanced ratio and max-load of each graph with respect to the partition. The experimental results can be seen in Table 4, and the comparison can be seen in the following figures. Note that since the graph Grid-3 is huge (100,000,000 vertices and 399,600,000 arcs), METIS does not calculate a feasible result. Figure 8 illustrates the unbalanced ratios of partition results of the two algorithms. From the figure, we can see that the unbalanced ratio of small part is better than that of big part for each graph. This is a very natural phenomenon. Most of unbalanced ratios by our algorithm are less than 2%, and most of the results by METIS are between 6% and 9%. Clearly, our algorithm is better than METIS on unbalanced ratio. All unbalanced ratios of graph Copter are worse, and the reason is the average degree of Copter is much larger than others. Figures 9 and 10 show the ratios of max-load and running time of our algorithm to that of METIS. Figure 9 illustrates that most of all ratios of max-load are between 0.94 and 1.06. This implies that there is little difference between the two algorithms in terms of maximum load. Moreover, we can see that the ratio increases with the number of parts, and the main reason is that we do not use mutli-level modification in back mapping phase. And this is also a key direction in our future work. From Fig. 10, we can see that for the small k, our algorithm often runs longer than METIS; conversely, our algorithm often runs less time than METIS for large k. This difference is related to the number of iterations and the average number of vertices in each part. Fig. 9 The ratios of max-load of our algorithm to that of METIS

Conclusions and future work
In this paper, we consider the balanced partition problem on large scale directed graphs. Firstly, we present a new mathematical modeling with new objective functions for this problem. Then, we combine multi-level strategy and recursive partition method to design an algorithm to solve it. Finally, by a large number of experiments, we determine the parameters, verify the stability of the algorithm, and compare with k-way partition in METIS in unbalanced ratio, maximum load and running time three aspects. The experimental results show that comparing with METIS, our algorithm is better in unbalanced ratio and has the same quality in maximum load. Furthermore, our algorithm can deal with some graphs with huge scale, which METIS can not return a feasible result.
There are two possible directions for future work. The first one is adding modification in back mapping phase, that is, map the partition of D m back to that of D 0 level by level, and modify the partition of each level to be a local optimum. The second one is to ensure the connectivity of each part. Furthermore, finding a new good and efficient graph contraction method is also a meaningful work.