In the last post, I reviewed the code of EPLB (Expert Parallelism Load Balancer). As a quick recap, EPLB is a toolbox for expert load balancing in the MoE architecture, it outputs the expert replication plan, grouping plan, and reallocation plan. The main idea of the whole algorithm is to decompose the joint decision problems into three subproblems and solve each by greedy methods.
The main algorithms used are balanced expert replication
and balanced packing
. In EPLB, they are all solved by greedy algorithms (see my last post for the pseudocode). In this post, I will present how to solve these problems by optimization, in particular, by integer (conic) linear programming. Besides optimization formulation, I also implemented these algorithms in Python. The interface is the same as EPLB, and I termed it as IPLB (Integer Programming based Load Balancer). I have opensourced the algorithm on Github:
Any feedback is welcome!
Function: replica_experts
Input: weights(num_layers
, num_log_experts
), num_phy_experts
Objective: replicate experts, for each layer, minimize the maximal workload.
For each layer $l$, the weight (workload) is $w_l = (w_{l1}, …, w_{le})$, the number of physical experts (replicas) is $c$. The decision variable is the number of replicas for each logical expert, $r_l = (r_{l1}, …, r_{le})$. The optimization problem is,
\[\begin{array}{rcl} \min_{r_l} & \max_{1 \leq j \leq e} \frac{w_{l j}}{r_{l j}} & \\ \text{s.t.} & \sum_{j = 1}^e r_{l j} \leq c & \\ & r_{l j} \in \mathbb{Z}_+ & \end{array}\]After reformulation,
\[\begin{array}{rcl} \min_{\alpha, {\mathbf{r}}_l} & \alpha & \\ \text{s.t.} & w_{l j} \leq \alpha r_{l j} & \forall 1 \leq j \leq e\\ & \sum_{j = 1}^e r_{l j} \leq c & \\ & r_{l j} \in \mathbb{Z}_+ & \end{array}\]The constraint that, $w_{l j} \leq \alpha r_{l j}$ is bilinear. However, since $\alpha$ and $r_{l j}$ are both positive, this constraint is actually convex and is semidefinite cone or second-order cone.
- semidefinite cone
which is equivalent to $\alpha \geq 0, r \geq 0,$$w_{l j} \leq \alpha r_{l j}$.
- Second order cone.
where $\mathcal{Q}_r^3 = { (x, y, z) \in \mathbb{R}^3 : 2 x y \geq z^2, x \geq 0, y \geq 0 }$ represents rotated second-order cone. Equivalently, this is a second-order cone after linear transformation:
\[\left(\begin{array}{c} \alpha\\ \frac{r_{l j} - \sqrt{2 w_{l j}}}{2}\\ \frac{r_{l j} + \sqrt{2 w_{l j}}}{2} \end{array}\right) \in \mathcal{Q}^3\]where $\mathcal{Q}^3 = { (x, y, z) \in \mathbb{R}^3 : z^2 \geq x^2 + y^2 }$.
Therefore, the problem replica_experts
can be solved by integer conic linear programming, which is supported by GUROBI and COPT!
Function: balanced_packing
Input: weights(num_layers
, num_obj
), num_packs
Objective: assign num_obj
objects to num_packs
packs, for each layer, minimize the maximal workload among packs.
For each layer $l$, the weight is ${\mathbf{w}} = (w_1, …, w_n)$. The decision variable is, the assignment from objects to packs $a_{i j}$. The optimization problem is,
\[\begin{array}{rcl} \min_{ a } & \max_{1 \leq j \leq m} \sum_{i = 1}^n w_i a_{i j} & \\ \text{s.t.} & \sum_{j = 1}^m a_{i j} = 1 & \\ & \sum_{i = 1}^n a_{i j} \leq \frac{n}{m} & \\ & a_{i j} \in \{ 0, 1 \} & \end{array}\]After reformulation,
\[\begin{array}{rcl} \min_{a} & \alpha & \\ \text{s.t.} & \alpha \geq \sum_{i = 1}^n w_i a_{i j} & \forall 1 \leq j \leq m\\ & \sum_{j = 1}^m a_{i j} = 1 & \\ & \sum_{i = 1}^n a_{i j} \leq \frac{n}{m} & \\ & a_{i j} \in \{ 0, 1 \} & \end{array}\]which is a standard integer linear programming.