Optimizing EPLB by Integer (Conic) Linear Programming

In the last post, I reviewed the code of EPLB (Expert Parallelism Load Balancer). As a quick recap, EPLB is a toolbox for expert load balancing in the MoE architecture, it outputs the expert replication plan, grouping plan, and reallocation plan. The main idea of the whole algorithm is to decompose the joint decision problems into three subproblems and solve each by greedy methods.

The main algorithms used are balanced expert replication and balanced packing. In EPLB, they are all solved by greedy algorithms (see my last post for the pseudocode). In this post, I will present how to solve these problems by optimization, in particular, by integer (conic) linear programming. Besides optimization formulation, I also implemented these algorithms in Python. The interface is the same as EPLB, and I termed it as IPLB (Integer Programming based Load Balancer). I have opensourced the algorithm on Github:

https://github.com/zwyhahaha/IPLB

Any feedback is welcome!

Function: `replica_experts`

Input: weights(num_layers, num_log_experts), num_phy_experts

Objective: replicate experts, for each layer, minimize the maximal workload.

For each layer $l$, the weight (workload) is $w_l = (w_{l1}, …, w_{le})$, the number of physical experts (replicas) is $c$. The decision variable is the number of replicas for each logical expert, $r_l = (r_{l1}, …, r_{le})$. The optimization problem is,

\[\begin{array}{rcl} \min_{r_l} & \max_{1 \leq j \leq e} \frac{w_{l j}}{r_{l j}} & \\ \text{s.t.} & \sum_{j = 1}^e r_{l j} \leq c & \\ & r_{l j} \in \mathbb{Z}_+ & \end{array}\]

After reformulation,

\[\begin{array}{rcl} \min_{\alpha, {\mathbf{r}}_l} & \alpha & \\ \text{s.t.} & w_{l j} \leq \alpha r_{l j} & \forall 1 \leq j \leq e\\ & \sum_{j = 1}^e r_{l j} \leq c & \\ & r_{l j} \in \mathbb{Z}_+ & \end{array}\]

The constraint that, $w_{l j} \leq \alpha r_{l j}$ is bilinear. However, since $\alpha$ and $r_{l j}$ are both positive, this constraint is actually convex and is semidefinite cone or second-order cone.

semidefinite cone

\[\left(\begin{array}{cc} \alpha & \sqrt{w_{l j}}\\ \sqrt{w_{l j}} & r_{l j} \end{array}\right) \succcurlyeq 0\]

which is equivalent to $\alpha \geq 0, r \geq 0,$$w_{l j} \leq \alpha r_{l j}$.

Second order cone.

\[\left(\begin{array}{c} \alpha\\ r_{l j}\\ \sqrt{2 w_{l j}} \end{array}\right) \in \mathcal{Q}_r^3\]

where $\mathcal{Q}_r^3 = { (x, y, z) \in \mathbb{R}^3 : 2 x y \geq z^2, x \geq 0, y \geq 0 }$ represents rotated second-order cone. Equivalently, this is a second-order cone after linear transformation:

\[\left(\begin{array}{c} \alpha\\ \frac{r_{l j} - \sqrt{2 w_{l j}}}{2}\\ \frac{r_{l j} + \sqrt{2 w_{l j}}}{2} \end{array}\right) \in \mathcal{Q}^3\]

where $\mathcal{Q}^3 = { (x, y, z) \in \mathbb{R}^3 : z^2 \geq x^2 + y^2 }$.

Therefore, the problem replica_experts can be solved by integer conic linear programming, which is supported by GUROBI and COPT!

Function: `balanced_packing`

Input: weights(num_layers, num_obj), num_packs

Objective: assign num_obj objects to num_packs packs, for each layer, minimize the maximal workload among packs.

For each layer $l$, the weight is ${\mathbf{w}} = (w_1, …, w_n)$. The decision variable is, the assignment from objects to packs $a_{i j}$. The optimization problem is,

\[\begin{array}{rcl} \min_{ a } & \max_{1 \leq j \leq m} \sum_{i = 1}^n w_i a_{i j} & \\ \text{s.t.} & \sum_{j = 1}^m a_{i j} = 1 & \\ & \sum_{i = 1}^n a_{i j} \leq \frac{n}{m} & \\ & a_{i j} \in \{ 0, 1 \} & \end{array}\]

After reformulation,

\[\begin{array}{rcl} \min_{a} & \alpha & \\ \text{s.t.} & \alpha \geq \sum_{i = 1}^n w_i a_{i j} & \forall 1 \leq j \leq m\\ & \sum_{j = 1}^m a_{i j} = 1 & \\ & \sum_{i = 1}^n a_{i j} \leq \frac{n}{m} & \\ & a_{i j} \in \{ 0, 1 \} & \end{array}\]

which is a standard integer linear programming.

Optimizing EPLB by Integer (Conic) Linear Programming

Function: `replica_experts`

Function: `balanced_packing`

CATALOG

FEATURED TAGS

Function: replica_experts

Function: balanced_packing

CATALOG

FEATURED TAGS

Function: `replica_experts`

Function: `balanced_packing`