Over the Rainbow

Helpful Resources for Grad School and Beyond

I’ve benefited greatly from reading advice posts—especially during my graduate school application process. In this post, I’ve collected some of the most helpful resources I’ve come across, covering...

Routines for Setting Up a New Server

Lately, I’ve been running deep learning experiments across different computing clusters. Every time I switch to a new server, I have to go through a series of setup steps to get my environment read...

Optimizing EPLB by Integer (Conic) Linear Programming

In the last post, I reviewed the code of EPLB (Expert Parallelism Load Balancer). As a quick recap, EPLB is a toolbox for expert load balancing in the MoE architecture, it outputs the expert replic...

Code Review | Expert Parallelism Load Balancer

DeepSeek recently released a simple yet effective toolbox for load balancing in Mixture of Experts (MoE) architectures. The EPLB toolbox consists of only one Python file and has already received 1....

Writing LaTeX Locally on macOS

Previously, I used Overleaf to write .tex files. It’s convenient, beginner-friendly, and great for collaboration. However, it only works online, which means you can’t draft your paper on a flight (...

A Random Trip in Beijing

Visa Interview Tips & Travel Snapshots

About visa interview I’m traveling to Beijing for my U.S. F1 visa interview. Good news first: my visa was approved within 60 seconds of the interview starting—super smooth! I wanted to share a q...

High Probability Analysis for SGD

Beyond Bounded Domain and Bounded Gradients

Long time no see! This one is the longest post I have written so far—so grab a drink, it will take a little time to read! For better readability, you can refer to the pdf version. I am learning ho...

A Short Escape to Jeju

I recently took a three-day trip to Jeju Island with my friends—a much-needed escape from our busy routines. On the first day, the sky was overcast, and the ocean looked dark and moody. But the se...

Proof of the Contraction Properties of PDHG

In this blog, we introduce how to simply derive the nonexpansiveness and contraction properties of primal-dual hybrid gradient method (PDHG) iteration through the language of operator theory. In [...

What is Good Research? A Catalog of Professional Views

I’ve been quite busy with PhD interviews recently, and I’ve found the experience to be very rewarding. I see interviews as a great opportunity to engage in meaningful conversations with experts. Du...