Blogs

A Step-by-step Derivation of ADMM from DRS

In this note, we present a step-by-step derivation of the Alternating Direction Method of Multipliers (ADMM) from Douglas-Rachford Splitting (DRS). This derivation is adapted from the book below an...

Generating Lyapunov Functions for Gradient Descent by SDP

This blog is the reading note of the following paper: [1] Taylor, Adrien, Bryan Van Scoy, and Laurent Lessard. ‘‘Lyapunov functions for first-order methods: Tight automated convergence guarante...

Reflections from My First Academic Talk

I just gave my first academic talk at a conference at SJTU IIC yesterday. I presented an ongoing project, and I had hesitated for a long time about whether to give a talk on an “incomplete” work. I...

Helpful Resources in Grad School

I’ve benefited greatly from reading advice posts—especially during my graduate school application. In this post, I’ve collected some of the most helpful resources I’ve come across, covering both gr...

Routines for Setting Up a New Server

Lately, I’ve been running deep learning experiments across different computing clusters. Every time I switch to a new server, I have to go through a series of setup steps to get my environment read...

Optimizing EPLB by Integer (Conic) Linear Programming

In the last post, I reviewed the code of EPLB (Expert Parallelism Load Balancer). As a quick recap, EPLB is a toolbox for expert load balancing in the MoE architecture, it outputs the expert replic...

Code Review | Expert Parallelism Load Balancer

DeepSeek recently released a simple yet effective toolbox for load balancing in Mixture of Experts (MoE) architectures. The EPLB toolbox consists of only one Python file and has already received 1....

Writing LaTeX Locally on macOS

Previously, I used Overleaf to write .tex files. It’s convenient, beginner-friendly, and great for collaboration. However, it only works online, which means you can’t draft your paper on a flight (...

High Probability Analysis for SGD

Beyond Bounded Domain and Bounded Gradients

Long time no see! This one is the longest post I have written so far—so grab a drink, it will take a little time to read! For better readability, you can refer to the pdf version. I am learning ho...

Proof of the Contraction Properties of PDHG

In this blog, we introduce how to simply derive the nonexpansiveness and contraction properties of primal-dual hybrid gradient method (PDHG) iteration through the language of operator theory. In [...

What is Good Research? A Catalog of Professional Views

I’ve been quite busy with PhD interviews recently, and I’ve found the experience to be very rewarding. I see interviews as a great opportunity to engage in meaningful conversations with experts. Du...

TeXmacs Tips

Efficient Math Typing, Crash Fixes, and More

About TeXmacs TeXmacs is my favorite text editor, especially useful for those who frequently need to type mathematical formulas. I highly recommend giving it a try! In case you need help getting ...

Performance Estimation Problems II

Convergence Proofs and Stepsize Optimization

This is the second post in a series on Performance Estimation Problems (PEP). In this post, I’ll introduce applications of the PEP framework, particularly in convergence proofs and stepsize optimiz...

Performance Estimation Problems I

Methodology Review

This is the first post in a new series on Performance Estimation Problems (PEP). I’ve divided the series into two parts: the first introduces the PEP framework, and the second covers applications o...

Polynomial Optimization II

Multivariant problems

This note is taken from the summer course, in which Prof.Cédric Josz makes everything clear and intuitive! This blog is about multivariate polynomial optimization, including both unconstrained and ...

Polynomial Optimization I

Univariant unconstrained problems

This note is taken from the summer course, in which Prof.Cédric Josz makes everything clear and intuitive! This blog is about univariate unconstrained polynomial optimization, and the multivariate ...

Equivalence of PDHG and DRS

It is common to hear that Primal-Dual Hybrid Gradient (PDHG) and Douglas-Rachford Splitting (DRS) are equivalent, and this blog is about why. I read Daniel and Vandenberghe’s paper [1] and organize...