DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems

Zhen Tong Dec-20-2023
https://arxiv.org/pdf/2210.04123.pdf

Introduction

Here is the corrected version:

This paper is written by Ruizhong Qiu from UIUC, and Zhiqing Sun∗, Yiming Yang from CMU Language Technologies Institute. The paper has been accepted by NeurIPS 2022.

This research focuses on the scalable problem in Graph Combinatorial Problems. Because Deep RL suffers from the costly decoding of COP solutions, the model encounters the sparse reward problem when dealing with large graphs. You can understand this by considering the construction of the graph or improving the graph to the final state as a very long trajectory because the graph is very large, and it only receives the reward at the end of the graph. Therefore, the reward is very sparse with respect to the long trajectory.

The researchers introduced a compact continuous space to parameterize the underlying distribution of candidate solutions, allowing massively parallel on-policy sampling without the costly decoding process. This effectively reduces the variance of the gradients by the REINFORCE algorithm. They also use meta-learning as an initialization of the model parameters.

DIMES can be generalized to problems, including locally decomposable combinatorial optimization problems, such as the Maximal Independent Set (MIS) problem for synthetic graphs and graphs reduced from satisfiability (SAT) problems.

Details

The novelty of DIMES compared to other construction heuristics learners is:

DIMES uses improving heuristics and unsupervised reinforcement learning.

Now we define the problem formally:

fs=argmaxfFscs(f)f_s^* = \arg\max_{f\in \mathcal{F}_s}c_s(f)

Fs\mathcal{F}_s is the set of all the solutions, ff is one solution, ss stands for an instance, cs()c_s()  is the function for the solution fsf_s cost. The solution space of COP is discrete, for example, the path of TSP, or the combination of MIS. However in the paper they parameterize the solution space with a continuous and differentiable vector θRVs\theta\in\mathbb{R}^{|\mathcal{V}|_s}, where Vs\mathcal{V}_s denotes the variables in the problem instances ss, for example, edges in TSP and nodes in MIS, and estimates the probability of each feasible solution ff as:

pθ(fs)exp(i=1Vsfiθi)p_\theta(f|s)\propto \exp(\sum_{i = 1}^{|\mathcal{V}_s|}f_i\cdot\theta_i)

where pθp_\theta is an energy function over the discrete feasible solution space, ff is a Vs|\mathcal{V}_s|-dimensional vector with element fi{0,1}f_i\in\{0, 1\}indicating whether the ithi^{th} variable is included in feasible solution ff, and the higher value of θi\theta_i means a higher probability for the ithi^{th} variable produced by pθ(fs)p_\theta(f|s)

Now the target function is the expectation over the continuous solution distribution for fpsf\sim p_s

lq(θs)=Efps[cs(f)]l_q(\theta|s) = \mathbb{E}_{f\sim p_s}[c_s(f)]

According to the REINFORCE-based update rule:

θEfpθ[cs(f)]=θEfpθ[(cs(f)b(s))θlogpθ(f)]\nabla_\theta\mathbb{E}_{f\sim p_\theta}[c_s(f)] =\nabla_\theta\mathbb{E}_{f\sim p_\theta}[(c_s(f)-b(s))\nabla_\theta\log p_\theta(f)]

Nevertheless, a common practice to sample from the energy pθ functions
requires
MCMC(energy-based learning), which is not efficient enough. Hence we propose to design an auxiliary distribution qθ over the feasible solutions Fs, such that the following conditions hold: 1) sampling from qθq_\theta is efficient, and 2) qθq_θ and pθp_θ should converge to the same optimal θθ^*. Then, we can replace pθp_θ with qθq_θ  in our objective function. b(s)b(s) is the baseline.

Auxiliary Distribution For TSP

Only having a vector for the solution is not enough to describe the TSP solution. we need to add another distribution to describe the path. Because the TSP path is a ring, that can start from any point. Therefore

qTSP(πf(0)=j):=1nq_{TSP}(\pi_f(0) = j):=\frac{1}{n}\\
qθTSP(f):=j=0n11nqTSP(πfπf(0)=j)q_\theta^{TSP}(f):=\sum_{j = 0}^{n-1}\frac{1}{n}q_{TSP}(\pi_f|\pi_f(0)=j)

Given the start node πf(0)\pi_f(0), we factorize the probability via chain rule in the visiting order:

Auxiliary Distribution For MIS

In the MIS problem, the solution is a set, but our sample is a vector, therefore we need to use a distribution to describe it.

qθMIS=a{a}fqMIS(a)q_\theta^{MIS} = \sum_{a\in\{a\}_f}q_{MIS}(a)

aa is an ordering of the independent nodes in solution ff, and {a}f\{a\}_f is the set of all possible orderings of the nodes in ff.

Meta-Learning

They train the GNN using the Meta-Learning conception. The idea is simple, select some graph instances and view them as tasks, and sample solutions for the graphs, after collecting many samples with gradients update the parameter for GNN

Φ\Phi is the parameter for the GNN, KsK_s is the input features for a graph instance ss  in the collection CC, AsA_s si the adjacent matrix of the graph,

The meta-learning updates is:

Performance