In this appendix, we provide a brief overview of two optimization techniques that are used repeatedly in this book: Lagrangian relaxation and column generation.
Consider an optimization problem of the form
Here, x is a vector of decision variables, b, c, and e are vectors of coefficients, and A and D are matrices. (It's not necessary that all of the x variables be binary; some or all can be continuous.) Suppose that (P) itself is hard to solve, but that the problem obtained by omitting constraints (D.2) is easier. In this section, we discuss Lagrangian relaxation, a method that is well suited to solve problems like this one. Similar approaches can also be applied to other types of problems, such as nonlinear programming problems.
There are many sources of additional information about Lagrangian relaxation in journal articles and textbooks; among the most user‐friendly treatments are the articles by Fisher (1981,1985).
The idea behind Lagrangian relaxation is to relax (i.e., remove) the hard constraints (D.2) to produce an easier problem. When we remove the constraints, we add a term to the objective function that penalizes solutions for violating the relaxed constraints. This penalty term uses a vector of Lagrange multipliers, one per constraint, that dictate the magnitude of the penalty. The Lagrangian subproblem is then given by
Problem (P‐LRλ) is easier to solve than problem (P). This, by itself, does not help us very much, because solutions to (P‐LRλ) will typically be infeasible for (P). But it turns out that the optimal solution to (P‐LRλ) provides us with a lower bound on the optimal objective value of (P). Feasible solutions to (P) each provide an upper bound on the optimal objective value. Such solutions must be found using some other method, typically using a heuristic that is executed once per iteration of the Lagrangian relaxation procedure. When the upper and lower bounds are close (say, within 0.1%), we know that the feasible solution we have found is close to optimal.
When choosing which constraints to relax, i.e., which constraints to label as “hard,” there are three main considerations:
Choosing which constraints to relax is not straightforward, and often some trial and error is required.
Let be the optimal objective value of (P) and let be the optimal objective value of (P‐LRλ). Let m be the number of rows in A, that is, the number of constraints in (D.2). Then we have the following result:
If (P) has a different structure than given in (D.1)–(D.4)—for example, if it is a maximization problem, or if (D.2) are inequality constraints—then we must make some modifications to Theorem D.1 (and the results that follow); see Section D.1.5.
Since (P) is a minimization problem, we want lower bounds that are as large as possible; these are the most accurate and useful bounds. Different values of will give different values of , and hence different bounds. We'd like to find that gives the largest possible bounds. That is, we want to solve
Suppose for now that we have found the that solves (LR). (We'll discuss one way to find such in Section D.1.3.) Let . How good a bound is ? For example, is it better or worse than the bound obtained from the LP relaxation of (P)? The answer turns out to be, “at least as good”:
An optimization problem with binary variables is said to have the integrality property if its LP relaxation always has optimal solutions that are binary. If the Lagrangian subproblem has the integrality property for all , then the bound from Lagrangian relaxation is exactly equal to the bound from LP relaxation:
In a typical Lagrangian relaxation algorithm, we solve (P‐LRλ) for a given and then find a solution x that is feasible for (P). This is often done by modifying the solution to (P‐LRλ), converting it somehow from an infeasible solution to a feasible one. We then choose new multipliers in the hopes of improving the lower bound. Therefore, each iteration of the procedure consists of (1) solving (P‐LRλ), (2) finding an upper bound, and (3) updating the multipliers. In Section D.1.3, we discuss one common method for step (3).
To summarize what we have covered so far, at any given iteration of the Lagrangian relaxation procedure, we have
where
If (P‐LRλ) has the integrality property for all , then (D.9)–(D.10) reduce to
At the end of each iteration of the Lagrangian relaxation procedure, we want to update the Lagrange multipliers to coax the subproblem solution toward feasibility for (P). Let x be the optimal solution to the Lagrangian subproblem for a given . Consider a given constraint i and its multiplier . Should we make larger or smaller? The answer depends on whether, and how, the constraint is violated. We can write constraint i as
where n is the number of variables. We are trying to encourage the solution to satisfy this constraint by adding the penalty term
to the objective function. If is too small, then there's no real penalty for making small, and it's likely that the left‐hand side of (D.12) will be too small. On the other hand, if is too large, there will be an incentive to make large, making the term inside the parentheses in (D.13) negative and the overall penalty large and negative. (Remember that (P) is a minimization problem.) By changing , we can encourage to be larger or smaller—hopefully equal to —at the next iteration.
So:
Now, is a piecewise‐linear concave function of . Solving problem (LR) involves maximizing this function. Since it's piecewise‐linear (and therefore nondifferentiable at some points), we can't just take a derivative with respect to . Somehow, though, we want to move from our current value of to a better one, over and over, until we're near the maximum of the function.
We will use a common method for updating the Lagrange multipliers called subgradient optimization. (Other methods for nonlinear optimization, such as the volume algorithm and bundle methods, have also proved to be very effective for updating Lagrangian multipliers.) In subgradient optimization, each move consists of a step size (which is the same for all i) and a step direction (which is different for each i).
The step size at iteration t (denoted ) is computed as follows. Let be the lower bound found at iteration t (i.e., the value of for the current value of ) and let UB be the best upper bound found (i.e., the objective value of the best feasible solution found so far, by any method). Note that while is the last lower bound found, UB is the best upper bound found. Then the step size is given by
is a constant that is generally set to 2 at iteration 1 and divided by 2 after a given number (say 20) of consecutive iterations have passed during which the best known lower bound has not improved. The numerator is proportional to the difference between the upper and lower bounds—as we get closer to the maximum of the function, the steps should get smaller. The denominator is simply the sum of the squares of the constraint violations.
The step direction for constraint i is simply given by (the violation in the constraint).
To obtain the new multipliers (call them ) from the old ones ( ), we set
Note that since , this update step follows the rules given above:
At the first iteration, can be initialized using a variety of ways: For example, set for all i, set it to some random number, or set it according to some other ad hoc rule.
The process of solving (P‐LRλ), finding a feasible solution, and updating is continued until some stopping criteria are met. For example, we might stop the procedure when any of the following is true:
Lagrangian relaxation is a general tool that can be used for any IP. However, some of the rules discussed above change when applied to IPs that have a form other than that given in (D.1)–(D.4).
The constraints relaxed may be inequality or equality constraints.
(Note: These rules assume the penalty in the objective function is written as
If, instead, the right‐hand side is subtracted from the left‐hand side, these rules are reversed.)
If the IP is a maximization problem, then
If the Lagrangian procedure stops before the upper and lower bounds are close to each other, there is no guarantee that the solution found is near‐optimal. If this happens, we could stop and accept the best feasible solution found without a guarantee of optimality (this treats Lagrangian relaxation as a heuristic), or we could close the optimality gap using branch‐and‐bound. The branch‐and‐bound process is like the standard process for solving LPs except that (a) lower bounds are obtained by solving the Lagrangian subproblem, not the LP relaxation, and (b) upper bounds are found using the upper‐bounding method that is embedded into the Lagrangian procedure, instead of when LP solutions happen to be integer‐feasible. At each node of the branch‐and‐bound tree, a variable is chosen for branching, and that variable is fixed first to 0, then to 1. The mechanics of branching and fathoming are just like those in standard branch‐and‐bound.
The Lagrangian relaxation algorithm is summarized in Algorithm D.1.
Column generation is a useful technique for solving optimization problems, especially those in which the number of variables is much larger than the number of constraints. The number of variables may even be exponentially large—too large to enumerate all of the variables and their coefficients. The basic idea behind column generation is to optimize the problem using only a subset of the variables (the columns), and to generate new columns as needed during the algorithm. Because the vast majority of the columns will be nonbasic, i.e., will equal zero, in the optimal solution, the idea makes sense, but we must be smart, and efficient, at determining the new columns that must be generated.
Column generation was first developed for linear programming (LP) problems (Ford and Fulkerson, 1958; Dantzig and Wolfe, 1960), but it has since become a popular tool for solving integer programming (IP) problems, as well. It is indispensable for certain classes of IPs, such as airline crew scheduling, that were previously considered intractable.
The column generation process works as follows. The problem being solved is decomposed into two problems, called the master problem and the pricing problem. The master problem is the original problem, but we usually work with a version of it that contains only a subset of the original variables and is therefore called the restricted master problem. The pricing problem, also called the subproblem, uses dual information from the master problem to identify a column to be generated and added to the master problem. The master problem is solved again, new dual information is obtained, the pricing problem identifies a new column, and so on, until the pricing problem cannot identify a new column to add, in which case the current solution to the master problem is optimal.
Suppose we wish to solve the following LP, which we will refer to as the master problem:
where J is the set of decision variables, is a (scalar) objective function coefficient, and and b are vectors of constraint coefficients. Let be the vector of dual variables associated with the constraints (D.19).
Recall that a standard decision rule in the simplex method is to find the variable with the most negative reduced cost, i.e., to solve
where the superscript T stands for transpose. The that minimizes (D.21) is selected as the next variable to enter the basis. If , i.e., for all , then the current solution to the LP is optimal.
Because we have assumed that the number of columns is very large, it may not be practical to solve (D.21) explicitly. Therefore, we will work with the restricted master problem, which considers only a subset of the columns:
Let be the optimal solution to and let be the corresponding optimal dual solution.
Suppose we could solve the following pricing problem:
(Note that the minimization is over all of J, not just .) If , then the current solution is optimal not only for , but also for (MP), since all of the reduced costs are nonnegative. If, instead, , then the j that attains the minimum in (D.25) will improve the objective function value of , and therefore we should add it to and re‐solve .
It seems unrealistic to hope to be able to solve the pricing problem, since we cannot enumerate the elements of J. However, in many cases, the pricing problem can be solved by exploiting the structure of the original problem, even without enumerating J. These are the cases in which column generation is most useful. Examples of tractable pricing problems include the cutting stock problem (Section D.2.3), the vehicle routing problem (VRP) (Section 11.2.4), and the location model with risk pooling (LMRP) (Section 12.2.7).
We next introduce an example to illustrate how column generation works in the context of a classic operations research (OR) problem: the one‐dimensional cutting stock problem, introduced by Gilmore and Gomory (1965). Paper manufacturers often produce very wide rolls of paper. Their customers want narrower rolls, so the problem is to determine how to cut up the wide rolls into smaller ones while minimizing the amount of waste.
Let W be the width of each roll of paper, and let K be the set of available rolls. Let m be the number of types of rolls that are required by the customers; there is a demand for type i, which has width , for . We define two sets of decision variables: if we use roll , 0 otherwise; and equals the number of times type i is cut from roll k. The objective is to minimize the total number of rolls to be cut, while satisfying all of the customers' demands.
One way to formulate the cutting stock problem is as follows:
The objective function (D.26) counts the total number of rolls used. Constraints (D.27) require the total demand of each roll type to be satisfied. Constraints (D.28) ensure that the total width of the rolls cut from roll k does not exceed W, and that no rolls are cut from k if . Constraints (D.29) and (D.30) are integrality and non‐negativity constraints.
The formulation above, given by Kantorovich (1960), is natural and straightforward. However, it is not practical, because its LP relaxation is very weak. Even for moderately sized instances, it may take a very long time for an off‐the‐shelf IP solver to solve (D.26)–(D.30). Therefore, Gilmore and Gomory (1965) propose an alternative formulation that treats the cutting stock problem as a set covering problem.
Define a cutting pattern as a collection of roll types and quantities that can be cut from a single large roll. Let J denote the set of all feasible cutting patterns, and let be a parameter that equals the number of times that roll type i is cut in pattern j. A cutting pattern is feasible if it satisfies
For example, suppose the width of the original roll is , and there are two roll types ( ), with demands and widths . Pattern 1 might entail the original roll being cut into three rolls of width and one roll of width , in which case and . Similarly, the large roll can also be cut into pattern 2, which might consist of two rolls of width and three of width , implying that and . Thus, each cutting pattern j is represented by a column . The problem, of course, is that there is an exponential number of cutting patterns—but let's ignore that concern for a moment.
In our new formulation, let denote the number of times that cutting pattern is used, i.e., the number of large rolls that will be cut using pattern j. Then we can reformulate the cutting stock problem as follows:
This formulation is sometimes called the extensive formulation, since it contains many more decision variables, whereas (CS) is called the compact formulation.
The idea is to solve the LP relaxation of (CSE) using column generation. The LP relaxation will be our master problem. It turns out that the pricing problem is relatively easy—it is a knapsack problem. Of course, the LP relaxation may not provide a feasible integer solution, but it might provide a tight lower bound for the original problem, because the extensive formulation has a tighter LP bound than the compact formulation does. The LP relaxation can also be used to develop heuristic solutions to the integer problem.
As we noted above, J is exponentially large. Therefore, we will work with the restricted master problem, in which the LP relaxation uses only the patterns in , which is typically much smaller than J. The restricted master problem is given by:
The initial subset can be calculated using a simple approach. For example, we might generate a column for each roll type i, with rolls of type i cut from the original roll, i.e., .
We can solve using standard LP algorithms. Let be the vector of optimal dual variables. Then the reduced cost of a primal variable is
Our task, then, is to identify a column (a cutting pattern) in that would improve the objective function of the restricted master problem if we were to add it to . Such a column would have a negative reduced cost. Thus, we would like to solve a problem like (D.25), but we need to do it without enumerating all of J.
Recall that a cutting pattern is feasible if it satisfies (D.31)–(D.32). We can formulate an optimization problem for “pricing out” the desired column, i.e., the one with the most negative reduced cost. This pricing problem is given by:
where we replace with for convenience, and since we are only concerned with finding one j. Fortunately, (D.39)–(D.41) is easy to solve—it is simply an integer knapsack problem. (Think of multiplying the objective function by to convert it to a maximization problem and omitting the constant 1.) Note that the are now the decision variables: We are trying to find a cutting pattern, as defined by the , that is feasible and that minimizes the reduced cost. If the optimal objective function value of (D.39) is negative, then we have found a new column, defined by the , to add to the restricted master problem. On the other hand, if the optimal objective of the pricing problem is nonnegative, then we know there is no cutting pattern that has negative reduced cost and that will improve the restricted master problem. We can conclude that we have found the optimal solution to the LP relaxation of the (full) master problem.
Moreover, the LP solution can be converted to an integer solution by rounding the fractional up. Since (D.34) holds for the fractional solution, it also holds for the rounded‐up solution.
Column generation was originally designed for solving linear programs. However, it can also be used to solve large‐scale integer programs together with the classical branch‐and‐bound framework. This method is known as branch‐and‐price. In particular, the original integer program is converted to an extensive formulation using an approach known as Dantzig–Wolfe decomposition. The extensive formulation typically has tighter LP relaxations and often eliminates symmetry. In the branch‐and‐price algorithm, we use column generation to solve the LP relaxation at each node of the branch‐and‐bound tree. Other aspects are similar to classical branch‐and‐bound, although the branching strategy is often somewhat different. A generic procedure for the branch‐and‐price algorithm is introduced by Barnhart et al. (1998).
Other practical issues arise when implementing column generation for integer programs. For example, one must consider how to reformulate the original problem into an appropriate extensive form so that both the restricted master problem and the pricing subproblem are tractable. Moreover, the solution process may exhibit a so‐called tailing‐off effect, in which the convergence becomes significantly slower after a near‐optimal solution is found. For further discussion on column generation for integer programming, we refer the readers to tutorials such as Wilhelm (2001) and Lübbecke and Desrosiers (2005).