Branch & Bound



Problem Statement

Branch-and-bound is an approach developed for solving discrete and combinatorial optimization problems. The discrete optimization problems are problems in which the decision variables assume discrete values from a specified set; when this set is set of integers, we have an integer programming problem. The combinatorial optimization problems, on the other hand, are problems of choosing the best combination out of all possible combinations. Most combinatorial problems can be formulated as integer programs. [For an excellent review of these problems, the reader is referred to Chapter 9 of K. G. Murty's book.]

The major difficulty with these problems, as Murty states, is that we do not have any optimality conditions to check if a given (feasible) solution is optimal or not. For example, in linear programming we do have an optimality condition: when you give me a candidate solution, I'll check if there exists an "improving feasible direction" to move, if there isn't, then your solution is optimal. If I can find a direction to move that results in a better solution, then your solution is not optimal. There is no such global optimality conditions in discrete or combinatorial optimization problems. In order to guarantee a given feasible solution's optimality is "to compare" it with every other feasible solution. To do this explicitly, amounts to total enumeration of all possible alternatives which is computationally prohibitive due to the NP-Completeness of integer programming problems. Therefore, this comparison must be done implicitly, resulting in partial enumeration of all possible alternatives.

As an example consider the following problem from machine scheduling. We have n jobs each of which can be processed either on machine 1 or machine 2. But the processing times may differ depending on the machine the job is to be processed. Suppose we have four jobs with the following processing times:

Job 1 Job 2 Job 3 Job 4
Mach. 1 4 4 3 5
Mach. 2 2 3 4 4

We would like to assign each job to a machine such that all the jobs are processed as soon as possible. In other words, we would like to minimize the completion time of the job that is processed last. In the machine scheduling literature this problem is referred to as the "minimizing the makespan in unrelated parallel machines" and denoted by R2||Cmax.

This problem has an integer programming formulation. If we define the decision variables as

xj = 1, if job j is assigned to machine 1, and
= 0, if job j is assigned to machine 2.

Then the following mixed integer program solves the above R2||Cmax problem:

MIN z,
z > = 4x1 + 4x2 + 3x3 + 5x4,
z > = 2(1 - x1) + 3(1 - x2) + 4(1 - x3) + 4(1 - x4),
z > = 0, xj = 0 or 1.

The right-hand-side of the first constraint is the time machine 1 completes processing, and the right-hand-side of the second constraint is the time machine 2 completes processing. Then z is the maximum of these completion times, which we would like to minimize.

If we were to explicitly enumerate all possible solutions, since we have n = 4 jobs and each job has two possibilities for assignment, either to machine one or to machine two, then there are 2n = 24 = 16 possible assignments. A systematic way to generate all these possible assignments is by means of a "total enumeration" tree, as shown below:

Total Enumeration Tree

The root node, Node 0, is where we start; no jobs yet are assigned to any machine. All the consecutive nodes that are created represent "partial schedules" in which a subset of jobs are assigned to machines; while the remaining jobs are yet to be assigned. At each "level" of the tree we assign one job to a machine. We start at the root node, Node 0; at level 1 we branch into two nodes, Node 1 and Node 2. Branching to Node 1, we set x1 = 1; that is, we assign job 1 to machine 1. In all the "descendents" of Node 1, that is Node 3 and Node 4, and their descendents, job 1 is assigned to machine 1. Proceeding in a similar fashion, at Node 3, we branch into two nodes, Node 7 and Node8. We assign job 2 to machine 1 (x2 = 1) at Node 7 and assign job 2 to machine 2 (x2 = 0) at Node 8.

Only the last nodes, so called "leaves" of the tree, (Node 15 through Node 30) represent full schedules. For example, Node 15 represents the schedule in which all jobs are assigned to machine 1 and no jobs are scheduled on machine 2, certainly not a sensible decision to make in our problem! Hence the role of the total enumeration tree is to systematically generate all possible solutions in a combinatorial problem.

In this small example with four jobs, it is possible to totally evaluate all 16 alternatives. But as the number of jobs, n, gets larger, the total number of alternatives, 2n, grows exponentially; making total enumeration computationally infeasible even for the fastest computers. It is clear that in most real life combinatorial problems we cannot explicitly enumerate all possible alternatives. Is it possible, then, to have a way to partially, or implicitly, enumerate the tree? In other words, is there a way to find the optimal solution, without exhaustively searching the entire tree? The branch-and-bound approach provides such a means for finding the optimal solution.

Solution to the Problem

The essence of the branch-and-bound approach is the following observation: in the total enumeration tree, at any node, if I can show that the optimal solution cannot occur in any of its descendents, then there is no need for me to consider those descendent nodes. Hence, I can "prune" the tree at that node. If I can prune enough branches of the tree in this way, I may be able to reduce it to a computationally manageable size. Note that, I am not ignoring those solutions in the leaves of the branches that I have pruned, I have left them out of consideration after I have made sure that the optimal solution cannot be at any one of these nodes. Thus, the branch-and-bound approach is not a heuristic, or approximating, procedure, but it is an exact, optimizing procedure that finds an optimal solution.

How can we make sure that the optimal solution cannot be at one of the descendents of a particular node on the tree? An ingenious answer to this question was given, independently by K. G. Murty, C. Karel, and J. D. C. Little in 1962 in an unpublished paper at Case Institute of Technology: "The Traveling Salesman Problem: Solution by a Method of Ranking Assignments" in the context of a combinatorial problem, and by A. H. Land and A. G. Doig in 1960 in a paper published in Econometrica, Vol.28, pp. 497-520: "An Automatic Method for Solving Discrete Programming Problems" in the context of integer programming.

It is always possible to find a feasible solution to a combinatorial or discrete optimization problem. If available, one can use some heuristics to obtain, usually, a "reasonably good" solution. Let us call this solution the incumbent. Then at any node of the tree, if we can compute a "bound" on the best possible solution that can expected from any descendent of that node, we can compare the "bound" with the objective value of the incumbent. If what we have on hand, the incumbent, is better than what we can ever expect from any solution resulting from that node, then it is safe to stop branching from that node. In other words, we can discard that part of the tree from further consideration.

Let us try to clarify these concepts in the context of the above example. Recall that in this example, we are trying to minimize the makespan of four jobs on two (unrelated) parallel machines. Let us find a feasible solution, the incumbent, using a heuristic. Suppose our heuristic assigns job 1 to machine 1, job 2 to machine 2, job 3 to machine 1, and job 4 to machine 2 (using the previously introduced notation: x1 = 1, x2 =  0, x3 = 1, and x4 = 0). Hence machine 1 completes its processing at 4x1 + 4x2 + 3x3 + 5x4 = 4(1)+4(0) + 3(1) + 5(0) = 7, and machine 2 at 2(1 - x1) + 3(1 - 2(1 - x1) + 3(1 - x2) + 4(1 - x3) + 4(1 - x4) = 2(1 - (1)) + 3(1 - (0)) + 4(1 - (1)) + 4(1 - (0)) = 7; resulting in a makespan of z = MAX {7, 7} = 7. With this solution we can safely discard any part of the tree which we know it cannot result in solution better than this. What we need now is a way of computing a "lower bound" on the value of the makespan when a partial schedule at an intermediate node is completed.

Consider the following very "loose" lower bound: the makespan of "partial" schedule at node; assuming as if jobs not yet assigned will not contribute to the makespan. For example at node 3, jobs 1 and 2 are assigned to machine 1 to be processed for 4 units of time each, and none of the other jobs yet assigned. Thus the makespan of the (partial) schedule in this node is 8. In any descendent of this node, the makespan of the corresponding (partial) schedule will be equal to (at best) or greater than (more likely) 8. But we already have a solution, the incumbent, with a makespan of 7. Hence, we can prune the tree at this node, knowing that the optimal solution cannot be at any one of the nodes 15, 16, 17, or 18.

We can also prune the tree at node 14, whose lower bound is 9 which greater than our incumbent solution's makespan of 7. From the above example two points should be clear: better ("tighter") the lower bound, the more we can prune off the tree, and regardless of the lower bounding scheme used, how much we can prune is highly data dependent. A same branch-and-bound procedure that behaves very well with a specific set of data, can do very poorly with a different set of data.

How well a branch-and-bound algorithm solves a specific discrete or a combinatorial optimization problem depends on how branching is going to take place and which bounding scheme is to be used. Starting at the root node, we partition the solution space of the problem. In the above example, at node 0 we partition the solution space, the set of all possible assignments, into two: all assignments in which job 1 is assigned to machine 1 is the partition (or the branch) denoted by node 1, and all assignments in which job 1 is assigned to machine 2 is the partition denoted by node 2. In different problems this partitioning or branching can take various forms. It is a major algorithm design issue to decide which one to use.

In order to compute a bound at a particular node, what we essentially do is to solve a relaxed problem. We do this by removing a number of complicating constraints. For instance, in integer programming a bound can be found by ignoring the integrality requirement on the variables and solving the problem as a linear program. The optimal objective function value of the linear program thus obtained, can never be worse than the case when integrality constraint is imposed on the decision variables. In the above machine scheduling example, we have used an "extreme" relaxation: we have simply ignored all the remaining jobs that are required to be assigned to one of the machines, resulting in a "loose" lower bound. In order to be able to prune more of the tree, thus having less alternatives to explicitly evaluate, we need to have tighter bounds. To be able to have a tighter bound, one has to solve the problem with minimal relaxation. (Tightest bound is obtained when one solves the problem as is, i.e. without any relaxation.) In order to have a tighter bound, we have to spend more computational effort at each node, but we need to evaluate fewer nodes, since we would have pruned most of the tree. On the other hand, if our bounding scheme results in not very tight bounds, we would not be able to prune much of the tree, thus having have to evaluate large number of the nodes -at the extreme all of the nodes in the tree, which is worse than total enumeration of all alternatives. How best one resolves this trade off is the indicator of the quality of the branch-and-bound algorithm designed for that specific problem.

It should be clear from the above discussion that, like dynamic programming, branch-and-bound is an "approach" for solving computationally difficult problems. We cannot talk about a branch-and-bound algorithm that can solve all discrete and combinatorial optimization problems. One has to design a specific algorithm for a specific class of problems. This has been a brief introduction to the branch-and-bound approach. For a more detailed discussion, the reader is referred to the section on branch-and-bound in Neos Guide by Optimization Technology Center and Chapter 10 of K. G. Murty's book on deterministic optimization models.


Ömer S. Benli
Bilkent University