A Constructive Genetic Algorithm For The
Generalised Assignment Problem
Research paper
Luiz A. N. LORENA lorena@lac.inpe.br
LAC/INPE- Instituto Nacional de Pesquisas Espaciais, Caixa Postal 515,
12201-970 - São José dos Campos – SP, Brazil
Marcelo G. NARCISO narciso@cnptia.embrapa.br
Embrapa Informática Agropecuária, Av. Dr. André tosello, s/n, Unicamp,
13083-970 – Campinas - SP, Brazil
J.E. BEASLEY j.beasley@ic.ac.uk
The Management School, Imperial College, London SW7 2AZ, England
Abstract. We present in this paper an application of the Constructive Genetic Algorithm (CGA) to the
Generalised Assignment Problem (GAP). The GAP can be described as a problem of assigning n
items to m knapsacks, n>m, such that each item is assigned to exactly one knapsack, but with capacity
constraints on the knapsacks. The CGA has a number of new features compared to a traditional
genetic algorithm. These include a dynamic population size and the possibility of using heuristics. In
our application of CGA to GAP we use a binary representation and an assignment heuristic which
allocates items to knapsacks. Computational tests are presented using large publicly available problem
instances taken from the literature.
Keywords: Constructive genetic algorithm, generalised assignment problem
1. Introduction
In this paper we consider an important combinatorial optimisation problem, the
Generalised Assignment Problem (GAP). This is the problem of assigning at minimum cost
n items to m knapsacks (n>m), such that each item is assigned to exactly one knapsack
subject to capacity constraints on the knapsacks.
Many real life applications can be formulated as a GAP, e.g. resource scheduling, the
allocation of memory space in a computer, the design of a communications network with
capacity constraints at each network node, assigning software development tasks to
programmers, assigning jobs to computers in a network, vehicle routing problems, and
others, see De Maio and Roveda (1971), Balachandran (1976), Fisher and Jaikumar (1981)
and Catrysse and Van Wassenhove (1992).
GAP is NP-hard, e.g. see Narciso and Lorena (1999). A number of algorithms in the
literature are exact tree search algorithms [Ross and Soland (1975), Martello and Toth
(1990)] and there are also a number of heuristics for the problem [Klastorin (1979), Fisher,
Jaikumar and Van Wassenhove (1986), Jornsten and Varbrand (1990), Martello and Toth
(1990), Catrysse and Van Wassenhove (1992), Lorena and Narciso (1996), Narciso and
Lorena (1999)].
Genetic Algorithms (GA) have become popular in recent years as effective heuristics for
NP-hard combinatorial optimisation problems such as the GAP. Today there are many
variations on the general GA theme, but all such variations can be classified generically as
population heuristics [Beasley (2002)], that is as heuristics which progressively evolve a
population of solutions. Such heuristics are in marked contrast to other approaches, such as
tabu search and simulated annealing, that operate on just a single solution. We shall assume
throughout this paper some familiarity on the part of the reader with genetic algorithms.
Holland (1975) originated GA’s . For application to optimisation problems the first step in a
GA is the definition of a representation – some way of encoding a solution to the problem
under consideration, typically into a binary bit string. For example, if there are 7 binary
(zero-one) variables in an optimisation problem then solutions can be represented as
0100010, 1100110, 1010101, etc – all of which are 7 digit binary bit strings. Solutions are
evaluated though use of a fitness function and the fitness value associated with a solution
indicates “how good” the solution is in terms of the original optimisation problem that is
being considered.
Suppose now that, although we are sure of certain variable values in a solution, we are
unsure of others. For example 101#1#1 represents the situation where we are sure of
variable values for all except the fourth and sixth variables, unsure variable values being
conventionally represented by #. 101#1#1 is an example of a schema. Clearly the schema:
101#1#1
comprises four solutions, namely:
1010101
1010111
1011101
1011111
which are formed by enumerating all combinations of possible values for each # in
101#1#1.
Holland put forward the building block hypothesis (schemata formation and conservation)
as a theoretical basis for the GA mechanism. In this view avoiding disruption of good
schemata is the basis for the good behaviour of a GA. One major difficulty with the
building block hypothesis however is that schemata are only evaluated indirectly, via
evaluation of specific solutions. This raises two problems:
1. a schema with v #’s has 2v solutions (replace each # by zero or one) and inevitably, for
computational reasons, we typically only gain limited information about the schema by
examination of a limited number of the 2v possible solutions
2. a solution with v decided variables is a member of 2v-1 schemata (replace each decided
variable by # or not) and so when using an evaluated solution we are attempting to use
information about many schemata.
In this paper we use the word structure as a generic term to cover both solutions and
schemata.
The Constructive Genetic Algorithm (CGA) [Furtado (1998), Ribeiro Filho and Lorena
(1998), Lorena and Furtado (2001)] was proposed recently as an alternative to the
traditional GA approach. One of the objectives of a CGA is the direct evaluation of
schemata. In this paper we present a CGA for the GAP.
This paper is organised as follow. Application of the CGA to the GAP is presented in
Section 2. In a CGA the original problem is regarded as a bi-objective optimisation problem
that drives the evolutionary search for well adapted structures. The relevant aspects of our
CGA for the GAP are explained: structure representation, the evolution process, selection,
recombination and mutation, and CGA pseudo-code is presented. Section 3 presents
computational tests using large publicly available problem instances from the literature,
providing insights into CGA performance.
2. CGA for the GAP
2.1 GAP formulation
The GAP is best described using knapsack problems. Readers unfamiliar with knapsack
problems (either single dimensional, bidimensional or multi-dimensional are referred to
Martello and Toth (1979,1987,1990), Chu and Beasley (1998a), Lin (1998) and Beasley
(2000). Given n items and m knapsacks, with pij as the cost associated with assigning item j
to knapsack i, wij as the weight of assigning item j to knapsack i, and ci as the capacity of
knapsack i, assign each item j to exactly one knapsack i, not exceeding knapsack capacities.
Defining x ij=1 if item j is assigned to knapsack i, =0 otherwise, the GAP can be formulated
as:
minimise åå
==
n
j
m
i 11
pijxij {minimise total cost}
subject to: å
=
n
j 1
wijxij £ ci "iÎM = {1,…,m} {respect knapsack capacities}
å
=
m
i 1
xij = 1 "jÎN = {1,…,n} {all items assigned}
x ij Î {0,1} "iÎM ,"jÎN
There have been relatively few papers presented in the literature applying genetic
algorithms to the GAP. Chu and Beasley (1997) presented an algorithm where a solution
was represented by an assignment of all items to knapsacks. Violation of knapsack
capacities was dealt with in their genetic algorithm by use of unfitness (see Chu and
Beasley (1998b)), which separates the measurement of constraint violation from the
measurement of objective function quality (fitness). They presented computational results
for a large number of randomly generated problems involving up to 20 knapsacks and 200
items.
Wilson (1997), working independently from Chu and Beasley (1997), used the same
representation. In his approach the initial population is composed of solutions that perform
well with respect to the GAP objective function, but are infeasible. He then uses a genetic
algorithm to evolve a feasible solution. Once a feasible solution has been found the genetic
algorithm stops and that feasible solution is subjected to a local improvement procedure in
an attempt to improve its GAP objective function value, whilst maintaining feasibility. He
presented computational results for a large number of randomly generated problems
involving up to 50 knapsacks and 500 items.
In order to apply the CGA to the GAP we will view the problem as a bi-objective
optimisation problem. Note here that our CGA approach is distinctly different from the
approaches adopted in Chu and Beasley (1997) and Wilson (1997). We first describe our
structure representation.
2.2 Structure representation
For structure representation we used a sequence of n characters, where n is the number of
items. This n character representation contains just three symbols:
1 to indicate that the item is a seed item and has been assigned to a knapsack
0 to indicate that the item is a non-seed item which will be assigned to a knapsack by
a heuristic
# to indicate that the item is temporarily out of the problem (an undecided item).
Seed items are initially assigned to the m knapsacks, exactly one per knapsack, so there are
always precisely m 1’s in each structure.
For example, considering a problem with n=7 items and m=3 knapsacks, we could have a
structure S = (#1#01#1), where the 3 seed items are: item number 2 assigned to knapsack 1,
item number 5 assigned to knapsack 2, and item number 7 assigned to knapsack 3. It em 4
has the label 0 and will be assigned to one of the knapsacks according to an assignment
heuristic. Items 1, 3 and 6 have the label # and are temporarily not being considered. As S
contains #’s it is a schema. By contrast the structure S =(0100101) is not a schema but a
solution.
The following assignment heuristic is used to translate any structure S into a solution to the
underlying GAP.
Assignment Heuristic - AH
1 – Assign the m items with label 1 to the m knapsacks, and update the knapsack capacities
accordingly
2 – Assigning the other n-m items to the knapsacks (labels 0 and #):
2.1 – Solve the m knapsack problems separately exactly
2.2 – Update the knapsack capacities for the items assigned to exactly one knapsack
2.3 – Resolve the m knapsack problems separately exactly for the remaining items
2.4 – Update the knapsack capacities for the items assigned to exactly one knapsack
2.5 – For each item j remaining, assign it to the knapsack i* that minimises wi*j
2.6 – If the solution obtained is not feasible for the GAP, restart the assignment of the
n-m items (the m seed items were already assigned in step 1), assigning (if possible)
item j to knapsack i* that minimises wi*j and for which capacity is not violated
3 – If the solution is feasible for the GAP improve the solution with the second part of
MMTH (see Lorena and Narciso (1996))
4 – Discard from knapsacks any items with label # in S
2.3 The bi-objective problem
Let C be the set of all 3n structures that can be generated by the 0-1-# representation we
have adopted, and consider two functions f and g, defined as f:C®Â+ and g:C®Â+
such that g(S) ³ f(S) "SÎC. We define the double fitness evaluation of a structure due to
functions f and g as fg-fitness.
For the GAP the function g represents the cost of items assigned to knapsacks after
application of the assignment heuristic AH. Letting C i(S)AH represent the items assigned to
knapsack i after application of AH to structure S we have that g(S) = å å
= Î
m
i SCj AHi1 )(
pij. Note
here that from step 4 in AH above any item j assigned a label # in S is not assigned to any
knapsack at the end of AH and so is not included in g(S).
For the GAP the function f represents the cost of items assigned to knapsacks after taking
the solution produced by AH and moving a single item between knapsacks. To define f the
following MAH heuristic is applied to S, producing an additional move of one item between
two knapsacks:
Modified Assignment Heuristic – MAH
1. Apply AH to S
2. Over all the items in knapsacks presenting label 0 in S let j* be the item with the most
costly assignment (ties broken arbitrarily)
3. Let i* be the knapsack corresponding to the least cost assignment of item j* (ties broken
arbitrarily)
4. Assign (move) item j* to knapsack i*
Letting Ci(S)MAH represent the items assigned to knapsack i after application of MAH to
structure S we have that f(S) =å å
= Î
m
i SCj MAHi1 )(
pij. Clearly we have g(S) ³ f(S). The difference
g(S) - f(S) ³ 0 can be interpreted as the cost of a wrong assignment if the resulting GAP
solution is feasible. Considering our representation we have that fg-fitness values increase
as the number of # labels decrease and therefore structures with few # labels have higher
fg-fitness.
In our CGA for the GAP fg-fitness plays two roles:
· interval minimisation we would like to search for a SÎC that minimises g(S) - f(S),
since obviously a good quality solution to the GAP cannot be improved by moving a
single item between knapsacks.
· g maximisation we would like to search for a SÎC that maximises g(S), since we need
to increase cost to ensure feasibility. A structure that has very few items assigned to
knapsacks, e.g. because it is a schema in which most items are labelled #, will have low
cost, but will not be a feasible solution to the GAP. This objective can be viewed as
encouraging the process to move from schemata to solutions.
Hence our CGA implicitly considers the following Bi-objective Optimisation Problem
(BOP):
minimise g(S) - f(S)
maximise g(S)
subject to: g(S) ³ f(S) "S
S ÎC
Zitzler and Thiele (1999), amongst others, have recognised that genetic algorithms
(evolutionary algorithms) are a common approach to dealing with multiobjective problems.
One genetic algorithm approach to dealing with such problems is simply to aggregate
objectives together, but this does require a scaling between differing objectives to be
defined. As Zitzler and Thiele (1999) have noted this “requires profound domain
knowledge that is often not available”. The approach taken in our CGA is not to aggregate
g(S) and f(S) together but to consider them separately.
2.4 The evolution process
The BOP defined above is not directly considered as the set X is not completely known.
Instead we consider an evolution process to attain the objectives (interval minimisation and
g maximisation) of the BOP.
At the beginning of the process, two expected values are given to these objectives:
· for the g maximisation objective we use a value gmax > maxSÎX g(S) that is an upper
bound on the objective value
· for the interval minimisation objective we use a value dgmax, obtained from gmax using a
real number 0 < d £ 1.
The evolution process proceeds using an adaptive rejection threshold, which considers both
objectives. Given a parameter a ³ 0 a structure S is discarded from the population if:
g(S) - f(S) ³ dgmax - ad(gmax – g(S)) (1)
The right-hand side of equation (1) is the threshold for structure removal from the
population and is composed of the expected value dgmax associated with interval
minimisation and the measure (gmax – g(S)), which is the difference between gmax and g(S)
evaluations. For a=0 equation (1) is equivalent to comparing the interval length associated
with S against the expected length dgmax. When a > 0 structures containing a high number
of # labels (i.e. structures which are schemata) have a higher probability of being discarded
as, in general, they have higher differences (gmax – g(S)) since gmax is fixed and g(S) is
smaller for schemata than for solutions.
In our approach the value of the evolution parameter a is related to time in the evolution
process. As initially good schemata need to be preserved for recombination a starts from
zero and then slowly increases (in steps of e) from generation to generation. The population
at evolution time a (denoted by Pa) is dynamic in size according to the value of the
adaptive parameter a. The population can shrink to zero during the evolution process
(extinction).
Rearranging equation (1) a structure S should be discarded from the population if
d(S) = [dgmax – (g(S) – f(S))]/[d(gmax – g(S))] £ a
A key computational point here is that d(S) (which we refer to as the rank of a structure) is
fixed in value at the time the structure is created. The evolution parameter a on the other
hand does increase over time. Hence we need only compute the rank d(S) of a structure
once and then continually compare it against the changing evolution parameter in deciding
whether to discard the structure or not. Obviously the la rger the rank the longer a structure
will exist in the population, since a structure is only discarded once a becomes large
enough to satisfy d(S) £ a.
2.5 Selection, recombination and mutation
Selection of individuals can be made in several ways. CGA has been tested with a number
of optimisation problems and in all cases an appropriate approach is that the population is
kept ordered using a key value that involves for each structure its fg-fitness and its
proximity to a feasible solution. Then, several times in a generation, two structures are
randomly selected, one from the best part of the population and the other from the whole
population, and these are recombined to form (one or more) new structures (see Lorena and
Furtado (2001)).
The manner of recombination depends on the problem and the way the structure represents
a solution. The main goal of recombination is population diversification. Structures
representing feasible solutions can be generated not only by recombination, but also by
complementation of a selected schema. The best results found with the CGA approach use
mutation over structures that represent feasible solutions for the problem (see Lorena and
Furtado (2001)).
In our CGA for the GAP the population is ordered in increasing value of D(S)=(1+d(S))/(n-
n#(S)), where d(S)= (g(S)-f(S))/g(S) and n#(S) is the number of # labels in S. Structures with
small n#(S) and/or with small d(S) appear in first order positions.
The method used for selecting structures for recombination takes one structure from the n
first positions in the ordered population (base) and the second structure from the whole
population (guide). Before recombination the first structure is changed to generate a
structure representing a solution, i.e. all #’s are replaced by 0’s. This complete structure
suffers mutation and is compared to the best solution found so far (which is kept throughout
the process).
Recombination merges information from both selected structures, but preserves the number
of 1 labels (number of knapsacks) in the new generated structure.
Recombination
if Sbase(j) = Sguide(j) then Snew(j) ¬ Sbase(j)
if Sguide(j)= # then Snew (j) ¬ Sbase(j)
if Sbase(j) = # and Sguide(j)=0 then Snew (j) ¬ 0
if Sbase(j) = # or 0 and Sguide(j)=1 then Snew (j) ¬ 1 and Snew(i) ¬ 0 for some Snew (i)=1
if Sbase(j) = 1 and Sguide(j)=0 then Snew (j) ¬ 0 and Snew(i) ¬ 1 for some S new(i)=0
At each generation, exactly n new structures are created by recombination. If a new
structure does not represent a feasible solution, then it is inserted into the population,
otherwise it suffers mutation and is compared to the best solution found so far. The
following pseudo-code describes the mutation process.
Mutation
For each position j with label 1 do
Fo