Xref: info.physics.utoronto.ca comp.ai.genetic:3893 comp.answers:7400 news.answers:29526 S

Master Index Current Directory Index Go to SkepticTank Go to Human Rights activist Keith Henson Go to Scientology cult

Skeptic Tank!

Xref: info.physics.utoronto.ca comp.ai.genetic:3893 comp.answers:7400 news.answers:29526 Newsgroups: comp.ai.genetic,comp.answers,news.answers Path: cf-cm!David.Beasley From: David.Beasley@cm.cf.ac.uk (David Beasley) Subject: FAQ: comp.ai.genetic part 6/6 (A Guide to Frequently Asked Questions) Message-ID: Followup-To: comp.ai.genetic Summary: This is part 6 of a entitled "The Hitch-Hiker's Guide to Evolutionary Computation". A periodically published list of Frequently Asked Questions (and their answers) about Evolutionary Algorithms, Life and Everything. It should be read by anyone who whishes to post to the comp.ai.genetic newsgroup, preferably *before* posting. Originator: David.Beasley@cm.cf.ac.uk (David Beasley) Sender: David.Beasley@cm.cf.ac.uk (David Beasley) Supersedes: Organization: University of Wales College of Cardiff, Cardiff, WALES, UK. References: Date: Tue, 20 Sep 94 09:08:05 GMT Approved: news-answers-request@MIT.Edu Expires: 23 Dec 1994 09:04:52 GMT Lines: 1323 Archive-name: ai-faq/genetic/part6 Last-Modified: 9/20/94 Issue: 2.3 TABLE OF CONTENTS OF PART 6 Q21: What are Gray codes, and why are they used? Q22: What test data is available? Q42: What is Life all about? Q42b: Is there a FAQ to this group? Q98: Are there any patents on EAs? Q99: A Glossary on EAs? ---------------------------------------------------------------------- Subject: Q21: What are Gray codes, and why are they used? The correct spelling is "Gray"---not "gray", "Grey", or "grey"--- since Gray codes are named after the Frank Gray who patented their use for shaft encoders in 1953 [1]. Gray codes actually have a longer history, and the inquisitive reader may want to look up the August, 1972, issue of Scientific American, which contains two articles of interest: one on the origin of binary codes [2], and another by Martin Gardner on some entertaining aspects of Gray codes [3]. Other references containing descriptions of Gray codes and more modern, non-GA, applications include the second edition of Numerical Recipes [4], Horowitz and Hill [5], Kozen [6], and Reingold [7]. A Gray code represents each number in the sequence of integers {0...2^N-1} as a binary string of length N in an order such that adjacent integers have Gray code representations that differ in only one bit position. Marching through the integer sequence therefore requires flipping just one bit at a time. Some call this defining property of Gray codes the "adjacency property" [8]. Example (N=3): The binary coding of {0...7} is {000, 001, 010, 011, 100, 101, 110, 111}, while one Gray coding is {000, 001, 011, 010, 110, 111, 101, 100}. In essence, a Gray code takes a binary sequence and shuffles it to form some new sequence with the adjacency property. There exist, therefore, multiple Gray codings or any given N. The example shown here belongs to a class of Gray codes that goes by the fancy name "binary-reflected Gray codes". These are the most commonly seen Gray codes, and one simple scheme for generationg such a Gray code sequence says, "start with all bits zero and successively flip the right-most bit that produces a new string." Hollstien [9] investigated the use of GAs for optimizing functions of two variables and claimed that a Gray code representation worked slightly better than the binary representation. He attrributed this difference to the adjacency property of Gray codes. Notice in the above example that the step from three to four requires the flipping of all the bits in the binary representation. In general, adjacent integers in the binary representaion often lie many bit flips apart. This fact makes it less likely that a MUTATION operator can effect small changes for a binary-coded INDIVIDUAL. A Gray code representation seems to improve a MUTATION operator's chances of making incremental improvements, and a close examination suggests why. In a binary-coded string of length N, a single mutation in the most significant bit (MSB) alters the number by 2^(N-1). In a Gray-coded string, fewer mutations lead to a change this large. The user of Gray codes does, however, pay a price for this feature: those "fewer mutations" lead to much larger changes. In the Gray code illustrated above, for example, a single mutation of the left-most bit changes a zero to a seven and vice-versa, while the largest change a single mutation can make to a corresponding binary- coded INDIVIDUAL is always four. One might still view this aspect of Gray codes with some favor: most mutations will make only small changes, while the occasional mutation that effects a truly big change may initiate EXPLORATION of an entirely new region in the space of CHROMOSOMEs. The algorithm for converting between the binary-reflected Gray code described above and the standard binary code turns out to be surprisingly simple to state. First label the bits of a binary-coded string B[i], where larger i's represent more significant bits, and similarly label the corresponding Gray-coded string G[i]. We convert one to the other as follows: Copy the most significant bit. Then for each smaller i do either G[i] = XOR(B[i+1], B[i])---to convert binary to Gray---or B[i] = XOR(B[i+1], G[i])---to convert Gray to binary. One may easily implement the above algorithm in C. Imagine you do something like typedef unsigned short ALLELE; and then use type "allele" for each bit in your CHROMOSOME, then the following two functions will convert between binary and Gray code representations. You must pass them the address of the high-order bits for each of the two strings as well as the length of each string. (See the comment statements for examples.) NB: These functions assume a chromosome arranged as shown in the following illustration. index: C[9] C[0] *-----------------------------------------------------------* Char C: | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | *-----------------------------------------------------------* ^^^^^ ^^^^^ high-order bit low-order bit C CODE /* Gray <==> binary conversion routines */ /* written by Dan T. Abell, 7 October 1993 */ /* please send any comments or suggestions */ /* to dabell@quark.umd.edu */ void gray_to_binary (Cg, Cb, n) /* convert chromosome of length n from */ /* Gray code to binary representation */ allele *Cg,*Cb; int n; { int j; *Cb = *Cg; /* copy the high-order bit */ for (j = 0; j < n; j++) { Cb--; Cg--; /* for the remaining bits */ *Cb= *(Cb+1)^*Cg; /* do the appropriate XOR */ } } void binary_to_gray(Cb, Cg, n) /* convert chromosome of length n from */ /* binary to Gray code representation */ allele *Cb, *Cg; int n; { int j; *Cg = *Cb; /* copy the high-order bit */ for (j = 0; j < n; j++) { Cg--; Cb--; /* for the remaining bits */ *Cg= *(Cb+1)^*Cb; /* do the appropriate XOR */ } } References [1] F. Gray, "Pulse Code Communication", U. S. Patent 2 632 058, March 17, 1953. [2] F. G. Heath, "Origins of the Binary Code", Scientific American v.227,n.2 (August, 1972) p.76. [3] Martin Gardner, "Mathematical Games", Scientific American v.227,n.2 (August, 1972) p.106. [4] William H. Press, et al., Numerical Recipes in C, Second Edition (Cambridge University Press, 1992). [5] Paul Horowitz and Winfield Hill, The Art of Electronics, Second Edition (Cambridge University Press, 1989). [6] Dexter Kozen, The Design and Analysis of Algorithms (Springer- Verlag, New York, NY, 1992). [7] Edward M. Reingold, et al., Combinatorial Algorithms (Prentice Hall, Englewood Cliffs, NJ, 1977). [8] David E. Goldberg, GENETIC ALGORITHMs in Search, OPTIMIZATION, and Machine Learning (Addison-Wesley, Reading, MA, 1989). [9] R. B. Hollstien, Artificial Genetic Adaptation in Computer Control Systems (PhD thesis, University of Michigan, 1971). ------------------------------ Subject: Q22: What test data is available? TSP DATA There is a TSP library (TSPLIB) available which has many solved and semi-solved TSPs and different variants. The library is maintained by Gerhard Reinelt . It is available from various FTP sites, including: softlib.cs.rice.edu:/pub/tsplib/tsblib.tar OTHER DATA Information about Operational Research test problems in any of the areas listed below can be obtained by emailing with the body of the email message being just the word "info". The files in OR-Library are also available via anonymous FTP from mscmga.ms.ic.ac.uk:/pub/ . Instructions on how to use OR-Library can be found in the file "paper.txt", or in the article: J.E.Beasley, "OR-Library: distributing test problems by electronic mail", Journal of the Operational Research Society 41(11) (1990) pp1069-1072. File Problem area assigninfo.txt Assignment problem cspinfo.txt Crew scheduling deainfo.txt Data envelopment analysis gapinfo.txt Generalised assignment problem mipinfo.txt Integer programming lpinfo.txt Linear programming Location: capinfo.txt capacitated warehouse location pmedinfo.txt p-median uncapinfo.txt uncapacitated warehouse location mknapinfo.txt Multiple knapsack problem qapinfo.txt Quadratic assignment problem rcspinfo.txt Resource constrained shortest path Scheduling: flowshopinfo.txt flow shop jobshopinfo.txt job shop openshopinfo.txt open shop scpinfo.txt Set covering sppinfo.txt Set partitioning Steiner: esteininfo.txt Euclidean Steiner problem rsteininfo.txt Rectilinear Steiner problem steininfo.txt Steiner problem in graphs tspinfo.txt Travelling salesman problem Two-dimensional cutting: assortinfo.txt assortment problem cgcutinfo.txt constrained guillotine ngcutinfo.txt constrained non-guillotine gcutinfo.txt unconstrained guillotine Vehicle routing: areainfo.txt fixed areas fixedinfo.txt fixed routes periodinfo.txt period routing vrpinfo.txt single period ------------------------------ Subject: Q42: What is Life all about? 42 References Adams, D. (1979) "The Hitch Hiker's Guide to the Galaxy", London: Pan Books. Adams, D. (1980) "The Restaurant at the End of the Universe", London: Pan Books. Adams, D. (1982) "Life, the Universe and Everything", London: Pan Books. Adams, D. (1984) "So long, and thanks for all the Fish", London: Pan Books. Adams, D. (1992) "Mostly Harmless", London: Heinemann. ------------------------------ Subject: Q42b: Is there a FAQ to this group? Yes. ------------------------------ Subject: Q98: Are there any patents on EAs? Process patents have been issued both for the Bucket Brigade Algorithm (BBA) in CLASSIFIER SYSTEMs: U.S. patent #[..] (to John Holland) and for GP: U.S. patent #4,935,877 (to John Koza). This FAQ does not attempt to provide legal advice. However, use of the Lisp code in the book [KOZA92] is freely licensed for academic use. Although those wishing to make commercial use of any process should obviously consult any patent holders in question, it is pretty clear that it's not in anyone's best interests to stifle GA/GP research and/or development. Commercial licenses much like those used for CAD software can presumably be obtained for the use of these processes where necessary. Jarmo Alander's massive bibliography of GAs (see Q10.8) includes a (probably) complete list of all currently know patents. There is also a periodic posting on comp.ai.neural-nets by Gregory Aharonian about patents on Artificial Neural Networks (ANNs). ------------------------------ Subject: Q99: A Glossary on EAs? 1/5 SUCCESS RULE: Derived by I. Rechenberg, the suggestion that when Gaussian MUTATIONs are applied to real-valued vectors in searching for the minimum of a function, a rule-of-thumb to attain good rates of error convergence is to adapt the STANDARD DEVIATION of mutations to generate one superior solution out of every five attempts. A ADAPTIVE BEHAVIOUR: "...underlying mechanisms that allow animals, and potentially, ROBOTs to adapt and survive in uncertain environments" --- Meyer & Wilson (1991), [SAB90] AI: See ARTIFICIAL INTELLIGENCE. ALIFE: See ARTIFICIAL LIFE. ALLELE : (biol) Each GENE is able to occupy only a particular region of a CHROMOSOME, it's locus. At any given locus there may exist, in the POPULATION, alternative forms of the gene. These alternative are called alleles of one another. (EC) The value of a GENE. Hence, for a binary representation, each gene may have an ALLELE of 0 or 1. ARTIFICIAL INTELLIGENCE: "...the study of how to make computers do things at which, at the moment, people are better" --- Elaine Rich (1988) ARTIFICIAL LIFE: Term coined by Christopher G. Langton for his 1987 [ALIFEI] conference. In the preface of the proceedings he defines ALIFE as "...the study of simple computer generated hypothetical life forms, i.e. life-as-it-could-be." B BUILDING BLOCK: (EC) A small, tightly clustered group of GENEs which have co- evolved in such a way that their introduction into any CHROMOSOME will be likely to give increased FITNESS to that chromosome. The "building block hypothesis" [GOLD89] states that GAs find solutions by first finding as many BUILDING BLOCKs as possible, and then combining them together to give the highest FITNESS. C CENTRAL DOGMA: (biol) The dogma that nucleic acids act as templates for the synthesis of proteins, but never the reverse. More generally, the dogma that GENEs exert an influence over the form of a body, but the form of a body is never translated back into genetic code: acquired characteristics are not inherited. cf LAMARCKISM. (GA) The dogma that the behaviour of the algorithm must be analysed using the SCHEMA THEOREM. (life in general) The dogma that this all is useful in a way. "You guys have a dogma. A certain irrational set of believes. Well, here's my irrational set of beliefs. Something that works." --- Rodney A. Brooks, [LEVY92] CFS: See CLASSIFIER SYSTEM. CHROMOSOME: (biol) One of the chains of DNA found in cells. CHROMOSOMEs contain GENEs, each encoded as a subsection of the DNA chain. Chromosomes are usually present in all cells in an organism, even though only a minority of them will be active in any one cell. (EC) A datastructure which holds a `string' of task parameters, or GENEs. This may be stored, for example, as a binary bit- string, or an array of integers. CLASSIFIER SYSTEM: A system which takes a (set of) inputs, and produces a (set of) outputs which indicate some classification of the inputs. An example might take inputs from sensors in a chemical plant, and classify them in terms of: 'running ok', 'needs more water', 'needs less water', 'emergency'. See Q1.4 for more information. COMBINATORIAL OPTIMIZATION: Some tasks involve combining a set of entities in a specific way (e.g. the task of building a house). A general combinatorial task involves deciding (a) the specifications of those entities (e.g. what size, shape, material to make the bricks from), and (b) the way in which those entities are brought together (e.g. the number of bricks, and their relative positions). If the resulting combination of entities can in some way be given a FITNESS score, then COMBINATORIAL OPTIMIZATION is the task of designing a set of entities, and deciding how they must be configured, so as to give maximum fitness. cf ORDER-BASED PROBLEM. COMMA STRATEGY: Notation originally proposed in EVOLUTION STRATEGIEs, when a POPULATION of "mu" PARENTs generates "lambda" OFFSPRING and the mu parents are discarded, leving only the lambda INDIVIDUALs to compete directly. Such a process is written as a (mu,lambda) search. The process of only competing offspring then is a "comma strategy." cf. PLUS STRATEGY. CONVERGED: A GENE is said to have CONVERGED when 95% of the CHROMOSOMEs in the POPULATION all contain the same ALLELE for that gene. In some circumstances, a population can be said to have converged when all genes have converged. (However, this is not true of populations containing multiple SPECIES, for example.) Most people use "convergence" fairly loosely, to mean "the GA has stopped finding new, better solutions". Of course, if you wait long enough, the GA will *eventually* find a better solution (unless you have already found the global optimum). What people really mean is "I'm not willing to wait for the GA to find a new, better solution, because I've already waited longer than I wanted to and it hasn't improved in ages." An interesting discussion on convergence by Michael Vose can be found in GA-Digest v8n22, available from ftp.aic.nrl.navy.mil:/pub/galist/digests/v8n22 CONVERGENCE VELOCITY: The rate of error reduction. COOPERATION: The behavior of two or more INDIVIDUALs acting to increase the gains of all participating individuals. CROSSOVER: (EC) A REPRODUCTION OPERATOR which forms a new CHROMOSOME by combining parts of each of two `parent' chromosomes. The simplest form is single-point CROSSOVER, in which an arbitrary point in the chromosome is picked. All the information from PARENT A is copied from the start up to the crossover point, then all the information from parent B is copied from the crossover point to the end of the chromosome. The new chromosome thus gets the head of one parent's chromosome combined with the tail of the other. Variations exist which use more than one crossover point, or combine information from parents in other ways. (biol) A complicated process whereby CHROMOSOMEs, while engaged in the production of GAMETEs, exchange portions of genetic material. The result is that an almost infinite variety of gametes may be produced. Subsequently, during sexual REPRODUCTION, male and female gametes (i.e. sperm and ova) fuse to produce a new cell with a complete set of DIPLOID CHROMOSOMEs. In [HOLLAND92] the sentence "When sperm and ova fuse, matching CHROMOSOMEs line up with one another and then cross over partway along their length, thus swapping genetic material" is thus wrong, since these two activities occur in different parts of the life cycle. [eds note: If sexual REPRODUCTION (the Real Thing) worked like in GAs, then Holland would be right, but as we all know, it's not the case. We just encountered a Freudian slip of a Grandmaster. BTW: even the German translation of this article has this "bug", although it's well-hidden by the translator.] CS: See CLASSIFIER SYSTEM. D DARWINISM: (biol) Theory of EVOLUTION, proposed by Darwin, that evolution comes about through random variation of heritable characteristics, coupled with natural SELECTION (survival of the fittest). A physical mechanism for this, in terms of GENEs and CHROMOSOMEs, was discovered many years later. cf LAMARCKISM. (EC) Theory which inspired all branches of EC. DECEPTION: The condition where the combination of good BUILDING BLOCKs leads to reduced FITNESS, rather than increased fitness. Proposed by [GOLD89] as a reason for the failure of GAs on many tasks. DIPLOID CHROMOSOME: (biol) A CHROMOSOME which consists of a pair of homologous chromosomes, i.e. two chromosomes containing the same GENEs in the same sequence. In sexually reproducing SPECIES, the genes in one of the chromosomes will have been inherited from the father's GAMETE (sperm), while the genes in the other chromosome are from the mother's gamete (ovum). DNA: (biol) Deoxyribonucleic Acid, a double stranded macromolecule of helical structure (comparable to a spiral staircase). Both single strands are linear, unbranched nucleic acid molecules build up from alternating deoxyribose (sugar) and phosphate molecules. Each deoxyribose part is coupled to a nucleotide base, which is responsible for establishing the connection to the other strand of the DNA. The 4 nucleotide bases Adenine (A), Thymine (T), Cytosine (C) and Guanine (G) are the alphabet of the genetic information. The sequences of these bases in the DNA molecule determines the building plan of any organism. [eds note: suggested reading: James D. Watson (1968) "The Double Helix", London: Weidenfeld and Nicholson] (literature) Douglas Noel Adams, contemporary Science Fiction comedy writer. Published "The Hitch-Hiker's Guide to the Galaxy" when he was 25 years old, which made him one of the currently most successful British authors. [eds note: interestingly Watson was also 25 years old, when he discovered the DNA; both events are probably not interconnected; you might also want to look at: Neil Gaiman's (1987) "DON'T PANIC -- The Official Hitch-Hiker's Guide to the Galaxy companion", and of course get your hands on the wholly remarkable FAQ in alt.fan.douglas- adams] DNS: (biol) Desoxyribonukleinsaeure, German for DNA. (comp) The Domain Name System, a distributed database system for translating computer names (e.g. lumpi.informatik.uni- dortmund.de) into numeric Internet, i.e. IP-addresses ( and vice-versa. DNS allows you to hook into the net without remembering long lists of numeric references, unless your system administrator has incorrectly set-up your site's system. E EC: See EVOLUTIONARY COMPUTATION. ENCORE: The EvolutioNary Computation REpository Network. An network of anonymous FTP sites holding all manner of interesting things related to EC. The default "EClair" node is at the Santa Fe Institute (USA): alife.santafe.edu:/pub/USER-AREA/EC/ mirror sites include The Chinese University of Hong Kong: ftp.cs.cuhk.hk:/pub/EC/ The University of Warwick (United Kingdom): ftp.dcs.warwick.ac.uk:/pub/mirrors/EC/ EUnet Deutschland GmbH: ftp.Germany.EU.net:/pub/research/softcomp/EC/ and The California Institute of Technology: ftp.krl.caltech.edu:/pub/EC/ See Q15.3 for more information. ENVIRONMENT: (biol) That which surrounds an organism. Can be 'physical' (abiotic), or biotic. In both, the organism occupies a NICHE which influences its FITNESS within the total ENVIRONMENT. A biotic environment may present frequency-dependent fitness functions within a POPULATION, that is, the fitness of an organism's behaviour may depend upon how many others are also doing it. Over several GENERATIONs, biotic environments may foster co-evolution, in which fitness is determined with SELECTION partly by other SPECIES. EP: See EVOLUTIONARY PROGRAMMING. EPISTASIS: (biol) A "masking" or "switching" effect among GENEs. A biology textbook says: "A gene is said to be epistatic when its presence suppresses the effect of a gene at another locus. Epistatic genes are sometimes called inhibiting genes because of their effect on other genes which are described as hypostatic." (EC) When EC researchers use the term EPISTASIS, they are generally referring to any kind of strong interaction among GENEs, not just masking effects. A possible definition is: EPISTASIS is the interaction between different GENEs in a CHROMOSOME. It is the extent to which the contribution to FITNESS of one gene depends on the values of other genes. Problems with little or no EPISTASIS are trivial to solve (hillclimbing is sufficient). But highly epistatic problems are difficult to solve, even for GAs. High epistasis means that BUILDING BLOCKs cannot form, and there will be DECEPTION. ES: See EVOLUTION STRATEGY. EVOLUTION: That process of change which is assured given a reproductive POPULATION in which there are (1) varieties of INDIVIDUALs, with some varieties being (2) heritable, of which some varieties (3) differ in FITNESS (reproductive success). "Don't assume that all people who accept EVOLUTION are atheists" --- Talk.origin FAQ EVOLUTION STRATEGIE: EVOLUTION STRATEGY: A type of evolutionary algorithm developed in the early 1960s in Germany. It employs real-coded parameters, and in its original form, it relied on MUTATION as the search operator, and a POPULATION size of one. Since then it has evolved to share many features with GENETIC ALGORITHMs. See Q1.3 for more information. EVOLUTIONARILY STABLE STRATEGY: A strategy that does well in a POPULATION dominated by the same strategy. (cf Maynard Smith, 1974) EVOLUTIONARY COMPUTATION: Encompasses methods of simulating EVOLUTION on a computer. The term is relatively new and represents an effort bring together researchers who have been working in closely related fields but following different paradigms. The field is now seen as including research in GENETIC ALGORITHMs, EVOLUTION STRATEGIEs, EVOLUTIONARY PROGRAMMING, ARTIFICIAL LIFE, and so forth. For a good overview see the editorial introduction to Vol. 1, No. 1 of "Evolutionary Computation" (MIT Press, 1993). That, along with the papers in the issue, should give you a good idea of representative research. EVOLUTIONARY PROGRAMMING: An evolutionay algorithm developed in the mid 1960s. It is a stochastic OPTIMIZATION strategy, which is similar to GENETIC ALGORITHMs, but dispenses with both "genomic" representations and with CROSSOVER as a REPRODUCTION OPERATOR. See Q1.2 for more information. EVOLUTIONARY SYSTEMS: A process or system which employs the evolutionary dynamics of REPRODUCTION, MUTATION, competition and SELECTION. The specific forms of these processes are irrelevant to a system being described as "evolutionary." EXPECTANCY: Or expected value. Pertaining to a random variable X, for a continuous random variable, the expected value is: E(X) = INTEGRAL(-inf, inf) [X f(X) dX]. The discrete expectation takes a similar form using a summation instead of an integral. EXPLOITATION: When traversing a SEARCH SPACE, EXPLOITATION is the process of using information gathered from previously visited points in the search space to determine which places might be profitable to visit next. An example is hillclimbing, which investigates adjacent points in the search space, and moves in the direction giving the greatest increase in FITNESS. Exploitation techniques are good at finding local maxima. EXPLORATION: The process of visiting entirely new regions of a SEARCH SPACE, to see if anything promising may be found there. Unlike EXPLOITATION, EXPLORATION involves leaps into the unknown. Problems which have many local maxima can sometimes only be solved by this sort of random search. F FAQ: Frequently Asked Questions. See definition given near the top of part 1. FITNESS: (biol) Loosely: adaptedness. Often measured as, and sometimes equated to, relative reproductive success. Also proportional to expected time to extinction. "The fit are those who fit their existing ENVIRONMENTs and whose descendants will fit future environments." (J. Thoday, "A Century of Darwin", 1959). Accidents of history are relevant. (EC) A value assigned to an INDIVIDUAL which reflects how well the individual solves the task in hand. A "fitness function" is used to map a CHROMOSOME to a FITNESS value. A "fitness landscape" is the hypersurface obtained by applying the fitness function to every point in the SEARCH SPACE. FUNCTION OPTIMIZATION: For a function which takes a set of N input parameters, and returns a single output value, F, FUNCTION OPTIMIZATION is the task of finding the set(s) of parameters which produce the maximum (or minimum) value of F. Function OPTIMIZATION is a type of VALUE-BASED PROBLEM. FTP: File Transfer Protocol. A system which allows the retrieval of files stored on a remote computer. Basic FTP requires a password before access can be gained to the remote computer. Anonymous FTP does not require a special password: after giving "anonymous" as the user name, any password will do (typically, you give your email address at this point). Files available by FTP are specified as : See Q15.5. FUNCTION SET: (GP) The set of operators used in GP. These functions label the internal (non-leaf) points of the parse trees that represent the programs in the POPULATION. An example FUNCTION SET might be {+, -, *}. G GA: See GENETIC ALGORITHM. GAME THEORY: A mathematical theory originally developed for human games, and generalized to human economics and military strategy, and to EVOLUTION in the theory of EVOLUTIONARILY STABLE STRATEGY. GAME THEORY comes into it's own wherever the optimum policy is not fixed, but depends upon the policy which is statistically most likely to be adopted by opponents. GAMETE: (biol) Cells which carry genetic information from their PARENTs for the purposes of sexual REPRODUCTION. In animals, male GAMETEs are called sperm, female gametes are called ova. Gametes have HAPLOID CHROMOSOMEs. GAUSSIAN DISTRIBUTION: See NORMALLY DISTRIBUTED. GENE: (EC) A subsection of a CHROMOSOME which (usually) encodes the value of a single parameter. (biol) A unit of heredity. May be defined in different ways for different purposes. Molecular biologists sometimes use it in a more abstract sense. Following Williams (cf Q10.7) `any hereditary information for which there is a favorable or unfavorable SELECTION bias equal to several or many times its rate of endogenous change.' cf CHROMOSOME. GENE-POOL: The whole set of GENEs in a breeding POPULATION. The metaphor on which the term is based de-emphasizes the undeniable fact that genes actually go about in discrete bodies, and emphasizes the idea of genes flowing about the world like a liquid. "Everybody out of the GENE-POOL, now!" --- Author prefers to be anonymous GENERATION: (EC) An iteration of the measurement of FITNESS and the creation of a new POPULATION by means of REPRODUCTION OPERATORs. GENETIC ALGORITHM: A type of EVOLUTIONARY COMPUTATION devised by John Holland [HOLLAND92]. A model of machine learning that uses a genetic/evolutionary metaphor. Implementations typically use fixed-length character strings to represent their genetic information, together with a POPULATION of INDIVIDUALs which undergo CROSSOVER and MUTATION in order to find interesting regions of the SEARCH SPACE. See Q1.1 for more information. GENETIC DRIFT: Changes in gene/allele frequencies in a POPULATION over many GENERATIONs, resulting from chance rather than SELECTION. Occurs most rapidly in small populations. Can lead to some ALLELEs becoming `extinct', thus reducing the genetic variability in the population. GENETIC PROGRAMMING: GENETIC ALGORITHMs applied to programs. GENETIC PROGRAMMING is more expressive than fixed-length character string GAs, though GAs are likely to be more efficient for some classes of problems. See Q1.5 for more information. GENETIC OPERATOR: A search operator acting on a coding structure that is analogous to a GENOTYPE of an organism (e.g. a CHROMOSOME). GENOTYPE: The genetic composition of an organism: the information contained in the GENOME. GENOME: The entire collection of GENEs (and hence CHROMOSOMEs) possessed by an organism. GLOBAL OPTIMIZATION: The process by which a search is made for the extremum (or extrema) of a functional which, in EVOLUTIONARY COMPUTATION, corresponds to the FITNESS or error function that is used to assess the PERFORMANCE of any INDIVIDUAL. GP: See GENETIC PROGRAMMING. H HAPLOID CHROMOSOME: (biol) A CHROMOSOME consisting of a single sequence of GENEs. These are found in GAMETEs. cf DIPLOID CHROMOSOME. In EC, it is usual for CHROMOSOMEs to be haploid. HARD SELECTION: SELECTION acts on competing INDIVIDUALs. When only the best available individuals are retained for generating future progeny, this is termed "hard selection." In contrast, "soft selection" offers a probabilistic mechanism for maitaining individuals to be PARENTs of future progeny despite possessing relatively poorer objective values. I INDIVIDUAL: A single member of a POPULATION. In EC, each INDIVIDUAL contains a CHROMOSOME (or, more generally, a GENOME) which represents a possible solution to the task being tackled, i.e. a single point in the SEARCH SPACE. Other information is usually also stored in each individual, e.g. its FITNESS. INVERSION: (EC) A REORDERING operator which works by selecting two cut points in a CHROMOSOME, and reversing the order of all the GENEs between those two points. L LAMARCKISM: Theory of EVOLUTION which preceded Darwin's. Lamarck believed that evolution came about through the inheritance of acquired characteristics. That is, the skills or physical features which an INDIVIDUAL acquires during its lifetime can be passed on to its OFFSPRING. Although Lamarckian inheritance does not take place in nature, the idea has been usefully applied by some in EC. cf DARWINISM. LCS: See LEARNING CLASSIFIER SYSTEM. LEARNING CLASSIFIER SYSTEM: A CLASSIFIER SYSTEM which "learns" how to classify its inputs. This often involves "showing" the system many examples of input patterns, and their corresponding correct outputs. See Q1.4 for more information. M MIGRATION: The transfer of (the GENEs of) an INDIVIDUAL from one SUB- POPULATION to another. MOBOT: MOBile ROBOT. cf ROBOT. MUTATION: (EC) a REPRODUCTION OPERATOR which forms a new CHROMOSOME by making (usually small) alterations to the values of GENEs in a copy of a single, PARENT chromosome. N NICHE: (biol) In natural ecosystems, there are many different ways in which animals may survive (grazing, hunting, on the ground, in trees, etc.), and each survival strategy is called an "ecological niche." SPECIES which occupy different NICHEs (e.g. one eating plants, the other eating insects) may coexist side by side without competition, in a stable way. But if two species occupying the same niche are brought into the same area, there will be competition, and eventually the weaker of the two species will be made (locally) extinct. Hence diversity of species depends on them occupying a diversity of niches (or on geographical separation). (EC) In EC, we often want to maintain diversity in the POPULATION. Sometimes a FITNESS function may be known to be multimodal, and we want to locate all the peaks. We may consider each peak in the fitness function as analogous to a NICHE. By applying techniques such as fitness sharing (Goldberg & Richardson, [ICGA87]), the population can be prevented from converging on a single peak, and instead stable SUB-POPULATIONs form at each peak. This is analogous to different SPECIES occupying different niches. See also SPECIES, SPECIATION. NORMALLY DISTRIBUTED: A random variable is NORMALLY DISTRIBUTED if its density function is described as f(x) = 1/sqrt(2*pi*sqr(sigma)) * exp(-0.5*(x-mu)*(x- mu)/sqr(sigma)) where mu is the mean of the random variable x and sigma is the STANDARD DEVIATION. O OBJECT VARIABLES: Parameters that are directly involved in assessing the relative worth of an INDIVIDUAL. OFFSPRING: An INDIVIDUAL generated by any process of REPRODUCTION. OPTIMIZATION: The process of iteratively improving the solution to a problem with respect to a specified objective function. ORDER-BASED PROBLEM: A problem where the solution must be specified in terms of an arrangement (e.g. a linear ordering) of specific items, e.g. TRAVELLING SALESMAN PROBLEM, computer process scheduling. ORDER-BASED PROBLEMs are a class of COMBINATORIAL OPTIMIZATION problems in which the entities to be combined are already determined. cf VALUE-BASED PROBLEM. ONTOGENESIS: Refers to a single organism, and means the life span of an organism from it's birth to death. cf PHYLOGENESIS. P PANMICTIC POPULATION: (EC, biol) A mixed POPULATION. A population in which any INDIVIDUAL may be mated with any other individual with a probability which depends only on FITNESS. Most conventional evolutionary algorithms have PANMICTIC POPULATIONs. The opposite is a POPULATION divided into groups known as SUB- POPULATIONs, where INDIVIDUALs may only mate with others in the same sub-population. cf SPECIATION. PARENT: An INDIVIDUAL which takes part in REPRODUCTION to generate one or more other individuals, known as OFFSPRING, or children. PERFORMANCE: cf FITNESS. PHENOTYPE: The expressed traits of an INDIVIDUAL. PHYLOGENESIS: Refers to a POPULATION of organisms. The life span of a POPULATION of organisms from pre-historic times until today. cf ONTOGENESIS. PLUS STRATEGY: Notation originally proposed in EVOLUTION STRATEGIEs, when a POPULATION of "mu" PARENTs generates "lambda" OFFSPRING and all mu and lambda INDIVIDUALs compete directly, the process is written as a (mu+lambda) search. The process of competing all parents and offspring then is a "plus strategy." cf. COMMA STRATEGY. POPULATION: A group of INDIVIDUALs which may interact together, for example by mating, producing OFFSPRING, etc. Typical POPULATION sizes in EC range from 1 (for certain EVOLUTION STRATEGIEs) to many thousands (for GENETIC PROGRAMMING). cf SUB- POPULATION. R RECOMBINATION: cf CROSSOVER. REORDERING: (EC) A REORDERING operator is a REPRODUCTION OPERATOR which changes the order of GENEs in a CHROMOSOME, with the hope of bringing related genes closer together, thereby facilitating the production of BUILDING BLOCKs. cf INVERSION. REPRODUCTION: (biol, EC) The creation of a new INDIVIDUAL from two PARENTs (sexual REPRODUCTION). Asexual reproduction is the creation of a new individual from a single parent. REPRODUCTION OPERATOR: (EC) A mechanism which influences the way in which genetic information is passed on from PARENT(s) to OFFSPRING during REPRODUCTION. REPRODUCTION OPERATORs fall into three broad categories: CROSSOVER, MUTATION and REORDERING operators. REQUISITE VARIETY: In GENETIC ALGORITHMs, when the POPULATION fails to have a "requisite variety" CROSSOVER will no longer be a useful search operator because it will have a propensity to simply regenerate the PARENTs. ROBOT: "The Encyclopedia Galactica defines a ROBOT as a mechanical apparatus designed to do the work of man. The marketing division of the Sirius Cybernetics Corporation defines a robot as `Your Plastic Pal Who's Fun To Be With'." --- Douglas Adams (1979) S SAFIER: An EVOLUTIONARY COMPUTATION FTP Repository, now defunct. Superceeded by ENCORE. SCHEMA: A pattern of GENE values in a CHROMOSOME, which may include `dont care' states. Thus in a binary chromosome, each SCHEMA (plural schemata) can be specified by a string of the same length as the chromosome, with each character one of {0, 1, #}. A particular chromosome is said to `contain' a particular schema if it matches the schema (e.g. chromosome 01101 matches schema #1#0#). The `order' of a SCHEMA is the number of non-dont-care positions specified, while the `defining length' is the distance between the furthest two non-dont-care positions. Thus #1#0# is of order 2 and defining length 3. SCHEMA THEOREM: Theorem devised by Holland [HOLLAND92] to explain the behaviour of GAs. In essence, it says that a GA gives exponentially increasing reproductive trials to above average schemata. Because each CHROMOSOME contains a great many schemata, the rate of SCHEMA processing in the POPULATION is very high, leading to a phenomenon known as implicit parallelism. This gives a GA with a population of size N a speedup by a factor of N cubed, compared to a random search. SEARCH SPACE: If the solution to a task can be represented by a set of N real- valued parameters, then the job of finding this solution can be thought of as a search in an N-dimensional space. This is referred to simply as the SEARCH SPACE. More generally, if the solution to a task can be represented using a representation scheme, R, then the search space is the set of all possible configurations which may be represented in R. SEARCH OPERATORS: Processes used to generate new INDIVIDUALs to be evaluated. SEARCH OPERATORS in GENETIC ALGORITHMs are typically based on CROSSOVER and point MUTATION. Search operators in EVOLUTION STRATEGIEs and EVOLUTIONARY PROGRAMMING typically follow from the representation of a solution and often involve Gaussian or lognormal perturbations when applied to real-valued vectors. SELECTION: The process by which some INDIVIDUALs in a POPULATION are chosen for REPRODUCTION, typically on the basis of favoring individuals with higher FITNESS. SELF-ADAPTATION: The inclusion of a mechanism not only to evolve the OBJECT VARIABLES of a solution, but simultaneously to evolve information on how each solution will generate new OFFSPRING. SIMULATION: The act of modeling a natural process. SOFT SELECTION: The mechanism which allows inferior INDIVIDUALs in a POPULATION a non-zero probability of surviving into future GENERATIONs. See HARD SELECTION. SPECIATION: (biol) The process whereby a new SPECIES comes about. The most common cause of SPECIATION is that of geographical isolation. If a SUB-POPULATION of a single species is separated geographically from the main POPULATION for a sufficiently long time, their GENEs will diverge (either due to differences in SELECTION pressures in different locations, or simply due to GENETIC DRIFT). Eventually, genetic differences will be so great that members of the sub-population must be considered as belonging to a different (and new) species. SPECIATION is very important in evolutionary biology. Small SUB- POPULATIONs can evolve much more rapidly than a large POPULATION (because genetic changes don't take long to become fixed in the population). Sometimes, this EVOLUTION will produce superior INDIVIDUALs which can outcompete, and eventually replace the SPECIES of the original, main population. (EC) Techniques analogous to geographical isolation are used in a number of GAs. Typically, the POPULATION is divided into SUB- POPULATIONs, and INDIVIDUALs are only allowed to mate with others in the same sub-population. (A small amount of MIGRATION is performed.) This produces many SUB-POPULATIONs which differ in their characteristics, and may be referred to as different "species". This technique can be useful for finding multiple solutions to a problem, or simply maintaining diversity in the SEARCH SPACE. Most biology/genetics textbooks contain information on SPECIATION. A more detailed account can be found in "Genetics, Speciation and the Founder Principle", L.V. Giddings, K.Y. Kaneshiro and W.W. Anderson (Eds.), Oxford University Press 1989. SPECIES: (biol) There is no universally-agreed firm definition of a SPECIES. A species may be roughly defined as a collection of living creatures, having similar characteristics, which can breed together to produce viable OFFSPRING similar to their PARENTs. Members of one species occupy the same ecological NICHE. (Members of different species may occupy the same, or different niches.) (EC) In EC the definition of "species" is less clear, since generally it is always possible for a pair INDIVIDUALs to breed together. It is probably safest to use this term only in the context of algorithms which employ explicit SPECIATION mechanisms. (biol) The existence of different SPECIES allows different ecological NICHEs to be exploited. Furthermore, the existence of a variety of different species itself creates new niches, thus allowing room for further species. Thus nature bootstraps itself into almost limitless complexity and diversity. Conversely, the domination of one, or a small number of SPECIES reduces the number of viable NICHEs, leads to a decline in diversity, and a reduction in the ability to cope with new situations. "Give any one species too much rope, and they'll fuck it up" --- Roger Waters, "Amused to Death", 1992 STANDARD DEVIATION: A measurement for the spread of a set of data; a measurement for the variation of a random variable. STATISTICS: Descriptive measures of data; a field of mathematics that uses probability theory to gain insight into systems' behavior. STEPSIZE: Typically, the average distance in the appropriate space between a PARENT and its OFFSPRING. STRATEGY VARIABLE: Evolvable parameters that relate the distribution of OFFSPRING from a PARENT. SUB-POPULATION: A POPULATION may be sub-divided into groups, known as SUB- POPULATIONs, where INDIVIDUALs may only mate with others in the same group. (This technique might be chosen for parallel processors). Such sub-divisions may markedly influence the evolutionary dynamics of a population (e.g. Wright's 'shifting balance' model). Sub-populations may be defined by various MIGRATION constraints: islands with limited arbitrary migration; stepping-stones with migration to neighboring islands; isolation-by-distance in which each individual mates only with near neighbors. cf PANMICTIC POPULATION, SPECIATION. SUMMERSCHOOL: (USA) One of the most interesting things in the US educational system: class work during the summer break. T TERMINAL SET: (GP) The set of terminal (leaf) nodes in the parse trees representing the programs in the POPULATION. A terminal might be a variable, such as X, a constant value, such as 42, (cf Q42) or a function taking no arguments, such as (move-north). TRAVELLING SALESMAN PROBLEM: The travelling salesperson has the task of visiting a number of clients, located in different cities. The problem to solve is: in what order should the cities be visited in order to minimise the total distance travelled (including returning home)? This is a classical example of an ORDER-BASED PROBLEM. TSP: See TRAVELLING SALESMAN PROBLEM. U USENET: "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." --- Gene Spafford (1992) V VALUE-BASED PROBLEM: A problem where the solution must be specified in terms of a set of real-valued parameters. FUNCTION OPTIMIZATION problems are of this type. cf SEARCH SPACE, ORDER-BASED PROBLEM. VECTOR OPTIMIZATION: Typically, an OPTIMIZATION problem wherein multiple objectives must be satisfied. Z ZEN NAVIGATION: A methodology with tremendous propensity to get lost during a hike from A to B. ZEN NAVIGATION simply consists in finding something that looks as if it knew where it is going to and follow it. The results are more often surprising than successful, but it's usually being worth for the sake of the few occasions when it is both. Sometimes ZEN NAVIGATION is referred to as "doing scientific research," where A is a state of mind being considered as pre- PhD, and B (usually a different) state of mind, known as post- PhD. While your time spent in state C, somewhere inbetween A and B, is usually referred to as "being a nobody." ACKNOWLEDGMENTS Finally, credit where credit is due. I'd like to thank all the people who helped in assembling this guide, and their patience with my "variations on English grammar". In the order I received their contributions, thanks to: Contributors, Lutz Prechelt (University of Karlsruhe) the comp.ai.neural-nets FAQmeister, for letting me strip several ideas from "his" FAQ. Ritesh "peace" Bansal (CMU) for lots of comments and references. David Beasley (University of Wales) for a valuable list of publications (Q12), and many further additions. David Corne, Peter Ross, and Hsiao-Lan Fang (University of Edinburgh) for their TIMETABLING and JSSP entries. Mark Kantrowitz (CMU) for mocking about this-and-that, and being a "mostly valuable" source concerning FAQ maintenance; parts of A11 have been stripped from "his" ai- faq/part4 FAQ; Mark also contributed the less verbose ARCHIVE SERVER infos. The texts of Q1.1, Q1.5, Q98 and some entries of Q99 are courtesy by James Rice (Stanford University), stripped from his genetic-programming FAQ (Q15). Jonathan I. Kamens (MIT) provided infos on how-to-hook-into the USENET FAQ system. Una Smith (Yale University) contributed the better parts of the Internet resources guide (Q15.5). Daniel Polani (Gutenberg University, Mainz) "contributed" the Alife II Video proceedings info. Jim McCoy (University of Texas) reminded me of the GP archive he maintains (Q20). Ron Goldthwaite (UC Davis) added definitions of Environment, Evolution, Fitness, and Population to the glossary, and some thoughts why Biologists should take note of EC (Q3). Joachim Geidel (University of Karlsruhe) sent a diff of the current "navy server" contents and the software survey, pointing to "missing links" (Q20). Richard Dawkins "Glossary" section of "The extended phenotype" served for many new entries, too numerous to mention here (Q99). Mark Davis (New Mexico State University) wrote the part on Evolutionary Programming (Q1.2). Dan Abell (University of Maryland) contributed the section on efficient greycoding (Q21). Walter Harms (University of Oldenburg) commented on introductory EC literature. Lieutenant Colonel J.S. Robertson (USMA, West Point), for providing a home for this subversive posting on their FTP server euler.math.usma.edu/pub/misc/GA Rosie O'Neill for suggesting the PhD thesis entry (Q10.11). Charlie Pearce (University of Nottingham) for critical remarks concerning "tables"; well, here they are! J. Ribeiro Filho (University College London) for the pointer to the IEEE Computer GA Software Survey; the PeGAsuS description (Q20) was stripped from it. Paul Harrald for the entry on game playing (Q2). Reviewers, Robert Elliott Smith (The University of Alabama) reviewed the TCGA infos (Q14), and Nici Schraudolph (UCSD) first unconsciously, later consciously, provided about 97% of Q20* answers. Nicheal Lynn Cramer (BBN) adjusted my historic view of GP genesis. David Fogel (ORINCON) commented and helped on this-and-that (where this-and-that is closely related to EP), and provided many missing entries for the glossary (Q99). Kazuhiro M. Saito (MIT) and Mark D. Smucker (Iowa State) caught my favorite typo(s). Craig W. Reynolds was the first who solved one of the well-hidden puzzles in the FAQ, and also added some valuable stuff. Joachim Born (TU Berlin) updated the Evolution Machine (EM) entry and provided the pointer to the Bionics technical report ftp site (Q14). Pattie Maes (MIT Media Lab) reviewed the ALIFE IV additions to the list of conferences (Q12). Scott D. Yelich (Santa Fe Institute) reviewed the SFI connectivity entry (Q15). Rick Riolo (MERIT) reviewed the CFS-C entry (Q20). Davika Seunarine (Acadia Univ.) for smoothing out this and that. Paul Field (Queen Mary and Westfield College) for correcting typos, and providing insights into the blindfold pogo-sticking nomads of the Himalayas. and Everybody... Last not least I'd like to thank Hans-Paul Schwefel, Thomas Baeck, Frank Kursawe, Guenter Rudolph for their contributions, and the rest of the Systems Analysis Research Group for wholly remarkable patience and almost incredible unflappability during my various extravangances and ego-trips during my time (1990-1993) with this group. It was a tremendously worthwhile experience. Thanks! "And all our yesterdays have lighted fools; the way to dusty death; out, out brief candle; life's but a walking shadow; a poor player; that struts and gets his hour upon the stage; and then is heared no more; it is a tale; told by an idiot, full of sound an fury, signifying nothing." --- Shakespeare, Macbeth EPILOGUE "Natural selection is a mechanism for generating an exceedingly high degree of improbability." --- Sir Ronald Aylmer Fisher (1890-1962) This is a GREAT quotation, it sounds like something directly out of a turn of the century Douglas Adams: Natural selection: the original "Infinite Improbability Drive" --- Craig Reynolds, on reading the previous quote "`The Babel fish,' said The Hitch Hiker's Guide to the Galaxy quietly, `is small, yellow and leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from his own carrier but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centers of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish. `Now it is such a bizarrely improbable coincidence than anything so mindbogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the non-existence of God. `The argument goes something like this: ``I refuse to prove that I exist,'' says God, ``for proof denies faith, and without faith I am nothing.'' ``But,'' says Man, ``The Babel fish is a dead giveaway isn't it? It could not have evolved by chance. It proves you exist, and so therefore, by your own arguments, you don't. QED.'' ``Oh dear,'' says God, ``I hadn't thought of that,'' and promptly vanishes in a puff of logic. ``Oh, that was easy,'' says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing." --- Douglas Adams (1979) "Well, people; I really wish this thingie to turn into a paper babel- fish for all those young ape-descended organic life forms on this crazy planet, who don't have any clue about what's going on in this exciting "new" research field, called Evolutionary Computation. However, this is just a start, I need your help to increase the usefulness of this guide, especially its readability for natively English speaking folks; whatever it is: I'd like to hear from you...!" --- The Editor, Joerg Heitkoetter (1993) "Parents of young organic life forms should be warned, that paper babel-fishes can be harmful, if stuck too deep into the ear." --- Encyclopedia Galactica ------------------------------ End of ai-faq/genetic/part6 ***************************


E-Mail Fredric L. Rice / The Skeptic Tank