Technical Article

Applications and Limitations of Genetic Programming

December 27, 2019 by Greg Schmidt

Learn about the applications and future directions of genetic programming.

At this point in the genetic programming series (GP), we've learned about what genetic programming is and how it represents information, how genetic operators work in evolutionary algorithms, and worked through evolving a sorting program through symbolic regression.

Here, we'll take a high-level look at what this technology can accomplish as it develops.

 

Practical Considerations

In order to understand this final chapter of our series, let's remember a XOR example we discussed in the first part of this series:

 

A perfect solution to the XOR problem discovered by GP:
(defun program ()
      (AND (OR X1 X2)
            (NOT (AND X1 X2) )
      )
)

Listing 1. XOR result

 

Let's also look back at the symbolic regression example from the previous article:

 

A polynomial approximation to sin(x) in the interval (0 <= x <= pi/2)

(defun program ()
      (+ (* (* (* X X) X)
            (* 0.2283 -0.6535))
      X)
)

Simplifying the above program yields the following equivalent equation:

polysin(x) = -.1492 x3 + x

Results:

 

x sin(x) polysin(x)
0.000 0.000 0.000
0.078 0.077 0.077
0.156 0.155 0.155
0.234 0.231 0.232
0.312 0.306 0.307
0.390 0.380 0.381
0.468 0.451 0.452
0.546 0.519 0.521
0.624 0.584 0.587
0.702 0.645 0.650
0.780 0.703 0.709
0.858 0.756 0.763
0.936 0.805 0.813
1.014 0.848 0.858
1.092 0.887 0.897
1.170 0.920 0.931
1.248 0.948 0.957
1.326 0.970 0.978
1.404 0.986 0.991
1.482 0.996 0.996
1.560 0.999 0.993

Listing 4. Symbolic regression

 

Notice that both the XOR and the symbolic regression examples presented here return a single value when evaluated.

This characteristic need not be a restriction as it is certainly possible for a function or terminal to have a side effect when executed. In fact, this is the case for the sorting program which contains a function with the potential side effect of exchanging a pair of elements in a vector. In practice, it is common for side effects to be present. Some additional examples of useful side effects are assigning one variable to another or changing the direction a robot is facing.

Our function set may include conditional functions that provide the evolved programs with the ability to make decisions. Conditional functions selectively evaluate their arguments. As an example, consider a function, with an arity of three, such as (if arg1 arg2 arg3). The function is evaluated by evaluating the first argument and if the result is true, the second argument is evaluated; otherwise, the third argument is evaluated. Iterative constructs are possible, too, as a function may evaluate one of its arguments multiple times. Additional complexity is introduced, however, by the necessity to limit the number of iterations and nesting level so as to avoid a situation where the evaluation of an individual can take an excessively long time. Some work has been done to allow recursive formulations, although success in this area has been somewhat limited. 

Although the results of GP systems tend to be LISP-like programs, a GP system need not be implemented in LISP. In fact, many systems are implemented in C or C++. A linearized representation of the program tree can be employed and the overhead of dynamic memory along with expensive garbage collection can be avoided. The efficiency of the fitness function deserves special attention as it is often a bottleneck due to the large number of times it is invoked during each generation. An excellent paper discussing various implementation techniques can be found in Advances in Genetic Programming (cited in the Suggested Reading section below).

As in other machine-learning paradigms such as neural networks, the potential to over fit the training data (GP test cases) exists. Overfitting can occur when the solution effectively "memorizes" the data, thus providing little more than an elaborate lookup table. One simple way to help curtail this effect is through the use of a parsimony factor. A parsimony factor is typically a small fraction multiplied by the number of nodes within the program tree, the result of which is incorporated into the fitness function. The idea is to reward smaller, presumably more general solutions. In addition, you are encouraged to use appropriate experimental design techniques. For example, if you are attempting to develop a model to be used for prediction, it is a good idea to restrict the fitness evaluation to a subset of the available data. In this way, the remaining data can be used to measure the predictive performance of the resulting model.

As is the case with evolutionary algorithms, GP provides no guarantee of finding an exact solution or even an acceptable solution. Results may vary greatly from one run to another. Often the process prematurely converges upon a local minima. Performance is highly dependent upon the problem complexity, its representation as characterized by the choice of functions and terminals, and the properties of the fitness function.

 

Applications for Genetic Programming

Genetic programming has been successfully applied to problems occurring in such areas as:

  • Circuit design
  • Combinatorial optimization
  • Control systems
  • Curve fitting/regression analysis
  • Data compression
  • Economic forecasting/financial modeling
  • Game strategy acquisition
  • Mathematical sequence induction
  • Neural network design
  • Pattern recognition and classification
  • Random number generation
  • Robot navigation
  • Symbolic integration and differentiation
  • Three-dimensional design

 

Future Directions

Depending upon the complexity of the problem, several GP runs may be required to find an acceptable solution if one can be found at all. Ideally, we would like GP to "scale up" as the complexity of the problem increases. Finding good ways to accomplish this goal is an active area of research. As in conventional programming, the notion of building higher- level representations through the use of subroutines is one way to approach this problem. In genetic programming II, Koza discusses methods that can be used to discover reusable subroutines and presents results that support the ability of hierarchical, modular programs to solve more complex problems. 

As we have seen, genetic programming couples the self-organizing characteristics of the genetic algorithm with the representational power and generality of S-expressions. The elegance of this approach simplifies problem specification to that of providing a domain-specific fitness function and an appropriate function and terminal set. Applicable to a diverse range of problems, genetic programming continues to be a fertile ground for research.

Still in its infancy, future breakthroughs may lead us one step further to that Holy Grail of systems capable of authoring their own programs.

 

Suggested Reading

  • Kinnear, Jr., K. E. (ed.). Advances in Genetic Programming. Cambridge, Mass.: The MIT Press, 1994.
  • Knuth, D. E. The Art of Computer Programming, Volume 3, Sorting and Searching. Reading, Mass.: Addison-Wesley, 1973
  • Koza, J. R. Genetic Programming. Cambridge, Mass.: The MIT Press, 1992.
  • Koza, J. R. Genetic Programming II. Cambridge, Mass.: The MIT Press, 1994.
  • Montana, D. J. “Strongly Typed Genetic Programming.” BBN Technical Report #7866, May 1993.
  • Mitchell, Melanie, An Introduction to Genetic Algorithms, The MIT Press, 1998.