Parameter Estimations for Industrial Enzyme Processes Using Genetic Algorithms D. M. Zuercher Abstract- A parallel genetic algorithm used to simulate and optimize initial parameters for an industrial enzymatic batch process is described. The goal was to determine initial dosing parameters by simulating the enzyme activity. The optimization of the dosing scheme was completed using a genetic algorithm. Application of the model to the production of 65% maltose using Genencor? Spezyme? BBA and Genencor? Spezyme? AAL from corn starch was also performed. 1. Introduction An industrial enzyme system is a large scale manufacturing process in which chemical conversions are performed using a dosing schema of enzymes to achieve the desired product. The production of maltose is performed using an industrial enzyme system. This product is manufactured using a multi-step enzyme dosing scheme with amylose and amylopectin derived from corn starch as the substrate. The enzymes modeled are Genencor? Spezyme? BBA [1997] and Genencor? Spezyme? AAL [1997]. The dosing amount, order of enzyme addition, and reaction time are a few of the many constraints that must be considered. Nehete [1992] has identified one dosing protocol for the production of high purity maltose from tapioca starch by optimizing parameters sequentially. This project is intended to present an algorithm to determine an optimized protocol for generic batch enzymatic processes using the industrial batch production of maltose as an example. The mathematical model of the enzyme kinetics proposed is presented with the introduction and definition of parameters (Section 2). Section 3 describes the genetic algorithm and the population control function used to control the rate of convergence. Section 4 discusses the mathematical model of an ideal enzyme system and the model of the two enzymes used in the production of maltose. An application of the model to maltose production is presented in Section 5. The results are presented in Section 6 and summarized in Section 7. 2. Mathematical Model The enzyme kinetic system, EKS, is defined as the kinetic modeling of an enzyme system. The kinetic expression of the enzyme is mathematically defined as the product concentration gradient of a specific species ci with respect to time t and the function, f, dependent on numerous non-negative floating point variables. This can be mathematically defined as: In this definition, ci is the concentration of a specific chemical component, and ?i are environmental factors which affect enzyme expression. For example, the production of maltose by an enzymatic reaction would include the polysaccharide profile (including maltose) represented by ci with environmental influences represented by??i such as pH, DS, enzyme bindings, cofactors, and inhibitors. Mathematically, this example can be represented as: Froment [1979] explains that a chemical reactor can be described through the completion of three phenomena: reaction rate equation (continuity), energy, and momentum equations. The momentum equations involving pressure drops and friction forces can be ignored in this model since these phenomena are not simulated. Furthermore, the reaction temperature remains constant throughout the simulation (and heat loss is assumed to be negligible). With these assumptions, the energy balance can be neglected. Equation 1 and equation 2 are expected to meet the requirements for the conservation of mass (continuity). In order to simulate such a sequence of differential equations as defined in (1) and (2), a design of experiment would need to be performed with multivariate relationships. This would prove to be economically infeasible due to the large industrial scale and time consuming nature that multivariate experiments would require. Furthermore, due to the nature of industrial processes, this may become impossible when large "swings" or deviations occur in the manufacturing process which cannot be easily controlled or immediately identified. In comparison, experiments involving a single variable with constant parameters (or quasi-constant parameters) occur more often in quality control and other laboratories. When considering this data, it must be noted that dependencies between multiple variables are not expressed. However, for estimation of initial enzyme dosing with continual evaluation of the enzyme's expression over time, this data can be of tremendous value. Furthermore, the application of this model will be implemented as an individual parallel process consisting of the tuple: [Rus, 1996]. It is expected that competing enzymes and other multiple enzyme phenomena included in equations (1) and (2) will be expressed as activity by the process. Figure 1 Figure 1 identifies a typical graph of the effect of a single parameter on the relative activity of the enzyme of interest. Activity is defined as the rate at which substrate is converted to product. Introducing the local response factor, ??, allows the estimation of the effect of parameter ? on the relative activity. Mathematically, The rate at which the enzyme converts substrate to product at time t, Et, can be estimated if the local response factors, ??, for the enzyme of interest are quantified. Equation 4 defines the model for the simulation of each enzyme. An additional parameter that must be considered is half-life. Half-life is defined as the time required for half of the activity in an enzyme to decay. By simulating time, the activity can be adjusted dependent on the half-life of the enzyme. A pseudocode representation of an enzyme is defined in Code 1. CODE 1: elapsed_time=0; elapsed_halftime=0; determine initial activity corrected for ??; determine half-life; repeat for(j=0;jhalf-life) activity=activity/2; elapsed_halftime=0; until elapsed_time>=desired time 3. Genetic Algorithm The complexity of the factors in the EKS model require an algorithm be given the ability to adapt to a range of environments in the determination of the optimized scheme. Furthermore, an algorithm for this model would need the ability to simulate the EKS using different parameters to determine the resulting polysaccharide profile. This requirement allows an algorithm to create variations in the parameters to yield polysaccharide profiles which can be analyzed for optimum. A genetic algorithm fulfills these requirements. 3.1 Structure of Genetic Algorithms Genetic algorithms use the principles of evolution to evaluate a function and determine an optimized solution. The solution is refined using genetic operators which manipulate the numerical values in a genetic code representing the variables in a specific problem. The algorithm is allowed to produce children which contain small genetic differences from their parent. The child is then evaluated using a selection function and compared with its parent. Only children which meet the specific selection criteria are allowed to survive. The implementation of the EKS by a Genetic Algorithm with Varying Population Size, GAVaPS, is explained in Michalewicz [1996]. Code 2 details this approach in pseudocode. CODE 2: t=0 initialize P(t) (Parallel) evaluate P(t) while(not termination) do begin t=t+1 increase age of chromosomes recombine P(t) (Parallel) evaluate P(t) kill: chromosomes where (age >= lifetime) end In contrast to other evolutionary programs, the population in GAVaPS is allowed to fluctuate as the algorithm proceeds introducing the concept of "age" of a chromosome. This fluctuation is the convergence of an algorithm moving from poor or infeasible solutions which are slowly removed from the population to an optimal solution. 3.2 Defined Parameters for GAVaPS The term population defines the number of chromosomes which survive into the next generation. In GAVaPS, the size of the population is allowed to vary with a defined maximum population size, pop_size. An initial population size, init_pop, is randomly selected. A chromosome is defined as a dosing scheme encapsulated into a digit string. This dosing scheme represents the set of parameters in which the simulation is performed including the order of enzyme addition. In this model, the genetic algorithm can be summarized by three different data components: gene, chromosome, and population. The gene is defined by a short sequence of digits which represent a dosing of enzyme into the EKS. This addition can be one or two specific enzyme(s). The chromosome is the sequence of three genes which depicts three additions allowing for the simulation of six enzyme activities. The population is the set of chromosomes which are evaluated. Using GAVaPS, the fluctuation of the population directly relates with the number of feasible dosing schemes determined by the algorithm so far. Figure 2 shows a chromosome depicting a typical dosing scheme with three processes of enzyme addition. The chromosomes in the algorithm compose the population. Figure 2 Gene# 1 Description BBA Dosage AAL Dosage time of addition pH adjustment 2 BBA Dosage AAL Dosage time of addition pH adjustment 3 BBA Dosage AAL Dosage time of addition pH adjustment Gene 1 time of addition <= Gene 2 time of addition <= Gene 3 time of addition Each chromosome is allocated a lifetime interval which is the maximum number of generations the chromosome is allowed to survive. The fitness of an individual is the evaluation of the function to be optimized using the parameters encoded in the genetic material of the chromosome. In this model, fitness is the cost of a specific dosing scheme (chromosome) if the resulting polysaccharide profile meets the product requirements. If the chromosome does not meet the product requirements, the fitness is set to zero and the chromosome is killed. 3.3 Initialization function The initialization function is responsible for the generation of the initial population of chromosomes. The first chromosome is represented as the current dosing scheme used in real-time experiments, if possible. The remaining chromosomes are randomly generated. The maximum life of each chromosome is determined. The allocation of the lifetime parameter is determined using the following linear allocation function: where AbsFitMin and AbsFitMax are the maximal and minimal values found at any point in the execution of the algorithm, MinLT is a predefined constant representing the minimum lifetime, and fitness[i] is the calculated fitness. 3.4 Evaluate function The evaluation of the chromosome requires that each dosing schema is executed to determine the result. Each enzyme in each addition is represented as a thread which manipulates substrate cells in a grid. The initial concentration of substrate in the grid can be controlled and the resulting product in the grid can be computed as the interaction of the enzymes with the grid proceeds. The simulation of the schema is computationally expensive and will be computed in parallel to be discussed later. The evaluation of each chromosome requires that the chromosome meet the desired polysaccharide profile at the end of the experimental simulation. Chromosomes which survive are given a cost value using proprietary information. For example, if one addition of x grams of enzyme at y cost per gram yields the desired polysaccharide profile, the evaluate function would return a fitness value of xy. Furthermore, the local response factor of a specific substrate species on a specific product species can be determined and adjustments can be made to reflect the concentration change. The concentration of a species in the grid and its effect on the other cells in the grid is a direct modification of the cellular automata paradigm presented by Hansen [1995]. 3.5 Recombine function Recombination of chromosomes allows the individuals to adapt to their environment. This adaptation occurs in the form of increased diversity and an increased likelihood of survival in an optimized solution. The chromosomes are recombined using three genetic operators to create diversity among the individuals. The first genetic operator, mutation, causes a single digit change in an individual to occur with the predefined probability, pm. This results in small genetic changes to an individual. Pattern crossovers of two chromosomes between two specific loci represent the second genetic operator and are controlled using the probability of pattern crossover, ppc. This operator enables beneficial genes (dosing addition) to be transferred to another individual. Finally, chromosome crossovers occur at the rate defined by pcc and are defined as the exchange of genetic material between two chromosomes at a specific randomly generated loci. Figure 3 explains an implementation of the three genetic operators. In addition, to ensure that the population meets the constraints for time and allowable enzyme dosages, a gene ordering operation is performed on each chromosome. This operator orders the addition of enzyme by time and ensures the enzyme dosage is within predefined allowable values. see Figure 3 definition: each position has pm probability of being changed effect on algorithm: individuals with diversity are created which allow the algorithm to search within a solution space definition: a gene is determined and the genetic material contained in the gene is exchanged between the individuals effect on algorithm: this operator allows the population to transfer genes to each other resulting in increased diversity and duplication of beneficial definition: the genes after a specific loci are exchanged between the individuals effect on algorithm: operator allows for increased diversity in the population 3.6 Parallel Evaluate function Due to the computational requirements of this simulation, a necessity for parallelization exists. Considering the functional unit, two methods of parallelization exist to complete the simulation. A. Functional Unit I: Enzymes as Parallel Entities Spawned by Evaluate Enzymes involved in the experiment are spawned from each parallel evaluation() function and interact with the other enzymes in the experiment through shared memory and semaphores on a substrate grid. This method is depicted in Figure 4. In this depiction, two parallel evaluate functions (1) and (2) each spawn two parallel enzyme children which will react with two different substrate grids. Figure 4: B. Functional Unit II: Enzymes as an Integrated Unit of Evaluate Enzymes involved in the experiment are integrated into the parallel evaluation() function. The interaction with other enzymes occurs serially within the evaluate function. This method is depicted in Figure 5. In this depiction, two parallel evaluate functions (1) and (2) with two enzymes react with two different substrate grids. Figure 5: 3.7 Kill function In essence, the kill function is the means by which the genetic algorithm can be manipulated to control convergence. The kill function will destroy the chromosome when the age age exceeds the predetermined lifetime determined by the linear allocation function. 4. Application to Ideal Enzyme System To identify the effect of the genetic operators and reproduction rate of the genetic algorithm, two ideal enzymes were modeled. The models have been intentionally simplified. The effect of temperature on the experiments is calculated initially and then expected to be constant. The pH of the experiment is allowed to change throughout the reaction. Furthermore, the lifetime of the enzyme is expected to be dependent of temperature and pH only. Enzyme 1 randomly attacks the starch molecule cutting at the alpha-1,4 linkage to produce oligosaccharides similar to Reaction 1 in Section 5 below. Additional restrictions on the enzyme are not made. The activity dependency of temperature is defined as: if(temp>66) activity =activity? * (100-(110-temp))/100; else activity =activity? * TIME_DIFF*0.98; The activity dependency on pH is defined by: activity2 =activity1 *(fabs(pH-6.5)/0.5*0.15); The half-life equation is: if(pH>=4.5 && pH<=7.5 && temp>66) halflife=(-0.1125*temp+35.375); else halflife=log(temp)*25; Enzyme 2 catalyzes the release of successive maltose units by hydrolyzing 1,4-?-D-glucosidic linkages from the non-reducing end of the dextrin chain. Enzyme 2 cleaves maltose from the end of the oligosaccharide resulting in a maltose molecule and an oligosaccharide that is two dextrose polymers shorter. This reaction is depicted in Reaction 2 in Section 5 below. The activity dependent of temperature is defined as: if(temp>90) activity= activity? * (3.75*temp-312.5)/100; else activity=AALmaxact*dose*TIME_DIFF*0.10; The activity dependency on pH is defined by: activity=activity*(100-fabs(pH-6.5)/0.5*0.15); The half-life equation is: if(pH>=4.5 && pH<=7.5) halflife=(-0.1125*temp+35.375); else halflife=0; 5. Application to Maltose Production The production of maltose using the maltose producing enzyme, Genencor? Spezyme? BBA, and an amylase, Genencor? Spezyme? AAL, to catalyze the hydrolysis of the ?-1,4-glucosidic linkages in soluble starch will be investigated as an example application of this batch modeling scheme. Spezyme? AAL randomly attacks the starch molecule cutting at the alpha- 1,4 linkage to produce oligosaccharides. In maltose production, the enzyme is used to further reduce the higher sugars to smaller oligosaccharides and ensure yellow color with iodine test. The chemical reaction is depicted in Reaction 1. Spezyme? BBA catalyzes the release of successive maltose units by hydrolyzing 1,4-?-D-glucosidic linkages from the non-reducing end of the dextrin chain and long chain oligosaccharides. BBA cleaves maltose from the end of the oligosaccharide resulting in a maltose molecule and an oligosaccharide that is two dextrose polymers shorter. BBA is unable to pass through the 1,6-?-D- glucosidic linkage. Reaction 2 explains the chemical reaction of BBA. Reaction 1: chemical reaction of AAL: where 35?k?l,m,n and p? 2 and n+p=m cleavage occurs only at 1,4-?-D-glucosidic linkages Reaction 2: chemical reaction of BBA: where l+m=n, 35?n, l>0, m>0 maltose cleavage cannot occur at the 1,6-?-D-glucosidic linkage For each enzyme, the parameters of interest are pH and temperature. The temperature is considered in the initial activity, but remains constant throughout the simulation. Activity dependencies on pH occur throughout the simulation. The application of this model by industry allows other parameters to be considered to be constant. Application of the model to other enzymatic reactions may find sensitivity analysis an aid in the determination of the parameters of interest which should be included in the simulation. [Carmichael 1997]. The kinetic equations for these specific enzymes are proprietary information and will not be published here. The equations were used for experimental test runs similar to the ideal enzyme model defined in Section 4. 6. Results The results of the ideal enzyme system in determination of the best fit chromosome convergence can be seen in Figure 6. In this example, the genetic algorithm must conform to the desired time. The algorithm is not allowed to stop the simulation when the desired profile has resulted, but must continue through the completion of the dosing scheme defined in the chromosome. Figure 6 Figure 7 Figure 7 can be achieved when the genetic algorithm is allowed to stop the simulation with the desired profile. A considerable reduction in costs can result since continuing the experiment would only increase the costs. However, ensuring a timely order of batch processes in the industrial setting is commonly desired. Other costs may be incurred in an industrial facility if batch order is disrupted. Figure 8 Figure 9 The production of 65% maltose using Genencor? Spezyme? BBA and Genencor? Spezyme? AAL was optimized in the same manner. Figure 8 and Figure 9 are graphs of the results when requiring time conformance and allowing the optimization to manipulate the time, respectively. The resulting best fit chromosome compares well with the dosing scheme currently used in maltose production. These results are not presented and are considered proprietary. The optimization identifies several opportunities. First, the solution space is extremely large. The solution space directly affects the convergence of this system. Finally, the large number of individuals that are allowed to survive from one generation to the next requires a considerable amount of computation time. The large number of individuals are required to ensure a local optimum is not reached. Figure 9 represents a population having a best fit which has stabilized and an average fitness slowly converging to the best fit. It would be expected that the convergence should be slow to ensure that multiple areas of the solution space are searched decreasing the probability that a local solution will result. 7. Discussion This project is intended to provide a general paradigm by which to model enzymatic batch reactions in constant volume batch reactors. Gonzalez-Tello [1996] has outlined methods which can be used to obtain kinetic equations for enzymatic hydrolysis of biopolymers to be applied to other examples using this model. Specifically, it is anticipated that the ability to map an enzyme's activity and function as the activity of a "processor" is a valuable tool for multiple enzyme dosing studies and investigations. Furthermore, the effect of competing enzymes is a natural progression of this paradigm through the competition of the processor(s). Finally, the optimization of a dosing scheme using a genetic algorithm allows for convergence of the system to a minimal cost. However, the paradigm's dependence on enzyme reaction rates is limiting. The results of the paradigm can be expected to be only as good as the mathematical model of each enzyme. Furthermore, multiple variable dependencies exist in many enzyme systems that are difficult to model. This paradigm would allow for these dependencies to be expressed if this mathematical data was available for the enzymes of interest. The two methods of simulating the evaluate function, explained in Section 3.6, have important consequences. The advantage of Functional Unit I is the similarity between real chemical properties and the direct representation of enzymes as parallel entities in the simulation. However, the simulation does not complete. This is believed to be due to operating system algorithms used for paging. While Functional Unit II does not have the direct representation of the enzymes compared to Functional Unit I, the simulation does not have the same shortfalls. In conclusion, the ability to generate initial parameter estimations in a multiple dosing schema without repeated wet chemistry methods seems feasible if the kinetics of the enzyme are known. Implementation of this project was completed on a SGI Power Challenge XL located at the Weeg Center, University of Iowa, Iowa City and purchased by from a grant from the NSF and financial support of the University of Iowa. Sources: Charmichael, G.R., Sandu, A., Potra, F.A. (1997) Sensitivity Analysis for Atmospheric Chemistry Models via Automatic Differentiation. Atmospheric Environment, 31, 475-489 Hansen, B. (1995) Studies in Parallel Programming: Parallel Program Paradigms. Prentice Hall, Englewoods Cliffs, NJ. Genencor International? Spezyme? AA-L. (1997) Product Specification Sheet Genencor International? Spezyme? BBA 1500. (1997) Product Specification Sheet Glowinski, J., Stocki, J. (1981) Estimation of Kinetic Parameters-Initial Guess Generation. American Institute of Chemical Engineers Journal, 27, 1041- 1043 Gonzales-Tello, P., Camacho, F., Jurado, E., Guadix, E.M. (1996) A Simple Method for Obtaining Kinetic Equations to Describe the Enzymatic Hydrolysis of Biopolymers. Journal of Chemical Technology and Biotechnology, 67, 286-290 Michalewics, Z. (1996) Genetic Algorithms + Data Structures = Evolution Programs. 3rd revised and extended ed., Springer-Verlag, New York, NY Moros, R., Kalies, H., Rex, H.G., Schaffarczyk, St. (1996) A Genetic Algorithm for Generating Initial Parameter Estimations for Kinetic Models of Catalytic Processes. Computer and Chemical Engineering, 20, 1257-1270 Nehete, P.N., Shah, N.K., Ramamurthy, V., Kothari, R.M. (1992) An optimized protocol for the production of high purity maltose. World Journal of Microbiology and Biotechnology, 8, 446-450 Rus, T. (1996) Parallel Programming for the Computational Sciences: Lecture Notes, Aug. 1996 Shiraishi, F., Kawakami, K., Yuasa, A., Kojima, T., Kusunoki, K. (1986) Kinetic Expression for Maltose Production from Soluble Starch by Simultaneous Use of B-Amylase and Debranching Enzymes. Biotechnology and Bioengineering, 30, 374-380 2