Pathway Modeling Glossary
Because bioinformatics is a multi-disciplinary endeavor, a lot of terminology is likely to be unfamiliar to most newcomers to the field. Here is a basic primer for the vocabulary we use, including terms from biology, chemistry, mathematics and computer science, as well as terms peculiar to our own project and the BioSPICE community.- apoptosis: Programmed cell death. A well-regulated process of cell suicide, through the activation of enzymes ("caspases") that break down cell proteins.
- basal parameter set: The basal parameters for a model, usually for the "wild type" variant. The parameter values for mutants are derived from the basal set, and those derived parameters are often defined in terms of differences from the basal set.
- bifurcation analysis: Systems of algebraic or ordinary differential equations can have multiple solutions, depending on the values of the parameters in the equations. Parameter combinations where the number (or stability) of these solutions changes, form (n-1)-dimensional manifolds in the system's n-dimensional parameter space. Bifurcation theory is the study of the geometry of these manifolds.
- BioPack: A simulator used in JigCell. The heart of BioPack is the widely used ODE solver LSODAR.
- BioSPICE: A DARPA project that has had a major impact on the systems biology community by funding projects to develop new software and models, and also to develop standards for model representation and software interoperability.
- budding yeast: The yeast commonly used for baking and brewing. Reproduces by a budding process. A favorite organism for genetic characterization of cell physiology.
- caspase: An enzyme that breaks down cell protiens during apoptosis.
- cell cycle: The sequence of events by which a growing cell replicates all its components and divides them more-or-less evenly between two daughter cells, so that each daughter contains all the information and machinery necessary to repeat the process.
- CellML: A language for describing biological models. Similar in some ways to SBML, though it attempts to capture a wider modeling domain.
- circadian rhythms: Physiological rhythms, with a period close to 24 hours, that persist under conditions of constant light, temperature, feeding, etc.
- comparator: A JigCell software component that supports assessment of the quality of a model by quantitative comparison between experimental data and numerical results from an ensemble of simulation runs.
- computational cell biology: Computational methods for addressing important questions in cell biology (e.g., metabolism, growth and division, motility, signaling), based on mathematical representations of the component processes.
- conserved attribute, behavior, gene, etc.: A molecule or physiological property that seems to have been conserved (little changed) in the course of evolution. For instance, the gene encoding cyclin-dependent kinase-1, the "master regulator" of the eukaryotic cell cycle, has changed very little over 1 billion years of evolution, as evidence by the fact that a mammalian CDK1 gene can rescue otherwise-lethal mutations of the CDK1 gene in yeast cells.
- conserved quantity: A time-independent functional relation in a dynamical system. The function f(x1,x2,...) is a conserved quantity if df/dt = Pf/Px1 * dx1/dt + Pf/Px2 * dx2/dt + ... = 0.
- dashboard: The name given by BioSPICE to the frontend to the BioSPICE software. Alternatively, any integrated frontend used by software related to BioSPICE (for example, the JigCell Project Manager is sometimes referred to as a "dashboard" in this way).
- differential equations: Equations involving derivatives, for example, the time-rate of change of a protein concentration in a cell. The solution of a differential equation is a function, for example, the protein's concentration as a function of time. The phenomenon that a pathway model most typically tries to match is a time series for some species concentration. Thus, most pathway modeling involves some form of differential equation, often augmented with some sort of discrete switching structure and/or algebraic equations.
- ensemble: A collection of simulations run on a particular model to reproduce a variety of physiological conditions, for example, a set of biochemical experiments carried out in a frog egg extract or a set of mutant phenotypes of budding yeast.
- enzyme: A protien able to initiate or accelerate a chemical reaction.
- experiment: In our context, a physical scientific experiment such as the controlled manipulaton of cells in order to make precise observations or measurements of their properties. To keep the distinction clear, we avoid speaking of a "numerical experiment", preferring "computer simulation" instead. The output of an experiment is experimental data.
- experimental data: The measurements or precise observations produced by an experiment. In our context, these consitute the phenomenological information that a pathway model is intended to match. Most convenient for modeling purposes are experimental measurements of the time courses of concentrations of key molecular components in the mechanism under consideration. More commonly, the observations are indirect indications of the effects of these components, for example, the velocity of motion of a cell, or the fraction of cells that die in response to a certain treatment, or the phase of the cell cycle at which a mutant cell arrests.
- flow: The state of mind of a worker engaged in creative action without conscious attention to the tools involved in performing the action. A goal of good software design is to enable users to enter a flow state when using the software so that they can focus on their tasks rather then their tools.
- frog egg extract: A preparation of cytoplasm made by crushing hundreds of frog eggs and centrifuging the mixture to separate the soluble components of the cell (enzymes, metabolites, mRNA, ribosomes, etc.) from membranes, yolk granules, etc. The cytoplasmic extract contains all protein components of the cell cycle cycle control system and will recapitulate cycles of DNA synthesis and mitosis.
- gene: A unit of inheritance. A sequence of nucleic acids in an organism's DNA that influences some physiological characteristic of the organism. Most commonly, the gene sequence encodes an amino acid sequence, which folds up into a functional protein, but all organisms have a few genes that encode RNA sequences (ribosomal and transfer RNA genes). In addition to the "coding" sequences, a gene also includes non-coding sequences that control in one way or another the expression of the coding region.
- gene expression: The rate at which a gene is used to produce functional RNA transcripts (messenger RNA, ribosomal RNA, transfer RNA).
- gene network: The conceptual network of direct cause-and-effect relationships among gene expression events. For example, expression of Gene 1 causes increased expression of Gene 2 and decreased expression of Gene 3, etc. The notion of "direct causation" is not well defined. In the simplest case, Gene 1 encodes a "transcription factor" that binds to sequences in Gene 2 and promotes expression of Gene 2. In another case, Gene 1 may encode an enzyme that phosphorylates and inactivates a transcription factor necessary for efficient expression of Gene 3.
- hybrid models: Mathematical models that contain two sorts of mathematical equations. For example, ordinary differential equations + stochastic differential equations, or ordinary differential equations + discrete switching networks.
- initial condition: To solve an ordinary differential equation of the form dx/dt = f(x,t), for t >= 0, one must specify the starting value of x at t = 0. This value, x(0) = a, is called the initial condition.
- interface: A computer science term commonly used for two different purposes. A "user interface" is the means by which a human communicates with the computer program, by some combination of keystrokes and mouse clicks as organized by screen prompts (blinkers, icons, spreadsheets, wizards, etc.). (A "graphical user interface," GUI, is a user interface that relies more on point-and-click devices than on keystrokes.) An "application program interface" (API) is a collection of function definitionss and their associated parameters for accessing a software component from another software component. To avoid confusion, one should refer specifically to "user interface," GUI ("gooey") or "API".
- interface metaphor: A specific thematic approach to a graphical user interface. For example, a model builder is a software component for entering models. Some model builders use a spreadsheet metaphor for the user interface, meaning that the user interface attempts to look and behave like a typical spreadsheet system familiar to most computer users. An alternative interface metaphor for model builders is a graphical sketchpad by which the user draws a wiring diagram with boxes and arrows.
- interface paradigm: Another term for an interface metaphor.
- JigCell: The problem solving environment developed and used by the Virginia Tech Team for doing pathway modeling.
- DNA microarray: An experimental tool for obtaining high-throughput gene expression data. At each position in an array of locations on a glass slide is spotted a sample of identical DNA molecules (a "probe") whose sequence is complementary to a messenger RNA sequence uniquely determined by a specific gene of a particular organism. A sample of messenger RNA sequences from the cells or tissues of the organism are hybridized to the probes, and the amount of RNA that binds to each probe is measured. By comparing a treated sample to a control sample, one can quantify which genes are expressed a great deal more or a great deal less in the treated sample relative to the control.
- model: A mathematical representation of the molecular machinery that is thought to carry out some interesting physiological behavior of an organism. For example, the molecular mechanism that generates 24 hour rhythms, or that mechanism that relays signals from hormone binding on the cell surface to altered gene expression in the cell's nucleus.
- model builder: A software component that helps a user to enter a model into the computer.
- mutant: An organism that has undergone a genetic change from the "wild type." For example, the nucleic acid sequence of a gene may be altered so that the gene produces a protein of altered functionality (less or more activity). In a "deletion mutant," a specific gene has been removed from the genome. In the context of pathway modeling, it is a variation on the wild type (or reference) organism. Within a model, a mutant typically has the effect of defining the values for specific parameters or equations.
- objective function: A criterion (function) to be optimized. For example, we seek the maximum (or minimum) value of some function G(x,y,z) over a domain a <= x <= b, c <= y <= d, e <= z <= f.
- ordinary differential equation (ODE), system of: A set of differential equations for one or more dependent variables, x(t), y(t), z(t)..., as functions of a single independent variable, t.
- optimization: The act or process of minimizing (or maximizing) an objective function f(x) for x a member of S. S is called the feasible set. Strictly speaking, this is "global optimization," which is a difficult problem with no general, practical, reliable numerical method. "Local optimization," to find a maximum or minimum that is simply better than any value of f nearby, is a much easier problem computationally.
- parameter: A constant appearing in a differential equation. For example, dx/dt = -kx, x(0)=a. In this case, x(t) is the dependent variable, t the independent variable, k a parameter (rate constant), and a the initial condition.
- parameter estimation: The process of finding parameter values that fit a mathematical model to experimental data. Often the process is framed as an optimization problem, where the objective function is a measure of the "distance" between model predictions and experimental observations. For instance, we might estimate the parameter, k, in dx/dt = -kx to measurements of the disappearance of some protein X from a cell after X's proteolytic machinery is activated.
- parameter set: A collection of parameter values used for a particular simulation run. This is a misnomer, since the parameters under consideration actually define a vector, not a set.
- pathway model
- phase plane analysis: For a pair of nonlinear ODEs, dx/dt = f(x,y) and dy/dt = g(x,y), the Cartesian coordinate system (x,y) is called the phase plane. The ODEs define a vector field on the phase plane, and the characterization of the geometric properties of this vector field is called phase plane analysis. These ideas can be generalized to N dependent variables in an N-dimensional phase space. N=2 is special because the asymptotic solutions of two ODEs can be only points and Jordan curves (closed and non-self-intersecting), and, furthermore, Jordan curves divide the plane into interior and exterior regions. These properties severely constrain the geometry of vector fields in the phase plane.
- PDE: Partial Differential Equation. A set of differential equations involving partial derivatives of one or more dependent variables, u(x,y,t), v(x,y,t) ..., with respect to several independent variables, x, y, t. Typically, u, v, etc. are local concentrations of chemical species, x and y are spatial coordinates, and t is time. Modeling and simulation of PDEs is supported by Virtual Cell (U. Conn. Health Science) and XPP (U. Pittsburgh).
- project manager: Within JigCell, this is a software component whose purpose is to act as a front end to access the various JigCell components. It is modeled after so-called Integerated Developement Environments (IDEs) commonly used for computer program developement. The project manager is centered around the collection of files (a "project") needed to specify all the various aspects of a single model.
- problem solving environment: A software system that (ideally) supports a user in all phases of posing and solving a problem within a given domain, from initial problem description through analysis of the problem. JigCell is an example of a problem solving environment for the domain of reaction network modeling.
- reaction: A chemical reaction specifies the transformation of reactants (A,B,...) into products (C,D,...). In general, pA + qB + ... -> rC + sD + ..., where p,q,r,s,... are integers, called stoichiometric coefficients, that are constrained by the requirement of balancing all atoms and electric charges that appear on the two sides of the arrow. If a reaction is not at equilibrium, then it will proceed at a rate v that depends on the current concentrations of all the reagents, [A],[B],[C],[D],..., and possibly on the concentrations of catalysts (enzymes) in the reaction mixture. (v > 0 means reactants converted into products, v < 0 means products converted into reactants.)
- reaction network: A collection of chemical reactions. A reaction network is expressed most naturally as a list (for example, in a spread sheet), but it is often intuitively appealing to try to represent the network graphically in terms of boxes (reagents) and arrows (reactions). The time evolution of a reaction network is governed by a system of nonlinear ODEs, dx/dt = R.v, where x is a vector of concentrations of all the reagents in the network, v is a vector of the velocities of all the reactions in which the reagents take part (v is a nonlinear function of x), and R is a rectangular matrix of stoichiometric coefficients (one row for each reaction).
- regulatory network: A reaction network with feedback that serves to regulate the time evolution of the chemical components in response to certain classes of input signals to the network. For example, the network regulating an organism's circadian rhythm must generate temperature-compensated oscillations with period close to 24 hours in the absence of any external signals, and it must entrain to a sufficiently strong driving signal ("Zeitgeber") of period close to 24 hours.
- robustness: A characteristic of a regulatory network to generate a certain qualitative response over a broad range of parameter values. For example, the cell cycle control system should reliably drive a yeast cell through a specific sequence of events (DNA synthesis, mitosis, cell division) at a specific rate (one division for each mass doubling) over a broad range of the rate constants determining velocities of the component biochemical reactions. If this were not so, then small mutations to the control systems (small changes to rate constants) would have large (often fatal) consequences for the organism.
- run: Short for simulation run.
- run manager: A sofware component that helps a modeler construct and manage an ensemble of simulation runs that collectively define the execution of a given model.
- simulation: An abstraction of a physical phenomenon. In pathway modeling, it is umerical solution of the mathematical equations that determine the temporal evolution of a reaction network. Typically this involves the execution of a solver for a set of differential equations, which produce time series output for the relevent species.
- simulation parameters: Constants that control the operation of a simulator (as distinguished from model parameters that identify a mathematical model with a particular organism or process). Examples include the size of a time step, the number of time steps to run, and the type of integrator or equation solver to use (such as a stiff or non-stiff solver), the desired accuracy of the solver.
- simulation output: The output from a computer simulation of a model. This output is typically compared to experimental data for the organism or process being simulated to determine how good the model is. The most common form of output from our simulators is time series for each dependent variable.
- simulation run: The execution of a computer program that solves the mathematical model of a reaction network under a particular set of conditions. To simulate one of our models, the simulator needs to be given a model in the form of an SBML file, a parameter set, an initial condition set, and a simulation parameter set.
- species: A chemical entity involved in a reaction.
- stochastic differential equations (SDE): Differential equations involving random variables.
- stochastic models: Any model involving random variables.
- stoichiometry: The numerical coefficients in front of each chemical species in a balanced chemical reaction.
- systems biology: The study of the mechanisms underlying complex biological processes as integrated systems of many, diverse, interacting components. Systems biology involves (1) collection of large sets of experimental data (by high-throughput technologies and/or by mining the literature of reductionist molecular biology and biochemistry), (2) proposal of mathematical models that might account for at least some significant aspects of this data set, (3) accurate computer solution of the mathematical equations to obtain numerical predictions, and (4) assessment of the quality of the model by comparing numerical simulations with the experimental data.
- Systems Biology Markup Language (SBML): The standardized markup language we use for describing mathematical models of reaction networks.
- time series: A collection of ordered pairs (t, X(t)), where t = time and X(t) = concentration (or activity) of a particular chemical species in the reaction network being modeled. This is the typical output from a simulation run. If the experimental data is also obtained as time series of species concentrations (or activities), then the comparison of simulations to experiments is straightforward. In many circumstances, however, the experimental data are some indirect measure of the effects of these time-varying chemicals, e.g., the percentage of cells in an asynchronous population that are in G1 phase of the cell cycle.
- transform: A computer program that converts time-series simulations into experimentally observed consequences. For example, if the observation is "percentage of cells in an asynchronous population that are in G1 phase of the cell cycle", then the transform must examine how the cell cycle control molecules are varying with time (in a simulations) and compute the probability that a cell chosen randomly from an asynchronous population will be in G1 phase.
- use case: A description of the process used to perform a particular modeling task on a particular model. It is a user-centered description of the activities performed by a user to accomplish a particular goal. Contrast to a walkthrough.
- walkthrough: A sofware-centered description of the steps taken when using a particular sofware system to accomplish a particular task or series of tasks on a particular model. Walkthroughs are useful for explaining how a particular software system is meant to be used. Contrast to a use case.
- wild type: A particular genetic variant of an organism that is typically reared in the laboratory. Often this variant is derived quite directly from individuals collected from a natural habitat, and it is selected to be genetically uniform and physiologically robust. Genetic mutations are then introduced into the wild type genetic background, in order to identify the genes involved in particular biological processes.
- wiring diagram: A graphical representation of a reaction network. Commonly drawn with icons representing molecular species and arrows representing chemical reactions. This is typically the inital (qualitative) description of a model produced by a molecular biologist.
- XPP: "Phase Plane for X-windows." A software package that supports many common methods of nonlinear dynamical systems theory on sets of nonlinear evolution equations (ODEs, PDEs, SDEs, difference equations, delay-differential equations, integrodifferential equations). Phase plane analysis of pairs of noninear ODEs is particularly convenient. XPP was written by Bard Ermentrout of the Univ. of Pittsburgh and runs on any computing system with an X-server.
- XPP-Aut: XPP with Auto. Auto is a software package for numerical bifurcation analysis.