A stochastic process whose probabilistic characteristics may be changed (controlled) in the course of its evolution in pursuance of some objective, normally the minimization (maximization) of a functional (the control objective) representing the quality of the control. Various types of controlled processes arise, depending on how the process is specified or on the nature of the control objective. The greatest progress has been made in the theory of controlled jump (or stepwise) Markov processes and controlled diffusion processes when the complete evolution of the process is observed by the controller. A corresponding theory has also been developed in the case of partial observations (incomplete data).
This is a controlled stochastic process with continuous time and piecewise-constant trajectories whose infinitesimal characteristics are influenced by the choice of control. For the construction of such a process the following are usually specified (cf. [1], [2]): 1) a Borel set of states; 2) a Borel set
of controls, a set
of controls admissible when the process is in state
, where
and
(
denotes the
-algebra of Borel subsets of the Borel set
), and in some cases a measurable selector
for
; 3) a jump measure in the form of a transition function
defined for
,
,
, and
, such that
is a Borel function in
for each
and
is countably additive in
for each
; moreover
is bounded,
for
and
. Roughly speaking,
is the probability that the process jumps into the set
in the time interval
when
and control action
is applied.
Let be the space of all piecewise-constant right-continuous functions
with values in
, let
(
) be the minimal
-algebra in
with respect to which the function
is measurable for
(for
) and let
. Any function
on
with values
that is progressively measurable relative to the family
is called a (natural) strategy (or control). From the definition it follows that
, where
. If
, where
is a Borel function on
with the property
, then
is said to be a Markov strategy or Markov control, and if
, it is a stationary strategy or stationary control. The classes of natural, Markov and stationary strategies are denoted by
,
and
, respectively. In view of the possibility of making a measurable selection from
, the class
(and hence
and
) is not empty. If
is bounded, then for any
and
one can construct a unique probability measure
on
such that
and for any
,
,
![]() | (1a) |
![]() |
![]() | (1b) |
![]() |
where is the first jump time after
,
is the minimal
-algebra in
containing
and relative to which
is measurable,
for
and
for
. The stochastic process
is a controlled jump Markov process. The control Markov property of a controlled jump Markov process means that from a known "present"
, the "past"
enters in the right-hand side of (1a)–(1b) only through the strategy
. For an arbitrary strategy
the process
is, in general, not Markovian, but if
, then one has a Markov process, while if
and if
does not depend on
, one has a homogeneous Markov process with jump measure equal to
. Control of the process consists of the selection of a strategy from the class of strategies.
A typical control problem is the maximization of a functional
![]() | (2) |
where and
are bounded Borel functions on
and on
, and
is a fixed number. By defining suitable functions
and
and introducing fictitious states, a wide class of functionals containing terms of the form
, where
is the moment of jump, and allowing termination of the process, can be reduced to the form (2). By a value function one denotes the function
![]() | (3) |
A strategy is called
-optimal if
for all
, and
-a.e.
-optimal if this is true for almost-all
relative to the measure
on
. A
-optimal strategy is called optimal. Now, in the model described above the interval
is shortened to
, and the symbols
,
and
are used in the sense in which
,
and
were used before. Considering the jumps of the process as the succession of steps in a controlled discrete-time Markov chain one can establish the existence of a
-a.e.
-optimal strategy in the class
and obtain the measurability of
in the form:
is an analytic set. This allows one to apply the ideas of dynamic programming and to derive the relation
![]() | (4) |
where ,
(a variant of Bellman's principle). For
one obtains from (4) and (1a)–(1b) the Bellman equation
![]() | (5) |
![]() |
The value function is the only bounded function on
, absolutely continuous in
and satisfying (5) and the condition
. Equation (5) may be solved by the method of successive approximations. For
if follows from the Kolmogorov equation for the Markov process
that if the supremum in (5) is attained by a measurable function
, then the Markov strategy
is optimal. In this way the existence of optimal Markov strategies in semi-continuous models (in which
,
,
, and
satisfy the compactness and continuity conditions of the definition) is established, in particular for finite models (with finite
and
). In arbitrary Borel models one can conclude the existence of
-a.e.
-optimal Markov strategies for any
by using a measurable selection theorem (cf. Selection theorems). In countable models one obtains Markov
-optimal strategies. The results can partly be extended to the case when
and the functions
and
are unbounded, but in general the sufficiency of Markov strategies, i.e. optimality of Markov strategies in the class
, has not been proved.
For homogeneous models, where and
do not depend on
, one considers along with (2) the functionals
![]() | (6) |
![]() | (7) |
and moreover poses the question of sufficiency for the class . If
and the Borel function
is bounded, then equation (5) for the functional (6) becomes
![]() | (8) |
This equation coincides with Bellman's equation for the analogous problem with discrete time and it has a unique bounded solution. If the supremum in (8) is attained for ,
, then
is optimal. The results on the existence of
-optimal strategies in the class
, analogous to those mentioned above, can also be obtained. For the criterion (7) complete results have been obtained only for finite and special forms of ergodic controlled jump Markov processes and similar cases of discrete time: one can choose
and a function
in
such that
is optimal for the criterion (2) at once for all
, and hence optimal for the criterion (7).
This is a continuous controlled random process in a -dimensional Euclidean space
, admitting a stochastic differential with respect to a certain Wiener process which enters exogenously. The theory of controlled diffusion processes arose as a generalization of the theory of controlled deterministic systems represented by equations of the form
,
, where
is the state of the system and
is the control parameter.
For a formal description of controlled diffusion processes one uses the language of Itô stochastic differential equations. Let be a complete probability space and let
be an increasing family of complete
-algebras
contained in
. Let
be a
-dimensional Wiener process relative to
, defined on
for
(i.e. a process for which
is the
-dimensional continuous standard Wiener process for each
, the processes
are independent,
is
-measurable for each
, and for
the random variables
are independent of
). Let
be a separable metric space. For
,
,
two functions
,
are assumed to be given, where
is a
-dimensional matrix and
is a
-dimensional vector. Assume that
are Borel functions in
satisfying a Lipschitz condition for
with constants not depending on
and such that
and
are bounded. An arbitrary process
,
,
, progressively measurable relative to
and taking values in
, is called a strategy (or control);
denotes the set of all strategies. For every
,
,
, there exists a unique solution of the Itô stochastic differential equation
![]() | (9) |
![]() |
(Itô's theorem). This solution, denoted by , is called a controlled diffusion process (controlled process of diffusion type); it is controlled by selection of the strategy
. Besides strategies in
one can consider other classes of strategies. Let
be the space of continuous functions on
with values in
. The semi-axis
may be interpreted as the set of values of the time
. Elements of
are denoted by
. Further, let
be the smallest
-algebra of subsets of
relative to which the coordinate functions
for
in the space
are measurable. A function
with values in
is called a natural strategy, or natural control, admissible at the point
if it is progressively measurable relative to
and if for
there exists at least one solution of the equation (9) that is progressively measurable relative to
. The set of all natural strategies admissible at
is denoted by
, its subset consisting of all natural strategies of the form
is denoted by
and is called the set of Markov strategies, or Markov controls, admissible at the point
. One can say that a natural strategy defines an equation at the moment of time
on the basis of the observations of the process
on the time interval
, and that a Markov strategy defines an equation on the basis of observations of the process only at the moment of time
. For
(even for
) the solution of (9) need not be unique. Therefore, for every
,
,
, one arbitrarily fixes some solution of (9) and denotes it by
.
Then, using the formula , one defines an imbedding
for which
(a. e.).
The aim of the control is normally to maximize or minimize the expectation of some functional of the trajectory . A general formulation is as follows. On
let Borel functions
,
be defined, and let a Borel function
be defined on
. For
,
,
one denotes by
the first exit time of
from
, and puts
![]() | (10) |
where the indices to the expectation sign mean that they should be introduced under the expectation sign as needed. There arises then the problem of determining a strategy
maximizing
, and of determining a value function
![]() | (11) |
A strategy for which
is called
-optimal for the point
. Optimal means a
-optimal strategy. If in (11) the set
is replaced by
(
), then the corresponding least upper bound is denoted by
(
). Since one has the inclusion
, it follows that
. Under reasonably wide assumptions (cf. [3]) it is known that
(this is so if, e.g.,
are continuous in
, continuous in
uniformly in
for every
and if
are bounded in absolute value by
for all
, where
do not depend on
). The question of the equality
in general situations is still open. A formal application of the ideas of dynamic programming reduces this to the so-called Bellman principle:
![]() | (12) |
![]() |
where are arbitrarily defined stopping times (cf. Markov moment) not exceeding
. If in (12) one replaces
by
, and applies Itô's formula to
, then after some non-rigorous arguments one arrives at the Bellman equation:
![]() | (13) |
where
![]() | (14) |
![]() |
and where the indices are assumed to be summed from 1 to
; the matrix
is defined by
![]() |
Bellman's equation plays a central role in the theory of controlled diffusion processes, since it often turns out that a sufficiently "good" solution of it, equal to on
, is the value function, while if
for every
realizes the least upper bound in (13) and
is a Markov strategy admissible at
, then the strategy
is optimal at the point
. Thus one can sometimes show that
.
A rigorous proof of such results meets with serious difficulties, connected with the non-linear character of equation (13), which in general is a non-linear degenerate parabolic equation. The simplest case is that in which (13) is a non-degenerate quasi-linear equation (the matrix does not depend on
and is uniformly non-degenerate in
). Here, under certain additional restrictions on
,
,
,
,
,
one can make use of results from the theory of quasi-linear parabolic equations to prove the solvability of (13) in Hölder classes of functions and to give a method for constructing
-optimal strategies, based on a solution of (13). An analogous approach can be used (cf. [3]) in the one-dimensional case when
,
,
,
,
,
are bounded and do not depend on
, and
is uniformly bounded away from zero. In this case (13) reduces to a second-order quasi-linear equation on
, such that
and (13) can be solved for its highest derivative
. Methods of the theory of differential equations help in the study of (13) even if
, where
is a two-dimensional domain, and
,
,
,
,
do not depend on
(cf. [3]). Here, as in previous cases,
is allowed to depend on
. It is relevant also to mention the case of the Hamilton–Jacobi equation
, which may be studied by methods of the theory of differential equations (cf. [5]).
By methods of the theory of stochastic processes one can show that the value function satisfies equation (13) in more general cases under certain types of smoothness assumptions on
,
,
,
,
if
,
(cf. [3]).
Along with problems of controlled motion, one can also consider optimal stopping of the controlled processes for one or two persons, e.g. maximization over and an arbitrary stopping time
of a value functional of the form:
![]() |
![]() |
Related to the theory of controlled diffusion processes are controlled partially-observable processes and problems of control of stochastic processes, in which the control is realizable by the selection of a measure on from a given class of measures, corresponding to processes of diffusion type (cf. [3], [4], [6], [7], [8]).
[1] | I.I. [I.I. Gikhman] Gihman, A.V. [A.V. Skorokhod] Skorohod, "Controlled stochastic processes" , Springer (1977) (Translated from Russian) |
[2] | A.A. Yushkevich, "Controlled jump Markov models" Theory Probab. Appl. , 25 (1980) pp. 244–266 Teor. Veroyatnost. i Primenen. , 25 (1980) pp. 247–270 |
[3] | N.V. Krylov, "Controlled diffusion processes" , Springer (1980) (Translated from Russian) |
[4] | W.H. Fleming, R.W. Rishel, "Deterministic and stochastic optimal control" , Springer (1975) |
[5] | S.N. Kruzhkov, "Generalized solutions of the Hamilton–Jacobi equations of eikonal type. I. Statement of the problem, existence, uniqueness and stability theorems, some properties of the solutions" Mat. Sb. , 98 : 3 (1975) pp. 450–493 (In Russian) |
[6] | R.S. Liptser, A.N. Shiryaev, "Statistics of random processes" , 1–2 , Springer (1977–1978) (Translated from Russian) |
[7] | W.M. Wonham, "On the separation theorem of stochastic control" SIAM J. Control , 6 (1968) pp. 312–326 |
[8] | M.H.A. Davis, "The separation principle in stochastic control via Girsanov solutions" SIAM J. Control and Optimization , 14 (1976) pp. 176–188 |
The Bellman equations mentioned above (equations (5), (13)) are in this form sometimes called the Bellman–Hamilton–Jacobi equation.
A controlled diffusion process is also defined as a controlled random process, in some Euclidean space, whose measure admits a Radon–Nikodým derivative with respect to a certain Wiener process which is independent of the control.
There are many important topics in the theory of controlled stochastic process other than those mentioned above. Controlled stepwise (jump) processes are of limited interest due to the lack of important applications. The following comments are intended to put the subject in a wider perspective, as well as pointing to some recent technical innovations.
These are normally specified by a state transition equation
![]() | (a1) |
Here is the state at time
,
is the control and
is a given sequence of independent, identically distributed random variables with common distribution function
. If the initial state
is independent of
and the control
is Markovian, i.e.
,
, then the process
defined by
is Markovian. The control objective is normally to minimize a cost function such as
![]() |
The number of stages may be finite or infinite;
is the discount factor. The one-stage cost with terminal cost
is
![]() |
![]() |
Define
![]() |
then the principle of dynamic programming indicates that
![]() |
where , etc., and that the optimal control
is the value such that
![]() |
In the infinite horizon case () one expects that if
, then
![]() |
and that will satisfy Bellman's functional equation
![]() |
The general theory of discrete-time control concerns conditions under which results of the type above can be rigorously substantiated. Generally, contraction (,
) or monotonicity (
) conditions are required.
is not necessarily measurable if
,
,
are merely Borel functions. However, if these functions are lower semi-analytic, then
is lower semi-analytic and existence of
-optimal universally measurable policies can be proved. [a1], [a2] are excellent references for this theory.
Return to the controlled diffusion problem (9), (10) and write the Bellman equation (a1) as
![]() | (a2) |
where ,
and
coincides with the left-hand side of (13). As pointed out in the main article, it is a difficult matter to decide in which sense, if any, the value function, defined by (12), satisfies (a2). The concept of viscosity solutions of the Bellman equation, introduced for first-order equations in [a3], provides an answer to this question. A function
is a viscosity solution of (a2) if for all
,
![]() |
![]() |
Note that any solution of (a2) is a viscosity solution and that if a viscosity solution is
at some point
, then (a2) is satisfied at
. It is possible to show in great generality that if the value function
of (12) is continuous, then it is a viscosity solution of (a1). [a14] can be consulted for a proof of this result, conditions under which
is continuous and other results on uniqueness and regularity of viscosity solutions.
The theory of controlled diffusion is intimately connected with partial differential equations. However the most general results on existence of optimal controls and on stochastic maximum principles (see below) can be obtained by purely probabilistic methods. This is described below for the diffusion model (9) where does not depend on
and
is uniformly positive definite. In this case a weak solution of (9) can be defined for any feedback control
; denote by
the set of such controls and by
the expectation with respect to the sample space measure for
when
. Suppose that the pay-off (see also Gain function) to be maximized is
![]() |
where is a fixed time. Define
![]() |
where . Let a scalar process
be defined by
![]() |
Thus, is the maximal expected total pay-off given the control chosen and the evolution of the process up to time
. It is possible to show that
is always a supermartingale (cf. Martingale) and that
is a martingale if and only if
is optimal.
has the Doob–Meyer decomposition
, where
is a martingale and
is a continuous increasing process. Thus
is optimal if and only if
. By the martingale representation theorem (cf. Martingale),
can always be written in the form
![]() |
where is the Wiener process appearing in the weak solution of (9) with control
. It is easily shown that
does not depend on
and that the relation between
and
for
is
![]() |
where
![]() |
This immediately gives a maximum principle: if is optimal, then
; but
is increasing, so it must be the case that
a.e., which implies that
![]() | (a3) |
One also gets an existence theorem: Since is the same for all controls one can construct an optimal control
by taking
![]() |
Similar techniques can be applied to very general classes of controlled stochastic differential systems (not just controlled diffusion) and to optimal stopping and impulse control problems (see below). General references are [a5], [a6]. Some of this theory has also been developed using methods of non-standard analysis [a7].
The necessary condition (a3) is not as it stands a true maximum principle because the "adjoint variable" is only implicitly characterized. It is shown in [a8] and elsewhere that under wide conditions
is given by
![]() |
where is an optimal control and
is the fundamental solution of the linearized or derivative system corresponding to (9) with control
, i.e. it satisfies
![]() |
![]() |
This gives the stochastic maximum principle in a form which is directly analogous to the Pontryagin maximum principle of deterministic optimal control theory.
In many important applications, control is not exercized continuously, but rather a sequence of "interventions" is made at isolated instants of time. The theory of impulse control is a mathematical formulation of this kind of problem. Let be a homogeneous Markov process on a state space
, where
(the set of right-continuous
-valued function having limits from the left). Let
be the corresponding semi-group:
. Informally, a controlled process
is defined as follows. A strategy
is a sequence
of random times
and states
, with
strictly increasing.
starts at some fixed point
and follows a realization of
up to time
. At
the position of
is moved to
, then follows a realization of
starting at
up to time
; etc. A filtered probability space
carrying
is constructed in such a way that
is adapted to
(cf. Optional random process) and for each
,
is a
stopping time and
is
-measurable. It is convenient to formulate the optimization problem in terms of minimizing a cost function
, which generally takes the form
![]() |
Suppose that and
; this rules out the strategies having more than a finite number of interventions in bounded time intervals. The value function is
![]() |
Define the operator by
![]() |
When is compact and
is a Feller process it is possible to show [a9] that
is continuous and that
is the largest continuous function satisfying
![]() | (a4) |
![]() | (a5) |
The optimal strategy is:
![]() |
![]() |
Thus, the state space divides into a continuation set, where
, and an intervention set, where
. Further,
is the unique solution of
![]() | (a6) |
where the infimum is taken over the set of stopping times
. This shows the close connection between impulse control and optimal stopping: (a6) is an optimal stopping problem for the process
with implicit obstacle
. Similar results are obtained for right processes in [a6], [a10]; the measurability properties here are more delicate. There is also well-developed analytic theory of impulse control. Assuming
, where
is the differential generator of
, one obtains from (a5)
![]() | (a7) |
Further, equality holds in at least one of (a4), (a7) at each , i.e.
![]() | (a8) |
Equations (a4), (a7), (a8) characterize and have been extensively studied for diffusion processes (i.e. when
is a second-order differential operator) using the method of quasi-variational inequalities [a11]. Existence and regularity properties are obtained.
Many applied problems in operations research — for example in queueing systems or inventory control — involve optimization of non-diffusion stochastic models. These are generalizations of the jump process described in the main article allowing for non-constant trajectories between jumps and for various sorts of boundary behaviour. There have been various attempts to create a unified theory for such problems: piecewise-deterministic Markov processes [a12], Markov decision drift processes [a13], [a14]. Both continuous and impulse control are studied, as well as discretization methods and computational techniques.
This subject is still far from completely understood, despite important recent advances. It is closely related to the theory of non-linear filtering. Consider a controlled diffusion as in (9), where control must be based on observations of a scalar process given by
![]() | (a9) |
( is another independent Wiener process and
is, say, bounded), with a pay-off functional of the form
![]() |
is to be maximized. This problem can be formulated in the following way. Let be independent Wiener processes on some probability space
and let
be the natural filtration of
. The admissible controls
are all
-valued processes
adapted to
. Under standard conditions (9) has a unique strong solution for
. Now define a measure
on
by
![]() |
By Girsanov's theorem, is a probability measure and
is Wiener process under measure
. Thus
satisfy (9), (a9) on
. For any function
, put
. According to the Kallianpur–Striebel formula,
where 1 denotes the function
and
![]() |
can be thought of as a non-normalized conditional distribution of
given
; it satisfies the Zakai equation
![]() | (a10) |
where is given by (14) with
. It follows from the properties of conditional mathematical expectation that
can be expressed in the form
![]() |
where and
. This shows that the partially-observed problem (9), (a9) is equivalent to a problem (a10),
with complete observations on the probability space where the controlled process is the measure-valued diffusion
. The question of existence of optimal controls has been extensively studied. It seems that optimal controls do exist, but only if some form of randomization is introduced; see [a7], [a15], [a16]. In addition, maximum principles have been obtained [a17], [a18] and some preliminary study of the Bellman equation undertaken [a19].
[a1] | D.P. Bertsekas, S.E. Shreve, "Stochastic optimal control: the discrete-time case" , Acad. Press (1978) |
[a2] | E.B. Dynkin, A.A. Yushkevich, "Controlled Markov processes" , Springer (1979) |
[a3] | M.G. Crandall, P.L. Lions, "Viscosity solutions of Hamilton–Jacobi equations" Trans. Amer. Math. Soc. , 277 (1983) pp. 1–42 |
[a4a] | P.L. Lions, "Optimal control of diffusion processes and Hamilton–Jacobi–Bellman equations Part I" Comm. Partial Differential Eq. , 8 (1983) pp. 1101–1134 |
[a4b] | P.L. Lions, "Optimal control of diffusion processes and Hamilton–Jacobi–Bellman equations Part II" Comm. Partial Differential Eq. , 8 (1983) pp. 1229–1276 |
[a5] | R.J. Elliott, "Stochastic calculus and applications" , Springer (1982) |
[a6] | N. El Karoui, "Les aspèctes probabilistes du contrôle stochastique" , Lect. notes in math. , 876 , Springer (1980) |
[a7] | S. Albeverio, J.E. Fenstad, R. Høegh-Krohn, T. Lindstrøm, "Nonstandard methods in stochastic analysis and mathematical physics" , Acad. Press (1986) |
[a8] | U.G. Hausmann, "A stochastic maximum principle for optimal control of diffusions" , Pitman (1986) |
[a9] | M. Robin, "Contrôle impulsionel des processus de Markov" , Univ. Paris IX (1978) (Thèse d'Etat) |
[a10] | J.P. Lepeltier, B. Marchal, "Théorie générale du contrôle impulsionnel Markovien" SIAM. J. Control and Optimization , 22 (1984) pp. 645–665 |
[a11] | A. Bensoussan, J.L. Lions, "Impulse control and quasi-variational inequalities" , Gauthier-Villars (1984) |
[a12] | M.H.A. Davis, "Piecewise-deterministic Markov processes: a general class of non-diffusion stochastic models" J. Royal Statist. Soc. (B) , 46 (1984) pp. 353–388 |
[a13] | F.A. van der Duyn Schouten, "Markov decision drift processes" , CWI , Amsterdam (1983) |
[a14] | A.A. Yushkevich, "Continuous-time Markov decision processes with intervention" Stochastics , 9 (1983) pp. 235–274 |
[a15] | W.H. Fleming, E. Paradoux, "Optimal control for partially-observed diffusions" SIAM J. Control and Optimization , 20 (1982) pp. 261–285 |
[a16] | V.S. Borkar, "Existence of optimal controls for partially-observed diffusions" Stochastics , 11 (1983) pp. 103–141 |
[a17] | A. Bensoussan, "Maximum principle and dynamic programming approaches of the optimal control of partially-observed diffusions" Stochastics , 9 (1983) pp. 169–222 |
[a18] | U.G. Haussmann, "The maximum principle for optimal control of diffusions with partial information" SIAM J. Control and Optimization , 25 (1987) pp. 341–361 |
[a19] | V.E. Beneš, I. Karatzas, "Filtering of diffusions controlled through their conditional measures" Stochastics , 13 (1984) pp. 1–23 |
[a20] | D.P. Bertsekas, "Dynamic programming and stochastic control" , Acad. Press (1976) |
[a21] | H.J. Kushner, "Stochastic stability and control" , Acad. Press (1967) |
[a22] | C. Striebel, "Optimal control of discrete time stochastic systems" , Lect. notes in econom. and math. systems , 110 , Springer (1975) |
[a23] | P.L. Lions, "On the Hamilton–Jacobi–Bellmann equations" Acta Appl. Math. , 1 (1983) pp. 17–41 |
[a24] | M. Robin, "Long-term average cost control problems for continuous time Markov processes. A survey" Acta Appl. Math. , 1 (1983) pp. 281–299 |