MarkovDecisionProcess.DynamicProgramming (Orbital)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Orbital library

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

orbital.algorithm.template
Class MarkovDecisionProcess.DynamicProgramming

java.lang.Object
  orbital.algorithm.template.MarkovDecisionProcess
      orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming

All Implemented Interfaces:: java.io.Serializable, AlgorithmicTemplate, EvaluativeAlgorithm, HeuristicAlgorithm

Direct Known Subclasses:: GaussSeidelDynamicProgramming, RealTimeDynamicProgramming

Enclosing class:: MarkovDecisionProcess

public abstract static class MarkovDecisionProcess.DynamicProgramming
extends MarkovDecisionProcess
implements HeuristicAlgorithm
extends MarkovDecisionProcess
implements HeuristicAlgorithm

Abstract base class for Markov decision processes solved per dynamic programming.

Author:: André Platzer
See Also:: DynamicProgramming, "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", "Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.", Serialized Form
Invariants:: getDiscount()∈[0,1]

Nested Class Summary

Nested classes/interfaces inherited from class orbital.algorithm.template.MarkovDecisionProcess
`MarkovDecisionProcess.DynamicProgramming`

Nested classes/interfaces inherited from interface orbital.algorithm.template.HeuristicAlgorithm
`HeuristicAlgorithm.Configuration, HeuristicAlgorithm.PatternDatabaseHeuristic`

Nested classes/interfaces inherited from interface orbital.algorithm.template.EvaluativeAlgorithm
`EvaluativeAlgorithm.EvaluationComparator`

Constructor Summary
`MarkovDecisionProcess.DynamicProgramming(Function heuristic)`
`MarkovDecisionProcess.DynamicProgramming(Function heuristic, double gamma)` Deprecated. convenience constructor, prefer to use `ValueFactory.valueOf(double)`..
`MarkovDecisionProcess.DynamicProgramming(Function heuristic, Real gamma)`

Method Summary
`protected MutableFunction`	`createMap()` Create a mapping.
`protected BinaryFunction`	`getActionValue(Function U)` Get the action-value cost function of an action and state.
`Real`	`getDiscount()` Get the discount factor γ.
`Function`	`getEvaluation()` f(s) = h(s).
`protected Function`	`getGreedyPolicy(BinaryFunction Q)` Get a greedy policy with respect to an action-value cost function Q.
`Function`	`getHeuristic()` Get the heuristic function used.
`protected Pair`	`maximumExpectedUtility(BinaryFunction Q, java.lang.Object state)` Calculate the maximum expected utility (MEU) action.
`void`	`setDiscount(Real gamma)` Set the discount factor γ.
`void`	`setHeuristic(Function heuristic)` Set the heuristic function to use.

Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess
`getProblem, plan, solve`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate
`complexity, solve, spaceComplexity`

Constructor Detail

MarkovDecisionProcess.DynamicProgramming

public MarkovDecisionProcess.DynamicProgramming(Function heuristic,
                                                Real gamma)

Parameters:: heuristic - the heuristic function to use.; gamma - The discount factor γ describes, how much immediate results are preferred over future results.
See Also:: setHeuristic(Function), setDiscount(Real)

MarkovDecisionProcess.DynamicProgramming

public MarkovDecisionProcess.DynamicProgramming(Function heuristic,
                                                double gamma)

Deprecated. convenience constructor, prefer to use ValueFactory.valueOf(double)..

MarkovDecisionProcess.DynamicProgramming

public MarkovDecisionProcess.DynamicProgramming(Function heuristic)

Method Detail

setDiscount

public void setDiscount(Real gamma)

Set the discount factor γ.

Parameters:: gamma - The discount factor γ describes, how much immediate results are preferred over future results. The higher the factor, the more balanced preference, the lower the factor, the more preference is taken for immediate results. For γ=0, immediate costs are considered, only. For γ=1, the undiscounted case, additional assumptions are required to produce a well-defined decision problem and ensure convergence.
Preconditions:: gamma∈[0,1]

getDiscount

public Real getDiscount()

Get the discount factor γ.

Postconditions:: RES∈[0,1]

getHeuristic

public Function getHeuristic()

Description copied from interface: HeuristicAlgorithm

Get the heuristic function used.

Specified by:: getHeuristic in interface HeuristicAlgorithm

Returns:: the heuristic cost function h:S→R estimating h^*.

setHeuristic

public void setHeuristic(Function heuristic)

Set the heuristic function to use.

Note that the new heuristic function will only apply to unknown future states for bootstrapping. States that have already been estimated with the old heuristic function will not be updated. Nevertheless its always safe to set the heuristic function immediately before a call to MarkovDecisionProcess.plan().

Specified by:: setHeuristic in interface HeuristicAlgorithm

Parameters:: heuristic - the heuristic cost function h:S→R estimating h^*. h will be embedded in the evaluation function f.
See Also:: "Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, Massachusetts. 1984."

createMap

protected MutableFunction createMap()

Create a mapping.

Overwrite to implement another lookup table than hash maps. f.ex. neural networks etc. However, beware of implicit function approximization and generalization techniques for U, that might disturb the convergence of RTDP.

Returns:: an arbitrary table-like implementation ready to keep values for arguments.
See Also:: Factory Method

getEvaluation

public Function getEvaluation()

f(s) = h(s).

Specified by:: getEvaluation in interface EvaluativeAlgorithm
Specified by:: getEvaluation in interface HeuristicAlgorithm

Returns:: the evaluation function f:S→R used to evaluate (either utility or cost) value of states.

maximumExpectedUtility

protected Pair maximumExpectedUtility(BinaryFunction Q,
                                      java.lang.Object state)

Calculate the maximum expected utility (MEU) action.

Parameters:: Q - the action-value cost function Q:S×A(s)→R evaluating the expected utility of actions in states.; state - the state s∈S in which to take an action.
Returns:: the Pair (a, Q)∈A(s)×R with maximum expected utility, which means minimum expected cost sum in this case.
Postconditions:: RES = (a,Q) ∧ a = argmin_a'∈A(s) Q(s,a') ∧ Q = min_a'∈A(s) Q(s,a').

getActionValue

protected BinaryFunction getActionValue(Function U)

Get the action-value cost function of an action and state.

Parameters:: U - The evaluation function U:S→R mapping states to expected cost sum.
Returns:: Q_U:S×A(s)→R; (s,a) ↦ Q_U(s,a) = c(s,a) + γ*∑_s'∈S P(s'|s,a)*U(s'). Q_U(s,a) is the action-value cost of the action a∈A(s) in state s∈S as evaluated by U.
See Also:: "C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England, 1989."

getGreedyPolicy

protected Function getGreedyPolicy(BinaryFunction Q)

Get a greedy policy with respect to an action-value cost function Q.

Parameters:: Q - an action-value cost function Q:S×A(s)→R.
Returns:: π_Q = λs. argmin_a∈A(s) Q(s,a).
See Also:: Greedy, getActionValue(Function)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Orbital library
1.3.0: 11 Apr 2009

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

orbital.algorithm.template Class MarkovDecisionProcess.DynamicProgramming

MarkovDecisionProcess.DynamicProgramming

MarkovDecisionProcess.DynamicProgramming

MarkovDecisionProcess.DynamicProgramming

setDiscount

getDiscount

getHeuristic

setHeuristic

createMap

getEvaluation

maximumExpectedUtility

getActionValue

getGreedyPolicy

orbital.algorithm.template
Class MarkovDecisionProcess.DynamicProgramming