Orbital library

orbital.algorithm.template
Class MarkovDecisionProcess.DynamicProgramming

java.lang.Object
  extended by orbital.algorithm.template.MarkovDecisionProcess
      extended by orbital.algorithm.template.MarkovDecisionProcess.DynamicProgramming
All Implemented Interfaces:
java.io.Serializable, AlgorithmicTemplate, EvaluativeAlgorithm, HeuristicAlgorithm
Direct Known Subclasses:
GaussSeidelDynamicProgramming, RealTimeDynamicProgramming
Enclosing class:
MarkovDecisionProcess

public abstract static class MarkovDecisionProcess.DynamicProgramming
extends MarkovDecisionProcess
implements HeuristicAlgorithm

Abstract base class for Markov decision processes solved per dynamic programming.

Author:
André Platzer
See Also:
DynamicProgramming, "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", "Bellman, R. E. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.", Serialized Form
Invariants:
getDiscount()∈[0,1]

Nested Class Summary
 
Nested classes/interfaces inherited from class orbital.algorithm.template.MarkovDecisionProcess
MarkovDecisionProcess.DynamicProgramming
 
Nested classes/interfaces inherited from interface orbital.algorithm.template.HeuristicAlgorithm
HeuristicAlgorithm.Configuration, HeuristicAlgorithm.PatternDatabaseHeuristic
 
Nested classes/interfaces inherited from interface orbital.algorithm.template.EvaluativeAlgorithm
EvaluativeAlgorithm.EvaluationComparator
 
Constructor Summary
MarkovDecisionProcess.DynamicProgramming(Function heuristic)
           
MarkovDecisionProcess.DynamicProgramming(Function heuristic, double gamma)
          Deprecated. convenience constructor, prefer to use ValueFactory.valueOf(double)..
MarkovDecisionProcess.DynamicProgramming(Function heuristic, Real gamma)
           
 
Method Summary
protected  MutableFunction createMap()
          Create a mapping.
protected  BinaryFunction getActionValue(Function U)
          Get the action-value cost function of an action and state.
 Real getDiscount()
          Get the discount factor γ.
 Function getEvaluation()
          f(s) = h(s).
protected  Function getGreedyPolicy(BinaryFunction Q)
          Get a greedy policy with respect to an action-value cost function Q.
 Function getHeuristic()
          Get the heuristic function used.
protected  Pair maximumExpectedUtility(BinaryFunction Q, java.lang.Object state)
          Calculate the maximum expected utility (MEU) action.
 void setDiscount(Real gamma)
          Set the discount factor γ.
 void setHeuristic(Function heuristic)
          Set the heuristic function to use.
 
Methods inherited from class orbital.algorithm.template.MarkovDecisionProcess
getProblem, plan, solve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate
complexity, solve, spaceComplexity
 

Constructor Detail

MarkovDecisionProcess.DynamicProgramming

public MarkovDecisionProcess.DynamicProgramming(Function heuristic,
                                                Real gamma)
Parameters:
heuristic - the heuristic function to use.
gamma - The discount factor γ describes, how much immediate results are preferred over future results.
See Also:
setHeuristic(Function), setDiscount(Real)

MarkovDecisionProcess.DynamicProgramming

public MarkovDecisionProcess.DynamicProgramming(Function heuristic,
                                                double gamma)
Deprecated. convenience constructor, prefer to use ValueFactory.valueOf(double)..


MarkovDecisionProcess.DynamicProgramming

public MarkovDecisionProcess.DynamicProgramming(Function heuristic)
Method Detail

setDiscount

public void setDiscount(Real gamma)
Set the discount factor γ.

Parameters:
gamma - The discount factor γ describes, how much immediate results are preferred over future results. The higher the factor, the more balanced preference, the lower the factor, the more preference is taken for immediate results. For γ=0, immediate costs are considered, only. For γ=1, the undiscounted case, additional assumptions are required to produce a well-defined decision problem and ensure convergence.
Preconditions:
gamma∈[0,1]

getDiscount

public Real getDiscount()
Get the discount factor γ.

Postconditions:
RES∈[0,1]

getHeuristic

public Function getHeuristic()
Description copied from interface: HeuristicAlgorithm
Get the heuristic function used.

Specified by:
getHeuristic in interface HeuristicAlgorithm
Returns:
the heuristic cost function h:S→R estimating h*.

setHeuristic

public void setHeuristic(Function heuristic)
Set the heuristic function to use.

Note that the new heuristic function will only apply to unknown future states for bootstrapping. States that have already been estimated with the old heuristic function will not be updated. Nevertheless its always safe to set the heuristic function immediately before a call to MarkovDecisionProcess.plan().

Specified by:
setHeuristic in interface HeuristicAlgorithm
Parameters:
heuristic - the heuristic cost function h:S→R estimating h*. h will be embedded in the evaluation function f.
See Also:
"Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley, Reading, Massachusetts. 1984."

createMap

protected MutableFunction createMap()
Create a mapping.

Overwrite to implement another lookup table than hash maps. f.ex. neural networks etc. However, beware of implicit function approximization and generalization techniques for U, that might disturb the convergence of RTDP.

Returns:
an arbitrary table-like implementation ready to keep values for arguments.
See Also:
Factory Method

getEvaluation

public Function getEvaluation()
f(s) = h(s).

Specified by:
getEvaluation in interface EvaluativeAlgorithm
Specified by:
getEvaluation in interface HeuristicAlgorithm
Returns:
the evaluation function f:S→R used to evaluate (either utility or cost) value of states.

maximumExpectedUtility

protected Pair maximumExpectedUtility(BinaryFunction Q,
                                      java.lang.Object state)
Calculate the maximum expected utility (MEU) action.

Parameters:
Q - the action-value cost function Q:S×A(s)→R evaluating the expected utility of actions in states.
state - the state s∈S in which to take an action.
Returns:
the Pair (a, Q)∈A(s)×R with maximum expected utility, which means minimum expected cost sum in this case.
Postconditions:
RES = (a,Q) ∧ a = argmina'∈A(s) Q(s,a') ∧ Q = mina'∈A(s) Q(s,a').

getActionValue

protected BinaryFunction getActionValue(Function U)
Get the action-value cost function of an action and state.

Parameters:
U - The evaluation function U:S→R mapping states to expected cost sum.
Returns:
QU:S×A(s)→R; (s,a) ↦ QU(s,a) = c(s,a) + γ*∑s'∈S P(s'|s,a)*U(s'). QU(s,a) is the action-value cost of the action a∈A(s) in state s∈S as evaluated by U.
See Also:
"C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England, 1989."

getGreedyPolicy

protected Function getGreedyPolicy(BinaryFunction Q)
Get a greedy policy with respect to an action-value cost function Q.

Parameters:
Q - an action-value cost function Q:S×A(s)→R.
Returns:
πQ = λs. argmina∈A(s) Q(s,a).
See Also:
Greedy, getActionValue(Function)

Orbital library
1.3.0: 11 Apr 2009

Copyright © 1996-2009 André Platzer
All Rights Reserved.