Long-Term Values in Markov Decision Processes, (Co)Algebraically

Frank Feys; Helle Hvid Hansen; Lawrence S. Moss

doi:10.1007/978-3-030-00389-0_6

Conference Papers Year : 2018

Long-Term Values in Markov Decision Processes, (Co)Algebraically

(1) , (1) , (2)

1
2

Frank Feys

Function : Author
PersonId : 1043059

Department of Engineering Systems and Services [Delft ]

Helle Hvid Hansen

Function : Author
PersonId : 1043060

Department of Engineering Systems and Services [Delft ]

Lawrence S. Moss

Function : Author
PersonId : 1043061

Department of mathematics [Bloomington]

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

Keywords

Markov decision process Long-term value Discounted sum Coalgebra Algebra Corecursive algebra Fixpoint Metric space

Domains

Computer Science [cs]

Fichier principal

473364_1_En_6_Chapter.pdf (475.04 Ko)

Origin	Files produced by the author(s)

Hal Ifip : Connect in order to contact the contributor

https://inria.hal.science/hal-02044650

Submitted on : Thursday, February 21, 2019-3:41:27 PM

Last modification on : Tuesday, March 26, 2024-5:44:13 PM

Long-term archiving on : Wednesday, May 22, 2019-4:14:16 PM

Dates and versions

hal-02044650 , version 1 (21-02-2019)

Licence

Attribution

Identifiers

HAL Id : hal-02044650 , version 1
DOI : 10.1007/978-3-030-00389-0_6

Cite

Frank Feys, Helle Hvid Hansen, Lawrence S. Moss. Long-Term Values in Markov Decision Processes, (Co)Algebraically. 14th International Workshop on Coalgebraic Methods in Computer Science (CMCS), Apr 2018, Thessaloniki, Greece. pp.78-99, ⟨10.1007/978-3-030-00389-0_6⟩. ⟨hal-02044650⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC1 IFIP-WG1-3 IFIP-CMCS IFIP-LNCS-11202

106 View

233 Download

Long-Term Values in Markov Decision Processes, (Co)Algebraically

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share