## Authors: Mohammadhosein Hasanbeig, Alessandro Abate and Daniel Kroening

## Paper Information

Title: | Logically-Constrained Reinforcement Learning |

Authors: | Mohammadhosein Hasanbeig, Alessandro Abate and Daniel Kroening |

Proceedings: | MoRe Papers |

Editors: | Mickael Randour and Jeremy Sproston |

Keywords: | Markov Decision Process, Reinforcement Learning, Policy Synthesis, Quantitative Model Checking, Buchi Automaton |

Abstract: | ABSTRACT. We propose a Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, our algorithm synthesizes a policy that satisfies the linear time property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches. |

Pages: | 2 |

Talk: | Jul 13 17:00 (Session 88G: Markov decision processes 2) |

Paper: |