Different approach

RL is different from other ML methods, as RL:

  1. start with a fully interactive, goal-driven agent
  2. it will make decisions
  3. it will pursue a goal

Elements of RL

A policy, a reward signal , a value function, and optionally, a model of the environment.

  1. Policy

    A map from {perceived states of environment} to {actions to be taken}

    It’s called stimulus-response rules in psychology.

  2. Reward signal

    The goal of RL problem.

  3. Value function

    Reward is for immediate, but value is the long-term desirability of states.

    eg. a state might yeild a low immediate reward, but still have hight value.

  4. Model

    mimics thebehaviour of environment.

Multi-armed Bandits

多杆老虎机。

Slot Machine(老虎机), are sometimes called One-Armed Bandit. Because traditional slot machines have a single lever (arm) to pull and spin the reels.

“Bandit” → Slot machines take people’s money like robery, so they are metaphorically called “bandits.”