Different approach
RL is different from other ML methods, as RL:
- start with a fully interactive, goal-driven agent
- it will make decisions
- it will pursue a goal
Elements of RL
A policy, a reward signal , a value function, and optionally, a model of the environment.
Policy
A map from {perceived states of environment} to {actions to be taken}
It’s called stimulus-response rules in psychology.
Reward signal
The goal of RL problem.
Value function
Reward is for immediate, but value is the long-term desirability of states.
eg. a state might yeild a low immediate reward, but still have hight value.
Model
mimics thebehaviour of environment.
Multi-armed Bandits
多杆老虎机。
Slot Machine(老虎机), are sometimes called One-Armed Bandit. Because traditional slot machines have a single lever (arm) to pull and spin the reels.
“Bandit” → Slot machines take people’s money like robery, so they are metaphorically called “bandits.”