Evaluating Saliency Maps for Deep Reinforcement Learning Agents in Super Mario Bros


This article is essentially a summary of my dissertation in eXplainable Reinforcement Learning (XRL).

I’m aiming this article at people who have no knowledge of Artificial Intelligence, Reinforcement Learning or anything like that. However, I am also trying to not simplify this too much. If there’s ever a random acronym with a link that’s just not explained for the rest of it, I promise you don’t actually need to understand what it means to understand what I’m talking about.

If you do have any questions, please feel free to email me!

If you want the brief yet very densely packed sentence for those that understand this stuff as to what I did for my dissertation, here it is: I trained a Deep RL agent to play Super Mario Bros using Proximal Policy Optimization (PPO), applied both SHapley Additive Values (SHAP), a very widely used eXplainable Artificial Intelligence (XAI) method, and a state-of-the-art research XRL method created by Greydanus et al. to this agent (both of which are perturbation-based methods which generate saliency maps for their explanations), and then used some research computational-based metrics (Sanity Checks & Insertion and Deletion Metrics) to evaluate the validity and utility of applying XAI methods to Reinforcement Learning (RL).

Well that was easy, right? We all understand this nonsense?

So what actually is Reinforcement Learning?

Reinforcement Learning (RL) is a very big, difficult and maths-y topic. If you wanted an in-depth look into the nuances and more of the details, I’d highly recommend what is considered the ‘Bible’ of RL: Reinforcement Learning: An Introduction by Sutton and Barto (it’s available for free online). I’ll try to summarise what’s important for this context here.

RL is about learning what action to take in a particular situation to give the “best” possible outcome (this is achieved by maximising a numerical value called the “reward”). In order to explain RL, I’m going to explain it generically, then relate it back to Super Mario Bros.

This may be clearer in code, so I’ll give an explanation in semi-legal Python using Gymnasium (a really popular library for Reinforcement Learning) to help illustate this:

import gymnasium as gym

agent = YourAgent()
environment = gym.make("YourEnvironment")
# reset does what it says on the tin, and will return back the first state to use.
state = environment.reset()  

while not_finished_training:
    # The first step is to get the action from our agent based on the current step.
    action = agent.get_action(state)
    # The next step is to execute that action in our environment, at which point the next state and the reward for doing that action will be given back to us.
    next_state, reward, ... = env.step(action)
    # The next step is to adjust our agent's policy (what actions to choose given a state) based on a host of different stuff.
    # What information is needed in what format is decided by the algorithm we choose; all of them will use the reward and some kind of previous state,
    # almost all of them will use some variation on a list of previous states, rewards, actions, etc. 
    agent.train(<stuff>)
    # Then we repeat this until you feel like stopping it !
    state = next_state  
    

Why should we care about XRL?