Shoot, why’d I just do that?
June 20, 2019
We’ve all had the experience of botching an easy decision. Laboratory subjects, both human and animal, also sometimes make the wrong choice when categorizing stimuli that should be really easy to judge. We recently wrote a paper about this which is on biorxiv. We argued that these lapses are not goof-ups but instead reflect the need for subjects to explore an environment to better understand its rules and rewards. We also made a cake about this finding, which was delicious.
We were happy to hear that Jonathan Pillow‘s lab picked our paper to discuss in their lab meeting. Pillow’s team have, like us, been enthusiastic about new ways to characterize lapses and in fact have a rather interesting (and complimentary account) which you can read if interested. We really enjoyed reading this thoughtful blog by Zoe Ashwood about their lab meeting discussion.
They raised a few concerns which we address below:

But we agree about the different strategies- *after* the animal attends to the stimulus & estimates their rates, it could potentially use this information to infer that a trial is neutral, and should discard the irrelevant visual information (a “causal inference” strategy akin to Kording et. al) rather than integrating it (a “forced fusion” strategy). However, this retrospective discarding differs from inattention because it requires knowledge of the rates and doesn’t produce lapses, instead affecting the – causal inference predicts comparable neutral and auditory sigmas, while forced fusion predicts neutral values of
that are higher than auditory, due to inappropriately integrated noise. Indeed we see comparable neutral and auditory values of
(and values of
too), suggesting causal inference.
Our response: In the exploration model, in addition to , the lapse rates on either side are determined by the *subjective* values of left and right actions (rL & rR), which must be learnt from experience and hence could be different even when the true rewards are equal, permitting asymmetric lapse rates . When one of the rewards is manipulated, we only allow the corresponding subjective value to change. Since there is an arbitrary scale factor on rR & rL and we only ever manipulate one of the rewards, we can set the un-manipulated reward (say rL) to unity & fit 2 parameters to capture lapses –
& rR in units of rL.
Our response: From the rat’s perspective, can arise naturally as a consequence of Thompson sampling from action value beliefs (Supplementary Fig. 2, also see Gershman 2018) yielding a beta inversely proportional to the root sum of squared variances of action value beliefs. This should also naturally depend on the history of feedback – if the animal receives unambiguous feedback (like sure-bet trials), then these beliefs should be well separated, yielding a higher beta. Supplementary 2 simulates this for 3 levels of sensory noise for a particular sequence of stimuli & a Thompson sampling policy.