A Very Concrete Model of Learning From Regrets

https://www.lesswrong.com/posts/vw6K8rDWeZA2hLc3G/a-very-concrete-model-of-learning-from-regrets

Warning 1: This post is written in the form of Java-like pseudocode.

If you have no knowledge of programming, you might have trouble understanding it.

(If you do, it still does not guarantee you will understand, but your chances are better.)

**Warning 2: **I have more than moderate, but less than high, confidence that this model is approximately correct.

It doesn’t mean that my or anyone’s brain works exactly in the way shown in the code, but rather that the flow of data in the brain is approximately as if it were using such an algorithm.

The word "approximately" includes stuff I don’t (yet) know about, but also stuff I didn’t include below to keep it simple.

I wrote this specifically for regrets, but processing of positive memories seems to have similar mechanics (with different constants).

**Warning 3: **There is little chance of finding any existing studies/​data etc. that could directly validate or invalidate this model. (However if you know of any, I’m all ears.)

There might some stuff that is correlated, so if you know something mention it too.

class Brain { …

//​ This represents a memory about a single event

class Memory
{
    …
    
    float associatedEmotions; //​ positive or negative
}

//​ Your brain keeps track of this

private Map<Memory, Float> memoriesRequireProcessing = new Map<>();

//​ Add new stuff to the queue

private void somethingHappened(Memory newMemory)
{
    float affect = getAffectOfSituation(newMemory);
    
    newMemory.associatedEmotions = affect * 0.5;

    if (Math.abs(affect) > 0.1)
        memoriesRequireProcessing.add(newMemory, Math.abs(affect));    }

//​ You have no control over how this works,
//​ but you can influence the confidence parameter
//​ (mostly indirectly, a little bit directly)

protected void learnedMyLesson(Memory m, float confidence)
{
    float previousValue =
        memoriesRequireProcessing.get(m);
    
    float nextValue = previousValue * (1.0 - confidence);
    
    if (nextValue > 0.1)
        memoriesRequireProcessing.set(m, nextValue);
    else
        memoriesRequireProcessing.remove(m);
}

//​ You can consciously override this and do something else
//​
//​ @return: judgement of success or failure

protected float ruminateOnMemory(Memory m)
{
    //​ Depends on the situation, but the default is
    //​ relatively low confidence
    
    learnedMyLesson(m, 0.1);
    
    //​ Substitute affect for judgement of success
    
    return getAffectOfSituation(m);
}

//​ This prompts some thoughts about a memory

private void rememberAbout(Memory m)
{
    feelEmotion(m.associatedEmotions);

    float judgement = ruminateOnMemory(m);
    
    m.associatedEmotions =
        0.9 * m.associatedEmotions
        + 0.2 * judgement;
}

//​ Your brain does this all the time

private void onIdle()
{
    while (memoriesRequireProcessing.thereIsALotOfShit())
    {
        //​ Choose some memory paired with a high value
        
        Memory next = memoriesRequireProcessing.choose();
        
        rememberAbout(next);
    }
    
    …
}

…

}

Comment

https://www.lesswrong.com/posts/vw6K8rDWeZA2hLc3G/a-very-concrete-model-of-learning-from-regrets?commentId=yZve6omveNLxq6qAm

Maybe you want to look into cognitive architectures e.g. LIDA.

Comment

https://www.lesswrong.com/posts/vw6K8rDWeZA2hLc3G/a-very-concrete-model-of-learning-from-regrets?commentId=Bi7QXgY6nQGqbr6xc

Thanks, this is interesting.

https://www.lesswrong.com/posts/vw6K8rDWeZA2hLc3G/a-very-concrete-model-of-learning-from-regrets?commentId=veQjyWQ6A74ydyoio

Sounds like actor-critic with experience replay RL.