Main  Blog  CV  Projects  Publications 

Meditations on Iteration

In (let us say, classical) reinforcement learning, there is an idea called policy iteration.

It is an algorithm (or perhaps more accurately an algorithm template) with two phases, called policy evaluation and policy improvement. Policy evaluation means judging the outcome of states (or state/action pairs) for you under your current policy - basically, how good are you doing in various circumstances? Policy improvement then suggests modifying your policy to maximize this valuation - how can you make better decisions for yourself right now, assuming you'll act about the same tomorrow as you did yesterday?

Under nice circumstances, this eventually converges to the best possible policy. By making local, greedy improvements, you can go from a terrible agent to a very "agenty" agent. For your circumstances anyway. Assuming they were really simple and easy (say a finite MDP with some mixing conditions).

I would like to do at least as well as this simple idea in my real life. I would like to notice when the things I am doing are consistently not working, and then adapt. I'd also like to explore in recoverable situations (and satisfy those mixing conditions). At least when things happen to work out to a finite MDP, I would like to eventually solve my problems.

Unfortunately this is tricky. I do not know how to measure my performance, and whenever I think about that I tend to think about Goodhart's law and become paralyzed. Usually my thoughts about life hacking/optimization cycle at that point and go no further. So let's say I am stuck on the policy evluation part. I guess the policy improvement part would not be THAT hard if I could pull off poliy evaluation, and then pay attention and build good habits on a daily basis.

What particularly annoys me is that I aim SO MUCH HIGHER than policy improvement. I study sequential decision theory, machine learning, and artificial intelligence not just because they're powerful and cool, but because I like to imagine that the algorithms and principles I learn can be applied to my own cognitive practices in a positive feedback loop. But it feels like... I don't do this very well. I do sometimes speculate about how theorem X could inform my decision making. I just don't really put it into practice.

So, instead of paralyzing, I've decided to start with the simplest, basically first idea that occurs to me (a policy improvement? Perhaps in a way). Every day, at least once a day, I will ask myself three questions:

1: What have I learned today, particularly about intelligence?

2: How can this make me a better learner and decision maker?

3: How have I been doing, on a concrete day to day level, at implementing the strategies I have imagined?

I do not think this is perfect, but maybe it is a good start.