Posts

Measure-Theoretic View of Policy Gradients

Introduction Why a Measure-Theoretic View of Policy Gradients? Reinforcement learning (RL) has long always relied on probability densities and likelihood ratios to compute policy gradients. The standard derivation comes to this conclusion: $$ \nabla_\theta J(\pi_\theta) = \mathbb{E} \left[ R \nabla_\theta \log \pi_\theta(a | s) \right] $$ where $J(\pi_\theta)$ is the objective function (e.g. expected reward), $\pi_\theta$ is the policy, $R$ is the reward, and $\nabla_\theta \log \pi_\theta(a | s)$ is the gradient of the log policy. Basically what we covered previously. ...

Reasonable effectiveness of Trust Regions

In deep learning and reinforcement learning (RL), optimization is at the heart of progress. While many of us simply call optimizer.step() in our training scripts, there lies a rich world of ideas and techniques that can improve both the stability and performance of our models. In this article, we dive into the concept of trust regions in optimization, examine how they helped overcome issues in RL, discuss their applications in other areas, and speculate on why renewed interest in these methods might be on the horizon with the rise in available compute. ...

Struggle with OCD and general anxiety

My Journey with OCD and Anxiety This is my personal account of living with OCD (Obsessive-Compulsive Disorder) and general anxiety. My goal in sharing this is twofold: to help those who may not believe in mental illnesses understand the struggle, and to encourage others battling similar issues to seek help. The Beginning of My Struggle Since May 2023, I’ve been grappling with OCD and health anxiety. It started with an irrational fear of rabies. A year prior, while visiting my wife’s family, I played with a puppy that later died. I became fixated on the idea that it had bitten me and given me rabies, lying dormant in me. The fear spiraled out of control. ...