Author: Klepach Albina

Part 1.

Part 2. Review of papers

DPDP for VRP (paper, code)

image.png


The Primacy Bias in Deep Reinforcement Learning (paper, code)

Problem: The Primacy Bias in Deep RL: a tendency to overfit early experiences that damages the rest of the learning process. Solution: Given an agent’s neural network, periodically reinitialize the parameters of its last few layers while preserving the replay buffer.

image.png


Toolformer: Language Models Can Teach Themselves to Use Tools (paper)

Вот так должен выглядеть вызов API в сгенерированном тексте:

image.png