Posts, articles, and discussions

Finetune Stable Diffusion Models with DDPO via TRL
By September 29, 2023 guest

Fine-tune Llama 2 with DPO
By August 8, 2023

StackLLaMA: A hands-on guide to train LLaMA with RLHF
By April 5, 2023

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
By March 9, 2023

Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system
By February 7, 2023

Illustrating Reinforcement Learning from Human Feedback (RLHF)
By December 9, 2022

Train your first Decision Transformer
By September 8, 2022

Proximal Policy Optimization (PPO)
By August 5, 2022

Advantage Actor Critic (A2C)
By July 22, 2022

Policy Gradient with PyTorch
By June 30, 2022

Deep Q-Learning with Atari
By June 7, 2022

An Introduction to Q-Learning Part 2
By May 20, 2022

An Introduction to Q-Learning Part 1
By May 18, 2022

An Introduction to Deep Reinforcement Learning
By May 4, 2022
Community posts
view all