Training a helpful and harmless assistant with reinforcement learning from human feedback
Out-of-distribution detection and selective generation for conditional language models
Machine learning algorithms typically assume independent and identically distributed
samples in training and at test time (IID). Much work has shown that high-performing ML …
samples in training and at test time (IID). Much work has shown that high-performing ML …