People are having the same problems with general conversational ai products like chatgpt, that one might have when discussing a topic with a random human.
This illustrates how RLHF is about preferences, not necessarily about general alignment or universal quality.
In contrast, specialized task and instruction tuned models are more apt to stay on the rails like a specialized contractor.