“They’re designed to produce statistically probable sentences.”
Exactly. Sometimes what is statistically probable is also correct: “What is the capital of France?” will most likely correctly tell you “Paris” because Paris has a high statistical association with “capital of France”.
But this methodology is inherently unreliable for giving facts. A LLM might confidently assert that George Washington cut down a cherry tree, just because there’s a common association between Washington and that story, even though historians largely believe it’s a myth. Elon Musk associated with Teslas + Teslas associated with car crashes + Elon Musk associated with a car crash leads to a LLM erroneously asserting that Elon Musk died in a fatal Tesla crash. Sure, it’s a statistically probable sentence, but it’s just not true. The LLM doesn’t know whether something is true, and it doesn’t care.