A paradox? we want the LLM to have enough data, to represent many things, but at the same time, we don't want them to have a normalising effect...
A paradox? we want the LLM to have enough data, to represent many things, but at the same time, we don't want them to have a normalising effect...