Luca comment 2

The resulting models may be narrow, entropic or homogeneous; biases may become progressively amplified; or the outcome may be something altogether harder to anticipate.

What to do? Is it possible to simply tag synthetic outputs so that they can be excluded from future model training, or at least differentiated?

Might it become necessary, conversely, to tag human-produced language as a special case, in the same spirit that cryptographic watermarking has been proposed for proving that genuine photos and videos are not deepfakes? Will it remain possible to cleanly differentiate synthetic from human-generated media at all, given their likely hybridity in the future? index.php?title=Category:Content form - comment