If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s). We have recently clarified our penalties for this. If a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper. The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM (“here is a 200 word summary; would you like me to make any changes?”; “the data in this table is illustrative, fill it in with the real numbers from your experiments”)
Just like Github, arXiv is where anyone can upload their scientific outputs. There’s a minimal verification to prevent Spam, but arXiv was never about gatekeeping content. Until now.
As I mentioned before, reputation is more important than ever in an age where text (and voice) is being produced cheaply, financed by monopoly-inducing LLM factories at a loss.
If the goal of arXiv is to provide an open alternative to the gatekeeping of journals, what is the open alternative to the gatekeeping of arXiv?