AI Megathread

Griatch

I very strongly support the implementation of strict regulations on how the models are trained, with requirements that all training data be listed and freely discoverable by the public.

Proprietary models like Midjourney and OpenAI don’t release any of this stuff, alas. But if you stick to OSS models, like Stable Diffusion, you can freely search their training data here (they also use other public sources). There are tens of thousands of LLM models for various purposes and active research on hugging face alone; they tend to be based on publicly available training data sets.

Faraday

@Griatch said in AI Megathread:

You talk as if it’s a clear-cut thing that these models are based on “theft”. Legally speaking, I don’t think this is really established yet - it’s a new type of technology and copyright law has not caught up.

I do, yes. Obviously the courts have not weighed in yet on the specific lawsuits at play, but that doesn’t prevent people from drawing their conclusions based on available evidence and knowledge of the laws.

I have seen with my own eyes these tools generate images and text that are very clearly copyright-infringing.

Arguing that they are somehow absolved of all responsibility because of how the users use the tools is like arguing that a pirate website or Napster bears no responsibility for being a repository of pirated material because it’s the users who are uploading and downloading the actual files. That has historically not worked out too well for the app makers. It’s the reason YouTube errs on the side of copyright claims - they don’t want to get drawn into that battle.

I also don’t personally find any weight to the argument that AI is ‘just learning like humans learn’. That’s like arguing that NFL teams should be allowed to use Mark Rober’s kicking robot in the Super Bowl because “it kicks just like a human does”.

Faraday

Just came across this latest insanity and felt obliged to share.

As of today, there are about half a dozen books being sold on Amazon, with my name on them, that I did not write or publish. Some huckster generated them using AI. This promises to be a serious problem for the book publishing world.

A brief update: After going back a few times with Amazon on this issue, I was notified the books would not be removed based on the information I provided. Since I do not own copyright in these AI works and since my name is not trademarked, I’m not sure what can be done.

It did eventually get sorted out, but only because this particular author had lawyers to advocate for them with Amazon.

Rinel

@Faraday said in AI Megathread:

I also don’t personally find any weight to the argument that AI is ‘just learning like humans learn’.

It’s demonstrably false, as I put forward in the mermaid argument earlier. You can show a human a mermaid and tell them to make one who is half octopus instead of half fish. You can’t do that with LLMs. You have to phrase the imput differently when trying to generate novel ideas, because LLMs /cannot learn/. They aren’t sapient. They aren’t even sentient. The fact that you can use certain tools to end up with an approximate result with an LLM doesn’t mean the AI is learning.

sao

I disagree that the law hasn’t caught up. The law of transformative versus derivative work is directly applicable to the theory behind the training data and its use. What hasn’t caught up is legislation, but it’s already illegal under existing common-law standards, it’s just that that’s difficult to enforce because it’s case by case and a lot of the actual practice of it is stupidly based on who can afford a fancy IP lawyer and who is going to believe a shifty agreement is lawful just because it was signed.

I got into an argument about this just the other day on wyrdhold but the innocent bystanders were screaming and crying about the crossfire so I had to stop.

The element of human creativity to create a new thing is already the basis of the legal distinction between transformative (new art) and derivative (copied art) work.

Faraday

@sao said in AI Megathread:

The element of human creativity to create a new thing is already the basis of the legal distinction between transformative (new art) and derivative (copied art) work.

Very true. It also staggers me just how many folks cry “but it’s transformative!” like that’s a defense. Transformative art is by default copyright infringement. Fair use is an exception that requires specific criteria. Transformation alone is not enough.

That’s why people still need permission to make a movie from a book, or a video game from a movie, or to record a cover song, even though all of these things are “transformative”. (YT’s rules for covers using ContentID makes things murky, but still gives the rights holder the control to block it, because it’s copyright infringement.)

In other AI news - grocery store app generates deadly “recipes”.

https://www.theguardian.com/world/2023/aug/10/pak-n-save-savey-meal-bot-ai-app-malfunction-recipes

Other instances have involved everything from the dangerous (undercooked meat) to the nonsensical.

Hopefully people will eventually learn that LLMs cannot be trusted for accurate information.

Rinel

@Faraday said in AI Megathread:

Fair use is an exception that requires specific criteria. Transformation alone is not enough.

And determining what is fair use is an absolute fucking mess. I’ve had tons of people get mad at me when I say that fanfic and fanart are generally not fair use, because they’ve been told that if you aren’t selling it then it’s fine. It’s not fine just because you aren’t selling it!

Don’t get me wrong, I support fanart and fanfic and even write fanfic, but I’m well aware that I’m operating in a grey area of the law. I just don’t care about the law when it comes to that sort of thing, because the law is overly restrictive.

As a total aside to this largely tangential post, one of the funnier things to emerge out of this common misconception is the extreme taboo people have on selling fanfic, while fanartists routinely sell their work.

Trashcan

I made an account just to come rant on this topic and then posted in the wrong thread so now I’m here.

I think everyone broadly agrees that plagiarism is morally wrong. Plagiarism has two aspects: 1) the theft of someone else’s work and 2) the misattribution of that work to someone who did not produce it. Both aspects are wrong individually. Why is it that with AI, people are willing to hedge around the second one just because the first one has been rendered fuzzy and unclear?

Transparency is required. If I copy-pasted the world of Popular Franchise X and did a Find-Replace for recognizable words and changed those to something else, and claimed that I had created an Original Theme, everyone would get that this was Wrong. If I did the same thing but said “yes this is shamelessly ripped from Franchise X”, there might be opinions on whether it’s lazy and not worth engaging with, but transparency would have rendered this down from Clearly Unethical all the way to Sort of Low Effort, Isn’t It?.

There is no money to be made in Mushing; we are all doing this for the pure pleasure of reading other people’s writing and having our writing be read. What we receive is entertainment and validation, and the balance of those two vs. how annoying we are OOC makes up our entire reputation in this community. I don’t buy that you can explain away the unethical nature of undue validation being rendered with “but you were entertained and isn’t that enough?” No. You’ve robbed me of a whole half of this experience. At least be honest about it and let me decide if half is enough.

Faraday

@Trashcan said in AI Megathread:

Why is it that with AI, people are willing to hedge around the second one just because the first one has been rendered fuzzy and unclear?

The argument given by many AI defenders is that generative AI is not plagiarism because “the AI is just learning the way humans do”.

For example, if I read a whole lot of articles about D-Day, developed a coherent understanding of the lead-up, events, and effects of D-Day, and then wrote a completely original article about D-Day, all while citing my sources and taking care to quote directly when using other peoples’ words–that’s fine, right? That’s not plagiarism.

The problem is that people think that’s how generative AI works. It isn’t. It doesn’t have an understanding of D-Day because it doesn’t have any actual intelligence. It doesn’t really know what D-Day is. It can’t distinguish fact from fiction. It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).

At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details. And, critically, it doesn’t and can’t cite its sources because it literally doesn’t know where it’s getting its stuff from. All the data just went into a giant blender of words and concepts.

It may not be the exact same process as copy/paste human plagiarism, but the net output is the same.

Pavel

@Faraday said in AI Megathread:

then it fills out the details

Often wildly incorrectly, too. Which isn’t the point you were making, but it does help reinforce the idea that the AI (which is still a stupid name for the thing) doesn’t know anything.

Tributary

AI is just statistics. There’s nothing intelligent about it, really. AI just looks at the statistics surrounding patterns and does some math to model those patterns.

These tools are not new, and they are not as poorly understood as a lot of fans seem to think. I have a textbook printed in 2006 that has exercises for students to write neural networks, among other things.

Many aspects of biological memories [as compared to computer memories] are not understood.

In 1943 McCulloch and Pitts recognized that a network of simple neurons was capable of universal computation. That means that such a network could, in principle, perform any calculation that could be carried out with the most general computer imaginable. [More precisely, such a network can calculate any computable function int he sense of a general purpose Turing machine.] This attracted a good deal of interest from researchers interested in modeling the brain.

Giordino, Nicholas J. and Hisao Nakanishi, 2006, Computational Physics, Second Edition, Pearson Education, Inc., Upper Saddle River, NJ.

The first edition came out in 1997, and if it’s in a textbook, it’s not cutting edge. None of this is cutting edge. It’s really just that now we have the processing power to do the calculations required and the memory in which to store it.

That said, I found it impossible to land a job in data analytics despite having masters degrees in math and physics because (in part) people seem to love the idea of math and statistics being far more mysterious than they actually are.

Pavel

@Tributary said in AI Megathread:

people seem to love the idea of math and statistics being far more mysterious than they actually are

They are, to the people with MBAs who inevitably end up running things for some reason. (I’m terrible at math and stats, but I understand some of the principles enough to know it’s not entirely magic.)

Faraday

So funny tangent about plagiarism…

Not only does ChatGPT plagiarize other authors’ work, it even plagiarizes itself. I asked it “how do I build a skill system in AresMUSH” and then asked it the same for Evennia.

For each I got a fairly bland summary of tips that apply to all skill systems everywhere (because literally that’s how it built the info - from the blender of concepts associated with everything it’s ever scanned about “building skill systems”)… but notably it was the SAME SUMMARY.

(For Evennia)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay, completing quests, or other in-game actions. You’ll need to implement a mechanism for characters to spend those points to increase their skill ratings.

(For AresMUSH)
Skill Improvement:
Decide how characters can improve their skills over time. This could involve gaining experience points through roleplay or completing quests. Create mechanisms for characters to spend these points to increase their skill ratings.

That’s just a snippet. The rest of its advice was pretty identical too.

This ties in with something that I think most folks don’t realize about AI. It’s not ACTUALLY generating something original. Two people on different computers using the same prompt with the same seed value(*) will get the EXACT SAME response - word-for-word, pixel-for-pixel. This is one reason why AI-generated works can’t themselves be copyrighted.

(*) - The seed value is normally behind the scenes and randomized to make the responses appear more random/original, but under the hood it’s there and can be controlled. Like how you can use a seed value in Minecraft to build the same world as someone else.

Sage

@Faraday said in AI Megathread:

It’s just as likely to invent a quote from a non-existent historian as it is to quote one (or more likely, just use their words without the quotes).
At its core, it’s just building a web of word connections to know that “what happened on D-Day” is commonly associated with things like “paratroopers” and “amphibious landings” and then it fills out the details.

Which is why what it does is not really plagiarism. Now I’m not going to get into all the metaphysics of how humans ‘know things’ and how capable we are of creativity, and I’m not going to argue about whether the output of ChatGPT is any good.

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster). If ChatGPT were to take significant passages from someone’s work and pass it off as it’s own, then sure, that would be plagiarism, but that’s not what it is really doing.

I’m also not going to say that it is ok for ChatGPT to be trained on works without payment to the creators of those works. I’m not sure that falls under the terms of ‘fair-use’. (I’m not sure it doesn’t, either. I need more time to fully consider the situation, but considering that OpenAI plans to make money from it, I’m leaning towards ‘not’).

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

Trashcan

@Sage said in AI Megathread:

I’m just going to point out that plagiarism is “to steal and pass off (the ideas or words of another) as one’s own” (according to Mirriam-Webster).

If you do not disclose to other players that you are using AI in your “workflow”, whether or not you’re plagiarizing the people whose content was used to build the LLM, you ARE plagiarizing the LLM because you are “passing off the words of another as one’s own”.

Transparency is required.

Edited to clarify: where your “workflow” includes copy-pasting from the output of an LLM.

Pavel

@Sage said in AI Megathread:

I’m not sure that falls under the terms of ‘fair-use’.

Unfortunately, fair use remains one of those issues that will only be truly decided in the courts.

@Trashcan said in AI Megathread:

If you do not disclose to other players that you are using AI in your “workflow”

I think that depends entirely on what you use it for. If you use it to sketch out a very rough (probably generic) idea, but then put in the work to turn the idea into something actually workable and suitable? That’d be the same, to my mind, as using a name generator.

But if you used it to write an entire character description, or an entire lore file? That’s different.

Sage

@Trashcan Yes, but in this case what I am referring to is whether ChatGPT is, in and of itself, plagiarism.

You are referring to a use case, which is not necessarily a good metric of whether a tool has value. It’s like arguing that a hammer is a terrible tool because it can be used to hit someone in the head.

Trashcan

@Sage
People are out here hitting people in the head with the hammer, so at the moment I am trying to establish common ground that we can all agree on, like “hitting people in the head with the hammer is bad”.

I recognize the debate on whether the hammer itself is bad or not is more nuanced. I think we can all agree slugging people in the head with the hammer is probably wrong, regardless of whether the hammer is made of fair trade rubber or blood diamonds.

Pavel

@Sage said in AI Megathread:

Yes, but in this case what I am referring to is whether ChatGPT is, in and of itself, plagiarism.

If I write something in a paper, and don’t cite where I found that information, that’s treated as plagiarism - even if that work is myself from a previous writing. Because I’m taking their idea, without giving them credit for it.

Plagiarising is defined thusly: “to steal and pass off (the ideas or words of another) as one’s own : use (another’s production) without crediting the source”(Merriam-Webster, 2023).

Therefore, given the that ChatGPT can’t create its own ideas (Thorp, 2023) or synthesise information without many errors (Park et al., 2023), I would argue that it does plagiarise, by definition.

References

Merriam-Webster. (2023, August 9). Definition of plagiarizing. Merriam-Webster.com. https://www.merriam-webster.com/dictionary/plagiarizing

Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science, 379(6630), 313–313. https://doi.org/10.1126/science.adg7879

Park, Y. J., Kaplan, D. M., Ren, Z., Hsu, C.-W., Li, C., Xu, H., Li, S., & Li, J. (2023). Can ChatGPT be used to generate scientific hypotheses? ArXiv (Cornell University). https://doi.org/10.48550/arxiv.2304.12208

Faraday

@Sage said in AI Megathread:

I’m just stating that I think the use of the word ‘plagiarism’ is probably not correct in this case.

In the case where I said that ChatGPT is “plagiarizing itself”, that was meant to be tongue-in-cheek. You can’t, by definition, plagiarize yourself.

But in the broader sense of “is what ChatGPT does plagiarism”, I disagree for the same reasons Pavel cited here:

@Pavel said in AI Megathread:

Therefore, given the that ChatGPT can’t create its own ideas (Thorp, 2023) or synthesise information without many errors (Park et al., 2023), I would argue that it does plagiarise, by definition.

We can quibble about the exact lines between plagiarism, copyright infringement, trademark infringement, etc. but it’s just semantics. Fundamentally it’s all about profiting off the work of others without proper attribution, permission, and compensation. Even if a million courts said it was legal (which I highly doubt but we’ll see), you would still not convince me that it wasn’t wrong.