- cross-posted to:
- [email protected]
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
- [email protected]
Reddit user content being sold to AI company in $60M/year deal::It’s being reported that a deal has been struck to allow an unnamed large AI company to use Reddit user…
They sell all your edits as well. This does make it harder to scrap the data, inadvertently bringing up how much the data they sell is worth.
Yeah, that’s the idea. Originally I went the “random characters then delete” route but realized that if I used randomized book excerpts from the public domain, the AI, or even a human, would have a very hard time figuring out what was real and what was trash. Ultimately, even if I can’t modify them all, I can modify enough to make it easier for the buyer to just filter my username out in order to keep the results clean.
I do wonder how much backup data a site like Reddit keeps. I suspect their back ups are poor as the main focus is staying live and moving forward.
I’d imagine ability to revert a few days, maybe weeks but not much more than that? Would they see the value in keeping copies of every edit and a every deleted post? Would someone building the website even bother to build that functionality.
Also for reddit so much of their content is based around weblinks, which give the discussions context and meaning. I bet there are an awful lot of dead links in reddit and their moves to host their own pictures and videos was probably too late. Big hosting sites have disappeared over time or deleted content, or locked down content from AI farming.
The more I think about it, they were lucky to get $60m/year.
Maybe not for reversion, but I could see them keeping the edits, since it doesn’t cost them much to do so, and it could be useful for spam identification or legal purposes. For example, if an account posts spam, and then edits their comment to hide it/skirt around moderation, or vice versa.
They would also have the benefit of the edits inflating the size of the data that they’re selling, which wouldn’t hurt.