• P03 Locke@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 year ago

    AI is trained off actual lyrics, which is why companies who create these models are at risk (they don’t own the data they’re feeding into the model.)

    Nobody is “at risk” of anything here. You don’t have to own data to use data, just like you’re not liable for the content of an Internet page because it was downloaded to your browser’s cache.

    Everybody who agrees with these lawsuits have a severe misunderstanding of how LLMs and other AI models work. They are large matrices of weights and numbers, not copies of the data they consume. The entire Stable Diffusion model is a 4GB file, trained from billions of images. It’s impossible to “copy” petabytes of images and somehow end up with a few gigabytes of numbers. The transformation is a lossy process, and its result does not fit the definition of copyright.