AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability

October 4, 2022

pIXELsHAM.com

https://waxy.org/2022/09/ai-data-laundering-how-academic-and-nonprofit-researchers-shield-tech-companies-from-accountability/

“Simon Willison created a Datasette browser to explore WebVid-10M, one of the two datasets used to train the video generation model, and quickly learned that all 10.7 million video clips were scraped from Shutterstock, watermarks and all.”

“In addition to the Shutterstock clips, Meta also used 10 million video clips from this 100M video dataset from Microsoft Research Asia. It’s not mentioned on their GitHub, but if you dig into the paper, you learn that every clip came from over 3 million YouTube videos.”

“It’s become standard practice for technology companies working with AI to commercially use datasets and models collected and trained by non-commercial research entities like universities or non-profits.”

“Like with the artists, photographers, and other creators found in the 2.3 billion images that trained Stable Diffusion, I can’t help but wonder how the creators of those 3 million YouTube videos feel about Meta using their work to train their new model.”

COLLECTIONS

| Featured AI
| Design And Composition
| Explore posts

POPULAR SEARCHES

FEATURED POSTS

Social Links

DISCLAIMER – Links and images on this website may be protected by the respective owners’ copyright. All data submitted by users through this site shall be treated as freely available to share.

AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability

Types of AI Explained in a few Minutes – AI Glossary

59 AI Filmmaking Tools For Your Workflow

How does Stable Diffusion work?

Kling 1.6 and competitors – advanced tests and comparisons

What’s the Difference Between Ray Casting, Ray Tracing, Path Tracing and Rasterization? Physical light tracing…

The Perils of Technical Debt – Understanding Its Impact on Security, Usability, and Stability

Generative AI Glossary / AI Dictionary / AI Terminology

RawTherapee – a free, open source, cross-platform raw image and HDRi processing program