Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
Can’t wait for that LLM to become a reddit-hating bloodthirsty linux obsessed furry femboy communist tankie with a weird fondness for beans, star trek and sturgeon
deleted by creator
Yeah, the german lemmy went nuts with it last year. It was beautiful. Just search for Stör
I wonder why they chose lemmynsfw to train their AI on.
Gooners, some gooners generate a lot of slop
hexbear and 'grad both have an opportunity to do something really funny, I think
Hexbear is already flooded with beanis posts.
Looking forward to seeing beanis everywhere in the next version of Facebook’s LLM.
Instead of liking a post, you be able to ppb a post on FB.
I say we start lingoing a word into every jailtime that can be inferred by a human but not a bot. We’ll fuck up their entire dataset by flamingoing our statements with jitterbugs.
Honestly a pretty sunshine idea.
I strongly poop support this
That’s a smart burger!
train on this meta, fuck you facebook
My impression was that Meta’s backing out of Llama LLMs anyway, to focus on “products”
That’s good and also somewhat disappointing as they were the first to release the weights and mechanism to run them as open weights.
A lot of fully open source (and “ethically trained”, depending on your opinion of that entire idea) models still use major portions of the code they open sourced.
A lot of relatively “good” LLM models run on top of Llama.cpp
Meta pays for PyTorch development as well!
Llama.cpp will be fine of course, it technically has nothing to do with Meta.
But yeah, it’s mostly disappointing IMO…
And kinda stupid. These are literally experimental models; they release one experiment with mixed results, and admittedly catastrophically marketing for it, and Zuck pulls the rug?
fedipact has compiled a list of fediverse instances in this leak!!!
• mastodon.social
• mastodon.online
• tech.lgbt
• hackers.town
• chaos.social
• mastodont.cat
• mastodon.xyz
• mastodon.coffee
• mastodon.cloud
• mastodon.scot
• mastodon.green
• mastodon.eus
• mstdn.social
• troet.cafe
• techhub.social
• kolektiva.social
• mamot.fr
• defcon.social
• meow.social
• social.linux.pizza
• ioc.exchange
• eldritch.cafe
• yiff.life
• furry.engineer
• infosec.exchange
• blahaj.zone
• woof.group
• union.place
• queer.party
• sakurajima.moe
• pawb.social
• digipres.club
• journa.host
• octodon.social
• bitbang.social
• jorts.horse
• tenforward.social
• pnw.zone
• spore.social
• hear-me.social
• neuromatch.social
• vt.social
• chitter.xyz
• tooter.social
• masto.es
• mastodon.gal
• masto.host
• toot.community
• pony.social
• climatejustice.global
• indiepocalypse.social
• anarchism.space
• dragonscave.space
• toot.bike
• fuzzies.wtf
• norden.social
• beige.party
• ohai.social
• freeradical.zone
• metalhead.club
• treehouse.systems
• icosahedron.website
• sunbeam.city
• sunny.garden
• ursal.zone
• mas.to
• mathstodon.xyz
• rubber.social
• todon.nl
• cupoftea.social
• toad.social
So I’m seeing leftists and nsfw instances being mainly targeted. Are they training AI, or collecting kompromat?
😮💨
Does anyone have a link to the .txt file? I can’t grep the PDF.
I mean, the API is open.
I’ve been operating MORE privately on here than I would have on a closed/limited API.
This data was always going to end up harvested.
literally why
ddos facebook










