[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/j/ - Janitor & Moderator Discussion

Name
Options
Comment
File
  • Please read the Rules and FAQ before posting.



File: nsfw_demo.gif (1.03 MB, 762x678)
1.03 MB
1.03 MB GIF
Has there been any consideration given to adopting a machine learning model that can identify potential rule breaking content and auto report it? Obviously given the current state of ML it would be difficult for a model to correctly assess whether or not something is a rule breaking post, but there are definitely some rules a bot could easily understand, like posting NWS images.

Some of these models have been around for quite a while, and while I have no firsthand experience with them they claim to be pretty effective, like https://github.com/infinitered/nsfwjs which says it has a ~90% accuracy rate. There are quite a few models trained for this specific purpose, which are easily discovered by searching for them. I've even read articles about several social media companies that use "artificial intelligence" to identify things like child porn or gore and blocking them before they're even uploaded.

The only potential problem I could think of with this is that some boards would have too many false positives due to the topics (/fit/ would likely have more false positives than a board like /mu/ or /biz/), but it would probably be possible to tweak the confidence percentages for these boards. Even if this wouldn't work for all blue boards, it could still be very useful on others where there isn't much of a reason to post images of humans, let alone scantily clad ones.

Please let me know your thoughts in the comments below and don't forget to hit that subscribe button!
>>
This sounds like an absolute recipe for disaster, but it would be fucking hilarious to watch run rampant.
>>
Terrible idea. Every time 4chan has been involved in any project involving machine learning, it's users have turned the project into basically the worst elements of /pol/. It's cheaper, easier, and more efficient to just throw hundreds of volunteers at the problem.
>>
File: 1442392176025.png (31 KB, 400x345)
31 KB
31 KB PNG
>>7254
interesting idea, but, at least imo, NWS stuff isn't even close to our biggest issue and is actually one of the things most easily handled by our current volunteers
call me when a google deepmind for finding out whether someone's intentionally shitposting exists
>>
>>7257
>>7258
>it's afraid
better watch out janny bois, your job is going to get taken by robot chads
>>
>>7260
I mean I wouldn't mind if my job was taken by a robochad. But lets be honest. Anons gonna do what anons gonna do and Tay is a perfect example of why letting a learning machine moderate 4chan is a bad idea. https://en.wikipedia.org/wiki/Tay_(bot)
>>
File: 1568127732112.jpg (57 KB, 480x451)
57 KB
57 KB JPG
In principle, it's a nice idea to have ML-assisted jannying tools. I've often dreamed of it. However, there are a handful of architectural issues I could imagine it would lead to. Could it scale well, and unexpectedly process a lot quickly. Would it be deferred to some third party proprietary web API... that might be a pain to rely on, reCaptcha for spam prevention is already a headache enough. Not even addressing costs involved cause it isn't my business, but it would be more expensive than our human labour - which is a weird outcome for computer automating something.
So it would need to be thought about very carefully, in very small steps that Team 4chan can really control. Yeah. If it can't be done reliably, forget about it.
>>
Sounds good if it works properly, but it's probably cheaper and more effective to just get more jannies on board.
>>
>>7258
>Every time 4chan has been involved in any project involving machine learning, it's users have turned the project into basically the worst elements of /pol/
The users of 4chan would play no role in this at all. It would be operating in the background just submitting reports based on a dataset provided by 4chan staff or perhaps the model is already trained and released that way.
>>
File: 1574722939056.jpg (99 KB, 1024x948)
99 KB
99 KB JPG
>>7261
you are not authorized to reply to me, j*nitor. this incident will be noted on your permanent record
>>
File: 1500679080665.webm (1.43 MB, 720x720)
1.43 MB
1.43 MB WEBM
>>7265
>>
>>7265
>>7266
please stop spamming my thread with toxoplasmosis agents
>>
Is teaching it to pick up on shitposting patterns or reposting patterns in text something that would be possible? That's something that would be super useful to me personally.
>>
>>7268
I know a guy who trained an ML algorithm to scrape boorus for images of Ryuko from Kill La Kill, extremely accurately, so I guess.
>>
>>7265
reply to this you purple eyed meat puppet
>>
no
>>
I'm sure if it was possible then the big tech companies would be the first to cut out human labor, the stories coming from the facebook mod factories suggest they just aren't there yet with ai mods
>>
I would be interested in having an autoreport for certain phrases. For example, an autoreport for the word "tight" or "tite" with an image posted on /tv/ would be useful in identifying CM. However, this would only work until the users learn that it exists, which could be merely months down the line, at which point the large amount of work developing this bot would go to immediate waste.
>>
>>7274
Have you SEEN microsoft's backend specifically for xbox live moderation? I know a guy who used to be a moderator for them. All the mods got fired, it's all automated now.
>>
>>7276
And there are probably hundreds of "accidentally deleted your account, so very sorry" per day; which is not what 4chan would ever stand for. A big company might not care about a few hundred customers being inconvenienced, but I'd like to think we have higher standards.
>>
>>7277
And that's my point anyway. We can't trust an automated system to uphold the principals of 4chan's rules whilst also moderating a userbase as... irresponsible... as 4chan's. Could you imagine how long it would take for boards like /pol/ or /vg/ to figure out ways to manipulate any sort of AI into moderating people they just simply disagree with?
>>
Maybe it should be clarified, but I understood OP as only considering an automatic *reporting* system. There would still be human action from Moderators and Janitors in enforcing rule violations. The bot would be acting as a user.
So if whatever technical hurdles there are were cleared, what is the actual harm in a bot reporting posts and does indeed have very few false positives?
>>
>>7279
Yeah this is more of what I was going for. I'm not sure how the reporting backend works, but the way I envisioned it was a separate board being made only for auto reports and jannies could sign up to be beta testers? I suppose mods could figure that part out but I do think it would be a good idea for these reports to be separate from user reports initially.

>>7278
>>7277
As poster above me mentioned, it would just be reporting posts and not actually taking action. Users wouldn't even be aware of it, and even if they were I don't see how they could possibly manipulate the behavior of the bot.
>>
>>7280
This makes sense. Like, teach it to automatically recognize basedjack, common ban evader names, etc?
>>
>>7280
>>7279
I'm a bit late to the discussion, but while this sounds good on paper I think it introduces a new fundamental problem. The workload increase.
Part of the reason the Janitor team exists is to act as a filter between user reports and the actual mods. I think anons are already pretty good at reporting blatant content that obviously needs action, that isn't hard to see and deal with. The problem can be that anons are a little *too* good at reporting content. Tons of shit that warrants little to no mod intervention gets put up, and would be a waste of time to comb through when there's real problems to deal with. We come in then to clear those out and take care of the small things with deletions and warns. When something's bad, we elevate it to mod review and they spend their time working on that.

How a violation is identified, the urgency to take care of it, what replies to it could also be considered actionable, what rule(s) they are breaking and more are just too chaotic and contextually based for a more objective minded bot to understand well enough to learn and make good calls on when reporting. There would be false positives no matter what, adding more useless fodder to already large amounts of reports that only need small to no invervention or all get wiped with a slap to an OP.

This wouldn't be a problem if we weren't entirely volunteer based and could be dedicated around the clock to watch over it, but we're not. We help out when we can and try to get on when things get rough. An accurate report bot is only going to help if you have the dedicated manpower to make use of it, which we already sometimes struggle with as it stands manually.
>>
>>7275
I think this part of it could be useful. If it was on a different system than normal human reports, a super report, then it could be used or ignored as time allowed to catch missed items or posts quicker than normal. No action that would be seen by users taken by the bot, but a report to janitors/mods that could be actioned.
If we were judicial with what alerts were set to it, phrases that 99% of the time are actionable, then it could be useful. It could be like a more cost effective smarter filter. Doesn't have to scan every post as it's posted, but scan through the board every 15 minutes or so. A human skims through the bot reports when they have time and can hit spammed phrases or at least then open up the thread to see the issue in person.
>>
I can confirm that I thought of this more than 10 years ago, never did it, and would still be interested if it worked. We built some more manual tools instead.

Given the state of tech at the time I was thinking of using an email spam filter (LSM, spamassassin etc). We had some stronger image filters in the past but the false positive rate was a bit high e.g. if we autobanned an all black image.
>>
>>7260
>do it for free
>job still gets automated
>>
I feel like if there was to be a layer of automation, it should take place at the same layer as regular users. All the automated tool would do is flag posts that it feels violated some sort of detection rule, but leave janitors as a filter before it gets to the mods. Hell, you could even make it a learning algorithm, and have it train against BRs that janitors submit?
>>
/g/ janny here. I'm going to say that anything that automates enforcement is a horrendous idea *especially* if it's based on ML where nobody understands how the algorithm works. See some real world examples and what people think about such systems:
>traffic enforcement cameras
>youtube's content id system
>reddit's automod
All horrendous and you can hardly reach a human when there is a mistake, and boy, there were some mistakes. The rules were made by humans and should be enforced by humans *only*. Giving robots the b& hammer is a huge mistake. Not because robots are smart and are gonna go skynet, but precisely because robots are fucking retarded and will never understand why those rules are in place. No matter how convicing your error rates are, I would even rather be wronged by a human and apologized to than wronged by a bot and get nothing in return but a confirmation "yeah, it looks like the bot made a mistake."

No, I will NOT succumb to technocracy let the botnet enforce the rules on me without accountability.
No, I will NOT eat the bugs.
No, I will NOT live in a pod.
No, I will NOT own nothing and be happy where everything is rented and delivered to me by a drone.

TED WAS RIGHT.
STALLMAN WAS RIGHT.

Btw an auto reporter can be helpful. An auto enforcer? Yeah, fuck no.
>>
>>7747
>>7741
>>7280
>>7279
I will also note that the problem with auto reporter is that it's an easy slippery slope to an auto enforcer, once you get lulled by error rates and start asking " well, what are we doing here? we'll let the bot do half the work". Once you start allowing the bot to enforce even or the clearest rule violations, or only fire it up when most mods are asleep, or when there are 600+ reports the cat is out of the bag.

It has to be explicitly guaranteed that it's only humans who delete posts and apply warns & bans. Again not only because it can go by the way of Tay, but also because of the way humans perceive power. When a janny* deletes my thread I can take comfort in knowning he's a fat retard who does it for free, has asthma and has zero friends. When a bot deletes my thread, there is no sort of comfort and you're just left being annoyed by the system. It's important and natural for people to hate and make fun of people who are in power as catharsis. If real world politicians and judges get fully replaced by robots I'm certain society will crumble due to dissatisfaction with such system.

Btw I think automation is also unnecessary when there is a tremendous array of improvements that can be done to the interface and . Discord is very inefficient, /j/ is barely working and there are no regular hours (although I don't know if those would be legal under minimum wage laws) and no coordination who does what or who pays attention to what. If there was a way to cooperate in real time (ideally such as in an MMO where you can see what threads and posts other jannies reviewed, what thread they're currently reviewing, etc. ) everything can be handled in extremely short time. 4chan is probably small enough that a mod or a janny can be omniscient of every post and thread on a board with proper interface. Definitely not at the scale of YouTube for automation

* (Ignoring the fact I'm a jan myself, but just try getting into the mind of an average user)
>>
>>7747
>>7748
>inb4 take your meds
>>
File: holy crackers.jpg (16 KB, 250x250)
16 KB
16 KB JPG
>>7254
>autoreporting janny bot
I want this.
But yeah, no automoderator, just autoreporter.
>>
It's worth pointing out that the moderation in both major Japanese chans is mostly autonomous. For futaba, users click the del button on a post or thread as a kind of report, and if it collects enough del requests, it first gets hidden from the catalog/index but stays alive, and if it hits a second, higher target of del requests it gets deleted. 5ch meanwhile has no moderation at all, if something is illegal you can email the staff to take it down, or do the more common thing and use the Japanese police's internet tipline to notify them of it (which isn't weird, they've been aware of the site for like 20 years). These differences in moderation have had a massive effect on shaping both sites - people on 5ch complain that futaba is exclusive or "clannish" which makes sense because it's pure majority rule, and they even have a sort of like button on each post which reinforces the majority beliefs. 5ch meanwhile still feels like the wild west era of the internet: there's over a thousand boards, no post cooldown, no captcha, and it gets millions of posts every day. There's not a more user-driven site on the web, and I really admire how it works despite the flaws.

Now, could these work on 4chan? Well, futaba's system probably not. The site is restricted to Japanese IPs, which are mostly static. Since most 4chan users have access to dynamic IPs nowadays, ban evading to spam del requests would be really easy, and very hard to prevent. We would not only have to keep the current set of banned IPs, but also limit requests from an IP range to 1 so people can't evade and spam delete requests. As for 5ch's system, this is actually way more realistic. It's just the current rules we have for /b/, but only rule 1 is enforced. An admin would be around to check for illegals now and then, and that's it. Honestly, this might not be bad.

(1/2)
>>
What seriously challenges this is the fact that 5ch has over a thousand boards, so traffic is extremely spread out and all the shitposting gathers in a few places. That would be impossible for us due to bandwidth limitations, any more than ~100 boards is unrealistic. Also, it's a lot easier to derail a thread with images than one with pure text. Since people would be less spread out than on 5ch, bitter arguments would be more common, and add images to that and it could become a mess. But then, that's part of the fun. I'm not even sure why this system works so well on 5ch, it just does. Most threads read like 4chan but there's no rules. It helps that most bots are autobanned/blocked, but still. A part of me feels like this system (and /b/'s) is the best format for a chan, little to no rules and the users run everything. It's why I wouldn't mind a looser set of rules across the site, users over-rely on moderation for quality control anyway and it's best to teach them to ignore what they don't like. The only rule that undeniably needs to be enforced is rule 1. It's an ambitious thought to say the least, but it answers the problem of autonomous moderation, and one of the biggest sites in Japan has functioned this way for years. Not to say this is a real proposal, but it's interesting, isn't it?

(2/2)
>>
>>7762
>>7763
too long didnt read plus youre janny
>>
File: fuck.png (53 KB, 317x179)
53 KB
53 KB PNG
>>7766



Delete Post: [File Only]
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.