Wikipedia:Village pump (idea lab)

The idea lab section of the village pump is a place where new ideas or suggestions on general Wikipedia issues can be incubated, for later submission for consensus discussion at Village pump (proposals). Try to be creative and positive when commenting on ideas.
Before creating a new section, note:

Discussions of technical issues belong at Village pump (technical).
Discussions of policy belong at Village pump (policy).
If you're ready to make a concrete proposal and determine whether it has consensus, go to the Village pump (proposals). Proposals worked out here can be brought there.

Before commenting, note:

This page is not for consensus polling. Stalwart "Oppose" and "Support" comments generally have no place here. Instead, discuss ideas and suggest variations on them.
Wondering whether someone already had this idea? Search the archives below, and look through Wikipedia:Perennial proposals.

Discussions are automatically archived after remaining inactive for two weeks.

« Archives, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54

Interest in testing a tool for Breaking News? Seeking feedback.[edit]

My team at the foundation, WME, has developed a dashboard that tries to identify new articles related to global "newsworthy" events as they are being written about across Wikipedia language editions at any given moment. You can read more about it here. I'm seeking help to improve the feature.

Here is the direct link to the dashboard. (desktop only).

I'd appreciate if anyone that tries it out can surface any potentially missing templates from across language projects that would help us capture more results. Using the thumbs up and down buttons in the demo to confirm or deny if entries are accurately identified as breaking news, would help me in the long and medium-term in building a better, more accurate tool.

Although Enterprises' focus is not on creating editing tools or gadgets, we hope this can be of use to the community, too.

Thanks! FNavas-WMF (talk) FNavas-WMF (talk) 16:19, 8 December 2023 (UTC)Reply[reply]

Are the thumbs up/down supposed to be if the article as a whole is about a current news event, or has been created in wake of a current news event? Because e.g. you have on the tracker Mama Diabaté who died two days ago, so that would be news and result in increased traffic and editing, but her notability would have been established over decades. On the other hand 2023 Guyana Defence Force helicopter crash was created for the purpose of covering a specific important recent news event. Are both to be considered "hits" for the tracker?

Also, "indications count" isn't documented, and I don't know what it means, and it seems odd (being a count) that you can only filter numbers equal to the count as opposed to higher or lower than. I also don't think the raw number of edits is too useful of a metric for the user to filter potential news articles, since news is rather localized by interest and region. Page-views-to-editor-ratio would seem more useful -- a niche new article or split may have a lot of edits from a dedicated editor and reviewers at first, but very few outside viewers will care to see it in the first hours. Any news event will blow it out of the water in viewer-to-editor ratio, even if news stories will have more anonymous editors. SamuelRiv (talk) 16:58, 8 December 2023 (UTC)Reply[reply]

Thanks for this feedback @SamuelRiv -- thumbs are to say, is this news or or not. there are a lot of false positives so were trying to filter what is not news. I'd consider both those examples as news. What is news and what isn't is so subjective, so really just up to the individual.

We don't use any pageviews right now, so all this is based on editing behavior/presence. Good call on the "viewer-to-editor" ratio idea ... My only issue that we could only calculate that 24/h too late (given how PV work right now). FNavas-WMF (talk) 21:10, 8 December 2023 (UTC)Reply[reply]

A 24h delay in the ratio is fine as long as you have some smoothing average on both views and edits -- it will be better than the metrics you currently have available. (I'm sure you can figure out better metrics once you get some data.)

News isn't really subjective in these clear cases -- your first verification would just be a Google n-gram call to see if there was a major spike in searches in the past week. If the API for that is free, that'd be the best metric I can think of. There's tons of simple algorithms to verify a spike or step discontinuity in rough data. SamuelRiv (talk) 21:20, 8 December 2023 (UTC)Reply[reply]

@FNavas-WMF many breaking news items are related to articles that already exist - so being able to see articles that have high "within last hour" activity instead of only new articles may be useful. — xaosflux ^Talk 18:08, 8 December 2023 (UTC)Reply[reply]

yep 100% agree @Xaosflux -- i'm working on getting us to within the last hour method you describe as we speak! FNavas-WMF (talk) 21:11, 8 December 2023 (UTC)Reply[reply]

I would go further and say that anyone who feels compelled to write about a news event on Wikipedia should look for existing articles to update rather than create a new one. This is an encyclopedia, not a newspaper. Phil Bridger (talk) 21:27, 8 December 2023 (UTC)Reply[reply]

@Phil Bridger totally agree. It seems to me that folks, at least on enWiki, do try to add to an existing article, which is why this tools as it works now is only very good for NEW, totally unforeseen events. Do you pointing editors to existing articles that are part of news is more valuable than to new articles? FNavas-WMF (talk) 16:25, 11 December 2023 (UTC)Reply[reply]

If memory serves, Another Believer does a lot of work with breaking news and might be interested in this. WhatamIdoing (talk) 00:33, 12 December 2023 (UTC)Reply[reply]

Thanks for the ping. This is on my radar and I was even able to chat with Francisco a bit at WikiConference North America recently. I've subscribed to this discussion and I'm curious to see what folks say about the tool. ---Another Believer _(Talk) 00:39, 12 December 2023 (UTC)Reply[reply]

Thanks! These comments have been very useful. I'm looking for more ways to cut down false positives to cut the noise! The "cite news" template is extremely useful to catching breaking news. It seems quite reliably used in new news events.

@WhatamIdoing @Phil Bridger @Xaosflux do you all see any more templates I should be following? FNavas-WMF (talk) 20:01, 18 December 2023 (UTC)Reply[reply]

{{cite web}} gets used a lot as well, especially when being used by newer editors who are using one of the citation insertion tools. — xaosflux ^Talk 20:46, 18 December 2023 (UTC)Reply[reply]

That template is probably less specific, though. WhatamIdoing (talk) 21:13, 18 December 2023 (UTC)Reply[reply]

Indeed. But especially if you are a user (new or old) that isn't aware of some of these templates and go through the basic VE workflow of (a) Type in something (b) Click the Cite button (c) Dump in your URL -- you will end up inserting a cite web. — xaosflux ^Talk 21:46, 18 December 2023 (UTC)Reply[reply]

@FNavas-WMF, an article being tagged with {{Current}} would be a direct indicator that we consider it a current event. But it is automatically removed by bot as soon as editing activity fades, which is often still while a layperson might consider something to be breaking news. Wikipedia:Current event templates#Current events has related templates/categorization. I'm curious how your tool uses/relates to this. An article being linked from Portal:Current events would be another strong indicator. {{u|Sdkb}} ^talk 00:12, 3 January 2024 (UTC)Reply[reply]

Option to omit subordinate sections on edit[edit]

Case in point: [1] The editor meant to add the content at the end of the "Discussion (II)" section, but ended up adding it at the end of its subordinate section, "Split off into a new page". He didn't catch the error and it was fixed later by a different editor (me). He is an experienced editor, significantly above average in technical competence, and I see this happen too often.

(In this case, I ended up changing the level of "Split off into a new page" to that of "Discussion (II)" to prevent this from happening again, but that solution was sub-optimal. By all logic the "Split off into a new page" should be subordinate to the Discussion section.)

Even if one is aware of this pitfall, it can be really cumbersome to have to back up to find the section you want. Imagine if there are four or five subordinates, some of them really long.

There should be the option to edit a section without its subordinates. Equally beneficial on any page that has multi-level sections, including articles, not just talk pages. As for specifics, that's why I'm on this page.

One thing to consider is that an editor might not know the option exists, or it might not occur to them to use it. In such cases the option would do little good. I'm thinking a pop-up box if the edited section has any subordinates: "Do you want to include the subordinate section(s)?" ―Mandruss ☎ 21:58, 10 December 2023 (UTC)Reply[reply]

+1 for this sort of feature. It's been requested in various places for over a decade IIRC. I don't get caught adding content in the wrong place, so much as it's annoying to have to scroll to the correct place and an excessively long preview of subsections I am not planning to change. DMacks (talk) 22:19, 10 December 2023 (UTC)Reply[reply]

Okay, only half a decade. I knew it sounded familiar though... Wikipedia:Village pump (technical)/Archive 163#Edit section without subsections. DMacks (talk) 07:52, 12 December 2023 (UTC)Reply[reply]

So the last comment in that thread was PrimeHunter, one of our most credible editors on technical questions, saying this is not only technically possible but "straightforward". There was no reply, suggesting concession by the naysayers. That was at VPT, and it seems to me the next step would've been this page. Not sure why that didn't happen. ―Mandruss ☎ 22:17, 12 December 2023 (UTC)Reply[reply]

@PrimeHunter:... DMacks (talk) 20:16, 18 December 2023 (UTC)Reply[reply]

I said "It seems straightforward". I'm not a MediaWiki developer and don't know how easy it would be in practice but it doesn't sound hard. I don't believe Izno's earlier comment there: I'm pretty sure "this is not technically feasible" is the answer due to the way that HTML sectioning works. That seems irrelevant. When you save a section edit, MediaWiki reparses the wikitext of the whole page in the same way as if you had edited the whole page. PrimeHunter (talk) 21:55, 18 December 2023 (UTC)Reply[reply]

-1 to the popup confirmation, but +1 to being able to edit just the "lead" of a section sans any subsections. I'm sure people will jump in with some good examples, but I'm struggling to imagine when "edit smallest applicable subsection" and "edit entire page" are both worse options than "edit intermediate size chunk". Folly Mox (talk) 02:19, 11 December 2023 (UTC)Reply[reply]

@Folly Mox: Your last sentence seems to suggest that it should never include subordinate sections, which would be another way of solving this problem; do I have that correct? If so, there are some cases where one would want to do that, such as re-ordering the subordinate sections or moving text between subordinate sections. Such things could be accomplished in other ways, including editing the entire page, but significantly less easily and more error-prone. ―Mandruss ☎ 20:33, 11 December 2023 (UTC)Reply[reply]

Yeah, never including subsections except in the "edit full page" case was my idea for avoiding a popup confirmation, but those things you mention are fine arguments for retaining the ability to edit a section including all its subsections. Another one is when there is no "section lead", and the prose starts after the first subsection. Misclicking on the wrong pencil would send users to an empty editing interface, which we'd have to cancel out of annoyingly. So maybe my idea is bad? I definitely am not liking an additional modal thing to tap between the editing pencil and the editing interface, but I'm not sure of the way round it. Folly Mox (talk) 21:45, 11 December 2023 (UTC)Reply[reply]

"Editing pencil": You must be using a different editor. I click [ edit ] next to the section heading.

Remember that the pop-up would only happen when there are subordinates, so the impact might be less than you imagine. The question would be asked only when needed. ―Mandruss ☎ 21:56, 11 December 2023 (UTC)Reply[reply]

On mobile skin, you have to go all the way to the top toolbar on a page, click the three dots, and click "edit full page" to do that. On very large pages that may well be a bigger inconvenience than the issue described here. Mach61 (talk) 19:50, 11 December 2023 (UTC)Reply[reply]

(Actually, there's no technical reason why this feature would have to be implemented the same on m.wiki AFAIK, so carry on) Mach61 (talk) 19:52, 11 December 2023 (UTC)Reply[reply]

There are indeed two issues here. The major one is the back-end: we need MW API support for it. The other one is the interface to activate it, for which we could have all sorts UI/UX design ideas, gadgets, etc. But none of the latter matters without the former. DMacks (talk) 02:12, 12 December 2023 (UTC)Reply[reply]

That's above my pay grade. If this earned a consensus at VPR, what are the realistic odds it would happen? ―Mandruss ☎ 06:47, 12 December 2023 (UTC)Reply[reply]

Any chance the gadget that allows the editing of lead sections might help? CMD (talk) 07:43, 12 December 2023 (UTC)Reply[reply]

No, that is quite different. Each section is numbered sequentially, so the lead is section 0 already and is not a header-delimited section at all (so the other sections are not subsections of it, in the way a === is a subsection of ==). DMacks (talk) 07:52, 12 December 2023 (UTC)Reply[reply]

All the gadget does is make a section=0 link like https://en.wikipedia.org/w/index.php?title=The_Example&action=edit&section=0&summary=/*%20top%20*/%20 to use a feature which already exists in MediaWiki. You could have made the same url manually. The proposal here would require a new MediaWiki feature. PrimeHunter (talk) 21:55, 18 December 2023 (UTC)Reply[reply]

Brainstorming a gadget that would be a clickable link in the section to call action=edit buth then intercept the actual spawning of the editor. It would snip off everything starting with the first line that begins with "==" into a hidden separate field, then reattached it when the user clicks 'publish'. DMacks (talk) 10:11, 2 January 2024 (UTC)Reply[reply]

Mechanism for establishing clearly notable topics[edit]

Some years ago, YouTube took a controversial decision that prevented users from seeing the number of dislikes (or likes) to their content. This was based on the supposed psychological effect this had on content creators. While there are obvious differences in the internal workings of YouTube and Wikipedia, the underlying logic have obvious similarities that I consider relatable. I have worked extensively with new editors on Wikipedia (even though not as often as before), and one thing that is really discouraging for them is to see their article (or AFC submission) being tagged as "not being significant". You can argue that this only means the article in its present state does not have notability established, however in practice it really means pick another to write about, not this one. And herein lies the need for my proposal.

I intend to propose that there is a mechanism for listing definitely notable topics. Not just anyone should be allowed to enter and validate an entry to this list, probably only editors with reviewers rights should accept entries. Articles with marginable or contentious notability will not be accepted to this list. This will not in any way address other issues regarding article suitability, such as promotional/not written in encyclopaedic form/no references/etc but it will also reduce the number of blatant deletion or rejection of articles that usually have a psychological impact on editors. From experience, if a new editor creates an article, there is a big difference between I am rejecting your AFC because this article is promotional and I am rejecting your AFC because there is no claim of notability. This vetted list will also ensure that new editors who claim they just want to create content, will be able to do that from a pool of community chosen content. You can say there are many todo list on Wikipedia across various WikiProject, but those are not really vetted and many contain non-notable article suggestion. I really think this can work and will be value-adding even though my hunch tells me it will be hard for it to be accepted. It can very well be refactored into something better. HandsomeBoy (talk) 11:36, 15 December 2023 (UTC)Reply[reply]

I think the psychological effect you mention, as well as the extremely common and understandable misconception that notability has some connection to significance, would be well ameliorated if we change the word "notability" in all guidance across the project to the more accurate and descriptive "alreadypublishedaboutness". Notability is a word about sourcing, not any intrinsic property of a subject, and is a definition of the word wholly divorced from standard English usage anywhere but here.

Having said that, the idea of a community-vetted central list of approved topics seems like it would be a systemic bias black hole.

An alternative idea might be to start with topics in sister language wikipediae that have passed some form of peer review but lack a corresponding en.wp article. This could be done algorithmically, and would spread out the bias to become the bias of the experienced contributors of the entire Wikimedia editing ecosystem. I have no doubt, however, that due to en.wp's stricter sourcing requirements, there are Good / Featured article analogues on other projects that would not be acceptable for inclusion here.

The real real problem, as I see it, is one of education. People seem to think that Wikipedia has an article on everything, and that therefore their thing should belong here as well. A substantial proportion of new accounts register in order to create a new article on a topic they've already selected, most of which are not the subject of sufficient prior publishing in reliable independent sources, but they don't learn that until they've had their aspirations crushed at AfC.

I think I've suggested this somewhere before, and see people occasionally ask it at the Teahouse, but a pre-draft source review would be a really beneficial process to add. Before someone starts a draft, they submit a list of 3–5 sources. Then experienced editors can evaluate them, and come back with judgements as to whether the provided sources (an important distinction, moving the conversation away from the topic, which the article proposer may have feelings about) are sufficient to establish appropriateness of encyclopaedic inclusion (another go at eliminating the term notability). Folly Mox (talk) 13:09, 15 December 2023 (UTC)Reply[reply]

I think the dream was that draftspace and AfC would provide that kind of function, but that never materialised. It used to be something one could post at wikiprojects, but aside from Women in Red, I've not seen much evidence of editors doing this recently. The problem is that it would need to be a specialised review, by someone familiar with the appropriate subject guidelines and outcomes at AfD in that area; for example, rejections of academics at AfC tend to give the wrong boilerplate. Espresso Addict (talk) 00:51, 16 December 2023 (UTC)Reply[reply]

No doubt a specialised review as you describe (sort of a pre-AfD) would greatly improve accuracy, and might come close to HandsomeBoy's idea of "guaranteed notability" (minimal false positives). But it doesn't take a topic area notability specialist to tell a first-day editor "source 1 is self-published, source 2 is a press release, source 3 just mentions the subject in a single sentence" etc.

I agree that the AfC declines could use an additional layer of granularity. prof being underutilised seems like a reviewer culture issue, and all the SNGs seem to have their own NN declines, but as far as I'm aware, there's no automated way to append, for example, "interviews don't help establish notability" or "a gallery hosting an artist's exhibition is not considered an independent source with respect to the artist", and the reviewers have to type it in manually, so often don't. Folly Mox (talk) 04:35, 16 December 2023 (UTC)Reply[reply]

Sure; I guess I'm principally interested in cases where a good-faith new editor comes into conflict with an enthusiastic AfC reviewer, where the newbie has a better understanding of relevant policy than the AfC reviewer, which in my experience has mainly been over academics but also occasionally material falling under CREATIVE. DGG used to accept a lot of aca-bios from AfC and no-one (as far as I know) has stepped in to fill that chasm. Espresso Addict (talk) 05:22, 16 December 2023 (UTC)Reply[reply]

That's a good point, and a scenario I hadn't considered. Folly Mox (talk) 05:28, 16 December 2023 (UTC)Reply[reply]

Possibly one way to handle this would be advising new article editors to state what notability standard they are targeting. A talk page comment saying what standard and what references prove that the article passes that standard would help any AfC reviewer. -- LCU ActivelyDisinterested «@» °∆t° 17:23, 19 December 2023 (UTC)Reply[reply]

A step in the wizard which asked the creator to check if any of the following applied (The article's subject is an academic...author...artist) and then guided them into presenting appropriate evidence to meet the relevant guideline might be useful. Espresso Addict (talk) 01:18, 20 December 2023 (UTC)Reply[reply]

I could imagine some (possibly very small) value in the question, especially if the wizard then provided more relevant advice, but I think that claiming that I think this SNG applies is kind of pointless. Just because I think the article's subject should be evaluated according to WP:SOMETHING doesn't mean that it should be, or that other editors will apply the SNG that I've picked out (even if they really ought to). WhatamIdoing (talk) 04:20, 26 December 2023 (UTC)Reply[reply]

God no. We do not need a layer on top of AfC, on top of AfD, on top of PNG. GMG ^talk 01:30, 20 December 2023 (UTC)Reply[reply]

Watchlist adjuster[edit]

I have a somewhat odd way, usually setting my time limit at two days and looking at the edit as it approaches the bottom. Such edits are usually around 30 or 40 hours old, so the more obvious problems have been caught by quicker watchers and I can check for more subtle ones. However, the pace of those edits varies greatly, so I must keep going into Preferences to adjust the time limit by a fraction of a day so as to avoid missing them. Simpler if there were a click from the watchlist to the Preference page that can make fine adjustments to it. Jim.henderson (talk) 04:48, 22 December 2023 (UTC)Reply[reply]

A user script could probably do this. Might want to ask for a user script at WP:US/R. Please be very specific about which setting(s) you want it to be able to edit. Is it the "Days to show in watchlist" setting? –Novem Linguae (talk) 11:12, 25 December 2023 (UTC)Reply[reply]

In the watchlist, the "Period of time to display:" dropdown has eight predefined values: 1 hour; 2 hours; 6 hours; 12 hours; 1 day; 3 days; 7 days; and 30 days. It's not necessary to use those values - just amend the URL in the browser's address bar. Using 4 days as an example, the following amendments could be used:

The days= parameter can have any value between 0.00001 (approximately one second) and 30. --Redrose64 🦌 (talk) 16:37, 25 December 2023 (UTC)Reply[reply]

Ah. Thank you. And it resets the Preference for the next time I look. This URL editing trick is indeed quick and not too techie for me to remember. I normally run my ENWP watchlist at two days, but when traffic is heavier or I miss most of a day, I generally add one day and then trim it down successively by fractional days. Or, when a bot has made changes to a great many of my pictures in Commons, sometimes it's better to handle it by making the day count large. Then I adjust the number shown, usually by ten at a time. I still would like if the watchlist, along with the short list of choices in the multiple choice list, would offer a link to the Preference page for finer tuning. However, it looks like I'll follow this method from now on. Thanks again. Jim.henderson (talk) 01:59, 26 December 2023 (UTC)Reply[reply]

There's a setting or gadget somewhere to show a blue dot if a diff in the watchlist is unread. I set mine to show 1000 revisions, then track things that need my attention by using the blue dot. –Novem Linguae (talk) 10:10, 27 December 2023 (UTC)Reply[reply]

Brainstorming a COPYVIO-hunter bot[edit]

I'd like to propose the idea of a a COPYVIO-hunter bot, but I'm not ready to make a specific Bot request yet, and so I'd like to expose this idea here first to brainstorm it. Sometimes, copyright violations are discovered that have been present on Wikipedia for years. (The copyright-violating content at Barnabas#Alleged writings was added on 4 August 2014 and discovered 18 December 2023.) But for an alert Tea house questioner two days ago, who knows when, if ever, this would have been discovered. That's worrisome.

We have some good tools out there, such as Earwig's detector, and my basic idea is to leverage that by building a bot around it, which would apply it to articles, and either generate a report, or apply the {{Copyvio}} template directly. A couple of additional bot tasks could streamline the human part of the investigation by finding the insertion point (Blame) and determining copy direction (IA search). There are input, performance, scaling questions, and human factors, and likely others I haven't thought of. As far as input, ideally I'd like to see a hybrid or dual-channel input of a hopper with manual feed by editors (possibly semi-automated feed by other tools), and an automated input where the bot picks urls based on some heuristic.

For performance, I launched Earwig with all three boxes checked, and it took 62 seconds to return results for Charles de Gaulle (174,627b) and 16 seconds for (randomly chosen) Junes Barny (5,563b). I'm pretty sure there are a lot more articles closer in size to the latter than the former, so let's say Earwig takes 30 seconds per search on average; multiplying that by {{NUMBEROFARTICLES}} gives us 6.43 years to search all of Wikipedia with a dumb, single-threaded bot with no ability to prune its input stack. (Of course, Wikipedia would be bigger six years later, but that gives us an idea.) Given that the Barnabas violation went undiscovered for nine years, six years is not so bad, as I see it. But not all articles are equal, and probably some pruning method could decrease the size of the input stack, or at least prioritize it towards articles more likely to have undiscovered violations.

As far as scaling, I have no idea of server availability at WMF, but presumably there are some bot instruction pages somewhere for bot writers which address how many threads are optimal, and other factors that could scale up the processing for better throughput; maybe someone knows something about that. If we had six threads going against one input stack, that would reduce it to one year; it would be great to run it annually against the entire encyclopedia.

For human factors, I'm thinking about the increased number of articles tagged with copy violations, and the additional load on admins that would inevitably result. There are currently 17 articles tagged with the {{Copyvio}} template right now. I wanted to provide some estimate of activity at Wikipedia:Copyright problems to gauge current throughput, but I'm not so familiar with the page, and was unable to do so. Inevitably, a bot would increase the load on admins (for WP:REVDEL) and other volunteers, and it would be helpful to gather some data about what would happen. Not sure if its possible to project that, but maybe a stripped down version of the bot just to wrap Earwig and spit out numbers on a test run of a week or two might give us some idea. I'm guessing in operation, it would generate a big, backlog balloon initially based on the first two decades of Wikipedia, but then its output would slow to some steady state; in any case, backlogs in other areas have been generated and attacked before with success.

Maybe a bot could somewhat reduce load per investigation, by means a handy output report that includes Earwig percent, maybe a brief excerpt of copied content, and so on. A couple of additional tasks could be defined which would work off the output report, one task running Blame on the suspect articles to add date of insertion to the report, and another to read IA snapshots and determine direction of copy (i.e., is it a mirror, or a copyvio), resulting in a report with information that ought to make the human part of the investigation considerably faster and more efficient per occurrence, which should at least somewhat offset the increased overall number of investigations.

Would love to hear any feedback on the technical aspects of this, as well as the human factors, and whether something like this should even be attempted. Thanks, Mathglot (talk) 02:00, 21 December 2023 (UTC)Reply[reply]

Maybe a fourth task could be a disposition-triage task, and would act on the report output of previous tasks based on configurable values; something like: "if copy-direction = copyvio then if Earwig-pct > 85 then remove content from article and mark/categorize as revdel-needed; else if Earwig-pct < 20 then remove Copyvio template and mark report as handled; else leave for human assessment; else mark as mirror and handled." Mathglot (talk) 02:29, 21 December 2023 (UTC)Reply[reply]

EranBot currently sends every new edit through CopyPatrol if I understand it correctly, which essentially runs the edits through Turnitin/iThenticate. One could reduce the bot load by making it only look at articles that were created prior to August 2016.

@MusikAnimal (WMF) and Mathglot: I understand that the WMF is currently working on a replacement/re-vamp of CopyPatrol (i.e. Plagiabot). Is there a way to integrate a sort of "historical article detection" into a similar interface while re-using some of the code from the new Plagiabot, or is this something that you think would be better kept separate? — Red-tailed hawk _(nest) 02:42, 21 December 2023 (UTC)Reply[reply]

That's terrific news, which means, if I understand correctly, that whatever the scope of the problem is, at least it's not getting worse (assuming perfect precision from Plagiabot). So we only have to deal with the pre-whatever-year issue, and slowly chip away at it. (I am subscribed; no ping needed.) Mathglot (talk) 02:56, 21 December 2023 (UTC)Reply[reply]

@MusikAnimal (WMF) I remember putting this up on phabricator somewhere (I think?), but would it be possible to provide a stable API to integrate CopyPatrol with various other editing/CVUA tools (specifically it would be great to be able to answer the question "What is the iThenticate score/URLs for a specific edit") Sohom (talk) 06:29, 21 December 2023 (UTC)Reply[reply]

I've left MusikAnimal a comment on their WMF account talk page. It would be nice to hear from them on this. — Red-tailed hawk _(nest) 17:45, 25 December 2023 (UTC)Reply[reply]

I acknowledge it's Christmas, and many WMF staff are taking vacation/holiday, so it's fairly possible that we might not hear back for a week or so. — Red-tailed hawk _(nest) 17:53, 25 December 2023 (UTC)Reply[reply]

Thanks. I've added DNAU for 1 month, imagining that he may be on a nice, long winter vacation. Mathglot (talk) 21:24, 25 December 2023 (UTC)Reply[reply]

An API for reviewing/unreviewing does exist, but it's undocumented right now. It also doesn't provide Access Control headers. I was working on an external-use API for CopyPatrol, but decided to hold off until the new version that uses Symfony was finished and deployed, since it won't be usable anyway until deployment has finished. Chlod (say hi!) 02:22, 26 December 2023 (UTC)Reply[reply]

Thanks for your patience! I was "around" on my volunteer account, but haven't been checking this one until today (my first day back at work after the break).

It sounds like you all are asking for phab:T165951, which was declined last November. It can be re-opened if there's interest in it. However, it's worth noting CopyPatrol doesn't go through every edit, only those that meet certain criteria. I let @JJMC89 speak to that before I say something wrong ;)

As for an API, we can certainly add an endpoint to get the score for a given revision, if it exists in our database. That's simple to implement and won't require authentication. If you could file a bug, I can have that ready for when the new CopyPatrol goes live.

API endpoints that make changes to our db, such as reviewing/unreviewing, is another matter. Right now we authenticate with OAuth, so we'd need to somehow have clients go through that before they could use the endpoint. If @Chlod is interested in building this, I'll happily review it! :) Off the top of my head, I'm not sure how to go about implementing it. Alternatively, maybe we could provide all logged in users an API key? That would avoid clients having to login to CopyPatrol.

I don't think we want to permit requesting new scores for any arbitrary revision, at least not until our partnership with Turnitin is finalized. That should happen very soon, and then we'll know for sure if we can send out that many API requests. Some changes to JJMC89's bot would likely also need to be made. All in all, I'd say this feature request is not much more than a "maybe".

Also, in case no ones mentioned it yet, attempting to identify old copyvios is tricky because of the all-too-common WP:BACKWARDSCOPY issue. In some cases it may not be possible to ascertain which came first -- Wikipedia or the source -- so I'd weary of attempting to automate this. MusikAnimal (WMF) (talk) 00:57, 3 January 2024 (UTC)Reply[reply]

The new bot looks at edits made in the article and draft namespaces (0 and 118) to submit to turnitin and skips the following types of edits:

made by a bots or users on the allow list
(revision) deleted before processing (rare unless catching up from a service outage)
rollbacks (MediaWiki native or Twinkle)
additions of < 500 characters after cleaning the wikitext.

Those that come back with more than a 50% match to a (non-allow listed) source are shown in CopyPatrol for human assessment.

As a quick test, I added an endpoint to dump the data from the database for a specified revision.[2]

{
  "diff_id": 7275308,
  "lang": "en",
  "page_namespace": 0,
  "page_title": "Mahāyāna_Mahāparinirvāṇa_Sūtra",
  "project": "wikipedia",
  "rev_id": 1178398456,
  "rev_parent_id": 1178304407,
  "rev_timestamp": "Tue, 03 Oct 2023 12:16:34 GMT",
  "rev_user_text": "Javierfv1212",
  "sources": [
    {
      "description": "C. V. Jones. \"The Buddhist Self\", Walter de Gruyter GmbH, 2021",
      "percent": 50.3817,
      "source_id": 820817,
      "submission_id": "3084bde6-3b8b-488c-bf33-c8c27a73ae06",
      "url": "https://doi.org/10.1515/9780824886493"
    }
  ],
  "status": 0,
  "status_timestamp": "Tue, 03 Oct 2023 12:38:16 GMT",
  "status_user_text": null,
  "submission_id": "3084bde6-3b8b-488c-bf33-c8c27a73ae06"
}

Please file a task so we can workshop the best way to design the API.

— JJMC89 (T·C) 00:40, 4 January 2024 (UTC)Reply[reply]

Filed as phab:T354324. This could be done on either the frontend or the backend; but it doesn't look like the backend source is publicly-available (and API endpoints are a frontend task anyway, so it should probably live on the frontend). Chlod (say hi!) 10:03, 4 January 2024 (UTC)Reply[reply]

I'd encourage making the repos public unless there is a reason for keeping them private. It will make things easier if someone goes inactive or if someone wants to submit a patch. –Novem Linguae (talk) 11:36, 4 January 2024 (UTC)Reply[reply]

Hi, Mathglot! Great to hear more initiative on copyright cleanup tasks; they're always a big help. Someone brought up a related idea at WT:CCI a while back, and I responded with a few points that probably apply here too. I've got a cannula lodged in my hand right now, so I'll copy over what I said in that thread to avoid straining it. There wasn't a lot of back-and-forth on that thread anyway so it's probably easier if I just repost it here.

There was an idea previously floated around about having Turnitin or Earwig run on all revisions of past cases; I'd say this is probably the general idea when talking about automation for CCI cases. When it actually comes down to making it happen, though, it's a spider web of caveats and limitations that make it hard to get off the ground. Here's a more-organized explanation of my thoughts that I randomly collected in the past few months:
First is the issue of cost. There's around 508 thousand revisions left to check (as of May this year), but we only ever have a finite amount of Earwig search engine searches or Turnitin credits. Processing all of these automatically means we have to work with the WMF to get more credits for a one-time run-through, and we're not sure if we'll get decent results for a majority of those checks.
We could work around this by completely disabling search engine checks, as the thread you linked discussed, but this can either work for or against us based on the case. We could also work around this by only selecting a few cases which rely mostly on web sources or (for Turnitin) sources that we know would probably be indexed. This significantly cuts down on the amount of revisions to check. But then there's the next issue:

A lot of the older cases, especially the ones over three years old, start getting a lot of false positives. As article text remains on the wiki for long periods of time, SEO spam sites, academic documents, slideshows, and others start copying from Wikipedia. We filter out a lot of these already (like those in this list and a bunch of others), but we still hit them every once in a while and enough that it clogs up what reports we would otherwise get from Earwig/Turnitin.
A possible solution to this would be human intervention (which is more or less a given with something like this), where editors will double-check to see if a flagged revision actually is copied from somewhere, or if it's just a false positive. Human intervention will weed out false positives, but then it won't weed out the false negatives.

At the end of the day, copyvio checking is a really hard computer science problem that humanity is still in the middle of solving. False negatives; like when a revision flies under the radar because a source it copied from has died, or when the text has been paraphrased enough to make checkers think it's completely original text; will always be one of the biggest brick walls we face. False positives waste editor time, yes, but false negatives arguably take up more time, because we then need to re-check the case. It also wouldn't be a good look for us or the WMF if it turns out that we get a lot of false positives and negatives, since that could be perceived by the community as a waste of funds. Perhaps this is still something that could benefit from research and testing.
— User:Chlod 13:02, 24 November 2023 (UTC)

This was for checking revisions on CCI pages, but the same applies for scanning every latest revision for all articles. It seems we've also been stretching Earwig to its limits recently, Earwig has been going down for almost every day in the past two weeks (CommTech's UptimeRobot). Unfortunately, the Earwig logs are project members-only, so I can't snoop in to figure out the cause by myself. But usually, we chalk this up to Earwig running out of Google API tokens. Would appreciate comments or ideas for the problems above; anything to ensure copyvios don't fly under the radar. Chlod (say hi!) 02:15, 26 December 2023 (UTC)Reply[reply]

Chlod thanks much for this. A few questions or comments:

Whats the 508,000 revisions? Is that just from CCI investigations?
In that same bullet, what cost are you talking about, processing time? And what did you mean by decent results, are you alluding to false +/- that you raised lower down?
- As far as the workarounds, this sounds like roughly what I referred to as various pruning methods to shorten or reorder the input list.
Re false + due to websites copying from Wikipedia, I don't see this as a major problem and I addressed it in the 'direction of copy' comment involving IA checks. Maybe we'd have to negotiate with IA for a certain amount of search traffic per unit time, but as a fellow non-profit and given the reasons for it, I can't imagine there wouldn't be some positive arrangement to come out of that. That would eliminate the need for human intervention in a proportion of cases; see the "if-then" psuedo-code at the end of my comment. The triage attempts to automate a lot of it, and steer only the grey-area cases toward human intervention. And it should also weed out most false negatives for the same reason, and I don't see the failure to have 0% false negatives as a problem. There is always a problem identifying edge cases, even when humans are involved; if an automated solution improves our accuracy and throughput over what it was before, then it's worthwhile. One hundred percent accuracy and coverage are a goal but they will never be attained and that shouldnt stop us from incremental progress; even if automated processes fail to identify some sites for human intervention, we'll catch 'em, hopefully, next iteration of the processing.
"Really hard computer science problem": again, imho, we don't need to "solve" it, we just need to do a bit better than we were doing heretofore. Paraphrase will fall, imho, to better shingling turbocharged with some AI to recognize synonyms and linguistic transformations at some point in the not-nearly so distant future as I would've guessed a year ago. We needn't let the perfect be the enemy of the good, and I think we can do a lot of good now.
Earwig woes: is anyone maintaining it?

Thanks, Mathglot (talk) 00:02, 27 December 2023 (UTC)Reply[reply]

Yep, the 508k revisions is those we have to check at CCI. That's from a dashboard by Firefly to see how much is left. It has its inaccuracies, but it's correct for most cases.
For the cost, it's actual monetary cost. From what I've heard (and what I assume from what I've heard), the WMF pays for the Google API and Turnitin credits, and that cost is pinned to how much we use Earwig and how many edits are checked by CopyPatrol, respectively. Attempting to request more credits for either needs discussion with the WMF, who then needs to discuss with Google/Turnitin. And yeah, the decent results is whether or not Earwig comes up with a false positive/negative.
- Definitely; there's a lot of one-or-two-sentence stubs that don't really need checking. This could, of course, be filtered out, possibly with a lot more criteria for skipping than just that.
I'm wary about using Internet Archive as a "source of truth" for dates. Though we do exactly that in CCI, it's probably not reliable enough to make broad judgements on whether a page is a copy or was copied from. If the pipeline goes Earwig → URL of likely match → Internet Archive, the data it would provide in a report could be a false positive if either the page changed URLs at any point in time (as I've seen happen with Sparknotes) as Internet Archive may not recognize the switch or if it was never archived before (though this practically never happens for recently-added citations). Of course, it's best if this is tested empirically first.
- This is a step in the right direction though. The downside of not using a system like this at all is that the direction checking will be manual, which then just pushes the investigation work back to the addressing user/administrator, and that could result in anywhere from zero (by luck) to a lot of false positives. But what has to be checked first is whether this will end up increasing processing time/workload for checking users.
Earwig's Copyvio Tool is actively maintained by The Earwig. The recent downtimes were shortly discussed in User talk:The Earwig § Copyvio tool is down; I only saw this now. Seems to have been from increased usage.

I agree; something is better than nothing. I'm mostly just worried about stretching the few editors working on copyvio even thinner by adding more work to do. We could balance this by encouraging more editors to help out at WP:CCP, but copyright cleanup really just has historically low participation rates. Chlod (say hi!) 05:14, 27 December 2023 (UTC)Reply[reply]

Hey Chlod, thanks for pinging me here.

With Google's API, there's a hard daily limit of 10,000 queries per day, which costs US$50. The copyvio detector will make up to 8 queries per page (each query corresponds to a sentence or so of text, so that is chosen to strike a balance between performance and detection accuracy – longer articles would really benefit from more than 8 queries in many cases). So that works out to somewhere between 1,250 and 10,000 articles per day; let's say 2,000 on average. To be very clear, that's a limit built into Google's API terms. We can't get around it without a special agreement with Google, and everything I've heard from the WMF indicates we have no special agreement: we're paying the regular rate. Over ten years of running the copyvio detector, and despite multiple people asking, I've never managed to make the right connections with the right people at Google to get a special agreement (or the WMF hasn't, and IMO it's really them who should be doing that instead of me).
Just bashing the numbers out, checking 500,000 pages without a special agreement with Google would cost $12,500 and take at least 8 months (again assuming 5 queries/page).
The search engine is really the limiting factor here, hence my emphasizing it. Compute cost is much cheaper and we could use WMCloud to parallelize this more effectively if the daily limits weren't so severe.
Recent issues aren't related to using up all of our Google API credits but mostly due to my own poor software engineering decisions ten years ago. Sometimes it's due to unauthorized bot traffic that needs to be identified and blocked, but in this case I haven't noticed any. There's an ongoing project to improve performance, but no timeline for when it will be ready, unfortunately.

— The Earwig (talk) 14:53, 27 December 2023 (UTC)Reply[reply]

Thanks for these detailed explanations. Just noting that I've started User:Novem Linguae/Essays/Copyvio detectors to try to document all these copyright tools and their nuances. Seems like every couple months this comes up and I've forgotten all the details since the last discussion, so maybe an essay will help me remember it :) –Novem Linguae (talk) 12:13, 31 December 2023 (UTC)Reply[reply]

@The Earwig: Anywhere I could possibly help with the copyvio detector's uptime? It's also affecting the NPP workflow at times, as the copyvio detector is part of checks to be done when patrolling. Chlod (say hi!) 13:56, 4 January 2024 (UTC)Reply[reply]

@Chlod: Thanks for offering to help! I've given you maintainer access to the tool, and you have permission to restart it when needed. This is the case if the request backlog gets full (a log message "uWSGI listen queue of socket" is printed to uwsgi.log over several minutes) but occasional slowness doesn't necessarily mean the queue is full and needs to be cleared. It's good for us to have maintainers across different timezones. But beyond the occasional restarts, addressing the underlying issue is complicated and not something I expect help with. As hinted above, a backend rewrite is in progress to improve performance. — The Earwig (talk) 16:41, 4 January 2024 (UTC)Reply[reply]

As I understand it, the issues with applying Earwig's copyvio thing to more pages (and the reason it always takes a million years to run) has nothing to do with computational power or programming skill on our part, but rather because Google search, which is a quite critical part of this software working, has deliberately decided to fuck us sideways on search queries.

Well, it's not clear: it could be that or it could be that nobody from Wikipedia or from the WMF has succeeded in figuring out how to ask them from a special dispensation.

At any rate, we have a rather low quota, and it would cost tens of thousands of dollars to make it higher, and we do not get any special dispensation although I guess they are perfectly fine to make millions of dollars from reusing our content in their own knowledge panels lol. jp×g 🗯️ 11:25, 28 December 2023 (UTC)Reply[reply]

Maybe @NPerry (WMF): might give more insight as to why the Wikimedia Foundation has not been able to get resources for copyright detection with Google search ? AFAIR, last year, they were involved with managing Wikimedia's partnership with Google. Sohom (talk) 11:54, 28 December 2023 (UTC)Reply[reply]

I'm not active in copyvio detection work, so take what I say as an outsider's perspective. Overall, copyvio detection on Wikipedia seems like an area that's struggling despite the heroic efforts of those working on it — multi-year backlogs at places like CCI are indicative of a system that's just not working. Bot assistance is our best hope of changing that dynamic on a systemic level, so I think it's a fruitful avenue to pursue. It'd be complex on a level greater even than ClueBotNG, but if successful it'd be similarly impactful.
One thing to perhaps think about is the difference between old copyvios and newly added ones. My vague understanding is that a lot of the difficulty/pain comes from years-old insertions, which have since been built upon, necessitating removal of large chunks of an article. If it'd be simpler to build a bot that only checks/fixes new contributions, then perhaps that'd be a good place to start. If it could sufficiently stem the tide, perhaps it'd lead to a situation similar to what we have with non-notable articles/deficient FAs today, where there's a bunch of stuff in the past to clean up, but ultimately it's a finite backlog with few new entries being added, creating hope we'll someday get through it (cf. WP:SWEEP).
Hope that's helpful, and good luck with this work! {{u|Sdkb}} ^talk 00:03, 3 January 2024 (UTC)Reply[reply]
(Possible overlap with part of above) - we have a copyright flagging system already (see log) - and allowing more bots to flag is fairly easy to do. Like many have said, building a reliable algorithm for doing the actual checking is a "hard" problem. One problem that came up during prior third party solutions like TURNITIN is that these companies wanted to reuse Wikipedia content without honoring the licensing requirements (e.g. We send them some text, they store it, then they reserve that to other people without attribution). — xaosflux ^Talk 17:00, 4 January 2024 (UTC)Reply[reply]

Grammar checker[edit]

My idea is that when you edit an article, there should be a built in grammar checker that doesn't detect spelling errors in links but instead in general text. Oo-rah! the marines are here (talk) 04:37, 21 December 2023 (UTC)Reply[reply]

@Oo-rah! the marines are here: Please see WP:PEREN#Enforce American or British spelling. --Redrose64 🦌 (talk) 16:51, 25 December 2023 (UTC)Reply[reply]

@Redrose64: Is there a regional variety of English in which "comittee" or "paralell" are considered correct? jp×g 🗯️ 11:34, 28 December 2023 (UTC)Reply[reply]

Not as far as I know. --Redrose64 🦌 (talk) 20:28, 28 December 2023 (UTC)Reply[reply]

To me, the idea of a spellcheck feature built into the editing interface seems like it would have negative utility. In addition to the ENGVAR issues, spell checkers in general have a tendency to propose changing proper names they don't recognise into common terms they do, and the same goes for words in other languages, which appear with some frequency in certain topic areas. Additionally, high typo / misspelling proportions in an article can be a hint there are deeper issues, since they are often the earliest problems corrected. Folly Mox (talk) 19:50, 30 December 2023 (UTC)Reply[reply]

Disagree. In addition to what others have stated, spellcheckers can be clumsy when applied to technical topics. A spellcheck function would be very obtuse when, say, talking about genes that often have strange names like GLUT. Even if this could be disabled, I can foresee countless edits by laypeople to "correct" something the program highlights, but is accurate. Just-a-can-of-beans (talk) 00:03, 1 January 2024 (UTC)Reply[reply]

How to message a range[edit]

So, there was one day where an IPv6 made an edit/addition to an article which was later modified by an user. A little later, a different IPv6 commented on this edit on the user's talk page. The IP Range calculator says that they have a common range 2a01:cb1d:3cf:ca00::/64, and according to mw:Help:Range blocks/IPv6 this is a range with an "end-user allocation size" which I guess means "probably just one computer". The list of enwiki contributions from said range and list of global contributors from said range strongly imply that all the edits come from the same person. However, the only action that can be performed range-wide is a block. Has anyone ever thought of creating a message/ping functionality for an entire range? Jo-Jo Eumerus (talk) 10:17, 23 December 2023 (UTC)Reply[reply]

Oh yes, messaging /64s is a perennial suggestion, which I think is currently confounded by the IP masking project. There's going to be a load of wishlist or bug requests. See for example m:Community Wishlist Survey 2021/Archive/Mirror contents of previous IPv6 talk page when a new address is assigned and phab:T112325, which lead to a number of other links. I usually recommend WP:IPHOP at times like this. In my experience anyone who is regularly editing will know how to find both your talk page and the article's talk page. -- zzuuzz ^(talk) 12:51, 23 December 2023 (UTC)Reply[reply]

This problem should largely go away when m:Temporary accounts are rolled out (probably in 2024). WhatamIdoing (talk) 04:26, 26 December 2023 (UTC)Reply[reply]

Or make it worse, depending on how it works. I don't see much explanation of how it applies vis-a-vis a range. Jo-Jo Eumerus (talk) 08:35, 26 December 2023 (UTC)Reply[reply]

I've asked questions on that matter but not received answers, probably because no decisions have been made. If a vandal can just hop from 2000:9:8:7:0:0:0:1 to 2000:9:8:7:0:0:0:2 and appear like someone on the other side of the world, we'll have to reconsider whether to abandon our founding principles and disallow unregistered editing. I'm hoping that the lack of response means that the backward step of IP masking is stuck in development hell and won't be imposed in the foreseeable future. Certes (talk) 11:40, 26 December 2023 (UTC)Reply[reply]

The docs they've posted seem to have the answers to me. It looks like IP hopping won't change the temporary account name, but clearing cookies or opening a new incognito tab will. OTOH, if you jump through the right hoops you can still see the IPs. It also looks like range blocks and autoblocks will still work as they do now, the difference is that someone who jumped through those hoops will have to look up the IP first where an IP is needed. There's a vague mention that they're going to look into ways to handle someone clearing cookies a lot. Anomie ⚔ 13:47, 26 December 2023 (UTC)Reply[reply]

Thanks. Of course, clearing cookies isn't always malicious, and tools such as uMatrix withhold cookies from sites which request but don't need them. I have it turned off for Wikipedia, because this site uses cookies in ways that help readers rather than advertisers, but casual visitors may not. Certes (talk) 14:11, 26 December 2023 (UTC)Reply[reply]

Maybe try to make the mobile version of the website more comfortable for readers[edit]

For me, the display of the mobile version is not very comfortable for me and I have trouble understanding what I‘m reading. Can we make it more easier to read and more comfortable? Acman1o (talk) 01:20, 24 December 2023 (UTC)Reply[reply]

Acman1o, what facets of the mobile reading experience are uncomfortable for you or lend themselves to misunderstanding? Special:MobileOptions allows you to adjust the default font size if that would help. The Wikimedia Foundation actually did a lot of research and testing to try to make the website as readable as possible, most recently by increasing the default spacing between lines, I think. More work has gone into the desktop skins, so depending on your device and your eyes, you might find the desktop version more comfortable, but you might be redirected to the mobile frontend unless you have "Desktop site" (or similar) enabled in your browser. The dark mode gadget (enabled in Special:Preferences) has made my experience here a lot more comfortable. Folly Mox (talk) 02:23, 24 December 2023 (UTC)Reply[reply]

According to this announcement, WMF are planning on launching a new beta feature soon that should allow you to fine tune the typography of the site for your account. The main project page is here. I'm sure they'd welcome your feedback. Folly Mox (talk) 02:29, 24 December 2023 (UTC)Reply[reply]

Deletion of account is needed[edit]

There should be an account deletion system. Edits made by deleted account should be left with name of the account without a link. 160.238.0.118 (talk) 19:34, 26 December 2023 (UTC)Reply[reply]

For legal reasons related to attribution of material, it is not possible to delete accounts. They can however be renamed in some circumstances: see Wikipedia:Courtesy vanishing. AndyTheGrump (talk) 19:45, 26 December 2023 (UTC)Reply[reply]

Given I can just search for all other edits made by that "name of the account", there is no difference whether or not they have a "link". Sounds like a distinction without a difference. What is it your understanding of what an 'account' actually is? DMacks (talk) 10:03, 2 January 2024 (UTC)Reply[reply]

Curlie[edit]

Curlie is the successor to DMOZ. We have a long tradition of including links to DMOZ in lots of articles and the current template, {{Curlie}}, has 6749 transclusions. I saw it recently and thought "who uses these anymore?" Web directories used to be essential tools, but in 2024 ... I can't remember the last time I used one. Most seem to be a combination of links already easily found in relevant Wikipedia articles, the most obvious links that would come up with any search, and spam. Over time, we've come to see the external links space as something to be used sparingly, but these remain.

So, real question: does anyone use these Curlie links?

This is not a proposal to remove them, to be clear. Even if they aren't used very often, they may be sufficiently aligned with our ideals/purpose to retain them, but it seems worth asking nonetheless. — Rhododendrites ^talk \\ 18:29, 28 December 2023 (UTC)Reply[reply]

Maybe it's time for another TfD discussion. Some of the "keep" opinions last time seemed to want to keep these links only as somewhere to send people who want to add unwanted links to Wikipedia articles. That is a very poor reason for keeping. We should be honest and tell people that their links are unwanted if that is the case. Phil Bridger (talk) 18:37, 30 December 2023 (UTC)Reply[reply]

Allow soft deletion of unopposed nominations[edit]

Hi. I am wondering what people would think about repealing the "a page is only eligible for soft deletion if it has been PROD'd/deleted in the past" rule. I am not the most active person at AfD, but I invite anyone to go to a random page in the (recent-ish) AfD archives and ctrl+f for the word "ineligible". Uncontroversial nominations (or nominations in which the nominator leaves nothing for further participants to add) get relisted all the time because someone objected to a PROD, or it was previously deleted.

I went through the closed discussions in December, and I found 36 discussions which were relisted as ineligible for soft deletion but were subsequently deleted (usually after a few delete per nom or delete NN !votes, and perhaps some additional relists): 1, 2, 3, 4, 5, 6, 7, 9, 10, 11^[a], 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32^[b], 33, 34, 35, and 36.

To be fair, I found four bluelinks that were saved because they were "ineligible for soft deletion": 1^[c], 2, 3^[d], 4. But I don't think that a redirect, a stub, a non-neutral REFBOMB mess, and No Pants Day^[e] justify the volunteer hours spent rubber-stamping uncontroversial nominations. Therefore, my idea: let these things be soft-deleted. Even if they were controversial^[f] at one point in time, they are not anymore. They would be eligible for WP:REFUNDs, and a single objection in the current AfD debate would still prevent soft deletion. I think it is time to get rid of this WP:CREEPy rule. House Blaster^talk 05:51, 2 January 2024 (UTC)Reply[reply]

It generally takes a lot more effort to create content than to delete it, so I'd apply strict scrutiny to any proposal to relax the criteria for PRODding. {{u|Sdkb}} ^talk 22:35, 2 January 2024 (UTC)Reply[reply]

To be clear, this would not change eligibility for PROD. It would only change eligibility for WP:SOFTDELETEing articles listed at AfD for a week (with all the associated notifications).

On a different note, I would also consider that the status quo is those 36+ articles (accounting for batch nominations) are "hard"-deleted. If someone subsequently finds sources, you either have to make a very convincing case to the deleting admin or spend a week's worth of editor-time at DRV. This proposal would make them all eligible for a REFUND. House Blaster^talk 23:28, 2 January 2024 (UTC)Reply[reply]

Ah, thank you for clarifying, and best of luck formulating the proposal! {{u|Sdkb}} ^talk 23:31, 2 January 2024 (UTC)Reply[reply]

Notes

^ with the relist comment Not eligible for soft-deletion (due to contested prod back in 2006 (!) ...
^ a batch nomination of seven was relisted because one had been dePROD'd
^ kudos to User:FormalDude for finding sources
^ closed as redirect after the closer found an appropriate target
^ Okay, No Pants Day is awesome. I would say it is the exception that proves the rule.
^ by "controversial", I mean someone at some point in time expressed the idea that the article should exist

Notability reform[edit]

I have a new guideline/policy draft at Wikipedia:Article inclusion criteria, and would love to have some feedback on it. Thanks in advance! Ca ^{talk to me!} 09:06, 2 January 2024 (UTC)Reply[reply]

What is the problem that this proposal is meant to fix? 331dot (talk) 09:23, 2 January 2024 (UTC)Reply[reply]

I answered it in a below response. Ca ^{talk to me!} 12:06, 2 January 2024 (UTC)Reply[reply]

Any proposal that says Downranking all SNGs to essays will not achieve consensus. Curbon7 (talk) 09:55, 2 January 2024 (UTC)Reply[reply]

Agreed - it have been removed. Ca ^{talk to me!} 11:27, 2 January 2024 (UTC)Reply[reply]

I respect the ambition, but realistically any massive change to WP:N would have to have been prompted by some unprecedented event or shift in community sentiment. I don't think people are actually dissatisfied with how notability as a whole works, even if some individual SNGs remain contreversial. Mach61 (talk) 10:12, 2 January 2024 (UTC)Reply[reply]

My feedback is that it looks like you are throwing out all of WP:N and trying to start again from first principles. But why? Barnards.tar.gz (talk) 11:10, 2 January 2024 (UTC)Reply[reply]

My first problem is with length: all the notability guidelines(SNG and GNG) make up for a reading experience nearing typical novellas — WP:N alone contributes 4000 words of reading material. However, all those tens of thousands words of guidance are thrown into a wrench by WP:PAGEDECIDE. How our numerous SNGs interacts with GNGs are defined is lacking, and newcomers are just meant to figure it out themselves. Any attempt to formally define it will inevitably be met with series of no-consensuses. I believe that hints that the way we are defining notability right now is fundamentally flawed. My goal with the proposal is instead of trying to use importance as criterion for inclusion(an insurmountably subjective and unfeasible task), but just to use the pre-existing policies as guidance. Ca ^{talk to me!} 11:19, 2 January 2024 (UTC)Reply[reply]

Is "trying to use importance as criteria for inclusion" actually the current standard? WP:N goes to pains to distinguish notability from simple importance (except as reliable sources decide to cover it, which is the current N standard). DMacks (talk) 11:27, 2 January 2024 (UTC)Reply[reply]

That's what it says on the tin, but reading SNGs like WP:BIO and WP:NPROF clearly shows it's more of a importance criteria than anything. Even GNG proves to be an publicity indicator since it does not actually deal with article content. I don't know why we have all these guidelines when it could be replaced with "What is the best possible article that can be made?" Ca ^{talk to me!} 11:40, 2 January 2024 (UTC)Reply[reply]

If you feel that the criteria are not being properly applied, have you tried fixing that first before deciding that everything should be thrown out? 331dot (talk) 11:49, 2 January 2024 (UTC)Reply[reply]

I recognize I am in a minority position with this belief but I believe notability as a system is fundamentally flawed for reasons described as above.

I made an attempt to Wikipedia:Village_pump_(idea_lab)/Archive_53#Rewriting_WP:N_to_reflect_community_consensusstandardize SNG and GNG in the past, but it was clear that any wording put forward was will fail to gain enough consensus. Ca ^{talk to me!} 12:04, 2 January 2024 (UTC)Reply[reply]

So I'm just wondering what makes you think a broader proposal covering more ground will gain consensus when a narrower proposal didn't. 331dot (talk) 15:04, 3 January 2024 (UTC)Reply[reply]

I would be a fool to think such a radical change like this would gain consensus. I'm poking around with different proposals to gauge community sentiment with notability. Ca ^{talk to me!} 15:43, 3 January 2024 (UTC)Reply[reply]

The rationale behind notability is positively defined at WP:WHYN Mach61 (talk) 16:31, 2 January 2024 (UTC)Reply[reply]

I am not sure what point is being made here. WHYN only explains the reasoning behind GNG. Ca ^{talk to me!} 16:36, 2 January 2024 (UTC)Reply[reply]

So, practicality concerns aside, I want to engage with the philosophy of this, since that's really what's interesting and it's what you're looking for.

If I'm reading correctly. you see "notability", the term of art we have built up onsite, as fallacious: we claim "notability" to be something robust and objective, independent from "importance"—which is ultimately a subjective notion—but ultimately, "notability" just boils down to just being "importance" in many cases anyway. I agree with this.

However, I'm not really convinced it's a problem that can be solved, and I think your attempt goes a ways to explain why: you've just moved the problem back a step, offloading the subjectivity at the heart of an encyclopedia onto other terms of art: how do we know when we can establish WP:NPOV and WP:V—when we feel like the framing is neutral enough; when we feel like the claim is verifiable enough? Surely these can't be solved by statistical analysis, or whatever—at least I don't think so.

Since we're also axiomizing "what Wikipedia is not"—when does an article stop being an indiscriminate collection of information, or a dictionary? Those are informed by our present policy, and now they have no practical criteria whatsoever.

The consensus mechanism we have to fill in the gaps left by the flexibility of WP:N is ultimately powered by subjectivity, but I think someone here may need to win a Berggruen Prize before we can really tackle that problem. Remsense留 09:01, 3 January 2024 (UTC)Reply[reply]

I've always been first in line to say that SNGs are a mess and should generally be ignored in favor of GNG, just like discretionary sanctions should be ignored in favor of simply not being a jerk. But the community is attached to their pet SNGs and there's almost zero chance of doing away with them. We got NPORN removed, and even on such an immensely niche topic, it was like pulling teeth. GMG ^talk 12:47, 2 January 2024 (UTC)Reply[reply]
If I'm understanding correctly, the problem you're trying to address is that there are some articles that meet the notability threshold but nevertheless should not have an article because of one of the reasons at WP:PAGEDECIDE. I agree that that's a problem. My explanation for it would be that, because 95% of AfDs deal with notability rather than PAGEDECIDE, many inexperienced editors use the heuristic "notable → keep" and ignore PAGEDECIDE.
That said, I don't think a new policy or guideline is the solution. We already have too many of those, and the impulse to replace them with a simplified version isn't going to succeed. PAGEDECIDE already has guideline status as part of the WP:Notability guideline page, so I'd instead encourage you to suggest changes to make it clearer, more easily invoked, and the notability guideline page as a whole simplified.
Something our policy/guideline pages badly need overall is for someone good at plain English writing to go through them with no agenda except shortening/simplifying/clarifying them. It won't be an enviable task, as everything on every PAG page was added for some reason or another, so there will be a lot of discussion/pushback about how to simplify without losing meaning. But it'd really be a valuable service. {{u|Sdkb}} ^talk 23:46, 2 January 2024 (UTC)Reply[reply]
The main thing is, I do not understand why we have the whole vaguely defined concept of notability when PAGEDECIDE supposedly trumps everything. Ca ^{talk to me!} 02:15, 3 January 2024 (UTC)Reply[reply]
PAGEDECIDE doesn't trump WP:N. It says that there are times that while a topic may merit a stand-alone article that it can be shown to meet the GNG or an SNG, there are reasons not to have a stand-alone article (such as when the topic is better covered in concern with a larger topic or similar topics).

Notability is a guideline and purposely vague because it is meant to encourage articles to start from a point that shows potential for growth so that the wiki-way can be used to expand. But as others have pointed out, this is not a one-size-fits-all approach, due to systematic bias from sources. Masem (t) 05:09, 3 January 2024 (UTC)Reply[reply]
If you were seriously proposing to dump GNG and its emphasis on hype and publicity and one-size-fits-all rules over importance, and try to push more subject-specific importance-based guidelines, I might be on board. This goes in exactly the wrong direction. We cannot possibly include articles on all topics about which reliable but local or routine sources have provided enough information to write start-class articles, which is what GNG pretends to do (but in practice doesn't). Instead, we need to use notability to filter out the truly unimportant topics. But because GNG does that based on publicity, it is inaccurate and easy to game. Cutting out all the nuance and making it be one size fits all can only worsen those problems, without solving any actual problem with current practice. —David Eppstein (talk) 04:43, 3 January 2024 (UTC)Reply[reply]
I'll echo the above comment: this seems to be a step in the opposite direction from what I could conceivably report. Moreover, it doesn't actually reduce the "subjective" aspect, just pushes it off to a different place. Who decides whether a biography is "negative", or whether all the sources are "marginally reliable", or what counts as "undue weight", or when an article is "unwieldy", or when related topics are "better appreciated" as separate pages, or when a topic is "controversial" instead of "mundane"? If the goal is to reduce the number of different policy/guideline pages, I say we go all out and synthesize WP:V, WP:NPOV, WP:NOR, WP:NOT, and WP:BLP into a single Wikipedia Rulebook. They're only separate pages due to historical accidents; if one were starting a wiki-based encyclopedia project now, with the benefit of Wikipedia's accumulated experience, one could cover the whole ethos in a single document instead of multiple pages that all talk about each other. XOR'easter (talk) 05:35, 3 January 2024 (UTC)Reply[reply]
A merger of just two of those policies, V and NOR, failed to get support. Since then, we've had nearly seventeen years of inertia for those policies... Mach61 (talk) 05:41, 3 January 2024 (UTC)Reply[reply]
Tangential note: A few people have mentioned that it's hard understand notability guidelines due to their length and detail. A couple weeks ago, I began drafting User:Wracking/Notability with the goal of creating bullet-point summaries of each SNG, mainly for my own reference. If this is something anyone wants to collaborate on, please reach out. Wracking ^talk! 05:47, 3 January 2024 (UTC)Reply[reply]
I tend to agree with David and XOR'easter. If I got to rewrite Wikipedia's inclusion guidelines from scratch, I'd go for specific guidelines on specific subjects, based on the consensus of editors knowledgeable about those subjects, and drop the futile quest for a Grand Unified Theory of Notability. The idea that we can use a single standard to classify literally all of human knowledge into the boxes "notable" or "not notable" would sound like complete madness to the average non-Wikipedian. And then try telling them that we think we can do so with just five bulletpoints... – Joe (talk) 09:29, 3 January 2024 (UTC)Reply[reply]
Yeah, what he said, and also what they said. jp×g 🗯️ 11:06, 3 January 2024 (UTC)Reply[reply]

@Joe, in fairness, the idea that we're going to have an encyclopedia that anyone can write without even needing so much as a user account or email, that also sounds like complete madness. So not sure that "sounds like complete madness" is a strong argument against anything on Wikipedia :-) Levivich (talk) 20:57, 7 January 2024 (UTC)Reply[reply]
I actually think that WP:GNG is a good Grand Unified Theory of Notability, since it ties back into core policies with the theory "So, can this topic ever make a core policy-compliant Wikipedia article?". Most alternatives tend to lean into "Someone finds this important" which is a lot more subjective and tends to invite both mass stubs and snobbery. Jo-Jo Eumerus (talk) 13:37, 3 January 2024 (UTC)Reply[reply]
Counterproposal: rename "Notability". If we're really going to rework the inclusion criteria for an encyclopaedia article here, let's do away with the confusing term "notability" and call it what it is. Right now it's something close to alreadypublishedaboutness (catchy, I know), but if we're going to redo the inclusion criteria, we could either rename the larger body of policy after what consensus agrees on the fundamental criterion is, or just call it inclusion criteria. Folly Mox (talk) 14:16, 3 January 2024 (UTC)Reply[reply]
That is indeed the title of Ca's original idea here. – Joe (talk) 15:10, 3 January 2024 (UTC)Reply[reply]

Support this idea much more than dismantling GNG. 🌺 Cremastra (talk) 00:59, 7 January 2024 (UTC)Reply[reply]

While I'm not convinced this is necessary per se, I'm just going to vomit a few potential terms: the crime of "notability" is arguably the "-ability". However, the term should have as little lexical overlap with "verifiability" as possible.

How about substantiation, attestation, recognition, corroboration, representation, precedence?

No, I don't think any of these work: I think "notability" might be the closest, best English word to use for this concept, so that the greatest number of people understand its usage as easily as possible.

I still think it's likely we just have to live with the subjectivity at the heart of "encyclopedias" as a concept. It's not like anyone else has figured this out! Remsense留 01:33, 7 January 2024 (UTC)Reply[reply]

"Inclusion criteria" is the way to go. One fundamental flaw in "notability" is it suggests a property of the subject that we as editors are discovering (something is notable or not notable and it's up to us to figure out which). In fact, "notability" isn't a property inherent in any subject, it's a decision editors make (we don't discover or learn if a subject is notable, we decide whether subjects are notable or not). "Inclusion criteria" has the advantage of being clear that it's a set of rules made up by editors for the purpose of deciding what topics should be covered--and not some inherent property, or something having to do with the inherent value of topics. Levivich (talk) 21:01, 7 January 2024 (UTC)Reply[reply]

Oh and the inclusion criteria should be "enough reliable independent secondary sourcing to write an accurate and complete tertiary encyclopedia article" which is what GNG already tries to get at. Levivich (talk) 21:04, 7 January 2024 (UTC)Reply[reply]
Agree with this. This is why some SNGs, and de facto notability, are bad, because sometimes there isn't enough information out there to write an encyclopedic article. 🌺 Cremastra (talk) 21:14, 7 January 2024 (UTC)Reply[reply]
WP:NOPAGE literally exists, so this comment makes no sense. Curbon7 (talk) 08:48, 8 January 2024 (UTC)Reply[reply]

"Standards of inclusion" is the term I suggested once upon a time, in one of the longest discussions on renaming the notability page. One of the reasons was indeed that it emphasizes that it's a Wikipedia standard, not an inherent characteristic or externally defined property. These days I usually use the more clunkier "standards for having an article", due to Wikipedia:What Wikipedia is not, as it is another standard of inclusion based on scope that is typically evaluated independently of the guidance on the notability page. isaacl (talk) 22:34, 7 January 2024 (UTC)Reply[reply]

I have to agree with Remsense on most points. While some kind of notability reform is needed, this is not the best way to go about it. Currently, apart from some less stringent SNGs, all articles have to meet GNG. This keeps out of one-reference sub-stubs that are better suited to wiktionary or wikidata. If we remove GNG, then our only rationale for deleting unhelpful articles that wouldn't be notable under GNG is likely a combination of WP:NOTDIC, WP:NOTDB, WP:INDISCRIMINATE, and WP:5P1.

The problem is that all of those are mostly or entirely subjective. Who decides what qualifies as an "indiscriminate collection of information", or what "encyclopedic" really means, when you get right down to it? We already have these disputes (case in point: the Barbenheimer RfC), but if they became commonplace at Articles for Deletion, matters would get worse.

We shouldn't rely on subjective measures. We already do, to a degree (how much coverage is "significant" coverage? How reliable is this source, really?) but implementing such a proposal, and accordingly dismantling the GNG, would intensify existing disputes.

The GNG is not great. But it works, and it's quantifiable, at least more so than allusions to WP:NOT. You need multiple sources, three is recommended, and they have to be reliable and secondary.

I think we should have more, specific, SNGs, that are objective and easily quantifiable. Notability isn't a yes-no question, but if we have more subject-specific notability guidelines, then we can be more accurate. Vaguer standards aren't helpful, because vagueness invariably leads to disputes. 🌺 Cremastra (talk) 18:26, 7 January 2024 (UTC)Reply[reply]

WP:Notability is a big vague confusing mess but it mostly works. IMO the way that it really works is that it combines 3 attributes:

Sourcing criteria which ostensibly is the only criteria. This is also used as a measuring stick for #2
Real world importance/notability
Degree of enclyclopedicness .....degree of compliance with Wp:not, above the floor of outright rejection under Wp:not

If we ever want to tidy up wp:notability, we're going to need to acknowledge this as a starting point. North8000 (talk) 14:29, 3 January 2024 (UTC)Reply[reply]

I agree with this. There is a balance between coverage (how often and how in-depth a subject is featured in independent sources) and the importance of the subject. There ought to be or there is more flexibility of sourcing needed for individuals who are at the top of their profession (whether in academics, sport, politics, business) compared with individuals who are active locally, in minor or secondary leagues, or non-executive positions. This is why the SNGs are useful - to help make determinations of real world importance. - Enos733 (talk) 18:23, 3 January 2024 (UTC)Reply[reply]

For example, if two published high school newspapers did lengthy in depth coverage of guitar player John Smith, that fully satisfies GNG but the system might not let that one pass. If the same two writeups were in Rolling Stone magazine, the system would certainly pass him. So the prominence of the sources (combined with the space they dedicated) matters for assessing #2, and #2 matters.

Another example: A town of 1,000 people with no sources other than a couple which (merely) reliably establish it's existence. The system is going to let that one be an article. Some will say it's because "GNG sources are likely to exist" but in reality it's because it's an ultra-enclyclopedic topic. Because it passes wp:not by a mile, and is also mentioned in 5P. North8000 (talk) 21:53, 3 January 2024 (UTC)Reply[reply]

Agreed. When we let the sports SNGs die, we inadvertently opened the door to a lot of minor league baseball players, because minor league baseball necessarily receives a bunch of local/routine coverage which looks like or could be GNG even if the player never comes close to making the major leagues, which was functionally necessary to enter a print baseball encyclopaedia. I'm not generally a fan of SNGs, but the ones that exclude rather than include can be very helpful. SportingFlyer T·C 07:01, 4 January 2024 (UTC)Reply[reply]

I knew at the time that that fix was only going to be 1/2 of a fix. In the "grand wp:notability unification" that I have in my head, it

Acknowledges that real world notability/importance is a factor and the coverage is a measuring stick for that as well
Calibrates for the ratio of coverage to real world notability in that field. Since in sports coverage is an end/product of itself and so less of / a weaker indicator, coverage in this area is less meaningful and it adjusts for that
Calibrate for degree of enclyclopedicness. A typical sports artticl is a bit lower here than a typical enclyclopedia article and it adjusts for that

The net result would be that the standard would be a bit tougher for sports than it currently is. North8000 (talk) 18:14, 4 January 2024 (UTC)Reply[reply]

Adding searching to the nearby page[edit]

Hello, Not sure is this is the correct place to put this, but is it possible to add coordinate or location searching to the nearby page, to allow for location permissions to not have to be granted? Thanks, Geardona (talk) 12:49, 4 January 2024 (UTC)Reply[reply]

Geardona, this can be done manually with the search keywords neartitle and nearcoord. See :mw:Help:CirrusSearch § Geo Search for documentation. Folly Mox (talk) 12:58, 4 January 2024 (UTC)Reply[reply]

Geardona you might want to see Wikipedia:Village_pump_(technical)/Archive_113#Passing_a_location_to_Special:Nearby? Sungodtemple (talk • contribs) 13:01, 4 January 2024 (UTC)Reply[reply]

Ok, did not realise that was a feature, maybe add a search bar to the page itself for more user-friendliness? Geardona (talk) 13:03, 4 January 2024 (UTC)Reply[reply]

Auto-confirmed[edit]

Hi. I’ve realized that it’s insanely easy to get auto-confirmed status… and I thought I had to use articles for creation forever. Would it be a good idea to make it more difficult? Say 50 edits, like on es.wp, or more time editing; one month, maybe? Encyclopédisme (talk) 14:32, 4 January 2024 (UTC)Reply[reply]

@Encyclopédisme

What do you mean that it's easy to get auto-confirmed status ? I've been writing for years now and I still have not had my username confirmed.

Боки ^{Write to me!} 14:38, 4 January 2024 (UTC)Reply[reply]

I mean being able, to say, move pages, create pages etc. You need 10 edits and a 4 day old account. That is auto-confirmed. Encyclopédisme (talk) 14:39, 4 January 2024 (UTC)Reply[reply]

@Encyclopédisme Sorry, I mixed it up with auto-patrolled :) My bad !

Боки ^{Write to me!} 17:29, 4 January 2024 (UTC)Reply[reply]

Despite the name, the autopatrolled flag is only handed out manually. Some accounts are marked as autopatrolled fairly quickly; others can be active for many years and create thousands of pages without it. Certes (talk) 20:30, 4 January 2024 (UTC)Reply[reply]

@Certes I am one of those in the second group :) The funniest part is that I am interface administrator of Serbian Wikipedia, wrote over 800 articles there yet somehow English Wikipedia needs me to show more values.

Боки ^{Write to me!} 22:25, 4 January 2024 (UTC)Reply[reply]

Please don't take it personally. I've created thousands of pages over the last 16 years and am not autopatrolled. The flag is simply a convenience for patrollers, and doesn't allow the account to do anything it couldn't do anyway. Certes (talk) 22:44, 4 January 2024 (UTC)Reply[reply]

@Боки: You have been autoconfirmed since 03:20, 14 July 2020 (UTC). --Redrose64 🦌 (talk) 22:17, 4 January 2024 (UTC)Reply[reply]

@Redrose64 Yeah, I just realized that I have misread the auto confirmed vs auto patrolled :)

Боки ^{Write to me!} 22:26, 4 January 2024 (UTC)Reply[reply]

The section refers to autoconfirmed status, which is handed out automatically on the account's tenth edit or four days after registering (whichever is later). That link should show a box top left saying "Your account is autoconfirmed" if you are logged in to an account that is not very new. Certes (talk) 14:46, 4 January 2024 (UTC)Reply[reply]

No any thoughts? Would it be possible? Encyclopédisme (talk) 15:43, 4 January 2024 (UTC)Reply[reply]

I personally would keep the WP:AFC route, until an AFC reviewer recommends the article author directly publishes articles. Having multiple eyes is an asset, not a detriment. I wish sometimes as a niche publisher that more people would review my articles. I say that as someone who is WP:AUTOPATROLLED. But making space for newer article contributors is in the interest of the wider encyclopedia. ~ 🦝 Shushugah (he/him • talk) 17:50, 4 January 2024 (UTC)Reply[reply]

I started creating 2 articles already. One of them was reviewed, the other already edited by other editors. The problem is indeed that niche subjects are widely overlooked, and due to the small audience, often state outright false info (Specifically I created articles about the Inca, also widely touched by this), based on old sources, or works of vulgarisation which don’t correspond exactly with the academic consensus. Encyclopédisme (talk) 17:56, 4 January 2024 (UTC)Reply[reply]

Mass patrolling[edit]

Hi everyone,

I was just curious if there was any discussion earlier, as I was not able to find it in the archives. If not, is it possible to have mass patrolling done? This could be helpful when dealing with multiple edits, where a user has made minor changes such as adding a specific number or other minor details. Instead of going into each and every single one of the edits, is there any way that mass patrol can be implemented, allowing us to check and approve certain unpatrolled edits more efficiently?

Thanks!

Боки ^{Write to me!} 14:37, 4 January 2024 (UTC)Reply[reply]

@Боки: We don't have edit patrolling enabled on English Wikipedia. Only new pages are patrolled, not individual edits. 🌺 Cremastra (talk) 01:01, 7 January 2024 (UTC)Reply[reply]

@Cremastra If I may ask, why not ? How do you manage the information that gets posted on the Wikipedia pages then ? People can just post anything and everything. There has to be a way that this gets managed. Боки ^{Write to me!} 15:02, 8 January 2024 (UTC)Reply[reply]

@Боки: That's a good question, but I don't really know the answer. Many users informally patrol RecentChanges to watch for vandalism, myself included. We check our watchlists, and keep an eye on worrisome editors. Things seem to generally tick along fine. 🌺 Cremastra (talk) 20:46, 8 January 2024 (UTC)Reply[reply]

@Cremastra What about the fact if someone makes a mistake or puts some incorrect information ? How do users here correct it ? They redo it or do they just revert the edit ? Боки ^{Write to me!} 20:59, 8 January 2024 (UTC)Reply[reply]

Yeah, someone would generally fix the problem or just revert the edit. There has been discussion of enabling edit reviewing lately, but I believe the idea was shot down. I think, in practice, edits are generally reviewed at some time or another, there's just not a special person clicking a "review" button. The process is unofficial and informal. It seems to (mostly) work. 🌺 Cremastra (talk) 21:01, 8 January 2024 (UTC)Reply[reply]

@Cremastra The reason why I am asking is because at Serbian Wikipedia (with a lot less edits, mind you) we have bunch of reviewers (including myself) who review edits of non-auto-patrolled users which brings me to the next point, how does person here on English Wikipedia become auto-patrolled ? Боки ^{Write to me!} 21:04, 8 January 2024 (UTC)Reply[reply]

The auto-patrolled right (where your articles are patrolled automatically) is granted by an admin through a formal request process. See WP:PERM/AP. Cheers, 🌺 Cremastra (talk) 22:12, 8 January 2024 (UTC)Reply[reply]

@Cremastra I will definitely work towards that considering I am an interface admin of Serbian Wiki. My only concern is with this amount of edits, does it not "ruin" the reputation of article if someone can easily add something to the article without anyone noticing it for a while ? Боки ^{Write to me!} 00:26, 9 January 2024 (UTC)Reply[reply]

does it not "ruin" the reputation of article if someone can easily add something to the article without anyone noticing it for a while Someone easily adding something to an article is how Wikipedia works. 🌺 Cremastra (talk) 00:31, 9 January 2024 (UTC)Reply[reply]

@Cremastra I know but in this occassion, I am referring to person adding, for example, "... and this woman has been involved with my dad" (literally) as part of the article. If this does not get patrolled or checked, then this goes on the article that someone will read and say what is going on. Боки ^{Write to me!} 08:45, 9 January 2024 (UTC)Reply[reply]

@Боки, I think you might be unclear on the purpose of auto-patrolled. Most new articles are reviewed by a team of editors, the new pages patrol. When an editor has created a lot of acceptable articles, they can be assigned "auto-patrolled" so the reviewing editors have more time to concentrate on other articles. It makes no difference in editing abilities or rights for the editor who has auto-patrolled. Schazjmd (talk) 00:32, 9 January 2024 (UTC)Reply[reply]

@Schazjmd on Serbian Wikipedia, auto-patrolled means we, as patrollers, do not have to check your edits (whether it's new page or just a simple edit on something) any more and you have gained trust that you will not make meaningless edits and that you know what you are doing on Wikipedia. That's what my definition of auto-patrolled is and that's what I am referring to. Боки ^{Write to me!} 08:47, 9 January 2024 (UTC)Reply[reply]

The scale of editors on enwp, as well as automated anti vandalism tool leaves good faith but non-constructive edits. And it generally works on enwp. Incorrect Source verification is probably hardest challenge we have ~ 🦝 Shushugah (he/him • talk) 09:02, 9 January 2024 (UTC)Reply[reply]

A Wikipedia journal[edit]

There's a lot of research about Wikipedia, but it tends to be from an 'outsider' perspective: computer scientists and computational social scientists that are interested in Wikipedia because it's a huge, open dataset; critiques of our content, or lack thereof, in specific fields; or, increasingly, experiments in replacing some or all of our work with algorithms. All very interesting and valuable, but what I'd really like to read is more studies of Wikipedia in its own terms. Things like the histories of specific policies, analyses of how processes work, biographies of prominent editors. Research like that does exist (e.g. the WMF's Wikipedia @ 20 edited volume springs to mind), but it's scattered around and hard to find.

If the Wikipedia community was a conventional collective organisation, a scholarly society or a trade union or something, it'd probably already have its own little periodical for that kind of thing. Something like The Signpost, but with bibliographic references, peer review, etc. Written and read primarily be people who are involved in, or at least have a deep knowledge of, the community. It could be hosted on-wiki like the Signpost or, perhaps better for discoverability, somewhere else, as long as it has that rooting in the community. Would anybody else be interested in something like that? – Joe (talk) 08:56, 5 January 2024 (UTC)Reply[reply]

Well, I once wrote something that might fit there. XOR'easter (talk) 18:07, 6 January 2024 (UTC)Reply[reply]

Well as you know I don't really agree with what you wrote there. But certainly that could be one role of a journal like this: counter-critiques to academic critiques of Wikipedia are unfortunately not going to be taken as seriously when they're published on Wikipedia itself. – Joe (talk) 13:59, 8 January 2024 (UTC)Reply[reply]

A nice idea! A major issue is the unpaid aspect of it. On other hand, if an academic is being paid, they can push for open-access, open-data etc.. which is what a lot of meta:Wikiresearch is. I also think about wikinews:Main and the success/challenges it faces ~ 🦝 Shushugah (he/him • talk) 01:11, 7 January 2024 (UTC)Reply[reply]

@Shushugah: Thanks. I'm not sure I follow though, who is(n't) being paid? – Joe (talk) 13:53, 8 January 2024 (UTC)Reply[reply]

Not quite what you're describing, but there are the WikiJournals. The idea there is more about getting wiki contributions to "count for something" by sending them through peer review and formatting them as journal pieces. There was the Wiki Studies Journal which involved several Wikipedians, but it doesn't appear to still be going. Heather Ford kicked off Wikihistories fairly recently -- not sure where that's headed.

Back to your thought, though, it would certainly be interesting. I'd be curious how much enthusiasm there is. I've seen a lot of valuable research projects undertaken by volunteers that would benefit from being cleaned up and formally "published". It may also be useful to provide a forum to publish literature reviews or to critique existing research. — Rhododendrites ^talk \\ 14:26, 7 January 2024 (UTC)Reply[reply]

My thinking exactly. This is the kind of thing people do already, and especially for users that are also in academia, or plan to be, it would be nice to be able to collect formal citations and credit for it.

Level of interest is the key. If the Wiki Studies Journal was a similar and failed, then it'd be good to know what went wrong. Otherwise, I was thinking of trying to put together an initial issue of invited contributions. If we couldn't find enough contributors, then we have our answer. – Joe (talk) 13:57, 8 January 2024 (UTC)Reply[reply]

Don't think that's a bad idea. A similar organization—the Organization for Transformative Works (which operates the fandom web archive Archive of Our Own) operates its own peer-reviewed academic journal like this. ~ F4U (talk • they/it) 19:16, 7 January 2024 (UTC)Reply[reply]

Bibliography articles[edit]

We have a number of articles titled 'Bibliography of X'/'X bibliography'. Sometimes these are lists of works by a subject, eg Virginia Woolf bibliography. Sometimes they are lists of works about a subject, eg Bibliography of Andrew Jackson. Sometimes they're both, eg Harold Pinter bibliography. Is "both" a desired approach? For example, if I wanted to split out some of the massive bibliography at Virginia Woolf, would I add it to the existing Virginia Woolf bibliography or would I create a new article? And if the latter, what would that be called to distinguish it from the existing article? Nikkimaria (talk) 21:06, 7 January 2024 (UTC)Reply[reply]

That massive bibliography at the Virginia Wolfe article isn't just a bibliography, it is part of the references. The article uses shortened footnotes, so each of those sources is the target of a hyperlink from the short footnotes in the references section. So they can't be moved to another article. Since the term "Bibliography" is ambiguous I would rather articles used the terms Citations / References for the two sections rather than References / Bibliography.

This doesn't answer your question, however. StarryGrandma (talk) 10:19, 8 January 2024 (UTC)Reply[reply]

Many of the works listed at Virginia Woolf#Bibliography are in fact not referred to by any of the shortened footnotes: more than eighty of them, at a quick count. A script like User:Trappist the monk/HarvErrors marks these.

To answer Nikkimaria's question, the only comparative example I can immediately find is Winston Churchill, which has Bibliography of Winston Churchill for works about Churchill, and Winston Churchill as writer for works by him. Caeciliusinhorto (talk) 20:55, 8 January 2024 (UTC)Reply[reply]

Yep, wouldn't be looking at removing any of the sources actually cited, just some of the ones that aren't. Thanks for the example, that's helpful - anyone have thoughts on what the best titling approach would be for these different types of bibliographies? Nikkimaria (talk) 00:04, 9 January 2024 (UTC)Reply[reply]

Rethinking WP:Spoken Wikipedia[edit]

I think spoken versions of articles have some of the most potential for improvement of any area of the site. Of course, the existing paradigm has an obvious central issue: recordings become out of date almost immediately, which dissuades both potential narrators and listeners. I've thought a bit about this, and I have a preliminary idea for a format that could at least exist alongside the existing spoken articles: abridged spoken sections. Especially on good or featured articles, it seems like sections could be excerpted, possibly adapted to be better read aloud (adapted to a "podcast" form, if you like), and then those could be recorded. Because they are their own text—which would also exist as a readable transcript, of course—they wouldn't immediately go out of date, while reflecting both the work put into the accompanying article and the needs of listeners. Remsense留 04:47, 8 January 2024 (UTC)Reply[reply]

Why wouldn't these excerpts also be prone to going out of date?

I'd be interested to know how many people prefer listening to an out of date version of an article, versus having a screen reader read the up-to-date version.

I also wonder if effort could be focused on marking up difficult passages to assist screen readers in some way. Barnards.tar.gz (talk) 08:58, 8 January 2024 (UTC)Reply[reply]

Because they would have their own transcript that may be edited to have particular suitedness to being read aloud—they would only meaningfully become out of date if the substance of the part of the article that was abridged changes, not just minor changes in wording or sentence reshuffling.

I think screen readers are the other major reason articles aren't read anymore, but I think—albeit as someone who uses screen readers but does not require them to read—that they're just not as nice a lot of the time? Sure, people can set screen readers to a blistering pace they're still comfortable with, but they still produce errors and best-fit algorithmic awkwardness. There's plenty to explore in a "podcast" presentation to achieve what screen readers cannot. Perhaps the format can diverge even further—during a discussion I was having a few days ago, the possibility of writing for/recording a dialogue format came up, and I think that has potential. Remsense留 20:57, 8 January 2024 (UTC)Reply[reply]

I think the best way to solve the outdating issue would be to create a clickable tool or function that would use something like AI or computer speech that would be in-built in Wikipedia that can read the text in all articles exactly as they currently stand. Helper201 (talk) 21:50, 8 January 2024 (UTC)Reply[reply]

As already mentioned, many people already use screen readers that are highly customizable by each individual user: we are discussing a potential form of spoken article that would also be less redundant in the age of screen readers. Remsense留 21:53, 8 January 2024 (UTC)Reply[reply]

Screen readers have existed longer than Wikipedia has. They've probably become a bit more mainstream though, with VoiceOver and Google TalkBack being pre-installed on smartphones. As a screen reader user, I'm very text-oriented so I almost never use Spoken Wikipedia and would almost never use spoken excerpts either. I don't think many proficient screen reader users would. Graham87 (talk) 06:08, 9 January 2024 (UTC)Reply[reply]

Thank you very much for your insight. This may seem like an off-topic question, but what about podcasts? Are they perceived as too slow or inferior to (hypothetically) equivalent passages from books using a screen reader as well? If not, what advantages do they have? Are there any advantages for you personally to have something narrated by a person as opposed to a screen reader, or are the disadvantages simply too great? Remsense留 06:15, 9 January 2024 (UTC)Reply[reply]

[1] with the relist comment Not eligible for soft-deletion (due to contested prod back in 2006 (!) ...

[2] tch nomination of seven was relisted because one had been dePROD'd

[3] udos to User:FormalDude for finding sources

[4] sed as redirect after the closer found an appropriate target

[5] Okay, No Pants Day is awesome. I would say it is the exception that proves the rule.

[6] y "controversial", I mean someone at some point in time expressed the idea that the article should exist

[a]

[b]

[c]

[d]

[e]

[f]