Making Wiki Gardening Tasks Easier Using Big Data and NLP

Zitieren

FOSDEM VZW

Padalkar, Bee

Formale Metadaten

Titel

Making Wiki Gardening Tasks Easier Using Big Data and NLP

Serientitel

FOSDEM 2017

Anzahl der Teile

611

Autor

Padalkar, Bee

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/42187 (DOI)

Herausgeber

FOSDEM VZW

Erscheinungsjahr

2018

Sprache

Englisch

Produktionsjahr

2017

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

I have been involved with Fedora Community Operations Team where I mostlycontribute on community-metrics related tasks. This talk will be about a NLP-based tool I have built for Fedora wiki to make wiki-gardening tasks easierfor contributors, the methods I used for building it and how it can be scaledto any other wiki. Fedora wiki is a community documentation space about different projects,initiatives, contributors in Fedora Project. However, due to large size of theproject, the Fedora wiki has been growing in size and has become difficult tomanage. As a result, wiki gardening tasks were born. These tasks generallyinvolve not only updating current information on wiki pages but alsoidentifying pages with redundant information to merge them, marking pages withold and outdated information accordingly etc.Most of these wiki gardeningtasks are a part of one-day FADs, hackathons or sprints and suitable for newcontributors looking to contribute to the project as they have a low entry-barrier. While there has been a category traditionally to mark wiki pageswhich need to be worked on, this task is mostly manual(needs to be done bysome contributor prior to the hackathon or event) and also doesn't cover thewhole wiki as it is manually difficult due to it's large size. My tool uses Natural Language Processing-based techniques to 1.Identify pageswith redundant information so that contributors can merge/delete them2.Identify pages in a specific category/topic which need to be worked on.3.Identify old pages which might need to be updated or categorized as out-of-date To make the tool faster, I have used Parallel Processing techniques. In thistalk, I would like to describe the tool and it's functionality, the methods Iused for building it and how it can be scaled to any other wiki. I would alsolike to hear feedback on other wiki gardening tasks this tool could possiblybe used for, and how to scale it to automatically merge pages with redundantinformation.