Making Wiki Gardening Tasks Easier Using Big Data and NLP

Cite

FOSDEM VZW

Padalkar, Bee

Formal Metadata

Title

Making Wiki Gardening Tasks Easier Using Big Data and NLP

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Padalkar, Bee

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42187 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

I have been involved with Fedora Community Operations Team where I mostlycontribute on community-metrics related tasks. This talk will be about a NLP-based tool I have built for Fedora wiki to make wiki-gardening tasks easierfor contributors, the methods I used for building it and how it can be scaledto any other wiki. Fedora wiki is a community documentation space about different projects,initiatives, contributors in Fedora Project. However, due to large size of theproject, the Fedora wiki has been growing in size and has become difficult tomanage. As a result, wiki gardening tasks were born. These tasks generallyinvolve not only updating current information on wiki pages but alsoidentifying pages with redundant information to merge them, marking pages withold and outdated information accordingly etc.Most of these wiki gardeningtasks are a part of one-day FADs, hackathons or sprints and suitable for newcontributors looking to contribute to the project as they have a low entry-barrier. While there has been a category traditionally to mark wiki pageswhich need to be worked on, this task is mostly manual(needs to be done bysome contributor prior to the hackathon or event) and also doesn't cover thewhole wiki as it is manually difficult due to it's large size. My tool uses Natural Language Processing-based techniques to 1.Identify pageswith redundant information so that contributors can merge/delete them2.Identify pages in a specific category/topic which need to be worked on.3.Identify old pages which might need to be updated or categorized as out-of-date To make the tool faster, I have used Parallel Processing techniques. In thistalk, I would like to describe the tool and it's functionality, the methods Iused for building it and how it can be scaled to any other wiki. I would alsolike to hear feedback on other wiki gardening tasks this tool could possiblybe used for, and how to scale it to automatically merge pages with redundantinformation.