I have been involved with Fedora Community Operations Team where I mostlycontribute on community-metrics related tasks. This talk will be about a NLP-based tool I have built for Fedora wiki to make wiki-gardening tasks easierfor contributors, the methods I used for building it and how it can be scaledto any other wiki.
Fedora wiki is a community documentation space about different projects,initiatives, contributors in Fedora Project. However, due to large size of theproject, the Fedora wiki has been growing in size and has become difficult tomanage. As a result, wiki gardening tasks were born. These tasks generallyinvolve not only updating current information on wiki pages but alsoidentifying pages with redundant information to merge them, marking pages withold and outdated information accordingly etc.Most of these wiki gardeningtasks are a part of one-day FADs, hackathons or sprints and suitable for newcontributors looking to contribute to the project as they have a low entry-barrier. While there has been a category traditionally to mark wiki pageswhich need to be worked on, this task is mostly manual(needs to be done bysome contributor prior to the hackathon or event) and also doesn't cover thewhole wiki as it is manually difficult due to it's large size.
My tool uses Natural Language Processing-based techniques to 1.Identify pageswith redundant information so that contributors can merge/delete them2.Identify pages in a specific category/topic which need to be worked on.3.Identify old pages which might need to be updated or categorized as out-of-date
To make the tool faster, I have used Parallel Processing techniques. In thistalk, I would like to describe the tool and it's functionality, the methods Iused for building it and how it can be scaled to any other wiki. I would alsolike to hear feedback on other wiki gardening tasks this tool could possiblybe used for, and how to scale it to automatically merge pages with redundantinformation. |