We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Text categorization with Apache Lucene

Formale Metadaten

Titel
Text categorization with Apache Lucene
Serientitel
Anzahl der Teile
69
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
As many text mining applications, automatic text categorization is usually implemented with flavors of Machine Learning algorithms, which are trained with an appropriate training set to build the model. This model contains the statistical data based on training set texts that will later allow the system to match an input document with the corresponding category. But wait… statistical data on text documents? Doesn’t it remind you of our dear improved inverted index at the core of Apache Lucene? Maybe we could consider the index containing the training set documents as our trained model? And maybe a simple query against this index could give us the category (or categories) more likely to apply to a given document? In this talk, we will demonstrate this approach and show that it can perform well and possibly "for free", as we sure all have Lucene based tools in our application portfolios!