We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Full-Text Search in Django with PostgreSQL

00:00

Formal Metadata

Title
Full-Text Search in Django with PostgreSQL
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Full-Text Search in Django with PostgreSQL [EuroPython 2017 - Talk - 2017-07-12 - PythonAnywhere Room] [Rimini, Italy] After some experiences in the implementation of full-text search functionality with different system, we have decided to use PostgreSQL to implement full-text search functionality in our next project, a website to search for shows, venues, bands and festivals. In the past, I have worked in two different projects, a mobile platform to sell and buy used items and a sport videos sharing platform, where I used two of the most currently famous full-text search software (Elasticsearch or Solr) but I had some synchronization and management problems. After that, in my company, we searched for new Django support of full-text search PostgreSQL implementation and we decided to use it to avoid any problems that I had in the past. I’m going to start speaking about the full-text search in a general context and I want to show the problems I encountered implementing it in the past. Afterwards, I’m going to talk about the PostgreSQL functionality to implement the full-text search functionality and also present the django.contrib.potgres.search module, with step-by-step demonstrations of its functions with real world data. Finally, I’m going to show the way we use and test this functionality in our project and which functionality lacks us to have a complete implementation of full-text search in our project. At the end, I want to present my conclusions about our solution and I want to explore some new features that will be present in the next versions of Django and PostgreSQL
E-textSoftwareIntelComputerSoftware developerImplementationDatabaseExecution unitRankingSubject indexingMobile WebDigital rights managementElasticity (physics)VideoconferencingSynchronizationSolrComputer clusterDevice driverFocus (optics)Physical systemEmailMessage passingModule (mathematics)Revision controlBinary fileComputer programmingFront and back endsElasticity (physics)Java appletSystem administratorPhysical systemPresentation of a groupDevice driverProjective planeVector spaceTable (information)SynchronizationPoint (geometry)Query languageWordSubject indexingBounded variationDatabase1 (number)Different (Kate Ryan album)Module (mathematics)Revision controlPatch (Unix)Cartesian coordinate systemInsertion lossAxiom of choiceSoftware developerCASE <Informatik>Computer scienceSoftwareImplementationINTEGRALCodeField (computer science)EmailElectronic mailing listReal numberMultiplicationScheduling (computing)VideoconferencingWeb pageDigital rights managementProcess (computing)CodebuchOrder (biology)Asynchronous Transfer ModeForm (programming)PurchasingScalar fieldLie groupGenderFormal languageSingle-precision floating-point formatBasis <Mathematik>SupremumSoftware engineeringAreaWeb 2.0Generic programmingGreatest elementMedical imagingSearch engine (computing)QuicksortMobile app
Module (mathematics)Revision controlBinary fileSubject indexingFocus (optics)Computer programmingBlogData modelEndliche ModelltheorieStandard deviationQuery languageObject (grammar)ResultantSocial classQuicksortSubject indexingFunctional (mathematics)CASE <Informatik>Medical imagingOrder (biology)BlogModule (mathematics)Endliche Modelltheorie
Operations researchObject (grammar)Query languageSimilarity (geometry)Vector spaceDigital filterSoftware engineeringRankingEndliche ModelltheorieConfiguration spaceNumberRadarMusical ensembleRevision controlObject (grammar)Query languageImpulse responseRankingWorkstation <Musikinstrument>Vector spaceVarianceNetwork topologyField (computer science)Different (Kate Ryan album)Musical ensembleString (computer science)Endliche ModelltheorieWebsiteSimilarity (geometry)NumberMoment (mathematics)Web 2.0Server (computing)Projective planeRevision controlCASE <Informatik>Operator (mathematics)Message passingLogicMiniDiscTheory of relativityException handlingWeightFormal languageStability theoryVector fieldVector potentialExtension (kinesiology)Table (information)KlassenkörpertheoriePort scannerWordModule (mathematics)Real numberXML
Revision controlMusical ensembleVector spaceConfiguration spaceQuery languageBit rateDigital filterObject (grammar)Software testingRevision controlForm (programming)Social classMechanism designLogicSearch engine (computing)Digital rights managementMusical ensembleSoftware testingXMLUMLProgram flowchart
Object (grammar)Musical ensembleSoftware testingElectronic mailing listBit rateFormal languageConfiguration spaceE-textSubject indexingCondition numberDigital rights managementSynchronizationStack (abstract data type)Integrated development environmentOpen sourceCodierung <Programmierung>GoogolIntelSoftwareWordCore dumpCondition numberField (computer science)Musical ensembleComponent-based software engineeringOrder (biology)Event horizonIntegrated development environmentFunctional (mathematics)Electronic mailing listDifferent (Kate Ryan album)QuantumTable (information)Bit rateMultiplication signRadiusPresentation of a groupNumberRight angleVideoconferencingWebsiteSoftware developerRun time (program lifecycle phase)ResultantOcean currentSoftware testingSlide ruleSoftwareComputer animation
E-textObject (grammar)Musical ensembleSoftware testingBit rateRevision controlQuery languageRankingMobile WebDigital rights managementVector spaceDigital filterBit rateWeightSlide ruleWeb pageInformationRankingNumberGame controllerWordFreezingMathematicsLogicInsertion lossFilm editingType theoryFunctional (mathematics)Different (Kate Ryan album)Shared memoryState of matterResultantRight angleFrequencyMathematical analysisSet (mathematics)DemosceneSocial classClassical physicsProjective planeSearch engine (computing)Vector spaceElasticity (physics)DatabaseCountingProper mapField (computer science)Meeting/Interview
Product (business)FlagMultiplication signVirtual machineField (computer science)Projective planeProcess (computing)Presentation of a groupVolume (thermodynamics)TheoryDatabaseElasticity (physics)Game controllerSet (mathematics)Meeting/Interview
Transcript: English(auto-generated)
Hello everyone, this is my first English talk and they have put it in the schedule just few days ago So excuse me for reading some notes Anyway, I'm happy to be here with you We are here to find out more about full text search in Django with Postgres
My name is Paolo Melchior and I'm Italian. I'm a computer science engineer I am a backend Python developer for more than 10 years and Django developer for about five years. At the present, I'm working remotely at 20TAB
As a senior software engineer, I'm not a database administrator But I'm a loyal user of Postgres in all my projects With this talk I want to show you how we have used Django full text search and Postgres in a real project The main reason why we used Django and Postgres-based full text search
Was because we prefer to implement full text search without any external tools These are the main topics of this presentation Full text search in general existing solution for full text search
Full text search support for in Postgres, Django support for full text search The project concertiaroma.com, the next innovation in full text search some personal conclusion and any question after the talk
Full text search derives from the need to do some document search For example to file documents that contain specific word and its variation If a document contain house or houses, it will be the same for the search Some example of everyday use of full text search are search engine, document search, email searching and so on
This is a list of some feature that we can find in an enough advanced full text search solution That can we use in a real project Stemming, ranking, stopword removal, multiple language, accent support, indexing and phrase search
Elexis search and Solr are two solution for the full text search that today are popular and used There are others, but these are the only ones that I have used in my professional projects
They are Lucene based and written in Java Snapmarket was a startup where I worked in the past and that produced a mobile phone application for
Sell and buy used items. In this project I used Elastic that had already been set up on the system But we have some difficulty managing and synchronizing it We had to apply some patches to Java plugin that we used for the compound of words in German and
I didn't particularly enjoyed it Another project was goldscout.com is a website dedicated to show sports videos uploaded by public user and it has
About 25,000 videos The use of Solr for full text search in this project was a customer choice we always had some problems synchronizing the data and At the end we prefer doing or writing on Postgres and or reading on Apache Solr
This solution that I've spoken about our full future and advanced. There are many online resources regarding it documentation article frequently as well question But I found some problem in synchronization and I've always had to use a driver
To connect with it. It's a bottleneck between the Django and insertion giant In some case I have to fix the code Personally I am a more adept than an ops so I don't like to be forced to integrate value system
I prefer developing a solving problem writing Python code Postgres has been supporting full text search since 2008 Internally it used TS vector and TS query to process the data into search It has some indexes that can be used to speed up the search gene and gist
Postgres has added support for fresh searching in 2016 Okay before we define the full text search using the concept of document The document is the general concept
Used in the full full text search and where the search is done in a database a document can be a field on a table The aggregation of four more field on a table or in different table The module Django point country point Postgres contain the support for the full text search
The support for full text search has been present in this module since the 1.10 version Instead brain engine indexes have been added in the 1.11 version
The gene index is very useful to speed up full text search The use of Postgres full text search in Django is more developer friendly for me Okay, let's look at the function of full text search in Django Starting from the models present in the search documentation the official documentation
We have a blog and an outer classes connected from an entry classes These are the basic search that we can use on Model in Django using filter with contain in the first case
In the second case using case incentive contains In order to get more results We can activating the an accent Postgres module So we can use the an accent access extension to search without worrying about
Accented the characters is most useful in various languages also in Italian This involved a full table scan Execution and its potential is low Another extension is trigram
Activating activating the trigram Postgres module we can use the trigram extension a Trigram is a group of three consecutive characters taken from a string We can evaluate the similarity of two string by the number of the trigram they share It's more. It's a lot useful, but
Sometime is not enough. This is the base search lookup of Django and With these we can execute a real full text search on a field, but is a real simple example
We can use search vector to search on more field on the same object object or connected object like in this case When we pass text to full text search by a search query we can apply operation of stemming and stop word removing
Even on the user text and on this we can apply basic logical operation we can use also the Postgres rank to calculate the score of a document relation to a search text and
We can use it to filter and to sort it we can set up the search vector to execute the stemming and stop word removing for a Language and we can get these languages also from a class field
it's possible to set up the search to give a different weights on various fields and the use and We can use these values in the search for filtering or ordering
We can decide to add to the model the search vector field to speed up the search It's very fast But we have to update this field manually for example using Django signal or with Postgres trigger Okay, this is the project
We are working on concertaroma.com Is a website to insert and to search for show festival bands venue all in the city of Rome At the moment the website has the following numbers bar are growing up and
The website is online since 2014 This is the version number two is the old version of the website was developed some years ago with Django 1.7 and it runs on Python 2.7 the data was managed by Postgres version 9.1 and the search is performed only using
SQL like syntax with filter and so on This is the version 3 is the new version Recently released Was developed with Django version 1.11 and it runs on Python 3.6
The data is managed by Postgres version 9.6 and the search user its full text version search engine Let's have a look at an example of manager defined for the band class
It define a search method that contain all the full text search logic It's more complex than the before example To better understand the mechanism we can take into consideration example of a simplified test internally defined
In this test setup we define the example data that we will use afterwards to test our search Two bands and two musical genres that we assign to the two bands
In the search test on the bands We simply invoked the search method giving a search text and we got back the list of values for the fields nickname and rate Nickname is stored on the band table while rate is calculated by our search method at runtime
In this example we compared Our search result with a list of list where we define the pair composed of the bands nickname at the numerical value That is the search rate or in other words the numerical value that defines the importance of
that We have seen a simplified the use of current feature of Django and the Postgres full text search module itself Both of these software are getting better in these fields and
These are some of the feature that can be available in the next feature Okay, in conclusion the following under condition we evaluated to implement this solution Not having any extra dependencies not doing too complex search
managing easily or the components Synchronizing data is not requested Postgres is already available in our stack and Is present a Python only environments These are the resources that I used to prepare this talk and to develop the search function
I showed you and we used on the website. I would like to thank 20 tab the company I work for To have given me this opportunity and Mark Tamlin the original Django country Postgres developer for sharing with everyone
His work. Thanks everyone for time that you have spent listening to me This presentation will release the with creative commons and you can download it from my speaker deck account after some hours
If anyone has any question, please wait after the presentation and if you want you can contact me here Thank you Thanks bono for for your presentation
Hello, really interesting talk. I was wondering because in one of the slides if you've got get a few slides back
You're showing the rating I'm not going to ask how exactly do we calculate the rating because it's I can imagine that's propriety information We're doing something similar, but I'm wondering can you speak? Oh, sorry
I'm wondering how much control do you actually have on the fact how the rating is? Generate can you for example say I want to rate higher Words that are longer than five characters so that if they appear early in the sentence Can I put a higher value of this one? Sorry?
For me, it's difficult to understand clearly your question. Can you come here and This is the search rank Okay, let's say you have an example
a Miley Cyrus is playing a throne on in July, right? Mm-hmm, and someone is searching for Cyrus will that be ranked high if you want to say Oh, this is the Cyrus appearing on the second as a second word in the sentence
So if I find another sentence that it's appearing on the as a first word, I want to run that one higher Do we have that kind of controlling postcard? You can control with the these other these are functionality Sorry
To get back the functionality is the weight set weight, but I'm searching
Okay You can specify the weights for something and In Postgres you can specify For type of weights from hey to to see 2d sorry and you can change the number related to this weight and
Configuring a lot of this type of number you can have different type of weight of final weight There is also two different type to calculate the ranking Postgres TS rank and TS rank CD and that have different logic
The first one Count the frequency of the word in your documents and the second one of a very complex Logic is more flexible and we used to Construct a search vector with 10 or 11 of these you I showed you in the example so
We worked a lot in this fields to calibrate and to settings in very low details the results in the search instead of Having some trouble and searching to fix the synchronizing like in other solution, but I think is
it's advanced enough to to setting a lot Can you give us some more information about
The difficulties you encountered in the synchronizing the document database with the classic search engine like
Solar or elastic search. Thank you. Okay In both the project I speak it before And there is there was a guy that managed to synchronize this the Postgres
database and elastic search of solar engine and in control the lot of problem of of timing sometimes the the user want to Flag for example in our set flag a concert to to like and
You have to wait The data came from the database to The other engine elastic search and every time is a slow process Especially when the volume of your data grow up
You Can you can have a very good machine to? Put on a sister for sure for example, but in this project This was a very little project. So we don't have a lot of big machine to do that and with Postgres we
Don't don't have this problem at all because the data and the full text search is in the same Same PC It's good the last question to Paul
No So, let's thanks again Paul for his interesting presentation Thanks, Paul. Thank you