Full-Text Search in Django with PostgreSQL
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33778 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
E-textSoftwareIntelComputerSoftware developerImplementationDatabaseExecution unitRankingSubject indexingMobile WebDigital rights managementElasticity (physics)VideoconferencingSynchronizationSolrComputer clusterDevice driverFocus (optics)Physical systemEmailMessage passingModule (mathematics)Revision controlBinary fileComputer programmingFront and back endsElasticity (physics)Java appletSystem administratorPhysical systemPresentation of a groupDevice driverProjective planeVector spaceTable (information)SynchronizationPoint (geometry)Query languageWordSubject indexingBounded variationDatabase1 (number)Different (Kate Ryan album)Module (mathematics)Revision controlPatch (Unix)Cartesian coordinate systemInsertion lossAxiom of choiceSoftware developerCASE <Informatik>Computer scienceSoftwareImplementationINTEGRALCodeField (computer science)EmailElectronic mailing listReal numberMultiplicationScheduling (computing)VideoconferencingWeb pageDigital rights managementProcess (computing)CodebuchOrder (biology)Asynchronous Transfer ModeForm (programming)PurchasingScalar fieldLie groupGenderFormal languageSingle-precision floating-point formatBasis <Mathematik>SupremumSoftware engineeringAreaWeb 2.0Generic programmingGreatest elementMedical imagingSearch engine (computing)QuicksortMobile app
05:52
Module (mathematics)Revision controlBinary fileSubject indexingFocus (optics)Computer programmingBlogData modelEndliche ModelltheorieStandard deviationQuery languageObject (grammar)ResultantSocial classQuicksortSubject indexingFunctional (mathematics)CASE <Informatik>Medical imagingOrder (biology)BlogModule (mathematics)Endliche Modelltheorie
06:51
Operations researchObject (grammar)Query languageSimilarity (geometry)Vector spaceDigital filterSoftware engineeringRankingEndliche ModelltheorieConfiguration spaceNumberRadarMusical ensembleRevision controlObject (grammar)Query languageImpulse responseRankingWorkstation <Musikinstrument>Vector spaceVarianceNetwork topologyField (computer science)Different (Kate Ryan album)Musical ensembleString (computer science)Endliche ModelltheorieWebsiteSimilarity (geometry)NumberMoment (mathematics)Web 2.0Server (computing)Projective planeRevision controlCASE <Informatik>Operator (mathematics)Message passingLogicMiniDiscTheory of relativityException handlingWeightFormal languageStability theoryVector fieldVector potentialExtension (kinesiology)Table (information)KlassenkörpertheoriePort scannerWordModule (mathematics)Real numberXML
10:21
Revision controlMusical ensembleVector spaceConfiguration spaceQuery languageBit rateDigital filterObject (grammar)Software testingRevision controlForm (programming)Social classMechanism designLogicSearch engine (computing)Digital rights managementMusical ensembleSoftware testingXMLUMLProgram flowchart
11:25
Object (grammar)Musical ensembleSoftware testingElectronic mailing listBit rateFormal languageConfiguration spaceE-textSubject indexingCondition numberDigital rights managementSynchronizationStack (abstract data type)Integrated development environmentOpen sourceCodierung <Programmierung>GoogolIntelSoftwareWordCore dumpCondition numberField (computer science)Musical ensembleComponent-based software engineeringOrder (biology)Event horizonIntegrated development environmentFunctional (mathematics)Electronic mailing listDifferent (Kate Ryan album)QuantumTable (information)Bit rateMultiplication signRadiusPresentation of a groupNumberRight angleVideoconferencingWebsiteSoftware developerRun time (program lifecycle phase)ResultantOcean currentSoftware testingSlide ruleSoftwareComputer animation
14:38
E-textObject (grammar)Musical ensembleSoftware testingBit rateRevision controlQuery languageRankingMobile WebDigital rights managementVector spaceDigital filterBit rateWeightSlide ruleWeb pageInformationRankingNumberGame controllerWordFreezingMathematicsLogicInsertion lossFilm editingType theoryFunctional (mathematics)Different (Kate Ryan album)Shared memoryState of matterResultantRight angleFrequencyMathematical analysisSet (mathematics)DemosceneSocial classClassical physicsProjective planeSearch engine (computing)Vector spaceElasticity (physics)DatabaseCountingProper mapField (computer science)Meeting/Interview
18:48
Product (business)FlagMultiplication signVirtual machineField (computer science)Projective planeProcess (computing)Presentation of a groupVolume (thermodynamics)TheoryDatabaseElasticity (physics)Game controllerSet (mathematics)Meeting/Interview
Transcript: English(auto-generated)
00:05
Hello everyone, this is my first English talk and they have put it in the schedule just few days ago So excuse me for reading some notes Anyway, I'm happy to be here with you We are here to find out more about full text search in Django with Postgres
00:24
My name is Paolo Melchior and I'm Italian. I'm a computer science engineer I am a backend Python developer for more than 10 years and Django developer for about five years. At the present, I'm working remotely at 20TAB
00:41
As a senior software engineer, I'm not a database administrator But I'm a loyal user of Postgres in all my projects With this talk I want to show you how we have used Django full text search and Postgres in a real project The main reason why we used Django and Postgres-based full text search
01:04
Was because we prefer to implement full text search without any external tools These are the main topics of this presentation Full text search in general existing solution for full text search
01:21
Full text search support for in Postgres, Django support for full text search The project concertiaroma.com, the next innovation in full text search some personal conclusion and any question after the talk
01:40
Full text search derives from the need to do some document search For example to file documents that contain specific word and its variation If a document contain house or houses, it will be the same for the search Some example of everyday use of full text search are search engine, document search, email searching and so on
02:07
This is a list of some feature that we can find in an enough advanced full text search solution That can we use in a real project Stemming, ranking, stopword removal, multiple language, accent support, indexing and phrase search
02:32
Elexis search and Solr are two solution for the full text search that today are popular and used There are others, but these are the only ones that I have used in my professional projects
02:45
They are Lucene based and written in Java Snapmarket was a startup where I worked in the past and that produced a mobile phone application for
03:01
Sell and buy used items. In this project I used Elastic that had already been set up on the system But we have some difficulty managing and synchronizing it We had to apply some patches to Java plugin that we used for the compound of words in German and
03:24
I didn't particularly enjoyed it Another project was goldscout.com is a website dedicated to show sports videos uploaded by public user and it has
03:41
About 25,000 videos The use of Solr for full text search in this project was a customer choice we always had some problems synchronizing the data and At the end we prefer doing or writing on Postgres and or reading on Apache Solr
04:06
This solution that I've spoken about our full future and advanced. There are many online resources regarding it documentation article frequently as well question But I found some problem in synchronization and I've always had to use a driver
04:24
To connect with it. It's a bottleneck between the Django and insertion giant In some case I have to fix the code Personally I am a more adept than an ops so I don't like to be forced to integrate value system
04:42
I prefer developing a solving problem writing Python code Postgres has been supporting full text search since 2008 Internally it used TS vector and TS query to process the data into search It has some indexes that can be used to speed up the search gene and gist
05:06
Postgres has added support for fresh searching in 2016 Okay before we define the full text search using the concept of document The document is the general concept
05:22
Used in the full full text search and where the search is done in a database a document can be a field on a table The aggregation of four more field on a table or in different table The module Django point country point Postgres contain the support for the full text search
05:45
The support for full text search has been present in this module since the 1.10 version Instead brain engine indexes have been added in the 1.11 version
06:00
The gene index is very useful to speed up full text search The use of Postgres full text search in Django is more developer friendly for me Okay, let's look at the function of full text search in Django Starting from the models present in the search documentation the official documentation
06:25
We have a blog and an outer classes connected from an entry classes These are the basic search that we can use on Model in Django using filter with contain in the first case
06:43
In the second case using case incentive contains In order to get more results We can activating the an accent Postgres module So we can use the an accent access extension to search without worrying about
07:03
Accented the characters is most useful in various languages also in Italian This involved a full table scan Execution and its potential is low Another extension is trigram
07:22
Activating activating the trigram Postgres module we can use the trigram extension a Trigram is a group of three consecutive characters taken from a string We can evaluate the similarity of two string by the number of the trigram they share It's more. It's a lot useful, but
07:43
Sometime is not enough. This is the base search lookup of Django and With these we can execute a real full text search on a field, but is a real simple example
08:02
We can use search vector to search on more field on the same object object or connected object like in this case When we pass text to full text search by a search query we can apply operation of stemming and stop word removing
08:25
Even on the user text and on this we can apply basic logical operation we can use also the Postgres rank to calculate the score of a document relation to a search text and
08:41
We can use it to filter and to sort it we can set up the search vector to execute the stemming and stop word removing for a Language and we can get these languages also from a class field
09:07
it's possible to set up the search to give a different weights on various fields and the use and We can use these values in the search for filtering or ordering
09:21
We can decide to add to the model the search vector field to speed up the search It's very fast But we have to update this field manually for example using Django signal or with Postgres trigger Okay, this is the project
09:41
We are working on concertaroma.com Is a website to insert and to search for show festival bands venue all in the city of Rome At the moment the website has the following numbers bar are growing up and
10:01
The website is online since 2014 This is the version number two is the old version of the website was developed some years ago with Django 1.7 and it runs on Python 2.7 the data was managed by Postgres version 9.1 and the search is performed only using
10:27
SQL like syntax with filter and so on This is the version 3 is the new version Recently released Was developed with Django version 1.11 and it runs on Python 3.6
10:44
The data is managed by Postgres version 9.6 and the search user its full text version search engine Let's have a look at an example of manager defined for the band class
11:02
It define a search method that contain all the full text search logic It's more complex than the before example To better understand the mechanism we can take into consideration example of a simplified test internally defined
11:25
In this test setup we define the example data that we will use afterwards to test our search Two bands and two musical genres that we assign to the two bands
11:41
In the search test on the bands We simply invoked the search method giving a search text and we got back the list of values for the fields nickname and rate Nickname is stored on the band table while rate is calculated by our search method at runtime
12:03
In this example we compared Our search result with a list of list where we define the pair composed of the bands nickname at the numerical value That is the search rate or in other words the numerical value that defines the importance of
12:22
that We have seen a simplified the use of current feature of Django and the Postgres full text search module itself Both of these software are getting better in these fields and
12:43
These are some of the feature that can be available in the next feature Okay, in conclusion the following under condition we evaluated to implement this solution Not having any extra dependencies not doing too complex search
13:05
managing easily or the components Synchronizing data is not requested Postgres is already available in our stack and Is present a Python only environments These are the resources that I used to prepare this talk and to develop the search function
13:26
I showed you and we used on the website. I would like to thank 20 tab the company I work for To have given me this opportunity and Mark Tamlin the original Django country Postgres developer for sharing with everyone
13:45
His work. Thanks everyone for time that you have spent listening to me This presentation will release the with creative commons and you can download it from my speaker deck account after some hours
14:03
If anyone has any question, please wait after the presentation and if you want you can contact me here Thank you Thanks bono for for your presentation
14:29
Hello, really interesting talk. I was wondering because in one of the slides if you've got get a few slides back
14:47
You're showing the rating I'm not going to ask how exactly do we calculate the rating because it's I can imagine that's propriety information We're doing something similar, but I'm wondering can you speak? Oh, sorry
15:01
I'm wondering how much control do you actually have on the fact how the rating is? Generate can you for example say I want to rate higher Words that are longer than five characters so that if they appear early in the sentence Can I put a higher value of this one? Sorry?
15:23
For me, it's difficult to understand clearly your question. Can you come here and This is the search rank Okay, let's say you have an example
15:42
a Miley Cyrus is playing a throne on in July, right? Mm-hmm, and someone is searching for Cyrus will that be ranked high if you want to say Oh, this is the Cyrus appearing on the second as a second word in the sentence
16:01
So if I find another sentence that it's appearing on the as a first word, I want to run that one higher Do we have that kind of controlling postcard? You can control with the these other these are functionality Sorry
16:21
To get back the functionality is the weight set weight, but I'm searching
16:41
Okay You can specify the weights for something and In Postgres you can specify For type of weights from hey to to see 2d sorry and you can change the number related to this weight and
17:03
Configuring a lot of this type of number you can have different type of weight of final weight There is also two different type to calculate the ranking Postgres TS rank and TS rank CD and that have different logic
17:23
The first one Count the frequency of the word in your documents and the second one of a very complex Logic is more flexible and we used to Construct a search vector with 10 or 11 of these you I showed you in the example so
17:46
We worked a lot in this fields to calibrate and to settings in very low details the results in the search instead of Having some trouble and searching to fix the synchronizing like in other solution, but I think is
18:08
it's advanced enough to to setting a lot Can you give us some more information about
18:27
The difficulties you encountered in the synchronizing the document database with the classic search engine like
18:42
Solar or elastic search. Thank you. Okay In both the project I speak it before And there is there was a guy that managed to synchronize this the Postgres
19:01
database and elastic search of solar engine and in control the lot of problem of of timing sometimes the the user want to Flag for example in our set flag a concert to to like and
19:21
You have to wait The data came from the database to The other engine elastic search and every time is a slow process Especially when the volume of your data grow up
19:44
You Can you can have a very good machine to? Put on a sister for sure for example, but in this project This was a very little project. So we don't have a lot of big machine to do that and with Postgres we
20:03
Don't don't have this problem at all because the data and the full text search is in the same Same PC It's good the last question to Paul
20:22
No So, let's thanks again Paul for his interesting presentation Thanks, Paul. Thank you