Merken

Multi-table Full Text Search with Postgres

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
and that and that and you they don't do that without so they're all of the word popular somewhat non-standard approach to search in your application and often searches done in some sort of an external service this article that it better it better all just stand like this was starting over that extra time there were talk about somewhat non-standard approach for certain applications often searches can be done with some sort of an external service so rather than doing that we're gonna use the tool that's already responsible for storing and finding data welcome the multi-table full-text search and those graphs the I'm killed Thompson we're not
doing a wednesday held you can find in these places on the internet feel free to tweet to me during the talk of that ship that but if you're not treating please do close your laptop there's been a lot of code here and it's going be hard to get anything out of the stock duality now that I
told you that I you should close your laptops and I don't like to jump in the credentials meaning the top URI here and that sort of what the point is that the object with the bio still hear a little bit more about me at the end but you have 1 claim to fame and that's that my birthday is in some don't interact the non-expert I'm going to talk to you about a real
feature we're gonna iterate explore other options and optimize just like we would when we're developing a feature in applications
retired of full-text search what it is how it can help us hopefully you could guess that
the the title where to talk about the use and no not
the HTML views were in fact that databases naturally we want to explore the performance implications of whatever we're going so we're gonna look at some of the performance implications and how we can mitigate and we're to talk about
materialize these costs as 1 way to do that
will look at a couple of gems they can help out we're doing this fun stuff and of course for look
at all the other options for some of the other options for what right let's look at sort of the the classic
example of search for articles in our applications the exact the the simplest thing we
could possibly do that works is to search for a substring so here we've got and articles where the body includes has that look and terrible the and to make a point of keep going time it was a matter of fact it related articles where the body include some substring Pasternak where you've probably seen this in applications and this works but it works if you're users know exactly what they're searching for so if they want to find an article with say the word book at and and they know that is going to be lowercase in the middle of a sentence someone and they can search like this and you can pass answer trade and I like a so that's doing exact substring the the most useful thing and so tiny
little step forward we could take years to do the case insensitive like with like I like in I don't like this but it is slightly better right well let's leave that and it sort of does what we need for now I we don't even know where in the where in the sentence the word that you're searching for is that's cool and but it will you know naturally features expand we need a search based on the titles of articles that we Polish in coming when we were searching we can handle this too will just sort of extend what we had already done and we're passing that query 2 times instead of wants and I will still do that case-insensitive like and the the person signs in here are so the social of anywhere in the way anywhere in a word universe of any sort of substring the so now we wanna served by the author's
name is getting a little more complicated but you know again probably something which is seen coming will go ahead and
join onto our user model I apologize reuse the user and authors tables interchangeably in this talk of so I uses stable so this is basically the same query but we're doing relating rails a handle the join and then we're doing the you know pulling out the user's name and the same to feel that we are already going out that query Cory Cory starting stutter a which something that we don't really want America and so 1 way we could refactor something like this is to do a quarry object and this is less performance but arguably easier to understand when you're looking at but when it comes out we still got really poor results were only searching for this case insensitive substrings and you know that's not great what of words can be plural and we have a singular aquaria what for searching for singular things Google most of these things and that's what our users are going to expect from us when we're building the search enter
full-text search the full text search allows us to query for rows based on natural language take a look at what is natural
language searching I'm good dressed natural-language
searching allows us to remove stop words from the query so these are words that appear in all sorts of sentences they have now a lot of semantic meaning to us and we want to not really include them in our search result we don't want every search that includes the word and or every article it returns the word in the structure of miniature all again just like with the like forces I like would eliminate casing really straightforward I we want synonyms to show up us user has sort of a concept in mind that they're searching for but doesn't remember exactly what it is that you know these both of these should return the same results and we're going to incorporate stemming is another feature of natural image searching which means that related words like tried trying and tries these are all different versions of the same root words so we report them under the same concept and when we're doing our searching under the hood were actually searching for that that we were instead of the specific words that were passed so his example of making that same query and we got and to resume around and this got a little bit more highlight the more important pieces and so here were looking at the text we wanna join were saying of title concatenate that with an empty space and then the body will call that the text is because we need to give it a name proposed to be happy and so that's what the the 2 pipe operators are the concatenation and we're also going to pull and the other's name as the text and actually we want the ID when we're going out of the article and the articles ID offers we 1 unique results because we don't want the same article short 1 2 times in a query if it shows up you know multiple times in the body features of both in the proper stable and and the title where the body by so
that a lot of sequel where do we put all this we
could further back and work worry object we looked at last time we had some code basically just an inline sequel and etc. and uh and execute aware and same thing with the scope and in there just pass and that's the queries with its interpolated but to be honest signal doesn't belong and files we've got an extension for that and suppose this actually has the answer
in the form of views the view is a
partial query stored in the database that can be selected from and return a
set of columns that research on later on the nature of users that because there are
basically just a query they can have multiple source tables and so we've got right now we've got the user's slash rather stable and we've got the article stable so this view will allow us to sort of abstract away
and just say this is the text we care about the and then
we can perform a aware later on we can do whatever we need to do to complete that query so that it's meaningful to our users when actually forming a search later on down the road so
if we were going to build here just sort of an example you and we've got this great you syntax just given a name does like you would a table a week and select distinct user ideas or a now obviously were pulling users with recent activity so l look at a couple different tables and we want to see all of the information about the user and also the last time they were active so we only want 1 instance of each user like I said we want all of those rows from the user table and we're going to create this concept of a user's last active time and by pulling and the activities created a call and we're just going to limit that to activities or users who were active in the last 7 days so when we're looking through this view looks pretty similar to what searching for a table would look like you select whatever select everything from users with recent activity where the order what every needed it and in fact it's so somewhere that Active Record
can use of you as a back end for model so
what we can do is create a fairly null model
looks very familiar and we can interact with that as if it were a table in a database so we've got this use of respectively is model is in the record subclass we're in a given a table name just because our naming of that table didn't match was rails would have expected when given the class name and tell that it's read online and this isn't strictly speaking true but it's easiest to just assume that view is going to be read only but if you need it not to be read only than there some special rules for that that's not that's an exercise the audience and but what this says is that tells rails that they nothing can be deleted and nothing can be written into this table with read-only economic or against was work full-text search
luckily I guess there's we're going talk now about our 1st Genentech secular politics
secular was originally written by Aaron Patterson and their lives here on get the take secular takes care of the full text search
portions of the queries and and it assumes that
you want search over every text field because it's called full-text search and since full text full-text search I guess the on on a record on the table and it gives you some variance search
options like basic search and fuzzy search and advanced search for our purposes all we really care about is
this basic search and as can be what most generally usable when you're building sort of a single field and that users used to get results back so that looks like this if you're searching for a game any sort of game that included Sonic and Sonic the Hedgehog supersonic whatever finding a little more complicated even with the basic search and say the title needs include Mario and the system is to include Nintendo but want any my Maria title on
any Nintendo problems so this is sort of the next simplest useful thing that you can deal with next I
silicic will go back and take a look at that 1st we wrote
this is that same sequel from before that to get out articles based on either the articles name body or the author's name sir search result is
really simple on the rail signed we're going to create
this 3 1 class and Morgan include take secular and until it that it belongs to an article because we need that field article and when 1 actually use it if we wanna say find an article written by Sandy where that mention Sandy then we just do this basic search for standing and map that onto the articles if you want to get a little too crazy you could do you could include innumerable into your record and innumerable is a super important and very useful feature of revealing built into the standard library and if you don't know about it feel free to come up and ask me afterwards but of basically give you all the school teaching map and everything else so you be able to use this class with a search result that new the query and then thought each time so basically you can treat it as if it were any sort of other collection an array-like like collection so creating this view
I I'm convinced you know that user radius anyone use them when an idea is so creating that is fairly
straightforward you got ActiveRecordBase connection execu so this is a migration eat actually destruct this teacher in this to execute and an organ is that previous equal that we just had on the screen and then to drop that we just say dropped you and then the name of system
changes this I will out what some feature creep as
we always have an hour features the fundamental comes back and says articles his comments
massacre should also show up in results if somebody is mentioned sending a comment about an article we want
that to show up in on that article to show up in our search results the salary cap were now searching
on an article's title and body an author's name in the comments body and any article that any of these things are related to should show up in our search results so the updated sequel looks like this the updated query and the new part is this new union with a fairly straightforward select and join them where where going in the body and the article ID from income so it was a
good that up 3 of the migration well we can
do it is I throw the new sequel into that up record just like we had done before and unfortunately what we need to do for the Dow migration is put the old text of this migrate view into that down migration so that when we're rolling back we know rails knows what state to put the database and that sort of thing so we can handle that that's not stressed out about mn and I unfortunately can't
always update you actually can't remove any columns from the view which we happen to be doing when we're rolling back this migration out we no longer have way but if you if you had added a new column into your views the you have more information say that actor that column from before the so we don't really need that after that comedy was remove that from the use of recent activity then you couldn't just do that with an update so we have to do
is just 1st drop that view and then created you again and then again we need a drum dump that whole sequel into the migration that sucks now you also can't
dumped a view into DB schema are being and so the solution that is to tell
rails that the database format is going the structure and then you're dump the structure that's equal and dump out an actual pose sequel version of your entire database yeah sorry luckily we've got our
2nd scenic which adds new methods to migrations and allows the user to be dumped into DB schema that are be which would you expect and is generally makes use a little bit easier to work with of course I had a little and
this and I am 1 of 2 main gainers the 1 in this room of of this Jim
and so creating this scenic migration and it's pretty straightforward really goes over it but you're not right into a simple file basically just the query portion of the fields you need worry about the previous impacts the drop the syntactical for you and and because you're writing it into you actually write it into a dot SQL file and so your getting whatever sort of editor support your you have been murdered team up for sublime or whatever people using these days Adam so you get whatever benefits that gives you gives me indentations my syntax highlighting and in sequel and then for the syntax and the migration actually does have this preview which is reversible just like create table and you can go back to using that change that that the need to change the
deal I you can actually just do this update view you tell what version numbers and the nose based on the naming convention how to find the new and old versions of the sequel of the sequel for the even that's a little bit
tougher member so we did create some generators we actually ever model generator that gives you that read me and give or read-only and infers the name of the model based on the name of the view so that that that meaning will match up c and overflow three-line method man or a woman method in like a total of 5 lines near file and and then when you're writing into that simple file which is like this and so the 1st version of the searches we right and this is that same sequel from before pretty straightforward methods lives in a simple whilst that it's a lot easier to use really look at 1 year the new narrator I will set the generator for when you need to update that views so you can just do it rather than scenic model In this scene again you have the notes and that will give you the next version of the next numeric version of the view and then dumps and the old version of the sequel which then you can update with whatever you needed that now there debt was some performance
implications with this approach as I mentioned
this queries really slope Aceh search
across 3 different tables and then a couple of columns to get all the results so we need so what it
comes down to it is actually an order of magnitude slower to get these results and unfortunately
views can be indexed luckily underlying
tables can also the recommendation here is
that in indices how are several types of and that's the
1 that you're most familiar with is the tree of the tree is great for exact matches on a call so text poor so even like the substring matches are OK with a B-tree might get an index at that and definitely for primary keys readers looking up 90 or a UUID those are great for full-text search the ones
that were interested in origin and just James stands for generalized inverted index and just stands for generalized inverted search tree there's an informational never use again Jim lookups are generally about 3 times faster than just they also take about 3 times longer to build Jan indexes are moderately slaughter update than just indexes and there are 2 to 3 times larger on this so that means the object's and this is what means you should use gender if you don't have hundreds of thousands of rows in your database you're not concerned about longer rates walking to the DB hi you're adding the index late in the game and you don't care about disk space perhaps because it's 2016 and you want very fast lookup so you you were optimizing for read at the if we're building a log manager or something like that and we want optimize for right heavy and maybe June isn't the right solution so on the other hand you should use just if you have very large tables millions billions of records on there's an order of magnitude and there few between those 2 it's up to you to figure out I work at a consultancy you compare you should use it if you have performance concerns right now and when I say that I mean the you currently have performance issues not they were concerned that in the future you will have performance concerns what use use it if for some reason disk space is important it's 1994 you should use it at your table is very right at the like I said log aggregators are a great example of this so adding those indexes as Jan is pretty straightforward and these are the 4 fields we've been using and you just say using gender reals inside and left materialized views are
of another way that we can improve this
performance are a tool to pre-populate the
result of users can run that entire search query that we had and it's going to store all those result into a temporary table so we still pay say the 400 ms whenever were creating that table but then we can score against the the temporary table which already has resulted in that's much much faster and so we
against the result sets rather than performed for query
and it's another order of magnitude faster even than the i like was this is without the indexes but and the way the downside of materialized views that it's not always updating because it is storing and that temporary table yet to tell it when you want to pay that 400 ms to get your update happening or however long it protects and you can do that as often as you
like you can do that on every right with with a trigger and sequel or with the enactment what's like this where you can do it may be on a timer if your search results don't always need to be up-to-date you could have you know of his controlled do it every 10 minutes or a hour a day and seeking the that either was opposed best trigger exercise to the reader or with the rails after that looks like this right
well what about some pre-built solution there's a lot of options out there and I did say that we would look at them
and we've got elastic search without the tire or a Lassiter trails Chile including who knows how much in your models I know it's a lot and EDU solar real some spot but whole shit things with thinking things actually does use a separate file but still like and what any this means what I had to figure this out 30 know things in the sequel Rice what this services are greater
is faceted search and if you're search doesn't look like a single box it looks like Amazon sidebar then full-text search is going to be a little more difficult to work with highly for POS presses full-text search these other tools do full-text retrieval
all of these things have to running a development machine that run your production machine which means that they have to be running a slowing down on your machine yet to deal with all these dependencies you also have to deal with them every time you're doing an update to your system if you ever change version the need make sure that development production of version of the original back make that handle big pain in the ass the
only be faked and test because you don't want to be actually using these things like interest in fact had a couple co-workers who were doing uh of were using solar I believe and I of great feature of solar is that it synchronises its index across the same subnet because they were both sitting on the work on there were computers and have the same IP addresses that extra like addresses they're tested the test indexes were being sink between each other in a lot of fun for a week
all of these have a lot of craft models I said some but at all and the removing data concern for your data base database you have they have this
arcane syntax and ultimately they make me make
this so by combining materialize you full-text
search and some rails magic with a pretty cool search for the
teacher the dozen require any new dependencies
and makes this 1 thank you
thank
you that was I think that if you if you have 1
Volltext
Dienst <Informatik>
Multiplikation
Wort <Informatik>
Kartesische Koordinaten
Ungerichteter Graph
Quick-Sort
Computeranimation
Bit
Punkt
Interaktives Fernsehen
Objektklasse
Code
Quick-Sort
Systemaufruf
Konstruktor <Informatik>
Computeranimation
Objekt <Kategorie>
Notebook-Computer
Dualitätstheorie
Implementierung
Volltext
Iteration
Kartesische Koordinaten
Computeranimation
Konfiguration <Informatik>
Sichtenkonzept
Sichtenkonzept
Computeranimation
Klassische Physik
Wort <Informatik>
Kartesische Koordinaten
Message-Passing
Quick-Sort
Computeranimation
Konfiguration <Informatik>
Autorisierung
Vorzeichen <Mathematik>
Abfrage
Ikosaeder
Wort <Informatik>
Grundraum
Quick-Sort
Computeranimation
Resultante
Autorisierung
Retrievalsprache
Volltext
Abfrage
Natürliche Sprache
Computeranimation
Objekt <Kategorie>
Singularität <Mathematik>
Informationsmodellierung
Datensatz
Wort <Informatik>
Tabelle <Informatik>
Resultante
Nichtlinearer Operator
Bit
Natürliche Zahl
Formale Sprache
Natürliche Zahl
Versionsverwaltung
Indexberechnung
Abfrage
Raum-Zeit
Quick-Sort
Computeranimation
Formale Semantik
Arithmetisches Mittel
Forcing
Formale Sprache
Wort <Informatik>
Wurzel <Mathematik>
Datenstruktur
Bildgebendes Verfahren
Objekt <Kategorie>
Retrievalsprache
Abfrage
Fortsetzung <Mathematik>
Maßerweiterung
Elektronische Publikation
Code
Computeranimation
Sichtenkonzept
Datenhaltung
Natürliche Zahl
Speicher <Informatik>
Abfrage
Partielle Differentiation
Quellcode
Sichtenkonzept
Computeranimation
Reihe
Bildschirmmaske
Multiplikation
Menge
Tabelle <Informatik>
Retrievalsprache
Subtraktion
Sichtenkonzept
Multiplikation
Abfrage
Sichtenkonzept
Quick-Sort
Computeranimation
Open Source
Datensatz
Trennschärfe <Statistik>
Information
Mini-Disc
Vollständigkeit
Ordnung <Mathematik>
Tabelle <Informatik>
Instantiierung
Datensatz
Informationsmodellierung
Sichtenkonzept
Matching <Graphentheorie>
Datenhaltung
Front-End <Software>
Klasse <Mathematik>
Datenmodell
Schlussregel
Computeranimation
Tabelle <Informatik>
Volltext
Abfrage
Computeranimation
Resultante
Konfiguration <Informatik>
Überschall
sinc-Funktion
Gebäude <Mathematik>
Einfache Genauigkeit
Physikalisches System
Quick-Sort
Computeranimation
Konfiguration <Informatik>
Physikalisches System
Datensatz
Datenfeld
Fuzzy-Logik
Nintendo Co. Ltd.
Spieltheorie
Varianz
Tabelle <Informatik>
Autorisierung
Resultante
Physikalisches System
Nintendo Co. Ltd.
Quick-Sort
Computeranimation
Resultante
Mapping <Computergraphik>
Retrievalsprache
Datenfeld
Sichtenkonzept
Klasse <Mathematik>
Abfrage
Quick-Sort
Computeranimation
Einfach zusammenhängender Raum
Radius
Selbst organisierendes System
Migration <Informatik>
Physikalisches System
Migration <Informatik>
Computeranimation
Touchscreen
Resultante
Retrievalsprache
Tropfen
Mathematisierung
Sichtenkonzept
Computeranimation
Kugelkappe
Autorisierung
Resultante
Migration <Informatik>
Mereologie
Abfrage
Fortsetzung <Mathematik>
Computeranimation
Virtuelle Adresse
Spitze <Mathematik>
Sichtenkonzept
VLIW-Architektur
Datenhaltung
Fortsetzung <Mathematik>
Migration <Informatik>
Sichtenkonzept
Quick-Sort
Computeranimation
Datensatz
Migration <Informatik>
Information
Aggregatzustand
Tropfen
Sichtenkonzept
Magnettrommelspeicher
Migration <Informatik>
Speicherabzug
Fortsetzung <Mathematik>
Tropfen
Computeranimation
Bit
Datenstruktur
Migration <Informatik>
Datenhaltung
Versionsverwaltung
Dateiformat
Fortsetzung <Mathematik>
Datenstruktur
Ganze Funktion
Computeranimation
Abfrage
Fortsetzung <Mathematik>
Elektronische Publikation
Migration <Informatik>
Sichtenkonzept
Quick-Sort
Computeranimation
Texteditor
Datenfeld
Migration <Informatik>
Verbandstheorie
Tropfen
Term
Tabelle <Informatik>
Retrievalsprache
Bit
Konfiguration <Informatik>
Total <Mathematik>
Sichtenkonzept
Versionsverwaltung
Zahlenbereich
Fortsetzung <Mathematik>
Migration <Informatik>
Elektronische Publikation
Fastring
Sichtenkonzept
Computeranimation
Demoszene <Programmierung>
Arithmetisches Mittel
Informationsmodellierung
Generator <Informatik>
Pufferüberlauf
Verbandstheorie
Speicherabzug
Gerade
Metropolitan area network
Tabelle <Informatik>
Resultante
Automatische Indexierung
Volltext
Sichtenkonzept
Größenordnung
Sichtenkonzept
Computeranimation
Tabelle <Informatik>
Tabelle <Informatik>
Netzwerktopologie
Matching <Graphentheorie>
Automatische Indexierung
Systemaufruf
Indexberechnung
Schlüsselverwaltung
Computeranimation
Eins
Tabelle <Informatik>
EDV-Beratung
Login
Raum-Zeit
Computeranimation
Datensatz
Datenmanagement
Reelle Zahl
Spieltheorie
Mini-Disc
Tabelle <Informatik>
Suchbaum
Automatische Indexierung
Sichtenkonzept
Datenhaltung
Raum-Zeit
Gebäude <Mathematik>
Spieltheorie
Indexberechnung
Bitrate
Sichtenkonzept
Objekt <Kategorie>
Datenfeld
Automatische Indexierung
Geschlecht <Mathematik>
Größenordnung
Mini-Disc
Inverter <Schaltung>
Tabelle <Informatik>
Resultante
Abfrage
Sichtenkonzept
Ganze Funktion
Computeranimation
Tabelle <Informatik>
Resultante
Sichtenkonzept
Abfrage
Abfrage
Größenordnung
Sichtenkonzept
Computeranimation
Tabelle <Informatik>
Resultante
Rechter Winkel
Fortsetzung <Mathematik>
Sichtenkonzept
Computeranimation
Konfiguration <Informatik>
Information Retrieval
Dienst <Informatik>
Informationsmodellierung
Quader
Fortsetzung <Mathematik>
Elektronische Publikation
Lie-Gruppe
Computeranimation
Softwaretest
Virtuelle Maschine
Softwaretest
Automatische Indexierung
Adressraum
Versionsverwaltung
Computerunterstütztes Verfahren
Physikalisches System
Biprodukt
Softwareentwickler
Gerichteter Graph
Netzadresse
Computeranimation
Informationsmodellierung
Datenhaltung
Sichtenkonzept
Computeranimation
Videokonferenz
Computeranimation

Metadaten

Formale Metadaten

Titel Multi-table Full Text Search with Postgres
Serientitel RailsConf 2016
Teil 27
Anzahl der Teile 89
Autor Thompson, Caleb
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/31536
Herausgeber Confreaks, LLC
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Searching content across multiple database tables and columns doesn't have to suck. Thanks to Postgres, rolling your own search isn't difficult. Following an actual feature evolution I worked on for a client, we will start with a search feature that queries a single column with LIKE and build up to a SQL-heavy solution for finding results across multiple columns and tables using database views. We will look at optimizing the query time and why this could be a better solution over introducing extra dependencies which clutter your code and need to be stubbed in tests.

Ähnliche Filme

Loading...