Boosting Books in Search Results at the University of Minnesota Press
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 38 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/55556 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
World Plone Day, 202117 / 38
4
5
16
24
27
28
31
34
37
38
00:00
Überlagerung <Mathematik>Web SiteDatenverwaltungComputerarchitekturSoftwareentwicklerQuick-SortNotepad-ComputerBitSpannweite <Stochastik>BenutzerbeteiligungURLNatürliche ZahlMereologieHochdruckOpen SourceFokalpunktDemo <Programm>Klon <Mathematik>GrundraumMultiplikationsoperatorInhalt <Mathematik>ComputeranimationBesprechung/Interview
02:23
Stochastische AbhängigkeitLesen <Datenverarbeitung>FreewareW3C-StandardStetige FunktionInformationWeb SiteHomepageMereologieURLWeb SiteProgramm/Quellcode
02:36
Gesetz <Physik>Baum <Mathematik>HomologieFreewareBildschirmsymbolZoomÜberlagerung <Mathematik>ProgrammbibliothekEreignishorizontLie-GruppeZufallszahlenPRINCE2Freier LadungsträgerArchitektur <Informatik>Stochastische AbhängigkeitWald <Graphentheorie>BildschirmmaskeFormale SpracheRuhmasseRechenwerkWeb SiteWeb-SeiteResultanteAutorisierungRechter WinkelXMLProgramm/Quellcode
03:29
DivisionSoftwaretestStochastische AbhängigkeitDigitalfilterWald <Graphentheorie>DatenbankInhalt <Mathematik>TermEreignishorizontFormation <Mathematik>Motion CapturingVirtuelle RealitätAbschattungWort <Informatik>Architektur <Informatik>RechenwerkHypermediaBeobachtungsstudieProgrammierumgebungHalbleiterspeicherHill-DifferentialgleichungInformationAutorisierungVerschlingungEreignishorizontFunktionalHypermediaMereologieWeb SiteWeb-SeiteFächer <Mathematik>ResultanteProgramm/QuellcodeXML
04:44
Divergente ReiheEINKAUF <Programm>Gesetz <Physik>KardinalzahlFlächentheorieProgrammierumgebungArchitektur <Informatik>BootenAnwendungssoftwareDivisionVirtuelle RealitätEreignishorizontWasserdampftafelLaufwerk <Datentechnik>Physikalische TheorieFächer <Mathematik>Computeranimation
05:06
EINKAUF <Programm>BeobachtungsstudieProgrammierumgebungHypermediaFlächentheorieEreignishorizontHill-DifferentialgleichungStochastische AbhängigkeitWasserdampftafelDivergente ReiheArchitektur <Informatik>DivisionSoftwaretestDigitalfilterDatenbankTermWald <Graphentheorie>BootenInhalt <Mathematik>BitUmsetzung <Informatik>StandardabweichungAutorisierungOnline-KatalogDefaultAutomatische IndexierungFlussdiagramm
05:42
Konfiguration <Informatik>DatenfeldDeskriptive StatistikAutorisierungAbfrageSuchmaschineQuick-SortLeistung <Physik>Automatische IndexierungFront-End <Software>SchnittmengeWeb SiteKlon <Mathematik>TUNIS <Programm>VersionsverwaltungMathematikDifferenteSpannweite <Stochastik>ProgrammbibliothekTypentheorieBitBenutzerbeteiligungIntegralTermCASE <Informatik>ResultanteEreignishorizontGraphische BenutzeroberflächeBesprechung/Interview
08:58
DatenbankDivisionSoftwaretestInhalt <Mathematik>Stochastische AbhängigkeitBootenWald <Graphentheorie>EreignishorizontTermDigitalfilterVirtuelle MaschinePunktWeb SiteBeobachtungsstudieProgrammierumgebungHypermediaComputersicherheitEntscheidungsmodellKommutativgesetzArchitektur <Informatik>Arithmetisches MittelPhysikalische TheorieMIDI <Musikelektronik>Web-SeiteWeb SiteQuick-SortFunktionalProgramm/QuellcodeXMLFlussdiagrammComputeranimation
09:22
BeobachtungsstudiePhysikalische TheorieProgrammierumgebungHypermediaComputersicherheitWurm <Informatik>DigitalsignalArchitektur <Informatik>Relation <Informatik>Arbeit <Physik>DefaultQuick-SortAbfrageNormalvektorWeb-SeiteWeb SiteInhalt <Mathematik>Computeranimation
10:27
Evolutionsstabile StrategieProgrammierumgebungLesen <Datenverarbeitung>Physikalische TheorieSpieltheorieDigitalsignalHypermediaBeobachtungsstudieTheoretische PhysikArchitektur <Informatik>Dynamisches RAMGeschlecht <Mathematik>Arbeit <Physik>Lateinisches QuadratAbfrageHilfesystemQuick-SortWeb SiteRichtungInhalt <Mathematik>PunktwolkeProgramm/QuellcodeXMLFlussdiagramm
11:33
FlächeninhaltDifferenteMatchingMixed RealityReelle ZahlQuick-SortTrennschärfe <Statistik>Web SiteInhalt <Mathematik>Charakteristisches PolynomSuchmaschineOrdnung <Mathematik>Besprechung/Interview
Transkript: Englisch(automatisch erzeugt)
00:07
Hello, and welcome to our World Fun Day session. I'm Sally Kleinfeld. I'm Jess Carda, with the US-based company specializing in open source web technologies, including fun, of course.
00:20
And with me is Emily Hamilton, who is the assistant director for book publishing at the University of Minnesota Press. And Alec Mitchell, one of our developers here at Jess Carda, also a former fun release manager. I'm at U of U. And the three of us have been working together on the University of Minnesota Press's Flown website
00:41
since 2011, and we're going to share a little bit of that history with you with a particular focus on site search. So Emily, take it away. Tell us a little bit about the press. Sure. I'm happy to. So the University of Minnesota Press is a scholarly publisher.
01:02
It was established in 1925, so we've been publishing books for a very long time. Primarily, we do books in the humanities and social sciences, also art and architecture. Currently, we publish about 130 books a year,
01:20
but we have more than 3,000 titles in print. So we use our Flown site to market all of those books and sell them as well, and many other things that I'm sure you'll see soon. The University of Minnesota Press is a part of the university, but it
01:42
is a standalone publisher. So we publish topics ranging from very, very specialized sociology to children's books that have to do with the natural world in Minnesota where we're located. So it's a wide range of content, and I
02:04
think that about covers it. Great, great. And can you say just a little bit about the importance of search to your website, sort of what you want people to find when they search for things on your website? And while you're talking about that, I'm going to go to sort of a demo of search on the site.
02:23
So there's the site, and you can see the URL up at the top here, everyone. Go ahead. That's our home page. And of course, when people come to any part of our website, we really want them to be able to search and browse the things
02:44
that they're interested in. And so we've got the Search Site button global search right there, and we want, for example, if someone hears about an author that we publish, we want them to see all of the books
03:02
that they've published with us right away. Perhaps it's obvious, but we want them to be able to get to know those books and buy them, hopefully. So I'm just searching on Casanova, which Emily has prompted me as, I guess, a popular author on your site. And although Casanova is not shown
03:22
in the titles of any of these, these are all books by this author shown in the live search here. And I'm going to hit Return and go to the search results for that page so you can see more about that. One other thing about the search results here is that if someone's a fan of Mary Casanova, the books are organized with the most recent coming up.
03:45
So that might be the one that they've heard about now, because that's the one that's just published. But they can then scroll down and see all of the books that she's published with us. Right, and if they're interested in other things,
04:01
the search results page, we customized this so they're also able to see, for example, events that she might be taking part in, news articles. Yes, so we use all of the functionality on the site to include media hits to recruit for events.
04:23
Of course, right now they're virtual primarily, but we use it the same way. And here you can see the e-commerce link and a lot of overview and a lot of details, author biography, any reviews. So there's a lot of information
04:40
that is published about this book as well as related books. Yeah, the related publications is one of the great things about the search, too. We can connect a lot of things that people might not find based on one search. So if they're fans of Mary Casanova, we have a whole lot of other books
05:01
that they might be interested in as well. Yeah, great. Why don't I turn the conversation over to you, Alec, and maybe you can say a little bit about how this works, how it happens that the books come up first when you do a search for something like an author's name.
05:24
Sure. So typically in a Plone text search, by default, it would use something called the ZC Text Index, which is the standard Z-catalog text index, which does some basic kind of relevancy-based sorting,
05:43
and that's effectively your only option. In newer versions of Plone, you can have some kind of secondary sorting if there are relevancy matches that are similar, but you're pretty limited in terms of how you can sort things. So what we do with the University of Minnesota Press
06:01
is we use a Solr search backend, which is a Lucene-based search engine using an add-on called alm.solr-index to power the search index for the site, and we use that for searchable text and also for title. And then secondarily, we index the publication dates of the books
06:24
and also the type, the portal types. And using Solr, we can add, we can modify the queries that come in to tell it to boost certain aspects of the search results so that we can ensure not only
06:40
that we're doing a relevancy-based search on the searchable text, but also and the title and the author and getting some particular boost. So we make authors more important than like general descriptions and titles more important than general descriptions and things like that so that we have some kind of boosting of the importance of each of the different fields,
07:02
and then more importantly, of the recency of the date of publication of these things. So we, for example, will push sort of all books above all events or give a boost to books that tends to push them above events, and then give a boost to author names
07:21
to push them above something where the person, the name Casanova was in the like sort of blurb about the book or something. And so that, you're right. And so that gives us a sort of wide range of ability to tune results. And we've done it, over the years, we've done a few different kinds of changes
07:41
to the tuning to get where we are now, where we have a set of search results that really promotes books, gets them up by date, but make sure that the results are really relevant in terms of what people are doing. And it's a Lucene, what's called an e-dismax query for the Lucene heads out there that we use.
08:03
And we have a fixed one that sort of boosts all of these particular things and gets them working in the way you need to, which can be pretty difficult for internal searches, but it's worked really well. And it works with the faceted searches and with the live search and the regular search. So we're very happy with that. Thanks for mentioning the faceted searches.
08:22
We'll look there. And just real quickly, some of you may not have heard of Solr, but you might've heard of Elasticsearch, which is apparently, I guess Elasticsearch is like the most used search engine on the web these days. And both Solr and Elasticsearch rely on the underlying Lucene search library.
08:41
Solr is a little older. They basically are useful for different kinds of use cases. We've been using Solr for quite a while and quite successfully here and there are integrations for both options, Solr and Elasticsearch available in Chrome. Anyway, I'm gonna go back to the website and look a bit at this faceted search,
09:00
which you can find by clicking on this explore page. When we were developing the site, I found myself just spending untold minutes playing with this explore functionality. So it sort of pulls you in to the amazing collections of books
09:22
that University of Minnesota Press offers. You can just go there and you'll actually, maybe Alec, you can explain how these things are being boosted now without any kind of search at all and what this technology is using.
09:41
Question for Alec, that was. Sorry, sorry, I was muted. I was muted there. Oh, yes, it was the old muted problem. So yeah, by default, we have the faceted search just due, because these are all books. The faceted search is limited. On the explore page, just the books. There are some other faceted searches for other content on the site,
10:00
but this one is just books. And by default, when there's no query, we sort by publication date. And then once you put a query in, we rely a lot more on that same solar relevancy searches when you're doing a text query. I believe when you do a normal facet query
10:22
using one of these sort of checkbox facets or the tag facet, then it's still sorting by default by date. But as soon as you enter a searchable text query, you'll end up with the kind of solar-based relevancy search that also does some date boosting to ensure that recent content is really being promoted.
10:45
As you can see, you can combine a subject with another primary subject to narrow things down. And then there are also finer-grained topics over here that you can select also. And there's one that's available there.
11:01
And there's a tag cloud there to show what's happening with these books. Yeah, Emily, any thoughts on this Explore search? I know it's not been perhaps as important to you guys on site as some of your kind of direct search
11:22
for various reasons. I don't know. Any thoughts on the faceted search? Yeah, the ideas behind the faceted search are really to sort of help people do a, really sort of mix and match kind of search
11:41
based on what they're interested in. And because we publish in so many different areas and in so many different subjects, people can combine a lot of different interdisciplinary interest areas in order to find books that are perhaps not all the same major discipline,
12:02
but they share some characteristics that they might be interested in. So it's kind of a more browsable, I would say, feature, kind of like what you were saying, Sally, where you can just sort of click around and find things that you're interested in
12:21
rather than sort of knowing what you're looking for. But we envision it as something where people can kind of enter into what we do and see books that cover a lot of different areas, which is really inherent in what we do
12:41
because we do so much interdisciplinary and cross-disciplinary publishing. Yeah, yeah, yeah. Your selection of books is fascinating. All right, great. Well, we hope we've given you a little taste of some of the kinds of things you can do integrating a real feature-rich search engine
13:03
like Solr or Elasticsearch with Plone to be able to make it easier for people to find possibly more specialized content on your website. And thank you very much to Emily and Alec for joining us and telling us all about it.
13:21
All right, thanks, Sally. Thank you.