Quick and Dirty Usability: Leveraging Google Suggest to Instantly Know Your Users

Video thumbnail (Frame 0) Video thumbnail (Frame 828) Video thumbnail (Frame 1403) Video thumbnail (Frame 3194) Video thumbnail (Frame 4179) Video thumbnail (Frame 5029) Video thumbnail (Frame 6242) Video thumbnail (Frame 7600) Video thumbnail (Frame 8406) Video thumbnail (Frame 9071) Video thumbnail (Frame 9832) Video thumbnail (Frame 11090) Video thumbnail (Frame 12340) Video thumbnail (Frame 13922) Video thumbnail (Frame 15978) Video thumbnail (Frame 16629) Video thumbnail (Frame 17428) Video thumbnail (Frame 18439) Video thumbnail (Frame 19003) Video thumbnail (Frame 20656) Video thumbnail (Frame 22154) Video thumbnail (Frame 22971) Video thumbnail (Frame 23696) Video thumbnail (Frame 24379) Video thumbnail (Frame 25442) Video thumbnail (Frame 26864) Video thumbnail (Frame 28039) Video thumbnail (Frame 30181) Video thumbnail (Frame 38976)
Video in TIB AV-Portal: Quick and Dirty Usability: Leveraging Google Suggest to Instantly Know Your Users

Formal Metadata

Quick and Dirty Usability: Leveraging Google Suggest to Instantly Know Your Users
Title of Series
Part Number
Number of Parts
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place

Content Metadata

Subject Area
Every second of every day, people use Google to troubleshoot problems and to learn how to accomplish their goals. While Google doesn’t make its search query logs publicly available, Google Suggest can be used to learn the most popular queries for any software. We systematically mined all of the query suggestions for GIMP, Inkscape, Blender, and Scribus to learn about the primary needs and problems encountered by users of these software applications. As examples, our technique collected ~15,000 common queries for GIMP and ~2500 queries for Inkscape. In this talk, we will present samples of the most common search queries for these applications, and what they suggest about the software user bases and their needs.
Keywords Libre Graphics Meeting (LGM) Libre and Open Source graphics software
Revision control Computer animation Googol Computer Water vapor Usability Student's t-test
State observer Computer animation Software Query language Googol Interactive television Computer Water vapor Usability Right angle Line (geometry) Error message
Interface (computing) Electronic mailing list Menu (computing) Bit Menu (computing) Information privacy Bookmark (World Wide Web) Type theory Googol Computer animation Query language Googol Quicksort Block (periodic table) HTTP cookie Window Window
Group action Computer animation Computer configuration Different (Kate Ryan album) Query language Data recovery Multiplication sign Query language Mathematical analysis Set (mathematics) Menu (computing) Login
Onlinecommunity Context awareness Direction (geometry) Source code Archaeological field survey Library catalog Correlation and dependence Number Planning Centralizer and normalizer Blog Googol Repository (publishing) Query language Endliche Modelltheorie Weight Archaeological field survey Planning Library catalog Cartesian coordinate system Googol Computer animation Estimation Repository (publishing) Personal digital assistant Quicksort Cycle (graph theory)
Web page Java applet Password Menu (computing) Total S.A. Login Bookmark (World Wide Web) Cache (computing) Computer animation Blog Query language System programming Queue (abstract data type) Block (periodic table) Task (computing) HTTP cookie Resultant Task (computing) Physical system Default (computer science)
Group action Computer animation Googol Direction (geometry) Menu (computing) Bookmark (World Wide Web)
Web page Computer animation Estimation Personal digital assistant Data recovery Googol Archaeological field survey Projective plane Query language Menu (computing) Formal language Planning
Distribution (mathematics) Presentation of a group Scaling (geometry) Key (cryptography) Computer-generated imagery Bit Information privacy Formal language Type theory Medical imaging Computer animation Binary image Personal digital assistant Query language Quicksort
Aliasing Point (geometry) Web page Presentation of a group Divisor Multiplication sign Computer-generated imagery Set (mathematics) Primitive (album) Shape (magazine) Fluid statics Googol Query language Selectivity (electronic) Circle Task (computing) Scale (map) Bit Line (geometry) Instance (computer science) Formal language Similarity (geometry) Type theory Process (computing) Computer animation Circle Personal digital assistant Function (mathematics) Game theory Resultant
Line (geometry) Sampling (statistics) 1 (number) Maxima and minima Primitive (album) Line (geometry) Rectangle Mereology 10 (number) Type theory Process (computing) Computer animation Software Different (Kate Ryan album) Query language Googol Query language Square number Energy level Quicksort Square number
Keyboard shortcut Variety (linguistics) Multiplication sign Simultaneous localization and mapping 1 (number) Template (C++) Type theory Causality Different (Kate Ryan album) Operator (mathematics) Query language Circle Information Imperative programming Task (computing) Operations research Real number Bit Template (C++) Type theory Numerical taxonomy Error message Computer animation Visualization (computer graphics) Circle Query language Crash (computing) Statement (computer science) Quicksort Physical system Spacetime
Computer font Installation art Gradient Projective plane Line (geometry) Term (mathematics) Rectangle Graph coloring Type theory Word Computer animation Visualization (computer graphics) Query language Term (mathematics) Circle Quicksort Point cloud
Computer font Computer animation Gradient Image resolution Computer-generated imagery Interactive television
Rectangle Real number Ellipse Usability Drop (liquid) Shape (magazine) Digital photography Computer animation Circle Query language Googol Arrow of time Point cloud Right angle Square number Triangle
Email Onlinecommunity Installation art Computer Water vapor Maxima and minima Usability Icosahedron Term (mathematics) Computer animation Software Googol Convex hull Capability Maturity Model
Web page Type theory Computer animation Software Integral domain String (computer science) Query language Electronic mailing list Database Local ring Electric current
Web page Context awareness Computer file Chemical equation Translation (relic) Menu (computing) Graph coloring String (computer science) Query language Energy level Contrast (vision) Associative property Error message Scale (map) Interface (computing) Chemical equation Graph (mathematics) Electronic mailing list Database Cartesian coordinate system Similarity (geometry) Computer animation Circle Personal digital assistant Search engine (computing) Contrast (vision) Order (biology) Energy level Quicksort Ranking Resultant
Scale (map) Onlinecommunity Graph (mathematics) Information Direction (geometry) Mathematical analysis Ellipse Price index Cartesian coordinate system Shift operator Ellipse Type theory Process (computing) Computer animation Circle Network topology Query language Query language Motion blur Circle Game theory Quicksort Resultant
Web page State observer Building Divisor Link (knot theory) Observational study View (database) Parameter (computer programming) Web browser Usability Number Goodness of fit CNN Average Bridging (networking) Googol Bipartite graph Square number Energy level Cuboid Office suite Task (computing) Area Scripting language Software bug Graph (mathematics) Software developer Interface (computing) Usability Database Incidence algebra Instance (computer science) Cartesian coordinate system Performance appraisal Type theory Word Process (computing) Computer animation Software Query language Search engine (computing) Information retrieval Order (biology) Right angle Quicksort Freezing Resultant
Computer animation
so the authorities and on his work that he uses have been
doing in the PC student make up and we're a show you how you can use Google suggest to quickly figure out what the users are doing with yourself at this really cool technology later in the day were going to show a new version again with critical but that will give sister grandpa that I'm going to as well in both these talks I encourage you to interrupt during the talk and ask questions because you get the most out of
talk you have to stop and say week were about this what about that what about that right don't like to be in a similar turnover and now have it again please interrupt right so as to make that this is about understanding how people are using a software based on looking at what search search queries and performance based on the observation that when people run into trouble with software with interactive and devices 1st line of defense is
often Google error was very quick motivating example but so this is a given in the history of something
that so here's a quick example so back in September I went to google and type in Firefox have and immediately of course Google returns a list of of 10 suggestions for hours to complete the scoring and the important thing is that these queries are actual queries that other people perform in the past and the sort approximately according to their popularity just looking at this list we get a pretty good idea right away some of the common activities that of Firefox users engage in and so there's an interest in privacy clearing the cash deleting cookies and that but if we go further down the list here but highlighted this 1 is a little bit curious it comes from the outside of the menu bar back right so we can actually inspect the interface of Firefox to to try to figure out why this so popular alright so this
is Firefox 3 . 6 again as I did this in September so Firefox wasn't is still better than Firefox 3 . 6 on Windows XP I'm happy to report that this is a problem that's only on Windows but and if you go to the tool bars and you uncheck the menu bar while the many-body spirits the problem
with this is that that was the the the many-body is now gone and analysis actually access those options to begin with so the 1st time I did this actually had no idea how to undo that action had no idea how to recover from this situation and it turns out that I'm not alone as you might have already suggested suspected
but if you exhaustively look through Google suggest here are also hold kind of different search suggestions all related to this missing many barred on using the techniques and talk about later we've identified over 150 different suggestions in Google's just about this issue and we estimate that what a search from this set is actually performed about once every 32 minutes on average so what I think this this example demonstrates an action really appeal to your intuition is that search query logs
of the central repositories catalog the day-to-day needs of the user community and end a matching step back a little bit and look at this in sort of a broader context there's actually been some research done at microsoft research looking at how how to use quarry lots to do things like medical research in social social science research and I'm a researcher named Mark Matthew Richardson he has this quote which I really like and so the corre logs act as if a survey were sent to millions of people asking them every everyday to write down what they're interested in thinking about planning and doing so this is a very rich data and it's highly ecologically valid so to demonstrate that actually
but some of you may have come across as in past but Google actually produced an application called Google Flu Trends where they look at health-seeking behavior also searches related to flu symptoms or by direct and everything to try to predict when somebody would go to the doctor and and the diet food now the important thing is that their model actually is very close to the data that was released by the CDC but they're able to produce these numbers in 24 hours whereas the CDC had 7 daylight because at the weight for the doctors to report all these cases so it's a very very powerful technique now taking this a bit closer to to to what we do so I'd I looked at the google insight for born to and I think we can see a six-month release cycle here so it's actually a pretty neat and data source right so the claim here is that Corey
logs can reveal the tasks and the issues for any publicly available interactive system so I stress here that have to publicly available because people have to be performing searches and and you get better results when it when you have a larger community a larger user base but the problem we have is that we don't have access to to Google's Corey dataset and so the questions how we approximate this data and think you guys already know the answer but but just to reiterate so
here's here's example from the beginning again but if I add 1 letter to now I get another 10 suggestions I can just sort of do a breadth-first search here and get more suggestions and in total there's about
74 75 thousand distinct suggestions if we do this action just do this lot just give you a sense of how easy it is to do it so
this is Hey and you get the idea that you can sort of see the common questions very quickly and some people don't know this I think of chemical of the direction of the character back and it will actually feel like in random places so you can actually really quickly go through and enumerate these things the
European Patent spots so I actually did that for brunch
projects but some of them are listed here so you can see that you know between in many cases thousands or in some cases hundreds of thousands of of quot suggestions are returned I have asked for Blender here of an ostrich you just candidate that that 1 project has a common name you end up having to disambiguate between blended project and lenders for ice and things like that but you can actually filtered out pretty well just by actually performing the searches and seeing what pages come back and then you can kind of determine from the language of the pages whether or not it's relevant to a particular topic so now I talked about offered for
example give have about 15 thousand suggestions those are the suggestions and but they actually represent about 2 . 8 million search queries there's actually more queries and that that but but the the thing is is that Google about to preserve people's privacy they do this sort of key in Canada mization so they become off long tail so only searches are performed by many users are actually recorded in this dataset self so we've got data very very quickly representing about 2 . 8 million searches and that's kind a typical for this type of thing and of course the the the popularity of searches it falls off exponentially so the popular stuff is really popular and then it has a sort of long tail distribution of alright so I'm just you a couple of
examples that very briefly and so this 1 here like quite a bit is called I like to refer to a speaker uses language and and again I doubt this presentation from most of the presentation did earlier in the week so so I mean this is probably pretty obviously but from within a given dataset we actually see a lot of searches for people asking how I convert image to black and white and here they're not talking about a binary image in most cases actually they they want something that looks like it was a an image captured on black and white felt and and there's many ways of doing this in the end and you know you can use a gray scale the center channel Mr. command that kind of thing I the but problem is is that None of these commands use the words black or white and and that's what people are searching for so the least that research suggests that to to some percentages of of the audience of maybe they're not able to recognize that these these commands are are necessarily relevant so we acted in a fight over 90 different
distinct phrasings for the the question of you know how to convert a black white and on research about 7 once every 74 unless there's a question it's maybe going to address that inferior federative his intention of maybe that case it's just a random idea have you thought about having instead of just have a set of static menus having a searchable set of menus yes in fact this factor is quite prominently into our research is something that I'm working on currently I can show a little bit about that at the end of the presentation but actually my colleague then there is going show you that will get this afternoon and search plays a big role there as well and I think that's a good point I mean you can't find 1 vocabulary that's can fit everybody what you wanna do is have some of these aliases where are depending on your background you can come in and still find what's relevant to the former task also another example here this time Inkscape so a lot of people asking how the crops they're not thinking about fitting the page the selection of these are types of things and and here we can see that these types of searches are performed about once every 3 hours it's a slightly lesser try and so and we can look at
another example here this so with the claim that in game to drop primitive shapes you is is typically a multi-step process and you dropped note line circle for instance user select you stroke selection but and and and and as a result we see many many many searches for people asking how I draw a circle this searched about once per hour and that's not the only primitive shape people looking
for we see like 130 different ways of asking how draw various types of wines in particular straight lines 40 different suggestions for rectangles 24 squares 14 4 lectures so is just this data suggests that maybe a multi-step process is sort of alluding sample portion of the of the audience for the for the software people are actually using the data from day to day
right so so those are a couple of examples of just text do you you can get from this on a fairly high level but again we do get a fair trade I mean tens to hundreds of thousands of queries so we do get a good chunk of that long tail so you can get pretty specific I I just wanna go into how we can start to identify interesting parts of the of the suggestions that that which focus on so the problem is that when you get 140 thousand suggestions not all of them are useful for understanding how people use a soccer maybe people just try to download the software maybe they want to the 3 reviews or something like that so you wanna build pick the ones that are useful in any kind of already sort taste of that but I'm just gonna go
into it in a bit more detail so In our research we've identified in this space about 6 different types of queries and the ones and look at the for understanding people's use our operating instruction causal so people asking how I perform a particular task or troubleshooting course and
what we did is we actually only 1 what that these queries it turned out that if a chorus phrased as a question so how to work hand and things like that was typically instructions and the people looking for operating instructions so I have some templates up here but the thing to you can sort of see the sort of imperative statements so any time you say like draw a circle on the sort of the the verbs in the present tense I kind of thing that also tended to indicate people looking for and for operating instructions whereas for troubleshooting it tended to be the sort of statements of fact this is this this is the situation and and the other thing that's useful oftentimes just look at queries that have certain keywords in them on and the because the obvious ones here but there but just to give you an idea of how to try to filter the data in again a quick and dirty way to get at what you're interested in and of course once you've done that there is a variety of different visualizations and tools you can use to try to navigate all this data so
simple tag-cloud here this 1 for Inkscape I have all and actually show maybe lot them off this time of of some of the data that we have various projects so your simple type of here but the other 1 I
like here is the term co-occurrence visualization so the way that this works um Mrs. we users so this stuff down this column here represents the most common words and then the stuff that goes across the horizontally represents the words that co-occur in the shop in the same query as these words so you can see right away here that what people wanted to know whether people wanna draw 1 draw lines rectangles curves circles excetera you know that what they want to color will that change color by color hair color and so and so it's just a way of summarizing the data and getting sort of sense and feel off what what people are interested in doing so and actually before into
current working I'll just show some of this so we have these
interactive titles of sponsors for this so we've got we've got benefiting Inkscape which removes her and Blender for again and you can kind of unfortunate resolutions along here but you can sort of see what I'm talking about here is this the fixed on draw yeah we connect I promise it was working before and you know
you can get a sense that began with the tag cloud of what people are trying to what people are trying to draw so these are this is tilted on were drop and you can sort of see on the right here all queries that have to do with drawing in get and so the so we can talk about the that this dataset afterwards so I think so I think it is
going to conclude with an idea other ways of we're looking
user data currently in this gets back to your question so
so up to now I've been advocating that we use this data to understand how people are using this software in practice to get a real quick sense of of a large user community but we can also make more active use of the data so this is some of the work that I've been working on
doing right now currently so basically when somebody types in search going to google the retrieved documents tutorials form postings etc. and embedded in those pages are many references to commence in the
software and we actually we can produce a list of commands by instruments in the software or just by looking at the localization database of the string string database on so anyways all we do is that we we perform performances is retrieved documents and we identify the commands that are in those documents and when we take a joint dataset from Google suggest we can
actually create these really are fairly large graphs where it sorry of well with given that we just use the dataset about this so Indian because it was instrumented to require all command invocations we have a list of all commands but for other applications in 1 you have to instrumentals applications the given the name of the command so I did it in a kind package but very quick way so again I I just looked at the the translation files like the string translation databases and now I have all the error messages all menu names of men and and it's just very quick many case you you when you do that you take this very large datasets you can create these associations these graphs so queries on 1 side and commands on the other side and what you can do that is you can build smarter you can build smarter so sort of command
search so if you were integrate search into the interface if I wanna say convert the black and white can come back with based on just looking at the pages that Google returns no channel mixer grayscale the saturated and it's a search engine so that the rankings here may not always be the best so maybe the saturation the above channel mixer but that we've got a pretty good 1st result there and similarly you can kind of do command recommendations if you will so if I ask what commands are used in a similar context the stretch contrast you see that white balance order levels and colors are also used in the context and then the other thing we can do so
that's going from this direction so going from quarries to Khmer but because we have this graph built from all of these queries go in the opposite direction so from a command what queries are associated with that so so here's something that I'm working on right now
and this is just a mock-up but suppose that this is a tool to that for the ellipse electoral being game well based on the data that we have we can say that well it selects is result is related to the following searches so draw a circle job let's text on circle correct red arriving at fact and you can see what other commands associated with that now this is generated by the user community and on the fly so sort of a ball the analysis people's the soccer evolves and and and the nice thing about search's is that if you think about what you're doing when you make this when you when you type in a quarry we trying to come up with this very concise information rich phrase to describe what is you're doing or where you're looking for your information need and so I think that these queries here of the compact that they give a lot of data give a pretty good indication of what people are after a type in search itself so this is 1 way that we're looking at integrating this data into the application to make it actionable so I think that really concludes that everything wanted to say today but I'm more than happy to
take questions of this yeah so aside from all this looking very very interesting 1 1 factor I I wasn't sure if you looked at that is it when you're freezing things and maybe
thinking about as a developer thinking about using these results to change the phrasing of what you have in your applications of people can understand it better right doesn't want have you looked at all is the reason you don't get searches on and the phrasing is because it's a good reason to begin with and you're making more people happy and only unhappy minority or going out to search for things that you might make things worse rather than better by switching so I think that 1 thing probably should've mentioned going into this is that I did this is a way I would say do a pilot study in the sense that if you if you want so many talks about the usability studies are just how observing people use your software right but we you want do is you wanna put people into situations that a lot of people wanted to perform these tasks any anyone put people into situations where are you really get some sort of value from the user evaluation so when you run its course and you get the square is back it's a suggestion to watch somebody perform that task and if they're having trouble then you can try to get some insights from from from these follow up observations by it might turn out that it's not usability problem just the very popular task that people want perform this kind of build on that but that's a good question John and that you can look at some the numbers we have only 1 people or acquiring some these things like an average of once every 30 minutes that is a fairly you know some the thing that really be problems or the things that are fair number of people or searching and so yeah we we were talking to get the stuff where people know how to do it and therefore they don't need to search for right but you know as we mentioned earlier there isn't necessarily 1 size fits all interfaces and so having the ability to search like he showed would be and you go back to the year we use modernism and yes so like you have ability type and this is what I want do and then it comes back with the the actual command interface might be a way to kind of bridge the gap between people who use the vocabulary of your application and those who are unfamiliar and so that I'm just curious about the standard but which we're using the scraper whether that's publicly available like for instance the Digital Methods Initiative comparing all the stuff but just use I'm query incidence of on it so when I so it's actually a good question the what I'm actually grabbing like these thousands of of of course suggestions that's a Perl script that I run and I don't make it publicly available because quite frankly it's sort of in this gray area where we're using the same interface that Firefox and other web browsers used to do query suggestions in in the search box in the corner but but I'm not entirely sure if I don't want underwater is publicly because it's instead of a gray area because we are sending automated queries what I'm suggesting to you guys is that are you guys can get a lot of bang for your buck just by typing inquiries manually which is perfectly fine the way I showed you may get a fairly high level view of how people are using so by typing in these types of things this is not to say you couldn't go and write the same process yourself is that they have a page OK I don't have a question about this graph between queries and commands that we have a tool to make this relational opposites has to be done manually well I that I have a tool that does it and it want get into the details too much but basically this graph here it's using some so techniques from the information retrieval literature answer to Question Answering and then so basically what you do is you build a question and should answer engine so you you've you've caught your instead of like a search engine you wanna build type in question again just like in the name of the command or whatever is an answer so I built that engine that perform those queries native stored the results in the database like almost the capture and and that is that it's just a bipartite graph is used no because I remember look like separated in black and white they don't have any common words so make link only so basically the argument is that these pages factors like a Rosetta Stone so a tutorial is telling people how to perform a task in this office they can have to mention that many in the documents but in order to be retrieved for that query they also have to have some words in common with the query so by performing the search retrieving the relevant pages and then by extracting the commandments from those pages you link the 2 vocabulary for principle many more questions the entombment was and so on the 1 hand pursuant to few