We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Fuzzy Search on Plone and Search for East Asian Language

00:00

Formale Metadaten

Titel
Fuzzy Search on Plone and Search for East Asian Language
Serientitel
Anzahl der Teile
39
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Plone Conference 2013 and Palestras da 9ª Conferência Brasileira de Python (PythonBrasil[9]) - Brasília / Brasil
NormalvektorArithmetisches MittelInformationMinkowski-MetrikFormation <Mathematik>QuellcodeQuelle <Physik>DifferentePhysikalisches SystemÜbertragGesetz <Physik>Translation <Mathematik>Exogene VariableAbstandKontextbezogenes SystemLesen <Datenverarbeitung>UnrundheitQuick-SortProdukt <Mathematik>Metropolitan area networkSystem FVerknüpfungsgliedNotepad-ComputerZweiFormale SpracheEnergiedichteQuaderSpieltheorieInternetworkingComputerunterstützte ÜbersetzungFokalpunktWort <Informatik>MathematikMultiplikationsoperatorImplementierungKlasse <Mathematik>Bridge <Kommunikationstechnik>Heegaard-ZerlegungMailing-ListeJensen-MaßDelisches ProblemRechter WinkelXMLVorlesung/Konferenz
AggregatzustandProgrammierungRoutingWort <Informatik>Physikalisches SystemRuhmasseOrdnung <Mathematik>RechenwerkZeichenvorratFlächeninhaltAutomat <Automatentheorie>Vorlesung/Konferenz
Physikalisches SystemWort <Informatik>Automatische IndexierungQuaderFormale SpracheMorphingAbstandInhalt <Mathematik>Demo <Programm>Ein-AusgabeYouTubeAutomat <Automatentheorie>Minkowski-MetrikHeegaard-ZerlegungSuchmaschineCodeWeb SiteSprachsyntheseOrdnung <Mathematik>MereologieSatellitensystemMultiplikationsoperatorVorlesung/Konferenz
Produkt <Mathematik>MultiplikationsoperatorWeb-SeiteKonfigurationsraumVorlesung/Konferenz
DefaultWeb-SeiteAutomatische IndexierungMultiplikationsoperatorMereologieSkriptspracheInhalt <Mathematik>Digitale PhotographieVorlesung/Konferenz
MaßerweiterungObjekt <Kategorie>Ordnung <Mathematik>Produkt <Mathematik>Mailing-ListeSchlüsselverwaltungQuick-SortDatenstrukturWort <Informatik>Automatische IndexierungData DictionaryVorlesung/Konferenz
PunktTotal <Mathematik>Automatische IndexierungObjekt <Kategorie>AdressraumArithmetisches MittelWeb SiteCASE <Informatik>QuaderDifferenteGrenzschichtablösungFormale SpracheKonfiguration <Informatik>SkriptspracheEin-AusgabeWort <Informatik>Mailing-ListeVerkehrsinformationSprachsyntheseAbstandPhysikalisches SystemVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
I will talk about Japanese language and other languages. Second, main topic of the session, Fajisajin Plan, questions.
How many languages can you read? So can you read in all languages? All languages can you read? So over five?
Over five? Oh, OK, so four, three, three languages, oh, seven, six, OK. So I have only two, Japanese and English. So only obligate is the need, I can read.
So first question, which do you think double byte languages? Which thing? So these are double bytes, six.
So second question, do you know which is in R-L-T language? Do you know? I don't know. Right, right to the left? You know?
No. So just one. In the list, in this list, so Japanese is L-T-R. So all the Japanese, about 70 years ago, so R-T-L.
So change it to L-T-R. OK, last question. Do you know which are not split using white space? No space, no white space for the word?
Do you know? Yes, yes, and Korean. So three languages, Japanese and East Asian languages do not use white space for splitting the word.
So can you read this? OK, and translation is this. So this is Watashi wa Terada Manavu desu. Nihon no tokyo kara kimashita. Brazilian kuru no hajimete desu. It is this. I am Manavu Terada.
I came from Tokyo, Japan. I have come to Brazil for the first time. So Japanese, normal Japanese users can, Japanese users can read. So not white space.
Watashi, I is Watashi. OK, I know. Nihon is Japan. No, not necessarily. Tokyo, kara is from. Kim is kimashita. So we can read no space.
So the Japanese doesn't have a white space for splitting words. And Japanese has three different characters. So hiragana and katakana, kanji. So hiragana is Japanese horn stick. Katakana is Western horn stick.
And kanji is originally Chinese. So hiragana and katakana are each 50 characters. And kanji is over 7, sorry, 2,000 characters. So Japanese has the same homonym
by different characters and has different homonym by same character. Can I read? They are same meanings. Top of Kyoto, beautiful city in Japan.
This is kanji, Kyoto. Hiragana, Kyoto. Katakana, Kyoto. So same word, same reading. So we have four types of characters, four characters. So of course, we can read. And can I read?
Hashi, hashi, hashi, different characters. Same reading, same horn stick. So they are different meanings. So hashi, first one is a bridge, bridge.
And second one is hashi, sorry, edge, edge, edge. So last one is chopsticks, hashi. So we can understand those by context.
So watashi wa hashi wo moteimasu. So I have a bridge, so I have chopsticks. So we have a lot of language, and we have a lot rules,
and we have a lot of issues. So I won't have any solution in long. So let's start. So main topic. I made the product. It's called Shih Tzu Sachi Faji, so but not.
So it is alpha release. From search box, brain, not context, zero items.
So did you mean brown? This product is there. So we want to get suggestion that are same as Google. But in the internet, we cannot use Google,
because we don't want to pass information to Google. And this product not to use solar. So I know solar is working well, but it's difficult to install and configuration
and implement. So and I want to original system. I want to build original system. So basic technologies. This system is not difficult. So keyword is only three.
Lebenstein distance and sorted list and automatic system. First one, Lebenstein distance. Lebenstein distance is storing metric from which area. So it's difficult. So example is too easy.
So zero distance, so basic word is brown. So zero distance is brown, uppercase, camera class. So one distance, phone, other words, shots.
So two distance is one, PLO, it's two distance. It is Lebenstein distance. Second keyword, sorted list, order container.
So can get order of words. So sorted order from unicode, so by alphabet. Area by alphabet. So this example is for G20, the countries.
Argentina, Australia, Brazil, the United States. So sorted from alphabets. Last keyword, Lebenstein automata system.
So look like mass acquisition. Automata is difficult. However, I found a good program recently. So it is only using Python.
So I'm using the code. So complete the introduction of the keyword. So next, so I made an index and search system.
Search engine needs some part. So index and search system are important. So index, it create the original index, so like sorted list, when a problem content is being created or modified.
Search, searching from original index, where we input into search box. Correct spelling will be shown in original index in this distance. So because it can be shown inside problem content.
For example, so we want to show by one distance, it is default. So from G20 countries, if input search box in search box means spelling.
So suggestion to Brazil. So Japan. So it use automata system for increase speed. So next, dependencies.
So we need only Python. But however, the Japanese and some East Asian language don't have white space for splitting word. So we use Macabre.
Macabre is morph engine for Japanese. So kanji to holistic and normalized. So because we have a lot of hominin.
So supported language, English and other European language. Also supported, supporting Portuguese. But maybe, I cannot try.
So and maybe Arabic is supported. It may be. So but Chinese and Korean, so it need to work splitting system. But I don't know it. OK.
So demo on the YouTube.
I don't install activate.
So shown the product add-on configurations. So first time, so repeat, need to repeat configurations.
OK. This is Japanese default front page. So it's true.
So not shown. Brown, sorry, can't. Maybe not Japan, not Japan's JavaScript.
Doesn't work. So Japanese demo, so Musukashino.
So OK. OK. And so just create event.
Then it does work.
So content create, so add content create. So create a original index. So just time is shown.
OK, structure of the product. So index data will be stored in the ODB.
So it's list object. So when it being created or modified, so we'll update the list by sort it. List is into dict dictionary. Dict key is holistic or lowercase in English. So value is original words.
So example, Argentina, Argentina, Argentina, Argentina. So Kyoto, Kyoto, Kyoto. So this object into geodb for index. So search.
So checking the list from input word for less distance by automatic system. It's shown the original word from list in dict values and the search box by JavaScript.
So for Japanese, I'm using make-up, so splitting and getting phonetic. It's sorted phonetic and original word because Japanese have the same homonym by different characters.
OK, the plan for future. So now I'm using geodb for index storing. But I want to have option storing to add db. So I'm trying to develop it.
And I want to support more languages. So please help me for more support languages. So summary of this session.
So Japanese and Eastern Asian language, we have any problem yet in prong. So I think prong is working well in much languages. So I wish prong will be continuous, working well. And all developer, you never forget other languages.
So and I have a Japanese sprint tomorrow. I went to join me, maybe me, so, OK. So and first is that, so I want to get the bug report.
So and please try to use the product, OK. Special thanks supported by IKE and Hiratara and the referral website. So thank you very much. That's all.
So questions, do you have any question? So I want to speak easy English, OK.
Anything else? OK. OK. Thank you very much.