A lightning intro to re-Isearch - TIB AV-Portal

A lightning intro to re-Isearch

00:00

0

Zimmermann, Edward C.

Formale Metadaten

Titel

A lightning intro to re-Isearch

Untertitel

re-Isearch, the 27 year old new kid on the search block

Serientitel

Anzahl der Teile

287

Autor

Zimmermann, Edward C.

Mitwirkende

Arzu BT (sea_plus_plus)

Lizenz

CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/57059 (DOI)

Herausgeber

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

Project re-isearch is a novel multimodal search and retrieval engine using mathematical models and algorithms different from the all-too-common inverted index. The design allows it to have, in practice, effectively no limits on the frequency of words, term length, number of fields or complexity of structured data and support even overlap--- where fields or structures cross other's boundaries (common examples are quotes, line/sentences, biblical verse, annotations). Its model enables a completely flexible unit of retrieval and modes of search. Developed using a highly portable C++ subset to be RAM efficient, the engine provides also bindings to a number of other languages such as Python, Tcl, Java etc.

FOSDEM 2022274 / 287

1

28:39

A gentle introduction to Picocli

2

43:04

Z80: the last secrets

3

34:39

Why your next embedded project should be written in Go

4

32:07

5

23:40

A Better Public Transport App

6

29:11

Open Geodata Digital Spaces

7

16:22

Open Source Firmware status on AMD platforms 2022

8

28:22

Firmware Settings and Menus

9

28:50

Bringing RAUC A/B Updates to More Linux Devices

10

11:45

GPIO across Linux and Zephyr kernels

11

28:48

Eclipse Amlen: Messaging for IoT/Web/Mobile

12

28:42

Back to DirectFB!

13

29:23

Why Embedded Linux Needs a Container Manager Written in C

14

1:00:10

Automotive Ethernet PHY bring-up: lessons learned and debug tips

15

25:52

How to teach OSS licenses and compliances at a university

16

30:10

Why the pandemic could help FOSS, but was a win for proprietary software

17

58:11

Panel: Hot Topics

18

27:53

A globally unified governance framework for Open Source

19

28:40

An update on the Digital Markets Act

20

28:19

Why Device Neutrality is important for Free Software?

21

28:19

Somebody set up us the bomb

22

29:12

Rapid Prototyping Physical Interfaces with Web Serial and Cheap MCUs

23

29:05

Can JS also build the metaverse?

24

41:39

Running trusted payloads with Nomad and Waypoint

25

43:58

Simple (but useful) Ansible reporting with ara

26

44:04

Deploying An Embedded Linux Distro Build Factory with Ansible And Proxmox: lessons learned

27

28:48

Utilizing AMD GPUs: Tuning, programming models, and roadmap

28

28:49

Porting Signal processing algorithms to CuPy for precision measurement

29

28:54

PIRA: Performance Instrumentation Refinement Automation

30

28:49

Bringing together open source scientific software development for HPC and beginners

31

26:24

Containers in HPC

32

25:34

Uncovering Arcon: A state-first Rust streaming analytics runtime

33

46:07

v3dv: Status Update for Open Source Vulkan Driver for Raspberry Pi 4

34

21:39

The status of turnip driver development.

35

43:13

36

56:34

Optimal buffer allocation on Wayland

37

31:08

Fun with Finite Automata

38

34:40

TornadoVM: Hardware Acceleration For Java In Practice

39

27:03

Update On Java On The Raspberry Pi

40

33:17

azul: How Your Java is Still Served Hot

41

33:23

Jakarta EE: Present & Future

42

30:47

Fundamentals Of Diversity & Inclusion For Technologists

43

30:59

Polyglot Cloud Native Debugger: Going Beyond APM

44

59:42

Using LibreSilicon

45

38:42

Efabless Open ASICs

46

48:35

Coriolis RTL-to-GDSII Toolchain

47

44:00

48

34:01

Next generation micro-controller programming

49

39:35

Nim Metaprogramming in the real world

50

33:59

Nim concurrency & Parallelism

51

28:15

Why rule-based monitoring is (still) great

52

29:06

Network Traffic Classification for Cybersecurity and Monitoring

53

18:04

Peer-to-peer hole punching without centralized infrastructure

54

29:04

Kubernetes networking : is there a cheetah within your Calico?

55

19:05

Keep appetite for the stats, it costs nothing

56

18:12

Faster memory reclamation with DPDK RCU

57

29:02

2-cluster Kubernetes, with Calico, BGP Interconnect and WireGuard... All Without Leaving Your Laptop!

58

29:06

Challenges and Opportunities in Performance Benchmarking of Service Mesh for the Edge

59

22:05

The relational model in the modern development age

60

23:52

Percona XtraDB Cluster(PXC) Non blocking operations, what you need to know to avoid pitfalls

61

24:59

ProxySQL Cluster: challenges and solutions to synchronizeconfigurationacross multiple decentralized cluster nodes

62

22:35

ProxySQL 2021 Dev Submit

63

18:52

Release Note Highlights from 2021

64

35:06

MySQL 8.0: Logical Backups, Snapshots and PITR like a rockstar

65

56:43

MySQL Operator for Kubernetes

66

25:07

MySQL on Kubernetes demystified

67

18:53

Hash join in MySQL 8.0

68

23:07

Flame Graphs for MySQL DBAs

69

23:41

MySQL Performance on Modern CPUS - Intel, ARM, AMD

70

42:25

Newest MySQL component services features

71

59:43

MySQL InnoDB ClusterSet

72

25:08

Encrypting binary (and relay) logs in MySQL

73

39:55

Efficient MySQL Performance

74

27:10

Backup/Restore tools performance comparison

75

37:33

Bootstrapping a multi dc cloud native observability stack

76

05:45

Monitoring and Observability devroom: Opening

77

39:02

Profiling in the cloud-native era

78

38:38

Adopting OpenTelemetry and its collector

79

25:56

Suggestions for a Stronger Mozilla Community

80

28:58

Collecting Sentences for Common Voice

81

18:59

Introduction to Foxfooding

82

37:12

Searchfox: Fast code search and indexing

83

39:03

Linux Mobile vs. The Social Dilemma

84

29:00

Phosh Contributors Get Together

85

33:59

ModemManager in your phone

86

38:46

2 Years of Mobian

87

30:18

Mainlining the reMarkable 2 eInk tablet

88

35:10

From Android to mainline on the Snapdragon 845

89

29:05

Running Mainline Linux on Snapdragon 410

90

24:05

Librem 5 phone kernel report

91

39:01

The road towards using regular linux on ebook readers

92

1:28:44

FOSDEM 2022 - Closing Session

93

1:28:45

Status of camera support on mobile FOSS devices

94

24:00

Anatomy of GNOME Calls

95

33:50

FOSDEM 2022 - Welcome to Libadwaita

96

29:01

Bring openwifi to PYNQ-Z1 with ultra low cost

97

39:03

RedLeaf: Isolation and Communication in a Safe Operating System

98

33:29

Unikraft: Debugging and Monitoring

99

30:59

Mitigating Processor Vulnerabilities by Restructuring the Kernel Address Space

100

48:18

Genode meets the Pinephone

101

20:23

Advanced Unit Testing in the Hedron Microkernel

102

43:53

The Composite Component-Based OS

103

25:26

A practical solution for GNU/Hurd's lack of drivers: NetBSD's rumpkernel framework

104

54:07

Unhackable across 30 Years, End in Sight

105

28:31

UX/RT - a QNX-like OS based on seL4

106

35:32

Hardware accelerated applications on Unikernels for Serverless Computing

107

33:30

Managarm: Design of a pragmatic fully-asynchronous microkernel

108

29:05

A year of RISC-V adventures: embracing chaos in your software journey

109

19:00

Why everyone needs to know some coding: last-mile sandboxing

110

23:44

Designing a programming language for the desert

111

30:10

Fuzion Language Update

112

18:37

How to design powerful DSLs for users

113

15:09

Declarative and Minimalistic Computing

114

33:59

The Concise Common Workflow Language

115

29:01

Adventures in Dataflow

116

28:47

The Matrix State of the Union

117

59:00

The matrix-rust-sdk

118

29:29

Growing Pinecones for P2P Matrix

119

19:01

Opsdroid: Building a bot using Python3

120

19:12

The next generation of Matrix interfaces

121

30:02

All things with moderation

122

29:15

MLS meets Matrix

123

29:21

Beyond the Matrix: Extend the capabilities of your Synapse homeserver

124

28:11

Events for the Uninitiated

125

04:56

Decentralized Collaborative Annotations using Matrix

126

04:28

ChatStat - An R package for Matrix stats

127

29:35

Through The Looking Glass

128

34:34

8-bit Character support on architectures were the smallest addressable unit size is 64-bit in Clang and LLVM

129

23:20

LLSOFTSECBOOK: LOW-LEVEL SOFTWARE SECURITY FOR COMPILER DEVELOPERS

130

16:25

Towards an Operational Code Aesthetics

131

24:02

Online performance

132

30:09

Why ODF is a better standard than OOXML

133

10:33

Macro Dialog feature

134

29:09

LibreOffice WASM – an Update: A status report from the journey to get LibreOffice into the browser, fully*

135

11:07

Information Engineering Operations

136

29:23

Improved coverage analysis for LibreOffice's CI

137

10:24

Editing Simulation

138

22:38

Improving Developer Experience at LibreOffice

139

26:38

Curl based HTTP/WebDAV UCP

140

09:21

Kubernetes setup & deployment

141

09:49

Canvas For Rendering UX

142

30:10

Advantages of LibreOffice Technology

143

10:09

Building Collabora Online UI: based on the LibreOffice components

144

18:26

Peergos - Combining peer-to-peer connectivity, end-to-end encryption and fine grained access control to build a secure and privacy focused self-certifying web protocol

145

27:57

State of libp2p

146

18:00

Edges Are Infrastructure: IPFS Everywhere for a More Resilient Future

147

19:21

Hyper Hyper Space: In-browser p2p applications

148

24:02

Earthstar: The merits of being a bicycle when everything else is a hyperloop.

149

22:10

What's coming in VIRTIO 1.2

150

19:13

Tracing KubeVirt traffic with Istio

151

18:52

The story of adding TPM support to oVirt

152

39:09

Phyllome OS: A friendly virtualization-focused Linux distribution

153

27:30

Network interface hotplug for Kubernetes

154

25:27

KubeVirt scale test by creating 400 VMIs on a single node

155

23:54

Isolating PCI/CXL Devices: It All Starts with System Launch

156

27:19

Introducing OKD Virtualization

157

28:26

DevOps, Cloud Native, DPUs: beyond the buzzwords

158

27:22

Cross-platform/cross-hypervisor virtio vsock use in go

159

39:27

Panel 2: Dependencies for Vulnerability Discovery and Tracking

160

24:03

SweetAda: A Lightweight Development Framework for the Implementation of Ada-based Software Systems

161

1:04:19

SPARKNaCl: A Verified, Fast Re-implementation of TweetNaCl

162

24:00

Exporting Ada Software to Python and Julia

163

38:43

Proving the Correctness of GNAT Light Runtime Library

164

34:15

The Outsider's Guide to Ada

165

33:38

The Ada Numerics Model

166

24:34

Ada Looks Good, Now Program a Game Without Knowing Anything

167

13:49

Introduction to the Ada DevRoom

168

1:03:39

Introduction to Ada for Beginning and Experienced Programmers

169

03:37

Closing of the Ada DevRoom

170

24:02

Implementing a Build Manager in Ada

171

23:41

Getting Started with AdaWebPack

172

29:04

Overview of Ada GUI

173

28:41

Use (and Abuse?) of Ada 2022 Features in Designing a JSON-like Data Structure

174

29:02

Alire 2022 Update

175

21:27

secPaver: Security Policy Development Tool

176

29:11

State of Open Source Databases

177

46:51

IMPLEMENTING AN INCENTIVISED PARTNERS PROGRAM IN MAUTIC

178

19:07

Introduction to qbe

179

23:49

Verifiable Credentials and Decentralized Identifiers with DIDKit

180

27:11

Automatic CPU and NUMA pinning

181

39:32

Build and release tools tailored to building, releasing and maintaining Linux distributions and forks

182

29:59

Modding the Immutable – how to extend Flatcar, an immutable image-based OS

183

44:49

Collaboration instead of Competition

184

38:16

CentOS Stream: stable and continuous

185

09:00

Extending Kubernetes with WebAssembly

186

28:41

Boot2container: An initramfs for reproducible infrastructures

187

29:20

Free tools that help you run online events in an effective way

188

25:37

Streaming and Edit Conference Videos with OBS, Jitsi and kdenlive

189

38:34

FOSS Events Primer

190

50:12

Run a conference with pgeu-system

191

03:34

Welcome to the Conference Organisation Dev Room

192

28:23

FOSDEM Conference Infrastructure

193

28:57

Lessons from 6 Virtual Ansible Contributor Summits

194

19:54

Debian Conference Infrastructure

195

29:06

Introducing ONLYOFFICE Forms for paperwork automation and smart collaboration

196

24:28

Oniro - an open-source starter for fast-paced IoT environments

197

32:04

Unifying Infrastructure and Application Delivery Using Keptn

198

23:46

Porion a new Build Manager

199

23:46

Massive Unikernel Matrices with Unikraft, Concourse and More

200

23:58

How to improve the developer experience in Heptapod/GitLab

201

44:26

Continuous Integration Pipelines with Nomad, Vault and Jenkins

202

19:00

Pushing the Open Source Hardware Limits with KiCAD

203

19:37

Open CASCADE Technology: status update

204

18:51

ngspice - current status and future developments

205

18:38

LibrePCB Status Update

206

59:14

KiCad Project Status

207

18:40

Hacking through BIM models

208

29:43

Valgrind and debuginfo

209

21:49

Adding Power ISA 3.1 instruction support to Valgrind

210

53:16

Upstreaming the FreeBSD Port

211

10:11

Enable AVX-512 instructions in Valgrind

212

20:31

Privacy-preserving video object detection in WebAssembly inside Veracruz

213

20:13

SGX Enclave Exploit Analysis and Considerations for Defensive SGX Programming

214

23:51

Secure boot, TEEs, different OSes and more

215

23:18

Logging, debugging and error management in Confidential Computing

216

23:20

Intravisor -- a hypervisor for fine-grained isolation using CHERI

217

23:18

Symbolic Validation of SGX enclaves using Guardian

218

23:16

Gramine Library OS

219

23:59

Rethinking the OS for Isolation Flexibility with FlexOS

220

20:46

WebAssembly + Confidential Computing

221

23:15

Developing for the AWS Nitro Enclave Platform

222

58:13

Process-based abstractions for VM-based environments

223

23:10

Arm CCA enablement through the Trusted Firmware community project

224

35:09

Unit testing Linux kernel drivers

225

13:10

How (not) to make a mockery of trust

226

43:01

LAVA + OpenQA = Automated, Continuous Full System Testing

227

21:33

Data Replication and Migration from Ceph RGW to Cloud

228

28:57

Introducing Garage, a new storage platform for self-hosted geo-distributed clusters

229

36:33

COSI : a brief update

230

09:43

Migrate to Ceph-CSI

231

29:01

Trajectware - timeline-based navigation across computing heritage

232

58:59

A Brief History of Spreadsheets

233

37:28

Debunking The Myths About The Raku® Language

234

58:31

Keeping old Unix/Linux up-to-date with pkgsrc

235

1:24:01

A Computer Museum Why and how?

236

27:40

FrogFind and 68k News

237

29:33

Old Web Today: Keeping Flash (and other) Retro Web Sites Accessible on the modern web

238

33:58

Hack for the Planet

239

19:45

Getting 1K Chess for the ZX81 online

240

29:35

AOSC OS/Retro - An Introduction

241

43:57

Made by Woz: how Apple-1 operating system works?

242

28:21

Radically simple testing in Raku

243

57:44

Raku Steering Council Q&A Panel

244

24:00

Class learning analytics with Raku

245

36:23

A Raku Grammar for Navigation Lights

246

33:36

GitHub Actions (in|for) Raku

247

22:12

Free Software, Dependency Management, and what I got wrong at FOSDEM 21

248

33:33

Keeping the past to preserve the future

249

38:23

Decentralized DevOps with Unfurl

250

29:02

Voyager 1 adventures

251

29:05

Opensource WiFi chip (openwifi) progress and future plan

252

29:06

gr-ofdmradar: OFDM Radar in GNU Radio

253

09:34

Introducing the M17 Project

254

29:42

Emitting Hellschreiber from a Raspberry Pi GPIO: combining gr-hellschreiber with gr-rpitx

255

58:07

AlekSIS, the Free School Information System

256

42:59

Working effectively with (-support-) the community

257

56:43

Solving the knapsack problem with recursive queries and PostgreSQL

258

58:30

Slow things down to make them go faster

259

29:37

PostgreSQL Distributed & Secure Database Ecosystem Building

260

28:30

Future Postgres Challenges

261

43:42

What I wish I knew about security when I started programming

262

58:22

Secure Communication with Tls

263

38:12

Sudo: Watch and control your blind spots

264

44:27

WebRTC broadcasting with WHIP

265

43:06

UnifiedPush: A FOSS cross-platform push notifications protocol

266

48:16

Jitsi: 20 years of Real Time Communications

267

58:58

On the Far Side of REST

268

58:44

Implementing the NTFS filesystem in Rust

269

45:09

Open Source Network Automation in 2022

270

43:48

European digital sovereignty and open source

271

35:38

Are we being inclusive with our community recognitions?

272

20:35

Strengthening Developer Communities in Unprecedented times

273

20:10

Tracking your time with Timewarrior

274

19:18

A lightning intro to re-Isearch

275

18:59

Rapid Prototyping of a Positioning System

276

14:58

NetOTA: Quick introduction to IoT centric package archive

277

20:10

Jupyter for React.js developers

278

18:06

Measuring and analyzing humidity data using Python, syslog-ng and Elasticsearch

279

14:29

C meta-programming for the masses with C%: cmod

280

18:45

Generating virtual 3D exhibitions from Wikipedia

281

20:42

Collabortive group self-awareness with Where, a Holochain app

282

17:21

LibreOffice 7.3 New Features

283

13:47

Thunderbird in 2022

284

20:30

ToroV, a kernel in user-space, or sort of

285

05:18

Hardware-accelerated graphics in secure multi-tenant environments

286

37:27

Making a Community Managed FOSS Project Sustainable

287

22:18

Valgrind on RISC-V

Automatisches Abspielen

Sprache

Text

Bild

00:00

Hill-Differentialgleichungp-BlockDiagrammXMLComputeranimation

00:28

SoftwareQuellcodeBenutzerprofilInformationSoftwareentwicklerProgrammierumgebungVerzeichnisdienstOffene MengeGruppenoperationAggregatzustandDatenbankMathematikElektronischer ProgrammführerPhysikalisches SystemThermische ZustandsgleichungWort <Informatik>Eigentliche AbbildungSelbst organisierendes SystemBaumechanikSystemplattformContent ManagementVolumenPrimitive <Informatik>LastMailing-ListeSoftwareentwicklerWeb SiteRadikal <Mathematik>VersionsverwaltungOpen SourceRechenschieberServerBitComputeranimation

01:23

DatenstrukturInformationGarbentheorieVerschlingungInhalt <Mathematik>AuszeichnungsspracheVisuelles SystemInformation RetrievalPlastikkarteEndliche ModelltheorieStandardabweichungAutomatische IndexierungWeb-SeiteDichte <Stochastik>Element <Gruppentheorie>GeradeSinusfunktionVirtuelle RealitätWort <Informatik>RechnernetzKlumpenstichprobeMessage-PassingEinhüllendeMereologieInhalt <Mathematik>Quick-SortMultiplikationsoperatorVirtualisierungGarbentheorieDatenstrukturDifferenteInformation RetrievalAuszeichnungsspracheUnimodale VerteilungMathematische LogikKontextbezogenes SystemSuchmaschineNormalvektorRechenwerkKontrast <Statistik>Wort <Informatik>Objekt <Kategorie>StandardabweichungMailing-ListeInternetworkingComputeranimation

03:31

Rekursive FunktionProjektive EbeneInternetworkingKontextbezogenes SystemMereologieUnimodale VerteilungQuick-SortComputeranimation

03:52

Data MiningArray <Informatik>AdressraumCachingAutomatische IndexierungMini-DiscWort <Informatik>ROM <Informatik>Objekt <Kategorie>DatenfeldExploitVirtuelle RealitätPhysikalisches SystemOperations ResearchInformation RetrievalBinärdatenSinusfunktionWeb-SeiteTranslation <Mathematik>Kernel <Informatik>BefehlsprozessorDatenstrukturUmwandlungsenthalpieNummernsystemCoxeter-GruppeSyntaktische AnalyseParametersystemMatroidKonfigurationsdatenbankWitt-AlgebraBestimmtheitsmaßQuick-SortTypentheorieWeb-SeiteMultiplikationsoperatorObjekt <Kategorie>FontAutomatische IndexierungInformation RetrievalMailing-ListeLesen <Datenverarbeitung>TouchscreenDatenfeldWort <Informatik>DatenstrukturSpeicherabzugAlgorithmusProzess <Informatik>Umsetzung <Informatik>Interface <Schaltung>ParametersystemGruppenoperationSyntaktische AnalyseSichtenkonzeptE-MailMAPKonfigurationsdatenbankMatchingElektronische PublikationInverser LimesTextsystemMengenalgebraHypercubeModifikation <Mathematik>PunktDickeComputeranimation

08:40

DateiformatDichte <Stochastik>Green-FunktionVHDSLQuick-SortFilter <Stochastik>Elektronische PublikationDifferenteBestimmtheitsmaßAdditionTypentheorieCodeLastPlug inMailing-ListePhysikalisches SystemProgramm/Quellcode

09:22

Automatische IndexierungSinusfunktionDatenbankMathematikVerzeichnisdienstQuellcodeCompilerBefehl <Informatik>KraftROM <Informatik>Prozess <Informatik>VerschlingungMusterspracheInklusion <Mathematik>ZeichenketteWort <Informatik>E-MailSyntaktische AnalyseDatenstrukturPhysikalisches SystemServerGenerizitätNotebook-ComputerIntelCodeDämon <Informatik>LastKonfiguration <Informatik>PlastikkarteZahlensystemProgrammbibliothekRückkopplungNichtlinearer OperatorMathematische LogikReverse EngineeringInformation RetrievalSystemprogrammierungAlgorithmusTermRegulärer Ausdruck <Textverarbeitung>DatenmodellPeer-to-Peer-NetzSchnittmengeEinfach zusammenhängender RaumBinärdatenPhasenumwandlungUmwandlungsenthalpieOrdnungsreduktionGeradeSystemaufrufSprachsyntheseBitrateMAPDiskrete-Elemente-MethodeWellenpaketMereologiePRINCE2Relation <Informatik>IT infrastructure libraryMagnetbandlaufwerkInterpretiererDatenfeldTypentheorieDisjunktion <Logik>Element <Gruppentheorie>DefaultGewicht <Ausgleichsrechnung>Funktion <Mathematik>ZählenAbelsche KategorieRankingInformationsspeicherungEindeutigkeitDateiformatKartesisches ProduktPaarvergleichRelationale DatenbankInhalt <Mathematik>Zeiger <Informatik>Coxeter-GruppeArchitektur <Informatik>GasströmungOperations ResearchVirtuelle RealitätProdukt <Mathematik>Basis <Mathematik>SoftwareEreignishorizontKonfiguration <Informatik>AbfrageInstantiierungPlastikkarteZahlenbereichDemoszene <Programmierung>DatenstrukturMAPSchlussregelGeradeMultiplikationsoperatorSpannweite <Stochastik>Mailing-ListeE-MailEDV-BeratungWasserdampftafelAnalytische MengeComputerspielLastQuick-SortSyntaktische AnalyseLeistung <Physik>Gewicht <Ausgleichsrechnung>DatenbankAutomatische IndexierungGamecontrollerElektronische PublikationObjekt <Kategorie>BitVirtuelle MaschineServerRelativitätstheorieFlächeninhaltDynamisches SystemANSI Z39.50-1992ThumbnailEindeutigkeitWort <Informatik>Arithmetischer AusdruckNichtlinearer OperatorBoolesche AlgebraResultanteHalbleiterspeicherNotebook-ComputerDemo <Programm>Computeranimation

15:25

SoftwareComputeranimation

15:45

Elektronische PublikationDichte <Stochastik>RechenschieberZahlenbereichAbfrageEinfach zusammenhängender RaumPrädikat <Logik>ComputeranimationBesprechung/Interview

16:21

Inverser LimesFormale SemantikMAPElektronische PublikationTermKontextbezogenes SystemPaarvergleichGeradeSprachsyntheseVerschlingungAbfrageArray <Informatik>LaufzeitfehlerRichtungWort <Informatik>AlgorithmusPhysikalisches SystemZahlenbereichWeb SiteBesprechung/Interview

18:02

Inverser LimesAbfrageBesprechung/Interview

18:18

Inverser LimesVersionsverwaltungElektronische PublikationAdressraumBesprechung/Interview

18:38

Computeranimation

19:14

Computeranimation

Transkript: Englisch(automatisch erzeugt)

00:08

Welcome everybody to Fostom 22 online. Great to be here. This lightning talk on iSearch, the 27 year old new kid on the search block.

00:22

And as you can see, there was some port in the supplemental lead foundation in GI0 and of course some calorifices in ICT. So a little bit of quick, quick, quick history. Basically the engine goes back to 94 when it started off and development was then split between North Carolina and Germany.

00:42

And it was deployed in lots and lots of sites and it became a proprietary fork and basically at one point, basically in 2011 it kind of stopped. And despite lack of servers, a lot of servers continued to run and I guess they don't want to break a big thing. And as I said, sites, lots and lots of sites, patent office, NASA, yeah, you see this long list of slides.

01:06

We'll upload this slide somewhere else so you can have a more detailed look at it because I'm going to be going really quick. This is a, you know, a really quick talk. So anyway, development terminated. Lots of people if I could open source IB which was the proprietary version of iSearch and we were really surprised at ApacheCon how primitive the offerings were.

01:23

So with the kind support of MLNet and API Zero in the middle of the COVID pandemic, we decided to basically bring this thing out and push the envelope. So anyway, so what's the difference? So mainstream search engines are about finding any information, a list of documents in a list of offerings.

01:41

So basically search engines gives a long list and, you know, instead of the content and you can't sometimes find it and then the sorting sort of screws things up. So basically a lot of stuff is actually not really findable or not really searchable or I mean it's there so it's searchable but you can't just find it.

02:01

So what this engine is about is looking at some of the structure, let's say XML, another kind of markup. Also explicit sort of paragraph because it basically has some logic to try to identify for certain kinds of doc types, some basically visual context and to basically find this sort of stuff.

02:21

So anyway, small queries, one time damage. So basically what we have here in contrast to normal search engines is the possibility to have a different kind of level, a unit of retrieval. The standard unit of retrieval, if you look at something like your standard internet search engine, it's basically the object that itself was indexed.

02:43

So that's the page, the PDF, the word document or whatever. And here we have very, you know, the possibility to do a different kind of granularity and that we also have the structure. So we have a document which is part of a collection, which is a part of a collection. So we, for example, can search and we can look at actually saying, OK, well, actually, this is not the document that's relevant.

03:01

This is the collection or this is a section of a document. So we can walk around the structure and we can try to figure out what actually is kind of relevant. And I think that's pretty cool. So and as I said, it's virtual. So basically we can create these collections to sort of transcend these sort of bubbles so we

03:20

can dig deeper to find these new insights and not something that was cut ahead of time. So basically the users can redefine things at search time and multimodal research. We actually applied this back in 2008 as part of an art project by Isaiah. And together with the Dutch design group, Metahaven presented this in the context of Internet search,

03:46

using this sort of multimodal recursive search to find layers of things of the same kinds of items. So the design of this thing is basically we've got a core engine. We've got like the documents and we've got doc types and doc types are the things that basically the interface that understands the various document formats.

04:05

So the doc types help with the indexing. They also help the retrieval side for basically doing some of the conversion. And, you know, here's here's sort of an outline of the algorithm. I will skip this because we're you know, we don't really have time in this sort of quick talk.

04:24

But basically what I do in this is I basically can index every word. I understand all the structures. I understand what word belongs to what structures. And I could do this really quick. So basically we've got all the specific paths to various objects and the little objects can get their own index structures.

04:43

And so we've got a like a polymorphism. So we can also, for example, like in this case, we have a date field six February twenty twenty two, which can also be encoded as zero two zero six twenty twenty two or even movie six, which is, I guess, Polish.

05:01

I don't know Polish, but it's Polish 22. But we can also look upon it from the point of view of actually text, which basically if we were looking for Feb as the text, it won't match Luty. So we have different ways of looking at the same sort of fields and objects. And we have some tricks as to how we actually get some really good IO because this thing's IO limited.

05:25

We use heavily a map and a lot of little tricks for basically getting things in. And this goes down to low level looking at the transition view. But basically what it basically means is that when we have multiple processes running the index, we don't have to do the IO all the time.

05:43

That the page is probably already in memory so that basically our little indexes are pretty fast. So as I mentioned before, we now have for every word we have an address, we can open file, read it. And, you know, and we also use the doctype to see how it's encoded.

06:02

So we understand all the paths, we understand all the words. Now, what we have within our algorithm here is we have the sort of first X things that were the first X characters or octets within the screen we sort of cache. And that's a sort of look ahead so that we can understand how it is and not necessarily have to open the file.

06:26

And this, but it also means that we do not have limits to the length of any word. So basically if it goes beyond that limitations, we open up the file and then we can go here. So we can do original file, we can do unlimited length, literals using any wildcard, Richard scream of our dreams.

06:43

And we can even do it the other way and we can say the given path, what's inside it. We can reconstitute also this any other possible way. And, yeah, and as I said, we have these things called virtual indexes, which allow us basically with a little file to create other indexes, to lump indexes together, whatever.

07:05

And in fact, if we want, there's also the possibility within a real index to actually import another index, which is not just basically having two indexes, because if I have if I'm making a virtual index, obviously I have the speed to search the one index of the speed to search the other index.

07:23

So basically that the complex goes that way. But when I, I import it, then it's also binary search within that index itself. And importing those also pretty quick. OK. And as I said, we have this dog type registry. And so the index goes goes to the dog type registry and the dog types can call each other.

07:47

Because sometimes the other thing. So, for example, I get I'm using the auto type dog type, which is the dog type, which tries to guess what kind of file I'm looking at if I don't tell it what it is. And it looks at it goes, oh, this looks like a mail file. And it passes it off

08:01

to the mail dog type and the mail dog type says, OK, well, it's not quite a mail file. This is a mailing list. So it will pass it now to another dog type, dog type D, which is the one that does the list. And the list one, they say, ah, but this is not this kind of list is this other kind of list and pass it on or whatever.

08:20

And we also have the possibility for various hyper parameters of these dog types to actually create what I'll call virtual trills in these dog types, which are the same as the dog type, but with slightly different hyper parameters. And again, it's just a file where you would define this stuff and not actually completely new parsing. There's it's all in the handbooks of the documentation. And as you get lots and lots

08:43

of dog types, you can see this list here and some of them are also filters. So, for example, filter detects filtered XML. We have an external file that does that. We have pandoc. We have all kinds of other existing sort of filters here. And we, you know, we handle all these different kinds of things here natively.

09:01

So we can reconstruct it. And then we also have something called a plug in architecture, which allows you to extend easily using C++ code to add additional high speed dog types to the system. And when it finds them, it loads them up here. As you can see here, I have an XFIF and I have MS Office or MXL plugin, etc.

09:23

So we also have a bunch of tools. I index, I util, etc. And just to give an example. So to index the file, we have I index my DB file one, two, where that's the simplest way to go. I detect it will then basically index it. But we have lots of options.

09:41

You can imagine, again, control P, the handbook, lots of stuff here and basically alone just to select files to walk in the step. It's basically like find on steroids. Now, let's talk a little bit about the performance on the machine that I'm currently sitting in front of, which is an Intel iCore 729, basically, and using 512 MB memory.

10:06

Yes, you read that correctly. And it will run actually even in as little as eight MB or four. It'll run on your wristwatch. We get about 56,000 words a minute. We're about half a million emails in under 20 minutes. But that's because a lot of the parsing of mails and doing stuff is kind of complicated.

10:23

So if we're just indexing full text, basically, you know, not looking at the structure and the de-parsing, we're going to get like we get up to about 20 times the speed. And we've clocked that as on this notebook at 70 million words a minute. And we put this up on servers where we're getting well beyond that. So, again, most of the time is spent analyzing document structure and parsing.

10:45

And there's a lot of room to actually improve that software. And searching, you know, there are there's an eye search tool. Normally you write C++ code, a scripting language, which is Python. But it's still quite useful and it can even run as a daemon.

11:03

There are lots of features in there. We've done a better as a back end, et cetera. And you can imagine it's got loads and loads of options. But I'm going to spare you that because I actually want to know we have limited amount of time. And I want to talk about the query languages have, because that's where some of the real power is beyond the fact of all these things I can do.

11:23

So, you know, what would a powerful engine be without fine grained search and a number of query methods and a rich query language? So we basically have a number of query languages here. We've got CQL, which is maintain the Library of Congress. Something about SOW, Z3950. If you don't know about it, don't worry about it.

11:41

Then we've got something I call smart queries, relevant feedback, you know, find me something like this. Infix notation, RPN notation, we also support. Now, smart queries are pretty cool because basically they are for the non-technical user and they are easy to use. And they get around some of the problem of like, well, you shouldn't be an or or an and, you know, some some engines basically take all the words and add them.

12:05

Some of them do or or it gives you too much, depending upon the way it's sorted. And, you know, misses a lot of important stuff. And so what we have here is something called smart query logic. And it basically does a number of things. It looks at that. Maybe it's maybe it's literal.

12:21

Are they in the same node? Maybe that would be kind of cool. Or if it's like and within this area here or if it's or, but within this sort of query, that's sort of the function here. So I'll give you a quick example. I looked searching through Shakespeare. I look up rich water and it finds no phrases like which water, but in the life it finds rich water are the same.

12:43

And so basically they're within the same container, which we call peer. Hate Jews. Again, there is no line or container with that. But we have in The Merchant of Venice, we have a lot of scenes that are talking about hating Jews within the same thing. And so he gets hate Jews reduced to. And I hate him for it's a Christian found basically in the lines book by Shylock.

13:07

And as you see here, I hate him for it as a Christian. Now, what we can also do is when we're looking at this, we can we know our position within within the within the structure. We can also go and walk up backwards, keep it within the scene.

13:23

So we can ask, for example, who is on the stage because we have these stage directions, you know, enter, exit, you know, etc. And so that's also pretty cool. We actually do a demo of that. But anyway, so out out, you know, it finds that actually it's it's it's used in a number of places.

13:41

So it finds here to the confirmed phrase out out. And we have a general query language expression here. And it supports, of course, you can imagine wildcard and Boolean operators, as you can imagine, lots of Boolean and or and not basically. Yeah. Keeps going here. Long list. Consult the handbook and unary operators not within.

14:04

Yeah. Keeps going. More things. And we've got the possibility also given to do sort of sorting. And, you know, which, of course, interested in search performance. And basically the performance is, you know, big O, the log in and the log event

14:21

where n is the number of unique words and m and number of instances to field search. So basically, you know, rule of thumb is the smaller the results encounter, the faster the search. And we have a number of features like fuel to try to limit this in a practical way. And we've got semantic search. We have personalized story and we've got all kinds of objects, which, as I mentioned, beyond text.

14:42

Here's some of the object list here. And we've got all kinds of relations and, you know, great. But they don't even have to be in the object. They can actually be external. We allow that. And we also have dynamic presentation, which allows you to convert this thing back in here. So the content come out of their places. And we've got all kinds of scoring, normalization, ranking, which we can also do stuff.

15:06

So anyway, bullet points, ETL, wide range of document types, data storage, dynamic search time, analytic recommendation, customization. And it's really available. And, you know, any work you haven't dreamed of? Oh, yeah.

15:23

I go on for hours. Yeah, I make that week. So anyway, visit a non monotonic net. I search to learn more software. It's all freely available on GitHub. So thank you very much.

15:47

PDF or PowerPoint file of your slides. And then do you say the query performances? I wonder. And as I saw the semantic search, you have to connect items, predicates and items together.

16:10

To define the system, we are doing semantic search or autonomously they are composed. You have you have different approaches. Number one, the I will put up the slides in GitHub.

16:25

There's the handbook. There's the sources. There's all kinds of documentation. There's a comparison to other engines like you've seen. It's all in GitHub. Under Reicer, you can find it in GitHub. The link that I published should actually go there.

16:42

And in terms of semantic search, yeah, there is actually a file. The file can be created by hand because basically you need for four on the semantic level. It has it has to be context dependent. So if I'm, for example, searching a collection that has to do with automobiles, obviously certain words have certain associations, which would be quite different than if I'm looking at site biology or whatever.

17:05

And so either it's created by hand, it's hand created, or you can use vectorization. There are a number of algorithms you can use to create this file by actually doing some, you know, back of words to vectorization to get that.

17:20

I think it's quite flexible. OK. And in terms of queries, you've got, you know, as I tried to show here, the smart query. But you have actually very rich query language. And then you can go back and forth. You can do even crazy stuff. So, for example, like I showed, searching for in Shakespeare and getting back a line.

17:41

I can say, OK, now I want that line. I want to know in what speech that is and then or in what app that is. And then I could even ask the question, I could walk up the stage directions and then I could actually at runtime say who is on the stage when this line is being set. And I can all do this as queries. So more questions. Go ahead. OK. Do you have system limits, limitations of your system indexing, indexers, limitation?

18:09

And of course, query limitation. Do you have any limitations to me? It limits. I mean, there are limitations. Yeah. Yeah. Sorry. Yeah. No, there are, of course, limitations right now.

18:22

The current version there is, I think, limited to 32 million files and hundreds of terabytes of data. So, but that limitation is artificial just because I wanted to keep the addresses within.

Empfehlungen

35:01

Intro to Transmogrifier

45:56

Intro to Kubernetes

1:06:19

Intro to DNSSEC

08:47

Intro to Sprints

44:32

WebSockets: Intro to Messaging

24:51

Intro to Graph databases

42:57

GDB: A Gentle Intro

22:54

Intro to Client-Side Testing

09:26

Intro to the SDR Devroom

43:21

Intro to Wichcraft Compiler Collection