We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

#bbuzz: First Steps with Apache Beam: Writing Portable Pipelines using Java, Python, Go

00:00

Formale Metadaten

Titel
#bbuzz: First Steps with Apache Beam: Writing Portable Pipelines using Java, Python, Go
Serientitel
Anzahl der Teile
48
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Apache Beam is an open source unified model for defining data processing pipelines (Batch and strEAM), which allows you to write your pipeline in your language of choice and run it with minimal effort on the execution engine (ex: Apache Spark, Apache Flink, Google Cloud Dataflow) of choice. In this practical session we will get hands-on writing Beam pipelines, and as well as discuss the fundamentals of Beam programming model, and SDKs (Python, Go, Java). Prerequisites You will need to install IntelliJ IDEA and/or PyCharm with the EduTools plugin, and with the kata(s) installed in the language of their choice to work through exercises in the online platform.
AppletGoogolPunktwolkeUnimodale VerteilungSchreiben <Datenverarbeitung>PunktwolkeTelekommunikationInformationsverarbeitungFortsetzung <Mathematik>Bildschirmfensterp-BlockKontrollstrukturStreaming <Kommunikationstechnik>Fundamentalsatz der AlgebraMomentenproblemGebäude <Mathematik>Formale SemantikPrimidealMultiplikationsoperatorBitSystemaufrufEin-AusgabeComputeranimationXMLUMLBesprechung/Interview
Geneigte EbeneCoxeter-GruppeSichtenkonzeptBildschirmfensterDateiGraphische BenutzeroberflächeLesezeichen <Internet>MomentenproblemFreewareAdditionVerschlingungProjektive EbeneEndliche ModelltheorieFormale SpracheVersionsverwaltungPersönliche IdentifikationsnummerBildverstehenDatenverarbeitungProgrammierparadigmaAuswahlaxiomAppletEreignishorizontWeb logOpen SourceE-MailComputeranimation
Coxeter-GruppeRohdatenGraphische BenutzeroberflächeLesezeichen <Internet>BildschirmfensterSichtenkonzeptSingle Sign-OnLangevin-GleichungOpen SourceDatenbankBildverstehenQuick-SortFunktion <Mathematik>Open SourceDatenverarbeitungEin-AusgabeTransformation <Mathematik>Parallele SchnittstelleSynchronisierungElement <Gruppentheorie>GraphBildschirmmaskeStapeldateiAuswahlaxiomMathematische LogikCodeVerband <Mathematik>Endliche ModelltheorieUmwandlungsenthalpieGerichteter GraphRoutingBasis <Mathematik>FlussdiagrammUML
LaufzeitfehlerATMBildschirmfensterLesezeichen <Internet>SichtenkonzeptGraphische BenutzeroberflächeDateiStreaming <Kommunikationstechnik>Prozess <Informatik>StapeldateiWeb SiteGerichtete MengeDatenflussPunktwolkeGoogolVerschlingungSystemtechnikOffene MengeProzess <Informatik>Schreiben <Datenverarbeitung>Cloud ComputingFront-End <Software>Produkt <Mathematik>Streaming <Kommunikationstechnik>SoftwaretestStapeldateiSkalierbarkeitEndliche ModelltheorieDatenflussEin-AusgabeEreignishorizontFortsetzung <Mathematik>Domänenspezifische ProgrammierspracheQuick-SortPunktwolkeIntegralFormale SpracheSoftwareentwicklerAppletEinsMultiplikationsoperatorFokalpunktGüte der AnpassungCluster <Rechnernetz>MultiplikationATMAuswahlverfahrenRichtungElektronisches WasserzeichenProjektive EbeneMomentenproblemTransformation <Mathematik>DatenanalyseSpannweite <Stochastik>ProgrammierparadigmaAuswahlaxiomNichtlinearer OperatorOpen SourceInstantiierungCodePerspektiveInformationsspeicherungOffene MengeFlussdiagrammUML
Gleitendes MittelSummierbarkeitLesezeichen <Internet>SichtenkonzeptGraphische BenutzeroberflächeDateiBildschirmfensterEreignishorizontCoxeter-GruppeMatrizenrechnungCachingIkosaederGewicht <Ausgleichsrechnung>Element <Gruppentheorie>Spannweite <Stochastik>PlastikkarteVerschlingungResultanteEndliche ModelltheorieMatrizenrechnungWeb SiteFormale SemantikFunktionalQuick-SortSuite <Programmpaket>Prozess <Informatik>RechenschieberElement <Gruppentheorie>Ein-AusgabeTransformation <Mathematik>BitMAPMathematische LogikFront-End <Software>SchlüsselverwaltungZweiKette <Mathematik>Gerichteter GraphWort <Informatik>CodeOrdnungsreduktionKlasse <Mathematik>Nichtlinearer OperatorFunktion <Mathematik>Fundamentalsatz der AlgebraMessage-PassingSchaltnetzMomentenproblemMathematikKontextbezogenes SystemInstantiierungAppletLokales MinimumCASE <Informatik>ComputeranimationFlussdiagramm
Coxeter-GruppeSichtenkonzeptRechnernetzGoogolCodeDateiformatURLBildschirmfensterLesezeichen <Internet>DateiGraphische BenutzeroberflächeWeb SiteDean-ZahlIntelInformationSchnelltasteSchreib-Lese-KopfSystemplattformLeistungsbewertungProdukt <Mathematik>SoftwareDatenmissbrauchVerschlingungE-MailMigration <Informatik>Prozess <Informatik>AdditionMatrizenrechnungBaum <Mathematik>BetafunktionCodierungstheorieATMInteraktives FernsehenSpeicherabzugDatenstrukturPERM <Computer>EntscheidungsmodellMenütechnikProgrammierumgebungWeb logMAPFundamentalsatz der AlgebraAppletInstallation <Informatik>MomentenproblemRechter WinkelFundamentalsatz der AlgebraWeb-SeiteVerschlingungVersionsverwaltungSichtenkonzeptInstantiierungXMLComputeranimation
TypentheorieDatenmodellFundamentalsatz der AlgebraSpeicherabzugProgrammierumgebungQuick-SortStatistikAppletMeterProdukt <Mathematik>BeschreibungskomplexitätWeb logCodeCodierungstheorieInteraktives FernsehenMenütechnikIntelZeitzoneEbeneVarianzGebäude <Mathematik>MAPKurvenanpassungDean-ZahlStreaming <Kommunikationstechnik>Plug inFramework <Informatik>GEDCOMCOMDrucksondierungData Envelopment AnalysisKreisbogenWeb SiteTypentheoriePlug inInstantiierungLokales MinimumEinsCASE <Informatik>Formale SpracheAppletInstallation <Informatik>KonfigurationsraumProgramm/Quellcode
CodeSichtenkonzeptBildschirmfensterGebäude <Mathematik>TeilbarkeitDateiNavigierenTaskATMInformationsspeicherungElement <Gruppentheorie>Ein-AusgabeMusterspracheFigurierte ZahlBitGüte der AnpassungFunktionalFunktion <Mathematik>FehlermeldungSchreiben <Datenverarbeitung>BildschirmmaskePhysikalisches SystemParallele SchnittstelleInstantiierungProgramm/Quellcode
BildschirmfensterNavigierenTeilbarkeitCodeSichtenkonzeptDateiMenütechnikIntelFundamentalsatz der AlgebraAppletBeschreibungskomplexitätProdukt <Mathematik>DatenmodellInteraktives FernsehenCodierungstheorieSpeicherabzugMengenlehreProgrammierumgebungGebäude <Mathematik>Web logMAPMinkowski-MetrikOvalData Envelopment AnalysisManufacturing Execution SystemRouterDADSDean-ZahlEbeneE-MailManagementinformationssystemGruppenoperationSchlüsselverwaltungMultiplikationsoperatorBitProgrammierumgebungEinsParallele SchnittstelleMereologieGarbentheorieTaskMAPProgramm/Quellcode
ATMInteraktives FernsehenCodeFundamentalsatz der AlgebraProdukt <Mathematik>AppletProgrammierumgebungMenütechnikIntelSpeicherabzugBaum <Mathematik>MAPWeb logDruckverlaufCOMMooresches GesetzDrucksondierungBitrateFormale SpracheGraphische BenutzeroberflächeLesezeichen <Internet>SichtenkonzeptBildschirmfensterExakte SequenzMailboxCodierungstheorieDatenmodellBeschreibungskomplexitätGebäude <Mathematik>DateiLogarithmusNichtlineares ZuordnungsproblemAnfangswertproblemVideo GenieGruppenoperationSchlüsselverwaltungFormale SpracheDifferenteBitFigurierte ZahlExogene VariableMultiplikationsoperatorMessage-PassingStapeldateiFormale SemantikMAPLokales MinimumComputerspielSoftwaretestChatten <Kommunikation>Reelle ZahlMomentenproblemDatenverarbeitungProgrammierparadigmaAppletComputeranimation
DateiBildschirmfensterGraphische BenutzeroberflächeWeb SiteDivergente ReiheBlu-Ray-DiscNavigierenGebäude <Mathematik>SichtenkonzeptCodeTeilbarkeitSchlüsselverwaltungProgrammierumgebungTransformation <Mathematik>TaskAppletGruppenoperationGebäude <Mathematik>ImplementierungAutomatische IndexierungAdditionProzess <Informatik>ZahlenbereichLastXMLProgramm/Quellcode
Coxeter-GruppeBinomialkoeffizientSystemtechnikFramework <Informatik>EreignishorizontBildschirmfensterTeilbarkeitGebäude <Mathematik>CodeSichtenkonzeptDateiNavigierenCross-site scriptingBitrateEgo-ShooterProzess <Informatik>TaskGraphische BenutzeroberflächeLesezeichen <Internet>Software Development KitDatenmodellGarbentheorieAbstimmung <Frequenz>Web logMAPInteraktives FernsehenGoogolSoftwaretestSoftwareBaum <Mathematik>Open SourceInterface <Schaltung>ParallelverarbeitungStapeldateiPhysikalisches SystemDateiformatProgrammierungSinguläres IntegralEin-AusgabeMengenlehreATMUnendlichkeitKlasse <Mathematik>AppletDatenflussPunktwolkeAuswahlaxiomLastInformationsspeicherungDesintegration <Mathematik>Pure Data <Programmiersprache>Formale SpracheEindringerkennungDruckspannungVerweildauerElement <Gruppentheorie>AntwortfunktionSpeicherabzugMailboxVerschlingungMetropolitan area networkGruppenoperationCodeBitSchlüsselverwaltungProgrammVererbungshierarchieElektronischer ProgrammführerTaskInstallation <Informatik>Deskriptive StatistikPrimitive <Informatik>HalbleiterspeicherFunktionalMAPProgramm/QuellcodeXML
EinfügungsdämpfungTetraederEin-AusgabeBildschirmfensterSichtenkonzeptNavigierenGebäude <Mathematik>CodeTeilbarkeitDateiIntelGeradeUmwandlungsenthalpieElement <Gruppentheorie>Prozess <Informatik>Mechanismus-Design-TheorieFunktion <Mathematik>TypentheorieCodierung <Programmierung>Charakteristisches PolynomKonfiguration <Informatik>AppletMAPProgrammierungElektronischer ProgrammführerGraphische BenutzeroberflächeLesezeichen <Internet>SpeicherabzugSinguläres IntegralROM <Informatik>BitrateAmeisenalgorithmusFontSchnelltasteGeradeBitBoolesche AlgebraRechter WinkelFunktionalMultipliziererKlasse <Mathematik>TupelMailing-ListeSchlüsselverwaltungParallele SchnittstelleMAPProzess <Informatik>SoftwaretestWort <Informatik>DateiLambda-KalkülElektronischer ProgrammführerElement <Gruppentheorie>Interface <Schaltung>Programm/QuellcodeXMLComputeranimation
Konfiguration <Informatik>Funktion <Mathematik>Singuläres IntegralMereologieEin-AusgabeDateiformatParametersystemCodeOperations ResearchDADSKomplex <Algebra>Element <Gruppentheorie>ComputerphysikUmsetzung <Informatik>Lesezeichen <Internet>BetragsflächeBildschirmfensterGraphische BenutzeroberflächeSichtenkonzeptProgrammierungElektronischer ProgrammführerWeb logMAPSoftware Development KitProzess <Informatik>VererbungshierarchieDatenfeldWellenformSpeicherabzugSyntaktische AnalyseDatenbankBitfehlerhäufigkeitRemote AccessInhalt <Mathematik>Advanced Encryption StandardNavigierenTeilbarkeitKette <Mathematik>SchlüsselverwaltungNichtlinearer OperatorInverser LimesGruppenoperationProzess <Informatik>Grundsätze ordnungsmäßiger DatenverarbeitungElektronischer ProgrammführerProgrammSynchronisierungBitWort <Informatik>Klasse <Mathematik>PunktOrtsoperatorAppletPaarvergleichThreadComputeranimationProgramm/Quellcode
GoogolURLSpezialrechnerApp <Programm>Graphische BenutzeroberflächeSichtenkonzeptBildschirmfensterDateiLesezeichen <Internet>Singuläres IntegralStrukturgleichungsmodellProgrammierungMailboxStapeldateiBitMini-DiscSpeicherabzugROM <Informatik>SoftwaretestMethode der kleinsten QuadrateWeb logAppletVersionsverwaltungDimensionsanalyseATMDean-ZahlDrucksondierungUmsetzung <Informatik>TaskPROMDateiformatGarbentheorieVerzweigendes ProgrammComputersicherheitDatenmissbrauchGruppenoperationCodeTermKonditionszahlBasis <Mathematik>SoftwareInformationFreier ParameterFormale SpracheBaum <Mathematik>GradientLogarithmusStandardabweichungWiderspruchsfreiheitFehlermeldungSoftwareentwicklerVollständiger VerbandUnternehmensarchitekturKategorie <Mathematik>KonfigurationsraumVerzeichnisdienstOffene MengeCodeTaskCASE <Informatik>Quick-SortWort <Informatik>PufferüberlaufSoftwaretestRepository <Informatik>UmwandlungsenthalpieGruppenoperationInstantiierungSchlüsselverwaltungPunktProjektive EbeneTransformation <Mathematik>AppletKlasse <Mathematik>InformationOpen SourceWeb logVerschlingungAutorisierungElektronischer ProgrammführerMereologieMAPSpeicherabzugVererbungshierarchieTwitter <Softwareplattform>ComputeranimationXML
SpeicherabzugProgrammierungCodeSichtenkonzeptDateiGebäude <Mathematik>NavigierenTeilbarkeitBildschirmfensterGraphDatenbankLesen <Datenverarbeitung>TabelleAppletSinguläres IntegralKonfiguration <Informatik>Ein-AusgabeFreier LadungsträgerCodierung <Programmierung>Web logSaaS <Software>Funktion <Mathematik>TaskAdvanced Encryption StandardGeradeMomentenproblemMengenlehreGerade ZahlZählenWort <Informatik>SystemplattformUnrundheitMAPElement <Gruppentheorie>ProgrammierumgebungAppletTupelEinfacher RingZeichenketteSchießverfahrenWort <Informatik>BitCodePhysikalischer EffektProgramm/QuellcodeComputeranimation
Konfiguration <Informatik>TeilbarkeitBildschirmfensterNavigierenSichtenkonzeptGebäude <Mathematik>CodeWeb SiteFunktion <Mathematik>GeradeKeller <Informatik>SpeicherabzugZählenTaskAppletDateiSchätzungSinusfunktionTechnische InformatikOISCGraphische BenutzeroberflächeLesezeichen <Internet>Heegaard-ZerlegungBildschirmmaskeResultanteFunktionalSchlüsselverwaltungSystemplattformRechter WinkelZeichenketteElement <Gruppentheorie>Wort <Informatik>GradientAppletProgramm/QuellcodeComputeranimation
Element <Gruppentheorie>Prozess <Informatik>MailboxCodierung <Programmierung>ProgrammierungROM <Informatik>Konfiguration <Informatik>GeradeDateiBildschirmfensterSichtenkonzeptGraphische BenutzeroberflächeMAPWeb logSchwebungPhysikalisches SystemCharakteristisches PolynomCodeSpeicherabzugZeitrichtungAppletTaskFunktion <Mathematik>SpieltheorieGebäude <Mathematik>TeilbarkeitFaktor <Algebra>HydrostatikOvalLaufzeitfehlerEin-AusgabeParametersystemArray <Informatik>Metrisches SystemSoftwaretestLesezeichen <Internet>BildschirmmaskeSinguläres IntegralElektronischer ProgrammführerFormale SpracheBetragsflächeAggregatzustandMehrrechnersystemSchlüsselverwaltungMAPZeichenketteAppletElement <Gruppentheorie>Ein-AusgabeWort <Informatik>GruppenoperationTransformation <Mathematik>StellenringDatenbankDatensatzDateiverwaltungFormale SemantikInstantiierungMomentenproblemÜberlagerung <Mathematik>ZählenInformationsspeicherungDateiUmwandlungsenthalpieBitRechter WinkelEndliche ModelltheorieTypentheorieKontextbezogenes SystemComputeranimationProgramm/Quellcode
AppletOpen SourceBildgebendes VerfahrenBootenSolitärspielTeilbarkeitGebäude <Mathematik>BildschirmfensterCodeDateiSichtenkonzeptDADSZahlenbereichFunktionalElement <Gruppentheorie>SystemaufrufMultiplikationsoperatorAppletEin-AusgabeQuick-SortCASE <Informatik>EreignishorizontGüte der AnpassungProzess <Informatik>ZeichenketteRechenschieberSoftwaretestParallele SchnittstelleGanze ZahlSchießverfahrenTransformation <Mathematik>Programm/Quellcode
Lesezeichen <Internet>SichtenkonzeptBildschirmfensterDateiGraphische BenutzeroberflächeSpeicherabzugZahlenbereichRegulärer AusdruckGanze ZahlFunktion <Mathematik>TeilbarkeitCodeGarbentheorieEvolutionsstabile StrategieKlasse <Mathematik>DatenfeldGlobale OptimierungWeb SiteEin-AusgabeProzess <Informatik>DatentypNebenbedingungAppletDesintegration <Mathematik>Objekt <Kategorie>FaserbündelElement <Gruppentheorie>Interface <Schaltung>Gerichtete MengeLesen <Datenverarbeitung>Web logNavigierenp-BlockThumbnailMultiplikationsoperatorTransformation <Mathematik>Prozess <Informatik>DifferenteWarteschlangePhysikalisches SystemMAPStreaming <Kommunikationstechnik>BitSoftwareEreignishorizontInstantiierungKartesische KoordinatenFunktionalFormale SemantikStrategisches SpielClientGüte der AnpassungBildschirmfensterElement <Gruppentheorie>TypentheorieCASE <Informatik>CodeSystemaufrufTaskMessage-PassingMomentenproblemInformationSystemplattformQuick-SortStapeldateiExogene VariableRechenschieberE-LearningPhysikalischer EffektLoginXMLProgramm/Quellcode
ProgrammierungROM <Informatik>Mailing-ListeGeradeTeilbarkeitBildschirmfensterStatisches RAMCodeDateiSichtenkonzeptNavigierenMAPWeb logKonfiguration <Informatik>SystemtechnikMechatronikEin-AusgabeSoftwaretestKorrelationInterface <Schaltung>ZeichenketteGraphische BenutzeroberflächeLesezeichen <Internet>Baum <Mathematik>MagnetbandlaufwerkSoftware Development KitMarketinginformationssystemMaximalfolgeSchlüsselverwaltungRegulärer AusdruckGoogolAppletDigitalfilterFunktion <Mathematik>Reservierungssystem <Warteschlangentheorie>R-ParitätMeterPhysikalisches SystemMultiplikationsoperatorStreaming <Kommunikationstechnik>WasserdampftafelSchlüsselverwaltungDifferenteFigurierte ZahlTouchscreenTransformation <Mathematik>GruppenoperationPunktwolkeElement <Gruppentheorie>CodeBildschirmmaskeMAPMereologieQuick-SortPunktWort <Informatik>AutorisierungProdukt <Mathematik>AppletStapeldateiSkriptspracheHasard <Digitaltechnik>KrümmungsmaßFilter <Stochastik>Leistung <Physik>CASE <Informatik>AggregatzustandVererbungshierarchieParallele SchnittstelleEin-AusgabeNichtlinearer OperatorFunktionalIterationResultanteSchaltnetzEinfache GenauigkeitInstantiierungCoxeter-GruppeEreignishorizontZählenProzess <Informatik>Funktion <Mathematik>Güte der AnpassungElektronisches WasserzeichenBildschirmfensterWeb SiteVollständigkeitRechter WinkelSummierbarkeitQuaderDefaultGraphfärbungArithmetisches MittelFormation <Mathematik>AdditionProfil <Aerodynamik>Boolesche AlgebraRichtungSchreiben <Datenverarbeitung>SchnelltasteNatürliche ZahlPerspektiveMusterspracheGemeinsamer SpeicherMetropolitan area networkRechenschieberBitAnalytische MengeÜberlagerung <Mathematik>SoftwareentwicklerUmwandlungsenthalpieHeuristikServerGanze ZahlDickeLokales MinimumVerschlingungGraphiktablettProgrammierumgebungOpen SourceXMLComputeranimationFlussdiagramm
DateiRechenschieberSichtenkonzeptDateiformatRohdatenOrdnungsreduktionKontrollstrukturTransformation <Mathematik>E-MailValiditätInhalt <Mathematik>Weg <Topologie>InstantiierungQuick-SortMetrisches SystemWarteschlangePufferüberlaufBildschirmmaskeTupelWort <Informatik>Element <Gruppentheorie>Umsetzung <Informatik>Formation <Mathematik>SoftwareentwicklerCASE <Informatik>SchaltnetzGemeinsamer SpeicherBroadcastingverfahrenCoxeter-GruppeTrennschärfe <Statistik>Nichtlinearer OperatorFunktion <Mathematik>AdditionFunktionalEin-AusgabeTypentheorieMaßerweiterungInstallation <Informatik>QuantilfunktionVerschlingungMailing-ListeRichtungCodeVideokonferenzAppletApproximationLokales MinimumArithmetisches MittelMultiplikationsoperatorSyntaktische AnalyseSchlüsselverwaltungSummierbarkeitMusterspracheSchreiben <Datenverarbeitung>BitParallele SchnittstelleZählenGüte der AnpassungGruppenoperationMAPComputeranimationXMLUMLFlussdiagramm
DateiCoxeter-GruppeEindringerkennungBildschirmfensterSichtenkonzeptIntelThetafunktionKorrelationLineares GleichungssystemVersionsverwaltungDean-ZahlOffene MengeServerKartesische KoordinatenProgrammPROMKontrollstrukturZeitzoneNavigierenCodeHilfesystemDruckverlaufTeilbarkeitGebäude <Mathematik>Witt-AlgebraOrdnungsreduktionBaum <Mathematik>PartitionsfunktionSpeicherabzugLesezeichen <Internet>Graphische BenutzeroberflächeSinguläres IntegralBimodulATMTorusEin-AusgabeTupelProzess <Informatik>StellenringImplementierungGerichtete MengeStatistikSchlüsselverwaltungTransformation <Mathematik>GruppenoperationLokales MinimumStapeldateiEinfache GenauigkeitBitMultiplikationsoperatorDifferenteSoftwaretestAdditionMAPDatenverarbeitungMomentenproblemAppletTaskProgrammierparadigmaProgramm/QuellcodeXML
CodeBildschirmfensterNavigierenSichtenkonzeptDean-ZahlSinusfunktionPeer-to-Peer-NetzFunktion <Mathematik>SpeicherabzugGruppenkeimLesezeichen <Internet>Graphische BenutzeroberflächeProzess <Informatik>TupelEin-AusgabePartitionsfunktionSinguläres IntegralImplementierungKontextbezogenes SystemDateiFehlermeldungGebäude <Mathematik>MengenlehreZustandsdichteSyntaxbaumLineares GleichungssystemKappa-KoeffizientDruckverlaufCAN-BusInnerer PunktBimodulAppletHydrostatikIntelBaum <Mathematik>TeilbarkeitAnströmwinkelKanonische VertauschungsrelationLokalkonvexer RaumBildschirmmaskeProgrammDedekind-SchnittZahlenbereichVerschlingungCodeGruppenoperationPrimitive <Informatik>SchlüsselverwaltungVererbungshierarchieFitnessfunktionElektronischer ProgrammführerWort <Informatik>SoftwaretestMAPDateiTupelNichtlinearer OperatorKette <Mathematik>SchnelltasteFunktionalRechter WinkelGeradeElement <Gruppentheorie>Lambda-KalkülAppletMailing-ListeProgramm/QuellcodeXMLComputeranimation
Klasse <Mathematik>AppletZufallsgeneratorElektronischer ProgrammführerProgrammierungBrowserIntelSystem-on-ChipVersionsverwaltungKontrollstrukturAnwendungssoftwareTropfenDateiSatellitensystemÜberschallströmungGEDCOMPeer-to-Peer-NetzOrtsoperatorAppletSchlüsselverwaltungGruppenoperationThreadSpeicherabzugPaarvergleichLokales MinimumProgramm/Quellcode
SichtenkonzeptDateiNavigierenBildschirmfensterGebäude <Mathematik>CodeTeilbarkeitOrdnung <Mathematik>SchlüsselverwaltungVererbungshierarchieLesezeichen <Internet>Graphische BenutzeroberflächeObjekt <Kategorie>DatentypInterface <Schaltung>BenutzerschnittstellenverwaltungssystemSerielle SchnittstelleKlasse <Mathematik>AppletHydrostatikSchlüsselverwaltungKlasse <Mathematik>Transformation <Mathematik>GruppenoperationVerschlingungProgramm/QuellcodeXML
CodeTeilbarkeitZeichenketteWort <Informatik>TorusWeb SiteATMBildschirmfensterNavigierenSichtenkonzeptTOERuhmasseLesezeichen <Internet>GruppenkeimFunktion <Mathematik>Sampler <Musikinstrument>Ein-AusgabeExplorative DatenanalyseSukzessive ÜberrelaxationDateiGebäude <Mathematik>Ordnung <Mathematik>ZahlenbereichDivergente ReiheKraftfahrzeugmechatronikerCachingTopologieProgrammierungGraphische BenutzeroberflächeNummernsystemWeb logMAPOnline-KatalogSpeicherabzugKonfiguration <Informatik>PlastikkarteParallele SchnittstelleAnalogieschlussGruppoidOrdnungsreduktionPhasenumwandlungSystemaufrufDean-ZahlLaufzeitfehlerAggregatzustandSinguläres IntegralSoftwaretestBaum <Mathematik>Objektorientierte ProgrammierspracheAutomorphismusAdvanced Encryption StandardKette <Mathematik>Elektronischer ProgrammführerQuick-SortCASE <Informatik>CodeKeller <Informatik>PufferüberlaufProgrammierumgebungElement <Gruppentheorie>Ein-AusgabeMAPSchlüsselverwaltungZeichenketteÜberlagerung <Mathematik>TupelSystemaufrufMomentenproblemFormale SemantikWort <Informatik>Endliche ModelltheorieBitGüte der AnpassungGruppenoperationZweiLokales MinimumRechter WinkelProgramm/QuellcodeComputeranimationXML
TeilbarkeitBildschirmfensterGebäude <Mathematik>MAPMechatronikWeb logFunktion <Mathematik>NavigierenDateiSichtenkonzeptCodeAggregatzustandLiniengruppeZahlenbereichCachingEin-AusgabeTopologieKonfiguration <Informatik>Singuläres IntegralSpeicherabzugLaufzeitfehlerOnline-KatalogProgrammierungBaum <Mathematik>OrdnungsreduktionPhasenumwandlungAlgorithmusGruppenkeimOrdnung <Mathematik>Divergente ReiheAnalogieschlussGruppoidParallele SchnittstelleSoundverarbeitungSystemaufrufKiosksystemWeb SiteLesezeichen <Internet>RechenschieberMultiplikationsoperatorQuick-SortEreignishorizontProgramm/QuellcodeXMLFlussdiagrammComputeranimationTechnische ZeichnungDiagramm
KontrollstrukturProzess <Informatik>Multiplikationsoperatorp-BlockEreignishorizontPhysikalisches SystemElement <Gruppentheorie>FunktionalMAPWarteschlangeStapeldateiTransformation <Mathematik>Message-PassingBildschirmfensterBitComputeranimation
KontrollstrukturFunktion <Mathematik>GoogolCoxeter-GruppeBildschirmfensterAdvanced Encryption StandardEreignishorizontQuick-SortBildschirmfensterBitÜberlagerung <Mathematik>DifferenteCodeMultiplikationsoperatorKartesische KoordinatenMotion CapturingStreaming <Kommunikationstechnik>InstantiierungPhysikalisches SystemTeilbarkeitSoftwareFormale SemantikClientStrategisches SpielComputeranimationXML
EreignishorizontDynamische GeometrieProzess <Informatik>Coxeter-GruppeKontrollstrukturElektronisches WasserzeichenHeuristikIdeal <Mathematik>KombinatorFunktion <Mathematik>ZeichenketteEin-AusgabeSingularität <Mathematik>Web SiteQuick-SortMAPEreignishorizontMultiplikationsoperatorZählenOpen SourceWort <Informatik>GraphfärbungRichtungArithmetisches MittelSchreiben <Datenverarbeitung>Physikalisches SystemEin-AusgabeFunktion <Mathematik>Element <Gruppentheorie>Coxeter-GruppeSummierbarkeitSchaltnetzElektronisches WasserzeichenQuick-SortResultanteCodeGruppenoperationHeuristikInstantiierungWeb SiteGüte der AnpassungRechenschieberVerschlingungBildschirmfensterLokales MinimumAdditionMereologieNatürliche ZahlFigurierte ZahlProzess <Informatik>GraphiktablettDickeMusterspracheBildschirmmaskeGanze ZahlBitRechter WinkelDefaultPunktAnalytische MengePerfekte GruppeQuaderSoftwareentwicklerComputeranimation
PunktwolkeRechnernetzGoogolURLAppletMinkowski-MetrikWeb logMAPWeb SiteYouTubeSoftwareschwachstelleProzess <Informatik>ComputersicherheitE-MailQuick-SortMailing-ListeWeg <Topologie>RichtungMetrisches SystemInhalt <Mathematik>PufferüberlaufVollständiger VerbandOffene MengeMultiplikationsoperatorSoftwareentwicklerE-MailXML
Web SiteCoxeter-GruppeBitCoxeter-GruppeUmsetzung <Informatik>TypentheorieGemeinsamer SpeicherBildschirmmaskeVerschlingungBroadcastingverfahrenVideokonferenzComputeranimation
XMLUML
Transkript: Englisch(automatisch erzeugt)
Next up is a workshop about Apache Beam, specifically how to get started with Apache Beam. So this is gonna be a very interesting session
if you want to write data pipelines in Java, in Python and Go, in SQL. And the workshop is not given by me, it's presented by Austin Bennett. He is a cloud architect at Dish, and he is also a cognitive linguist and researcher
with interest in multimodal communication. Now, I don't know what that is, but he is pretty enthusiastic about Beam and yeah, looking forward to the workshop now. Thank you. Oh, hello everyone. All right, let's, oh, where am I gone? All right, cool.
So this is first steps with Apache Beam. We're going to walk through writing supportable pipelines with Java, Python, and a little bit of Go. More enthusiastic about Go than where it's at at the moment, but we will see that.
So, what we will cover today. I will walk you through an introduction. Here's what the Beam is all about. Let's get an overview. Let's certainly get hands-on. That's a prime takeaway. I hope at the end of this block,
you guys feel comfortable using Beam. We will take breaks throughout the lecture here for exercises or what we call katas. And then we'll dig into some of the windowing and time semantics, some about triggering and streams,
and also side inputs. I've mentioned all of these things, but ultimately we're gonna go through them very briefly. These things get quite advanced and I'm hoping to expose you to the fundamental building blocks that you need
to start being proficient. Naturally, we can't do it all in this short block. And then finally, we'll wrap this up just sharing some opportunities, ways to get involved with the Beam community, as well as some great things if you know Beam and otherwise.
Cool. So, shameless plug, at least at the moment, both Max and I helped with organizing Beam Summit, a couple day conference in August. Wanted to alert you if you guys like what you're hearing around Beam and makes sense, check it out.
It's free and whatnot. So, cool. Anyways, these exercises are going to be hands-on. Hopefully you guys have received a pre-email, but ultimately please download IntelliJ or PyCharm.
So, in the event, well, hopefully you can install either the community additions are free, you can use PyCharm Edu addition edu or you can use any paid version that you have.
This is a link here and maybe we should put that in the Q&A room and pin that. Otherwise, beam.opachi.org slash blog slash beamcata releases. These are a community resource in general for getting hands-on with Beam.
So, we'll walk through setup, but encouraging while this is going on, since downloads may take a few moments, try to download some version of IntelliJ if you wanna use Go or Java specifically, or you can also use Python in that
or get PyCharm if you only wanna use Python. Cool. All right, so, before we can get hands-on, you guys should probably know what Beam is about. So, I'm gonna walk you through the programming model and the vision for the project.
Okay, so, the Apache Beam vision, it's to be an open source, unified model for defining data processing pipelines, which allows you to write the pipeline in the language of your choice and run it with minimal effort on the execution engine of choice.
So, there's a lot of things there. So, I'm gonna slowly walk through and unpack that for you guys. So, I mean, specifically, hopefully, we have a sense of what data processing pipelines are, but even starting there. What's a pipeline?
I think we don't mean these sorts of pipelines, but ultimately, for data where we have some sort of, say, input, we may do some sort of transformations, and then we get to an output. So, a simple Beam data processing pipeline
starts with a source, I can't click there, that gets read into what is referred to as a P collection, so parallel collection of elements. We perform P transforms, parallel transformations, into, say, another P collection,
maybe do another transform, which gets stored as a P collection, and then written out to some sort of sync somewhere. So, we can get into the more specifics, but again, in Beam, what is a pipeline? It is a directed acyclic graph of data transformations
that are applied to one or more collections of data. So, again, as we said in Beam, P transformations, parallel transformations, P collections, the collections of data. These can be either unbounded, so streaming or bounded.
The super cool thing about Beam is the sort of transformation logic often that you're doing has no basis, I mean, you don't need to worry about whether or not it's bounded or batch data or unbounded slash streaming data.
Pipelines themselves can have multiple sources as we see here, or multiple syncs, and can do any sort of drafts there. So, the graph gets optimized based on the code and gets submitted to the runner of your choice.
So, again, I'm already, I guess, talking ahead to the P collection and P transforms. This is probably beaten in already by now. So, ideas around, here's our, again, sources and maybe combined.
We can, you know, various forms of distillation, ways we can partition data here or combine. So, that's roughly what data processing pipelines are.
So, what do we mean by this unified model? So, this is the cool thing where, again, we can have batch processing or stream processing through Apache Beam model, as well as its SDKs, and we can run that on many of the runners,
any of the supported backends. So, that we probably already had a few sessions around Apache Flink here. Apache Spark is another well-known one. There is a direct pipeline, which really just use it for testing. Don't try to write production jobs with it. It is not meant to,
isn't certified for scalability. And then, naturally, Beam is often conflated with Google Cloud Dataflow to use that managed cloud service. You would write Beam jobs and submit them to the Google Cloud Dataflow backend.
So, Beam is the, you know, SDK layer there. Beam allows you to use the language of choice. So, we can, we're gonna talk about, especially Python and Java. There are some Go examples to work through.
Recently, Kotlin has a bunch of katas available, so I'll also show you those. I'm not a competent Kotlin developer, so I can't speak to writing that too well, although I look forward to playing with it,
something that maybe you might prefer over Java. There's also Beam SQL, and you may recognize this as Scala. So, there is a Scala layer open sourced by Spotify called Shio, written on top of the Java SDK.
So, a different way of looking at this thing, Beam is a single model that supports multiple modes of processing, that there are multiple SDKs in various language that you would write and submit to various runners. So, again, these are the more well-known ones,
but there are several others. The pipeline runners, we can see here. So, again, the direct runner, Flink, LinkedIn especially runs Beam on SAMSA.
Spark, as we said, AWS had given a talk at last year's Beam Summit, showing how you can submit Beam jobs either to their own AWS EMR clusters or to the Kinesis Data Analytics. Hazelcast has a runner called Jet.
Google Cloud Dataflow was mentioned. IBM Streams supports submitting Beam jobs too, and there's a, I think it's just incubating still, Apache Project called Nemo. I think that's the range of runners at the moment,
at least that are well-known and still in active development. So, another way that this is talked about here, the Apache Beam ecosystem, again, a unified programming model. It is portable, given you can write the same Beam code
and run it on Flink or Spark or Dataflow, for instance. I know that was my biggest reason for seeing the promise is not being tied to the sort of cluster or infrastructure that my ops team wanted to maintain for me.
It's also portable from the language perspective. Using the Apache Beam model, I can write with a bunch of languages as well as cross language, potentially multiple languages in the same pipeline,
which is work well underway. It's very extensible. As I mentioned, supports DSLs like SQL or Scala and has a ton of IOs with a ton of the major open source, data stores and APIs,
as well as integrates with the major clouds. Okay, so that was a lot on seeing the same thing like four times. So, hopefully that makes a good enough sense.
So, want to talk about now the intricacies of pipelines themselves. A pipeline can be used, or I mean, is written to describe four things. We're gonna use this color code convention throughout. So what are we computing? That's gonna be the primary focus here.
And then where in event time, when in processing time and how do refinements relate? So what is the sort of transformations? Then there is also windowing, which is about where that occurs. There is watermarks and triggers.
So when does that occur? And there is the accumulation on how are we accumulating results? This is a very useful link to pay attention to. I can't pull up that from these slides.
Beam.Apache.org documentation, runner's compatibility matrix. So that's also pretty findable just from the beam.apache.org website or searching beam compatibility matrix. This just tells you the sort of functions and what aspects of the Beam SDK are implemented
and working well in each of the various runners. So as you can see, most runners do a rather good job of implementing the full suite of semantics from the Beam SDK
or Beam model, I mean, but naturally check here if there's questions. All right, cool. That was a bunch of stuff. So I'll give us one second to let that sink in.
And I'm also gonna check the QA room, although I believe that Max is in there doing, cool, answering questions and pointing us to where we want to be.
So cool. All right, let's start talking about writing a pipeline. Where are we? And then pretty soon we will wind up getting hands-on. So again, writing a pipeline,
we want to focus on this transformation logic and what are we computing first. So we can talk about element wise transformations, which are equivalent to a map and map reduce style. We can talk about the aggregating functions, which then essentially are reduced
and there are composite transformations we can do, which is using combinations of various bits. So we'll walk through all of those. At a first pass, we're going to talk about element wise transformations and then get moving with hands-on
element wise transformations that you can write yourself using Beam. So the primitive two of most things in Beam is a Pardue, so parallel do function. It performs, oh, much better here.
I changed my screen, nice. So it performs a, I mean, it's how you do things parallely. Write things with the Beam model and you can see.
So for instance, here is an example of an element wise transformation in Java. So we are taking the pipeline, let's go here. We have a pipeline that we are creating. We have an input based on our elements
and we have a Pardue of a do function. In this case, we are creating a key value pair of what will be the first letter and the word based on the input collection of the various supported backends.
We'll have some examples of things like this hands-on in a second. So I won't unpack it too much. We also have element wise transformations. So in Python, this is just the first letter.
Take the input collection and return the first letter. So it's as simple as creating a class with a process method and returning what you want. In Python, you will see we use this pipe operator to essentially pipe or chain our functions together
and the Unix style. And here is an example of the Go code. Again, first letter function, write the function, have a Beam.Pardue with the context, the function
and what you're working with. So again, we'll dig into that in a moment. A little bit more on element wise transformations. Not only is Pardues one-to-one, but Pardues can output zero, one, as we said, or many. So for instance, we could explode prefixes
and say for the word SAMSA as the runner, we could go SSA, SAM, SAMCA, for instance, where we have a ton of outputs for each input element. Alternately, we can filter where we have less elements.
So cool. That is the bare fundamentals that I think important to know this moment before I let you guys get hands on here. So I'm hoping maybe you guys have had the chance
to install IntelliJ, right? And we can even Community Edition, right? So for instance, searching Google or whatnot, there is this Community Edition that you can download.
I'm clearly on a Mac here as it shows, and that will run download and then install. So hoping you guys have done that. The next step to Lookout,
I will pull up my version of IntelliJ.
Okay, so once you have IntelliJ downloaded as the instructions show in the, let's go, Beam Kata's blog, Beam Kata's release.
So in the chat room, I believe we have a link to this. This page should show you how to get set up. But additionally, I will walk you through that, which is then, let's see, okay.
So once you have, say, IntelliJ installed here, you can go to Configure Plugins and look for the EduTools plugin.
And in this case, it's already installed, but you would click the Install button to get what you need. You can see I've already installed and played with these last night, for instance. Once the EduTools plugins are installed,
it brings up this Learn and Teach dialogue. So we can go to Browse Courses. And if you type Beam, you will see we have Beam Kata's for Python, Java.
These two are very thorough. Recently, some Golang Kata's have been added. I even wrote one for Flatten, I believe last week. And I just saw these Kotlin ones. So feel free to give those a shot.
I'm unsure how familiar Max is with Kotlin. This is not something I anticipate us covering, but know that it's there. And if that's a language you prefer, naturally something you can do in Beam here. So for instance, I think this is gonna create a copy,
but that's okay.
So that is bringing these things up. For those wanting to get, oh, good, it didn't actually have it. Maybe it'll figure it out. Okay, one thing I,
it is possible that for in IntelliJ specifically, and maybe in PyCharm, if you haven't been using Beam before, to use these, you naturally need to install the Apache Beam package.
So that can be done by pip install, Apache dash Beam, or as you can see here, I'm noticing an error and I'm, and I can use, let's let that come up.
Oh, it did eventually find it. So I did have it on my system, but that is, that's a common problem otherwise when you're trying to do this. So I am hoping that in these various exercises,
that we can walk through that you guys essentially get set up. Let me delete, these are the existing ones,
but I'm hoping that you guys have the environment set up. Let us know if you have troubles and we're here and should be able to help debug. And otherwise, please try the intro, which is just a beam.create to get things up and running.
And from there also, please do the initial parallel do. So let's clean up my environment, but in the Python bit, please try to get through intro here and hello beam.
And also there's many sections in the map section, and try to get through just the first part of your task. So I'm gonna give you guys a few minutes.
I'll set a timer here. Let's check back and let's call it 10 minutes to ensure you can download and get set up.
If there's some common questions, I may come back on over the speaker, but in general, I think these are the setup instructions and you see the timer countdown. Please work on these exercises and let us know if you have any questions.
All right, let's get this microphone back on. So it seems like Johannes at least wanted to verify
what we're supposed to do here, since the solution is kind of already in here. Oh, cool. If it's already passing, then you're good to go. I was just redownloading these.
Actually, I'm gonna ask what language are you all using? No reason to dig into solutions too much here. Oh, interesting. Well, so real quick, what language is everybody using?
Cause I don't wanna necessarily walk you too often through all of these languages if nobody's using. Cool, let's do both. I figured at least both Java and Python initially. I see both of you here and awesome.
It's also fantastic to see some life in response. This is kind of eerie, just broadcasting to people. So I really appreciate even just seeing some messages here. Cool, yeah, let's work through both then.
Let me pull this out. Cool, tell it Jay coming up. I am in Cata's Java. I hopefully am redownloading that and indexing.
It is over here, it is loading, awesome. So here is, oh, it's hiding up here on my other monitor.
So we have this loading. Since in general Python is a little more simple,
I want to run that first, although I need to let this process continue running. It's building and configuring the environment as you guys probably experienced.
So that will be right up here. Looks like we're good with Java and we have Python going, Python loaded.
Oh man, I wanted this to delete, but it did not. So I will, I guess, show you the solution here for Python and I will delete it after the next exercise
so we can work through it together, but it's already saved here, so, oh well. So for the introductory task, and again, how to, or I mean, I guess to start how to use this. For any pipeline, we absolutely have to install Apache Beam in this.
Oh, the descriptions have fantastic details. So, you know, follow them. They'll give you some bit on, all right, here's the overview. That one was less useful. Let's make this a little smaller. Okay, you know, write a simple pipeline
that takes HelloBeam so we can get taken to here's create, the create function and how it works. And here is, for example, in memory, so we can see we have, say, beam.create and the important bit here is in an array or list, right?
So the create function. So this has already passed. I didn't get to work through it, but we can see that this is what's needed for the HelloBeam.
I probably should have just walked through that for the use of this interface since this is probably new to a ton of you. And again, I'll delete and clear out these. I wanted to have made sure that worked before.
So for Python here, the initial parallel do, we need to overwrite, this came with nothing here. So we needed to create a process method in our class,
which is a do function. We needed to, it suggested to use a pardew with our do fun, which we then down here had a to do item here, which was then called beam.pardew and add our multiply, have the parallel do,
do our multiply by 10 do fun. And the Beam programming guide is a fantastic resource. So I'll probably pull that up often and just while you guys are getting going with this, just Apache Beam programming guide, this is your friend.
When I got started working with this, I made a point of at least skimming this programming guide every few days just to sync it in. Cause this really was a weird way of thinking about things for some reason for me, although ultimately it comes together.
So the solution is that simple, that it doesn't require a lot of weird programming in the sense of things you wouldn't be familiar with, take an element and multiply it by 10. But the point of this exercise is knowing
to call a parallel do and with your do fun, which is a, again, a class of a beam.do fun. So again, with the check, it will pass since this was already written. Let's see, Johannes though.
Oh, okay. And Paul has a nice comment here as well. So you're creating the collection and apply.
Yeah, that's a fair point here on the way. So Paul, I take your comment here on maybe we could improve the wording of the katas. I'm gonna even just take a screenshot of that
and provide that. I dare even digress a little bit for those super curious Apache Beam GitHub.
We have our Beam repo. We have the katas are in learning slash katas. And for instance, in Java, this was a core trends. No, this was a common trends core transform.
This was a map and a part do and maybe the task info. Task.md. So Paul, I think you were critiquing the wording here. I think it is, oh no, this was the intro.
So I'll find that somewhere. I thought it would be right in here. Interesting.
I'm curious why it's not there, but. Oh, cool. Paul looks like you got it figured out. I guess the point I was trying to make is Beam is a completely open source project and even these exercises are.
So if there's something that looks like it needs improved, Henry is the original author of these katas, the one that made that blog post, but many other people have contributed either exercises, wording, et cetera. So pull requests are even welcome if you find things that could be improved.
Naturally, I can't say to whether or not there's a madness in the specific wording, but making suggestions would, it's totally reasonable there. All right, so sorry for that digression people, but also I guess cool to know that this,
even this learning platform is wholly open. So I think this walked us through the multiply by 10, do fun and again, the beam.create inside of the array. Again, I will delete my history so that I can walk through this live in the next round.
Java should be hopefully deleted here. Let's see, shoot. Okay.
Anyways, maybe that's cause I am logged into where my history is still here. All right, so the for Java, oh, and why don't we just look at this. If you guys are working through
and find yourself kind of stuck, maybe I shouldn't be giving you this little answer. So you can see if I go through, maybe I actually tried writing some code, but it didn't work. I could get the incorrect answer, but there is a little peak solution, which then pulls up a bit here,
which would walk through what the answer, at least a form of the answer, right? There are potentially several ways of writing things like a split function or getting a first character from a string or whatever.
And the katas are just checking the result of the pipeline which realistically in practice would be all that you care about or primarily what you care about. So it's smart enough to be able to check that. So a goal also for sharing this
is being aware of this as a platform so that you can work through exercises on your own. I will, cool. So let's talk through the Hello Beam here.
So there's plenty of text. The overview as usual, we can use create, which is the Java docs, actually get used to those. And the example, again, as before from Java,
here's how to create from in-memory data. Later on, we will start working with text IO, a more realistic pipeline where we would take a file, for instance, which I guess is good.
I don't think I have slides. So beam.apache.org IOs, built-in IO transformation. So that you guys are aware, Beam doesn't just work with, Beam does not only work with local input
in pipeline things. There's file IOs, Avro, text IO, TF records for TensorFlow, XML, Tikka, Parquet, Thrift, HDFS, GCS, S3, the local file system.
And then for messaging, things like Kinesis, Kafka, JMS, et cetera. There's a whole built-in ecosystem and then a ton of databases as well. There's formalized IO transformations in progress.
If you're curious, you wanna use something, you can follow the JIRA issues. And also it's quite extensible. So if you need to write to something and it doesn't exist, if you're used to dealing with the say APIs of your given database,
but there's not an IO already written, it's something that you can do yourself. So for instance, there's not a great Firestore IO at the moment, but it's straightforward enough to work with that API in Beam, which is something I wrote recently at work.
Cool, all right. So another digression there back to the Java pipeline. If I recall, it was a return to do was the example. So based on the docs, we are saying pipeline.apply,
pipeline, which we wind up having from here. We're applying the create function that we have, which is creating of the element, in this case, a single string of Hello Beam.
So the check there will pass hopefully. Oh, congrats me. And then, oh, shoot.
Pardue, so we have our test.java. So in this same vein, we want to use the annotation process element.
So we, I think it was to do right around here and write this whole thing. So we have our input, which is where we call it, apply transform. So our input is based on these numbers, which are created from here, but we want to apply a parallel do function
of a do function, which, you know, is integers here where we're processing the element and we're outputting a number times 10. So multiplying it by itself.
Again, there's, you know, get to know these docs, parallel do, do fun, what you need to be doing here. I guess just to highlight, let's see what that is, cause that should be, very nice.
Okay, so that should walk us through at least the very basics of our first parallel do. There are a ton of other exercises we're gonna walk through, eventually spattered in with a whole bunch more information about what Beam can do
and then eventually getting into streams. So I guess, oh yeah, that's a good question for everybody. That's my, that's my thought is to move on, but I would be, let's use this for a second as a QA,
is everyone good in general? I will assume without a message in the next moment that we'll dive into some more lecture. Cool. And Johannes, I did just see your thumbs up,
but did you get your question answered earlier? Was it, you said it was passing already. So does that mean it was, you didn't actually have to type the code? Anyways, all right, so we, it looks like we have some thumbs up.
Oh, cool, you're responding. So that will, okay, all the tasks are already done for you, I guess, congrats on completing the course. Good job. And I think in that case there,
I've worked through these on several occasions. Somewhere there is, oh, I think it might even be, no, not there. There is a login to a,
nice, I think there's a login to septic, which is the online course platform where these are stored. So I would imagine somehow using a different user or becoming yourself maybe, maybe the like generic user is already,
already passed everything or the anonymous one. So if you maybe log in, it will kind of remove the history. I'm unsure, I know I've loaded these on various occasions and it's disappeared,
although I'm now annoyed that it's not. So, all right, anyways, time to move on. We're actually halfway through, so yay us. Oh, cool, Max is helping Paul there.
All right, so let's get back to the slides. The hope here is to keep peppering in some things about Beam and give you guys time. One thing I guess I, again, I really, really wanna reinforce is, although something like parallel do
and some of these transformations may seem super basic, but they're actually so fundamental and a whole lot of these things when added together, although each of these parts, again, are very, very simple, they add up to being very, very powerful.
So, okay. So we just walked, oh, also, feel free to play with exercises, especially if you have multiple screens. I really like the seeing the map, beam.map exercises
because they're fundamentally the same output, although it accomplishes the same thing as the parallel do. I haven't tried to run testing on whether or not it makes a difference from a performance perspective.
Cool, all right, so let me grab a splash water. Friends of parallel do, Max will answer that, Andrew, and if not, I'll point you to some more resources
in a second. Yeah, I think you could even go as far as take Andrew, I guess, to hazard an answer who asks how to write this outside of the IDE. I think you could even take the Python, for instance, code
or the Java code, for instance, and compile it or run Python script on the script itself. So we can pull up some examples for that. Cool, yeah, Max is answering that.
All right, the friends of parallel do. The SDK has a, how do I go full screen? I guess this is good enough.
So the SDK has a ton of other element wise transformations, which actually can get us very far. We did talk about parallel do's, we talked about filters briefly, we know that we can map and flat map elements.
Oh, yeah, here, here's some code examples actually, right. All right, so we saw part do's, we can filter. So for instance, filtering by, you know, I want to filter things that start with S. That was a Java example, here's a Python example.
There is, and here's the go example. So we didn't get into going, I didn't hear anybody saying they were worried about go. Another worthwhile point is that go is not too ready yet.
So don't think you're gonna write production streaming go pipelines right now. It's, I was just talking with one of the primary go authors who wanted to make sure to highlight that fact. It is pretty okay for batch pipelines,
but it's still considered experimental. So I was optimistic in including go it is, but it is part of the ecosystem and where we'll be gone. All right, back to friends of part do, filter example,
we can map elements. So you'll see that in the same sort of examples in the katas here, map elements with Java and or with Python. So this is create that key value pair from the bit here.
There is flat map example, for instance, also a flat map in Python. We can get the keys for instance, with keys, or we can also use, get the keys or values out of things.
So those are many of the element wise transformations available in Beam. Next, just doing element wise transformation is great. And there's even a, you know, a good justification
for just moving and transforming things one by one using Beam. I question at times whether or not running a, you know, in this age of cloud compute, whether running a full pipeline is a good idea
with persistent compute in the age of serverless technologies, if I'm only doing element wise transformations, if using the built-in IOs, there's a good case for how it can handle checkpointing, for instance, on streaming runners.
But if not, a ton of the power gets in with things like aggregation and, you know, forms of stateful streaming. So one of the most common grouping transformation, I believe that occurs is group by key. So I'll have you guys work through an example
of group by key here pretty shortly. So grouping transformations again, here would be, go through and take of key value, pairs where we have maybe already split as we did in the past of a single input,
say of SAMSA there where we then have a key value pair of the first letter S and the full word SAMSA, for instance. And then if we wanted to run the Beam operation group by key, we could then have grouped with the key S
and we'd have both SAMSA and Spark. So here is examples of the Java, for instance, input be a P collection and applying to that P collection in a group by key and in Python, taking the input,
which is a P collection and using beam.groupByKey. In addition, when we are running group by keys, we also may wanna do something intelligent with that. So for instance, we may wanna write a Pardue function
or a DoFn ourselves, for instance, top in iterable. So in this case, we would take the key values we had before, we have our group by key where we do that, but then based on that, we might wanna find the most frequently occurring or top result.
So in that case, we could write, again, a parallel do that looks through each of the values and figures that out for us. But in the case, the group by key is followed by Pardue can often be simplified and optimized by leveraging something like combine.
So in this case, here's an example of doing the same thing from the same sort of input, but instead running a combined per key. Oh, I hit that, let's, all right, cool.
So we have that. Again, grouping transformations, then if we're running something like count and compare, could be a combined function that counts words and then is extracting the top however many.
We're really able to write any of our own sort of operations that we want, but to an extent, make sure to know the SDK. No reason to write your own if the community's written something
that's gonna work for you that has also probably been thoroughly tested and optimized. So of a bunch of the combiners, there are things like top per key, the count per key, the sum of longs per key, quantiles, approximation algorithms,
min, max, mins, for instance. So look into the SDK documentation and understand the pieces at your disposal. Okay, so also in addition to the map
and essentially reduce operations, just think about how you can combine things together. So for instance, we have multiple outputs that are possible. We talked about that at the very first stage. Cool, so a very common reason for using multiple outputs
in a pipeline is a form of a dead letter queues, for instance, say the input is not good or otherwise I can't parse it properly, kind of validate things
and continue through the pipeline if things are fine and if not, write it and log it somewhere. So some example code for that for Java is found here. In this case, we're using things like, I can't highlight the code. Okay, so the tuple tags where we're trying to process it
and if the validate element is good, apply the success tag and if not, give it the dead letter tag, which then we can based on the, I'll put tuple.
set up the various P collections for success or dead letters that we want to do things with. So the pattern is also doable in Python in this way again where we're trying and accepting here yielding what we're looking for and the
example here for Go is this. Very interesting I find with Go is you don't just write Pardos and for instance there's Pardu2 when you have a parallel do function with two outputs and the SDK supports up to Pardu7 for the various
forms you can get into. So I think that's a good amount of bit on aggregating and reusable patterns here so I wanted to give you guys a little bit of time to
start working through group by key for instance and if you also have already worked through that you know there's a ton of other stuff there and again Max and I are going to be here for a while so I'm gonna set another timer for
10 minutes and give you guys some time to digest what was just shared handle what you need to and do some exercises. Well so again we will let's
hold that okay all right I am interesting see this Johannes we can
dig into that yeah let's either save that or Max do you have a good way to chat or Johannes's question on Beam and NiFi I actually don't have much experience so I don't feel like I am well positioned to talk about it so
let's hope Max has something he can chat or I also welcome him coming on stage either now or at the end to discuss your question there cool so we
just allocated some time for group by keys I am in them I tried downloading now IntelliJ what do we got that's possible Nina let me figure that out in
a bit okay so since we were having some concerns with freshness of our
katas or if they were already installed I tried downloading the community edition of IntelliJ to see if a totally different IntelliJ doesn't have my saved history so I'm hoping that is the case but we'll find out
here in a second so for the next few moments I'm gonna hope to walk through group by key as another really fundamental bit and then we're all get into streaming semantics overview like the real reason I think a whole lot of
you probably are interested in and care about beam it's less although we all need to do lots of batch data processing this your fact that we can learn this single programming model in a way to accomplish both of these tests
is to me super powerful okay so let's see it is loading fingers crossed oh yes okay so for who was that again Oh Yohannes okay so I have downloaded
the IntelliJ community edition and reinstalled following the instructions and it looks like it has cleared out my you know my history I guess so let me
dig into group by key so that was what I was hoping you guys had worked through in addition to whatever else again this is super self-contained so hopefully something that you find value with working through some of these others later so the task here again I'm gonna go through Python first and then
we'll go into Java group by key it's a beam transformation we want to I mean I think you guys actually have a good a decent sense of what group by key is that will pull up things like this group by key documentation in the
Python docs so I'll pull up the same interesting that's kind of funny implement and the hint is pulls up the hint number one pulls up the same link that was already there so I guess a cue to say hey hint read what we share
but again as I've shared the programming guide super handy think about the primitives that you want to use and this should pull up a group by key here so here is the fit here's what we're trying to do is their code
examples here though do fun lifecycle group by key so no this is I guess talking about okay and then it gets into code group like you so that's a
good note for maybe we could use a code example in the programming guide for a group by key anyways okay so what we want to do we did look at
hard-dos but since we didn't talk about beam dot map so I found this confusing because I tend to look for shortcuts when I first did this so I see a single line here so I'm under the impression that I would only write a single
function well actually to do group by key we need a key value pair so first I need to write a beam dot map where let's say I do a lambda word where I'm
going to return a tuple of word at the first character along with the word itself right and then I will want to pipe that to beam dot group by key so
again then the thought is this will create a key of a okay so ultimately we'll wind up with a with then a list of Apple and and to be with a list of bear and ball and see with a list of car and cheetah the way these
are done I also had you know re reverse engineered this if you will or whatnot log elements is how they caught us are set up to do this we can get into how to just write these out to text files which makes it easier to explore exactly what is going on so let's test this here and see how we
did fantastic right so again that was you know I get I got hung up by that and again if you guys that are working with Python you can kind of chain these all together with operations but it's super I like the
Unix pipe delimiters here so that addressed group by key here for for Python and I will now pull up the group by key for or Java let's go beam let's
see how we doing with questions and it does look like as that is coming up is your question on beam and I five max answered in a thread a little below it
but it doesn't look like I guess either of us are that well prepared to speak to that naturally yeah I don't wanna I like beam and I don't want to talk about comparisons to things that I'm not well versed enough because I'm
obviously I guess already somewhat biased by my position here cool all right so here is the beam katas for Java which are loading so that will
come up here can I close this okay so this is the hello which we don't
want to close and reopen all right there we go core transforms group by key fantastic all the way deep in there and we don't have any this isn't
so we can work through this okay okay so I need to write my my apply
transformation which is a group by key pulls up that Doc's there it is a let's you know the important thing is the key value class of possible values in the beam SDK we have a link to that the group by key which pulls up the
same sort of documentation and again it takes us to the programming guide which does not have actual code examples in this case which is also good I mean naturally we can you know search stack overflow and the whatnot
so we're by convention we use input and in this case we want to take input and apply we're gonna map elements oh I'm gonna restart this IDE just so
that it cleans itself up a little so let's map elements into so we need
key values of strings and strings okay and then we want to do that by word
key value of word we can do carer or subs during some strings zero one word
so we're creating that tuple do I have all these things right I think I
have that looks right and then I want to apply so this is like we did in
the Python so we are mapping the elements into a key value of string and
string type where we're taking the key value of the first character with the word so we're creating the key value pairs and then we're applying the group by key and creating it so if this gets out of my way I believe that does
what we need so let's great old didn't do what I need okay well let's see how the solution looks input dot apply map elements key value of strings
via word key value of and group by create so I'm gonna just count this as okay this is because I haven't configured Gradle on this new
community edition bit for now since you know I just installed it to try to get the exercise is clear cool Johannes is typing something so as Paul maybe anybody some questions that we can answer right at the moment and
otherwise I will try to run through some of the basics on streaming semantics and how to think of stateful streaming and reason about it and beam but let's answer some concrete questions around this specific exercise if any yeah okay yeah honest let's um let's save that for some of
the QA either right at the end or in a I think Nina mentioned that there's a public room for some of these concrete additional questions that either that we
can dig into so let me try to get through the some of the high level stuff to give you the broad strokes of the model for now and then let's take it from there looks like nice Paul sorry for being unfair I you're right
did not did not cover map elements at least in you know more than me
mentioning it for a second also I'm you know I am kind of jumping ahead and around because I want to give you guys enough of what I think are the some fundamental pieces so awesome and good call as max says do funds are
everything are you know most of what we need although then you know for anything custom and if not we can you know check into some of the built-in functionality cool let me get back to these slides slides over here okay cool
so I'm gonna take you through a whirlwind of windowing time triggers and that sort of thing um you know here's just ways to think about it we're not gonna get hands-on in the next what do we got like 25 minutes or something so
and then we can dig into some of I think I anticipate it sounds like an audience especially interested in getting these things up and running in a real sense so we'll make sure to find time to discover that so given also
what it sounds like the audience I'll try to breeze through these things since this may be familiar and I should do some sort of polling in the future to gauge the audience all right so we have you know where in event time is what we're gonna concern ourself when we're talking about windowing in time and you know why does that matter so we may have data we may have data
that's big it may be super big over you know lots of days it might be infinitely big it's just you know always coming so the unbounded bit so especially when it is unbounded or streaming there may be unknown delays so
an event from 8 may come right at 8 it may come at 815 there or you know it may not arrive as far as event time to our system until you know after 2 p.m. so we walk through the map and
Purdue functions in the sense of how we can do elements wise transformations that lets us doesn't do things to the elements but time is not particularly relevant when doing element wise transformation we then and the
most common way of dealing with time is just dealing with processing time chunks so you know historically that was nightly you know or daily batch jobs you know more often now it is say in hourly chunks and it's really
only concerned with processing time windows so when the events are in our message queue in our system not you know when it occurred so in the 9 to 10
o'clock block as well as in the 1 to 2 p.m. block we may have events from 8 a.m. so we a different way of thinking about this and this you know hopefully is quite intuitive to you although you know sometimes takes a
while to reverse thinking from you know just taking events as they come versus I want to take the events as they I want to recreate the event time or at least make sense of like knowing what happened as it occurred in event
time rather than just letting things like network delays or a device shutting off and not sending the event until it was turned back on for it to arrive at my system to have a sense of what was going on so we can also window
events so when doing is a way to divide data into event time based chunks so we can do fixed windows we can do sliding windows as different ways to capture things around event time we you know in the world that I
deal with which is television so naturally streaming or real-time applications but trying to recreate things like user sessions for instance
to know bit so we can't know when session is going to end but you know have to deal with all sorts of various bits and again like I said clients dying and recreating how to then figure it's out so be aware that we can have various windowing strategies will show some code on how beam handles those
semantics there is also triggers and as it relates to stream so I can't help but give a shout out to the streaming systems book that's over here to all I don't need to grab it I really wish they had gone with this amazing dinosaur
cover which is in the book but anyways you know the fish is also quite you know apropos so a lot of this stuff is covered in a lot more depth if you
want to dig into the way to reason about this again just aiming for high level here so if we have our unbounded P collections not only do we want to talk about where things are occurring so we're in event times but then we want
to talk about triggers and for when that happens relative to processing time so for instance we may want to trigger a watermark where we don't specify then anything too specific there's just the heuristic sort of watermark that he
manages where we want to use a fixed fixed windowing as we talked about with windows and where we want to just some integer so here's a way to how that starts to happen triggers can control when we start to output the
aggregation on the right would be the heuristic watermark because we which is the default so essentially when the system estimates that the window is
it's hard to know exactly when or what to do with the sort of late arriving data so we do have ways to control that also and again on the left here is the notion of the perfect watermark meaning if it captured all data I guess
essentially as soon as it became available so in addition to the sort of heuristic watermark and again you can see the color coding right we're using blue for windowing and green for the sort of trigger code as well as I
think that was supposed to be red but it's yellow for the summation anyways oh no that's right all right so we can define our when things are happening if we want to say use early and late firings which then will help
us have additional complete results where we don't necessarily discard for instance allowing late firings at count if we want to update our sums based on that as well as for instance with early firings start to accumulate those
results as soon as we have them without waiting these are all sort of trade-offs that you'll potentially need to make depending on materializing results say
too soon that are incorrect or what to do at what point with how late of data so this is then the perfect watermark again is on the left with the heuristic watermark and with that sort of early and late firing code
configured the heart the boxes there once they turn blue are when the trigger winds up firing as well as then the resultant score or summation occurring in each window so cool let's not yeah let's make sure we leave
time other kinds of triggers there's a ton of we can trigger on element counts in a window we can configure on processing time we can trigger on various combinations I can get super nuanced with triggers yeah figure out
how quick and important your sort of aggregations our combinations are and you know is and how you're dealing with wherever you're viewing those
results okay so also to be aware of side inputs is supported so for instance is a way to mean I view that as a form of join you can do take extra inputs on the side to yeah I mean essentially join so either here in a
second we'll look at some example code for even looking through all elements as part of a collection or here we can just have the code on where we're looking at all of the max work grabbing the max word length based
on all elements in the collection and then using that to pad all of the words for output for instance but also side inputs are common for I need
to do some sort of kind of streaming lookup so yeah be aware of that as a very common pattern and something supported I guess worth mentioning the Scala API she'll here's you know no presentation would be complete without a
form of word count wanting to say thanks to this is an open source community thanks to the people building beam and that I you know took slides from that we share we're pretty short on time so I'm gonna avoid just
sitting here in the last 10 minutes letting you work independently I'll walk through the next little bits and save some time for questions you guys all have intelligent downloaded now so work through those let's see other important bits no katakota time I mentioned opportunities hey I work for
dish they're hiring check it out Shopify is some guys I met at the last beam summit they're hiring specifically a group at this link here for so senior analytics developer merchant data otherwise search
specifically for those that think about flink and or beam and specifically that are gonna write beam pipelines so I don't I see more and more things happening in that direction of companies recognizing and actively
seeking those with that can write beam pipelines so you know my natural sales pitch on hey it makes sense to learn guys want to suggest get involved I should make a better slide here but check out the beam Apache org website that has a ton of good stuff and we went there a ton of time you know here
it is check out lots of oh good we have quick starts for all cool be aware
I think community check out the contact us so there is a user and dev list especially are relevant if you're trying to use beam and run into trouble hit up the user mailing list there's a bunch of users there as well
as developers that may help you out be aware there's a slack channel you can get into the ASF slack and people may help but ultimately one of your best bets is stack overflow not only because it's great but there's some of the people actually building it use that as some metrics for keeping track
of you know open issues answer questions and whatnot there so that's probably one of your better places to also get questions answered so yeah check out slack join the community check out the dev list also so if you
want to learn directions that things are going or get involved with check that out also would love to hear what I can do better I've offered this sort of workshop a couple times but if there's some sort of buzzwords feedback
what what sort of content would be better to focus on probably work out some of the kinks with install and whatnot but let me know how I can do better and closing on beam summit again come check it out that's org although
it doesn't know yeah check in summit org out and come join I'm gonna check out these chat room and start thinking about what get back to
slack slack cool yeah that's a reasonable bit we have a couple minutes
max you're asking if we should start the breakout now yeah we should have some conversations and answer some questions does that mean going into our own room somewhere and max if not come on into the conversation here also hi
here I am great presentation yeah I think there were a lot of interesting questions in the in the slack channel and it's probably easier if we could
questions here if there are more questions in written form otherwise probably easier to just like video conference and ask questions directly seems sensible so yeah I guess can we just say anybody anybody type out a
question if you really want one just brought unanswered broadcast to you and if not we'll share the breakout room link or do we already have that we do have the link yes I'll post it again okay yeah you know also posted it in
the slack channel cool I guess we'll end it here then and join the breakout session yeah and so for those of you with some questions come
join us there so we can just talk through things also since I don't know if whether this was getting recorded or not I suspect the breakout room isn't so I can be you know share more naive thoughts and whatnot there so
cool see you guys over there and ask max stuff especially since he really knows these things and is building