We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Private Data Anonymization with Python, Fundamentals

00:00

Formal Metadata

Title
Private Data Anonymization with Python, Fundamentals
Title of Series
Number of Parts
141
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How to bring large legal document repositories into the public domain without releasing private data? The fundamental concepts behind document anonymization are entity recognition, masking type, and pseudoanonymization. Using python language and a collection of libraries such as spacy, pytorch, and others we can achieve good scores of anonymization. How is this applied within a flow containing AI models for NER? Once anonymized how to improve the result by doing more text mining with python based apps and human in the loop. Although it was approved in 2016, the application of the GDPR at the European level remains a challenge in banking, legal, and other contexts. This talk covers the process of transforming pdf and docx documents into xml, processing them using regexp and spacy/torch models, and how to parse these results using AntConc and Textacy. All the ideas will be supported with the real experience of the MAPA project a European project for anonymization finished in 2022.
Smith chartPersonal area networkBitExecution unitInformation privacyInternetworkingState of matterPhysical systemXMLComputer animationLecture/Conference
Advanced Boolean Expression LanguageCategory of beingChatterbotContext awarenessSystem identificationService (economics)INTEGRALFormal languageTranslation (relic)RandomizationExpressionLocal ringEndliche ModelltheorieAssociative propertyConnected spaceDemo (music)Software development kitInformationControl flowWebsiteArtificial neural networkDeterminantPattern recognitionNatural languagePower (physics)PhysicalismSet (mathematics)AuthenticationProcess (computing)Projective planeInformation privacyOrder (biology)Cartesian coordinate systemSystem administratorSoftware maintenanceRegulator geneEnterprise architectureProfil (magazine)Observational studyCollaborationismOpen sourceVirtual machineMachine learningAdaptive behaviorBiostatisticsEmailBookmark (World Wide Web)Uniform resource locatorPersonal digital assistantSelf-organizationVideo gamePerformance appraisalPlastikkarteFood energyField (computer science)CASE <Informatik>Database transactionOrientation (vector space)Element (mathematics)Attribute grammarNumberDistribution (mathematics)Descriptive statisticsMaxima and minimaAuditory maskingIdentifiabilityMixed realitySensitivity analysisComputer animation
Advanced Boolean Expression LanguageCASE <Informatik>Intermediate value theoremLengthEndliche ModelltheorieSymbol tableSoftware developerOrder (biology)CASE <Informatik>Subject indexingMachine learningUniform resource locatorWordServer (computing)Serial portValidity (statistics)Reverse engineeringSlide ruleVirtual machineComputer architectureAddress spaceFree variables and bound variablesBlock (periodic table)PseudonymizationTranslation (relic)File formatInstance (computer science)Type theoryPattern recognitionDiffuser (automotive)Finite-state machineProjective planeDomain nameAreaProcess (computing)SpacetimeInformation privacyInternet service providerInformationElectronic mailing listData modelFormal grammarWeb applicationPoint (geometry)Self-organizationIdentity managementRepresentational state transferDataflowoutputClient (computing)EmulatorWebsiteFrame problemMultiplication signCartesian coordinate systemMathematical optimizationArtificial neural networkVisualization (computer graphics)Set (mathematics)Declarative programmingExpected valueShared memorySpreadsheetBuildingComputer animation
Advanced Boolean Expression LanguageSymbol tableCASE <Informatik>Endliche ModelltheorieSpacetimeFormal languageLibrary (computing)Translation (relic)NumberOrder (biology)FeedbackPattern recognitionMetropolitan area networkMultiplicationElectric generatorDifferent (Kate Ryan album)Multiplication signService (economics)Field (computer science)Expert systemSimilarity (geometry)Combinational logicFormal grammarDatabaseSet (mathematics)Table (information)Loop (music)Data dictionarySoftware frameworkWritingCausalityComputer animationLecture/ConferenceMeeting/Interview
Computer fileSelf-organizationFlow separationEndliche ModelltheorieInstance (computer science)Subject indexingReverse engineeringVirtual machineProcess (computing)Different (Kate Ryan album)Order (biology)Projective planePresentation of a groupCellular automatonSpreadsheetLecture/Conference
Computer animation
Transcript: English(auto-generated)