Self-explaining APIs - TIB AV-Portal

Self-explaining APIs

00:00

1

Formal Metadata

Title

Self-explaining APIs

Title of Series

EuroPython 2022

Number of Parts

112

Author

Contributors

Et al.

License

CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/60853 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

This talk will present strategies and python tools to create semantically interoperable REST APIs. After the problem statement, various solutions will be presented, including: * contract-first api development with OpenAPI 3, ontologies and controlled vocabularies like the ones published by the European Union (https://op.europa.eu/en/web/eu-vocabularies/authority-tables); * the rdflib and pyld python libraries for processing json-ld and RDF files (https://en.wikipedia.org/wiki/Resource_Description_Framework); * the use of centralized catalogs such as schema.org. Prerequisites: * no prior knowledge of semantics and ontologies; * practical experience with OpenAPI, json schema and data modeling and API design in general.

Speech

Text

Image

00:00

GoogolDigital signal processingNumbering schemeComputer animationLecture/Conference

00:36

Software developerTheoryMessage passingGame controllerDisintegrationFile formatObservational studyContent (media)Statement (computer science)Term (mathematics)Uniform resource locatorAbsolute valueType theoryFormal grammarCodeData structureTime domainLatent heatInformationTurtle graphicsWorld Wide Web ConsortiumZoom lensNamespaceFinitary relationCore dumpComputer fileGraph (mathematics)ParsingObject (grammar)Key (cryptography)Process (computing)Installation artContext awarenessStandard deviationDublin CoreElement (mathematics)World Wide Web ConsortiumTurtle graphicsTerm (mathematics)HypermediaType theoryDomain nameDifferent (Kate Ryan album)Formal languageGraph (mathematics)Object (grammar)Theory of relativityInformationNamespaceService (economics)Observational studyComputer scienceGame controllerMessage passingComputer fileCASE <Informatik>FamilyAuthorizationField (computer science)MereologyPolygon meshCentralizer and normalizerString (computer science)Semantics (computer science)Context awarenessSet (mathematics)File formatLibrary catalogArithmetic meanComplex (psychology)Virtual machineGroup actionProcess (computing)Position operatorData managementForm (programming)Latent heatFocus (optics)Semantic WebRepresentational state transferElectronic mailing listLine (geometry)Library (computing)Physical lawKey (cryptography)Predicate (grammar)Disk read-and-write headCAN busSpacetimeOvalPlastikkarteOntologyGraph (mathematics)Metropolitan area networkNetwork topologyDesign by contractMultiplication signExistenceInheritance (object-oriented programming)SineNumbering schemeRing (mathematics)Division (mathematics)Data recoveryArmSummierbarkeitLink (knot theory)Web 2.0Direction (geometry)WordXMLComputer animation

10:10

Standard deviationTerm (mathematics)Dublin CoreElement (mathematics)DatabaseGraph (mathematics)Communications protocolDigital filterInformationFile formatAutomationEmailPredicate (grammar)World Wide Web ConsortiumReverse engineeringTransformation (genetics)Computer fileHausdorff dimensionTurtle graphicsFrame problemData structureObject (grammar)Installation artContext awarenessCartesian coordinate systemNamespaceFamilyInformationPredicate (grammar)Multiplication signCASE <Informatik>Service (economics)Graph (mathematics)EmailDatabaseTable (information)Revision controlData typeCodeIdentifiabilityObject (grammar)Slide ruleNumbering schemeQuery languageElectronic mailing listCommunications protocolContext awarenessWindows RegistrySet (mathematics)Latent heatCore dumpInterpreter (computing)Field (computer science)Standard deviationInternationalization and localizationFrame problemType theoryFormal languageShape (magazine)Local ringTerm (mathematics)Structural loadProjective planeString (computer science)AutomationWeb 2.0WindowDimensional analysisComplex (psychology)LogicTransport Layer SecurityNetwork topologyEntire functionGoodness of fitInternetworkingCausalityPlastikkarteWeb applicationShared memoryBuildingGroup actionComputer configurationComputer clusterPublic key certificateMathematicsWave packetData recoveryVideo gameGradient descentView (database)Metropolitan area networkPosition operatorBeat (acoustics)PressureSemiconductor memoryArm1 (number)Level (video gaming)Graph coloringComputer animation

19:44

Game theoryFrame problemData structureInformationPredicate (grammar)Object (grammar)Graph (mathematics)Installation artFormal languageElement (mathematics)Zoom lensInternationalization and localizationData typeContext awarenessSoftware developerFile formatString (computer science)Telephone number mappingUsabilityComputing platformTerm (mathematics)Different (Kate Ryan album)Computer wormField (computer science)SubsetStandard deviationInclusion mapSample (statistics)Embedded systemExtension (kinesiology)Form (programming)Computing platformWeb 2.0Latent heatUniform resource locatorField (computer science)MetadataTable (information)FreewareEnumerated typeInformationDifferent (Kate Ryan album)File formatRow (database)Windows RegistryMappingPhysicalismCartesian coordinate systemComplex (psychology)Local ringProjective planeRevision controlWeb-DesignerObject (grammar)Basis <Mathematik>Regulator geneCASE <Informatik>Computer wormContext awarenessSubsetVirtual machineSemantics (computer science)Descriptive statisticsShift operatorGame theoryLevel (video gaming)BuildingCategory of beingTerm (mathematics)Link (knot theory)Graph coloringNumbering schemeNetwork topologyService (economics)Data recoveryBit rateWordInterpreter (computing)Tablet computerCAN busMereologyGoodness of fitMessage passingComa BerenicesOcean currentSystem callGame controllerKey (cryptography)Cheat <Computerspiel>Labour Party (Malta)Metropolitan area networkTheory of relativityHypermediaOptical disc driveLogic gateSpeech synthesisSymbol tableComputer animation

29:17

Open setStandard deviationInformationInclusion mapSample (statistics)Embedded systemExtension (kinesiology)Context awarenessString (computer science)Object (grammar)Field (computer science)SubsetMaxima and minimaText editorFinite element methodMoment (mathematics)Lemma (mathematics)Menu (computing)Multiplication signLandau theoryCodeGame controllerType theoryFile formatService (economics)Source codeImage registrationData typeHypermediaInformation securityRepresentational state transferBridging (networking)Computer fileINTEGRALUniverse (mathematics)Multiplication signInformationService (economics)Library catalogForm (programming)Port scannerSemantics (computer science)CASE <Informatik>Electronic mailing listCategory of beingLevel (video gaming)Software developerContext awarenessOrder (biology)Shape (magazine)Endliche ModelltheorieSocial classGrass (card game)Query languagePoint (geometry)Interface (computing)Stability theoryField (computer science)Self-organizationSoftware testingProcess (computing)Representational state transferUsabilityForcing (mathematics)FreewareNumbering schemeNumberCovering spaceBridging (networking)Source codeArithmetic meanDifferent (Kate Ryan album)File formatSet (mathematics)Row (database)Scaling (geometry)Mixed realityPlastikkarteInformation securitySemantic WebLatent heatWeb-DesignerType theoryGroup actionGame controllerOpen setGraph (mathematics)Uniform resource locatorData storage deviceDatabaseLocal ringText editorWeb serviceHypermediaWeb 2.0AuthorizationTable (information)Complete metric spacePhase transitionWritingPolygon meshComputer wormFrictionObject (grammar)SynchronizationWindows RegistryShared memoryComputer animation

38:51

HypermediaData typeImage registrationInformationInformation securityRepresentational state transferBridging (networking)File formatLatent heatPhysical systemReal numberGraph coloring

39:47

Service (economics)Real numberPhysical systemPairwise comparisonRight angle2 (number)InformationDesign of experimentsProcess (computing)WorkloadHypermediaWindows RegistryDisk read-and-write headMetropolitan area networkTerm (mathematics)Local ringWordMarginal distributionGodPoint (geometry)Sampling (statistics)TunisVertex (graph theory)Single-precision floating-point formatRoundness (object)Scaling (geometry)Lecture/Conference

44:53

XML

Transcript: English(auto-generated)

00:07

Thank you, everybody, I'm very happy to be here because this is my tent, EuroPython, and it's really stunning to be again with all of you, all the new friends together

00:25

after two tough years. So today we will speak about self-explaining API. So I work for the Italian digital transformation department and today I will present you how to design schemas that simplify the API mesh shop

00:46

and interoperability. At first I will explain you the concept of controlled vocabularies and then how to use them for creating interoperable REST API based on contract-first schema design.

01:01

At the end, I show how a central data catalog for semantic interoperability, a lot of words, will support this approach, but don't worry, this is not a talk about semantic web theoretics, and well, for semantic web folks, please forgive me, I will try to make things understandable.

01:27

We want to simplify API mesh shop, but it's not easy because we are a lot of people, we have a lot of agencies, and every agency publishes their own data sets or services

01:42

through API, so the hard part is that all of this should have some common meaning, some common ground, and this is not easy. Let's see how, let's see a simple example. Well, what's semantic? Semantic is the study of meaning, and this is important to be sure

02:06

that the message is understood. Now we can see two different API messages, but those are not very clear, because in the first case we don't know if it is a full name or it is just a first

02:26

name, and if it is a full name, which is the first one, and which is the family name. In the second case, we don't know, okay, we know that 4 million something, but we don't know

02:42

what something is, so if we have to exchange this message with another country that has a different currency, this message can be problematic to integrate or to mesh shop. The solutions are controlled vocabularies. Controlled vocabularies are a computer science

03:01

tool that uses URI to disambiguate terms. It is very simple. So the first part of the URI is the name of the vocabulary. For example, this is BBP, the vocabulary. Then there is a term, in this case it's dog, and then there is a definition. You see the RDFS comment, it's the

03:24

common field name in vocabularies for a definition that is written in human readable language. So vocabularies contain a collection of terms and define concept and relationship in a specific

03:42

domain. For example, in healthcare, in finance, whatever. They are validated by a designated authority that is not necessarily a public authority. For example, your company, your own company, could have a vocabulary for defining the different job titles, for example,

04:00

so that when the hiring managers have to hire people, they can use a very well-specified job positions to do it, and they just not invent job position. And they are formally described, we have languages for that, using the text start from media type or its JSON counterpart,

04:25

JSON-LD, which is a W3C specification. Actually, all those specifications, all those languages, are completely isomorphic, so you can switch from turtle to JSON-LD and

04:42

exactly the same information. Complex vocabularies are called ontologies, but well, this is not the focus of this talk. And codelists are the simplest form of vocabulary, they are the simplest of terms. For example, the job title one I was saying to you.

05:02

Let's see how to create a very simple vocabulary. Here we have a vocabulary made up of four terms described in turtle. At first, I declared the URL name spaces so I can write it in a more concise way. Instead of writing W3ID, W3.org 2000 and so on, I just write RDFS.

05:30

Then I declare, I define the terms using one or more sentences. A sentence is made by a tribal, subject, a predicate, and an object. So I say that a person is a natural person. It

05:48

is described, you see the RDFS comment predicate, means that this is for human. It's human readable, it's not machine readable. The person has a given name and the given name is the

06:04

given name of a person. The same for registered family. As you can see, family is a complex term and different countries, for different countries or for different communities, family can mean something different. But even in the same country, for different agencies, the term family

06:28

could have different meanings. In this case, in this vocabulary, a registered family is a family, is a group of people tied together according to a very specific Italian law. For a service

06:44

produced by another agency, the term family could have a different meaning. In this case, it is not IT registered family. You can see that IT is W3ID.org Italian to CPV.

07:02

This means that worldwide, I can classify a registered family with a unique URI that is valid worldwide. If Italy is going to integrate with another country and we use the term, they can see the meaning of that term and another country or another agency can use a different URI to define

07:23

a family. So three terms for now. Now I define another term that is child-of. Is child-of is child-parent relation and you can see I have another sentence defining child-of that is

07:44

that it applies to person. In this case, I have a very clear definition of what is a person, what is a given name, what is a registered family and what does child-of mean in a very

08:01

small vocabulary. Every term is well-defined in this file. So I can use Python to process this kind of files. The library is RDFlib and vocabularies are interpreted as graph because

08:23

I have entities that are related together. Subject and objects are related by predicates. So I parse those files in a graph and then I can translate this information from the

08:42

turtle format to the JSON-LD format that is completely isomorphical. I have even other ways of serializing this information, for example in XML, but we are not interested in XML. In this case, you can see that I have a context. The context is that the IT string means

09:09

that long URI and then I have a graph that is made of a list of travels. I have the is child-of that has a comment and it has a domain. You can see that ID,

09:23

the domain has an ID. This means that there is another line in the graph that contains ID, IT person. Let's make another example. This is very interesting and very useful.

09:43

I can use and define vocabularies not only for concepts like the person concept, but even to define information, data sets. This allows me to provide a lot of information in data set and this information doesn't need to be linear. They can be graphs.

10:10

So in this case, this is a vocabulary based on this cost and doubling core standards. Those standards provide keywords and predicates to create more and more vocabularies.

10:25

Syntax support internationalization using language tag. For example, I can see the country ITA. This is the subject. The identifier is the ITA string. I have two labels, but I can have

10:43

more, one in Italian and one in French. The same concept can be shortened, expressed in a more concise way. So sentences with the same subject and predicate can be shortened using semicolon and comma. For example, I can just write France as an identifier

11:05

and two preferred labels. And I can even create terms. For example, for the Czech Republic, I can say that it replaces another entry that is in the vocabulary, that is Czechoslovakia. And this is the same for the Czech Republic. And on the contrary, I can say that

11:24

the Czech Republic has been replaced by Czech Republic and the Czech Republic. If you can see all that stuff, you can understand that vocabularies improves quality. Because in my service, if I say, okay, I use a three code,

11:42

three ISO code letter to identify a country, and if I say that I'm using this EU vocabulary, I have not only with those three letters the information of which country it is, but I have the localization of this country name in all the language of the European Union,

12:05

and I have even a lot of more information. For example, whether this country has a euro currency, if this country has been replaced by another, for example. This is very important for registry information. Because if you're a citizen from the Czech

12:27

Republic and you were born before, for example, in 1980, you were not born in Czech Republic. You were born in Czechoslovakia. So you can use this table, this vocabulary, to map back

12:42

all the information about countries. And all you need to store on your data set is the three letter code. So we can see that this is very helpful. Vocabularies are stored in graph databases. So you can use Retroso, you can use Amazon and you can query those databases with all this information using this protocol.

13:06

In this case, I have the vocabulary we have seen before. I make a query, and in this query where I specify a list of predicates that should match, in this case I say the URI should be in the scheme of the EuroVOC country vocabulary.

13:26

EuroVOC and SCOS are resolved using the namespace I introduced before. They should have a concept using the pref label, and they have an identifier. I am interested in the concept localized in English and not in the others. And in this

13:44

case it just extracts a very simple table with the URI, the concept, so Italy, France, and the identifier. So I can populate this information into a graph database and extract very simple views of those complex information.

14:06

So I explained what our vocabularies are, now we will see how can we use them in a very simple way. So I can use vocabularies to describe data. This is a very simple example,

14:24

it's not the country, that's me. That is identified by any mail URI, so you can see this is my mail, it's me worldwide, that's me. This is defined by four sentences. The first one, well, they are actually five, I update my slides, but the first sentence

14:47

says that I am a person according to the Italian vocabulary for person. I have other predicates that says which are my given name according to the Italian vocabulary,

15:01

my family name according to Italian vocabulary. Okay, it seems simple because it's a family name, but our friends from Iceland, for example, they have a patronymic or matronymic. So the concept of family name in Iceland is different from the one we have in other European countries.

15:26

Or for example, this is my given name, but was it the same name I had when I was born? Is it the same name that I had on my birth certificate? Or maybe I changed my name in time.

15:43

So as you can see, when you design services for millions of people, there are a lot of other cases that may happen and that you may have to take into account. So in this case, I am stating that this is my given name that I have now, not the one I had at birth. This

16:04

is my family name and not my patronymic or my matronymic. This is my birthplace, and that's according to the EU vocabulary. And I have an identifier that, okay, I picked my mail, but it can even be different from the one I used in my subject predicate.

16:27

Applications can use all this information back here and all that linked information to automate interoperability checks and other logics. For example, they may check the country where I was born is existing now, or if maybe it's changed, for example,

16:50

so it has been superseded by other countries. So those are all checks that you can do if you use data that is linked through vocabularies. Well, the nice thing about linked data is that

17:03

they have many dimensions. They are graphs. And there are people that spend their lives popularizing this information. But actually, you can project them to lower dimensions so that people that is not aware of all this complexity can use them, because people maybe is just

17:21

interested in a list of country names and the localized names so that when it pops up window on a web application, you can see Italy instead of Italy, for example. So there is a specification that allows you to project data on those dimensions.

17:42

There is a JSON-LD framing to project this kind of data into very simple JSON object that is sparkled that you can use to make queries and produce, for example, CSV. There is a CSV for the web. There is another specification that allows us to interpret CSV information as linked data. So the important thing is to build stuff using specification.

18:08

Let's see JSON-LD framing. This is using the PyLD vocabulary. So this code is quite simple. Loads the European country vocabulary from that URL that is published by the European Commission.

18:26

Then loads into a JSON object, and then it makes a projection. To make a projection is named framing in this specification.

18:41

So it selects all the subjects that has a given type. For example, all the subject that has a data type cost concept. This is technical, but that's okay. It uses and shows all the fields that I am seeing there. So country code, version info,

19:06

and label in English. But those fields do not exist yet. This is the shape I want the JSON object to have, but we will see it in the next slide. Then I have a context. The context

19:21

takes the RDF information on the right, so the ID, the identifier, the version info, the preferred label localized in English, and map it to the specific fields. And the nice thing is that the context object you can see there can be used to convert back

19:48

the simplified JSON object to the original semantic stuff. But let's see, because it's very simple. On the left, I have the vocabulary. On the right, I have the JSON. So in the context,

20:02

I say URL is an ID, so it takes the ID from the subject of the vocabulary and puts it into the URL field. Then it takes this cost pref label predicate, just takes the

20:24

IT localization and puts it in label underscore IT. It takes the version info and puts it into the version info object. Since I'm not specifying anything about the euro currency, currency adoption date, it just keeps this information. In this way, I have a very simple

20:45

projection of those very complex information that can be provided to web developers that have no knowledge of all the complexity of vocabularies we are explaining now, but that can use it,

21:01

for example, to populate web forms or APIs. So the challenge when we work with vocabularies is making this information accessible. But we can build platforms so that we can publish data in different formats and so that people can use them directly to create APIs or for online

21:26

fillable forms. So I have this linked data information with an internal. It's complex, maybe boring, maybe not comprehensible. Okay, but I can create a platform where through framing that I showed before, I produce a JSON API or I can produce CSV so you just pick the fields

21:48

you want to see and you get tabular data. Or I can produce a JSON schema. Imagine I want to produce an API and I want to populate a field. A field should be constrained by

22:05

only the countries that are in this vocabulary. Okay, this is a JSON schema. You can see it. That provides an enumeration of all the fields that are contained in that vocabulary. So people are not supposed to understand how the vocabulary works, but you can write an API that wraps it

22:27

and provides it as a JSON schema so that people that are going to build an API can say, okay, just reference this JSON schema URL and you will get that vocabulary for free.

22:43

Another thing you can embed is into frictionless data. That is a specification that provides metadata for tabular data. So we are mostly working on enabling people to use in a simple way this data that can seem complex that

23:07

seems not completely understandable. So this shift now. Semantic APIs. When we have APIs, we want that APIs can reference concepts and vocabularies to provide complete and machine

23:26

readable description of the exchanged concept. So if I send a payload, I want that a machine can be able to validate it, not only for its syntax, but even for this semantic.

23:42

So how can you build semantic APIs? Semantic APIs should be built using the same vocabularies. When different APIs use the same vocabularies, there is this feature that we have seen before, that is the JSON-LD context, that allows to map JSON properties to vocabulary terms.

24:06

In this case we have two API payloads. The first one is in Italian, the second one is in English. How can I know that they map to the same person, for example? I can write

24:25

for my API a context, that is, this text in red. This context says that the nomenclature fields maps to the Italian vocabulary, w3id.org, Italian given name, that citta dinanza maps to the S citizen concept, and in the concept I say that

24:50

it uses the European country vocabulary. This means that whatever you have in the value, so the

25:00

ITA string, should be appended to the base of the context. Well, if the other APIs makes the same work with the given name, citizenship, and so on, they can be mapped back

25:22

to the same vocabulary. So you can see that I can transform back and see that the user has an IT given name, Mario, and then has an IT as citizenship, that has the full URI of Italy in the European country vocabulary. So the work that should be

25:48

done is to design APIs that should integrate between different, for example, ecosystems. Together, for example, imagine you have to integrate an API that works on the finance sector

26:04

with another API that works in the registry sector, or in another financial sector where there are some regulations that are different. You should gather your payloads and check to understand whether the concept that you are using in your APIs are the same. Because, for

26:27

example, in some cases you may use the concept of a legal person, and in other contexts you can use the concept of physical person. They may not map in different ecosystem, and this means that

26:48

maybe in some cases if you are using or creating a financial application that only works with people, it's okay, but if you are creating a financial application that works

27:01

or that should work both for physical person and legal person with companies, maybe you need to tweak your application before integrating, before meshing up. Otherwise, you may end with inconsistency. So how can this enable

27:31

interoperability in cross-border services? Well, the basic game is that the European Commission defined a basic vocabulary for person to identify a subset of person, so that

27:49

on the left you have a registry name in Italy, with a given name there is a second name,

28:00

a surname, and a country. I can see that some of those fields map to the European vocabulary that is w3.org.ns.person. So I can map some of these fields. Some of the second name that maps to

28:24

alternate name has no mapping, but for this subset I can transform this person record to a person record that is possible to map in all other European countries,

28:44

and the same can be done by other countries. This means that I have a basis to create interoperable service, so that if you move to Finland, for example, or to Ireland, the basic registry information is available all across Europe.

29:08

The problem now is that we have three different specifications. The first specification is the

29:29

Semantic Web Specialist, and it is very complex for web developers, service developers, and so on.

29:41

The other world that is related to web developers, API developers, is the OpenAPI world. The problem is how can I bridge those two worlds which have different requirements? When I design a service that should be available for 60 million people, for example,

30:05

I have to shape for billions of requests. So I cannot convey every time all this semantic information that I need to describe all the specificity of a service. So I cannot convey

30:27

the complete payload, semantic payload. In the other case, if I have to convey this payload to another country to create an interoperable service because I want to attend France University

30:47

while my records are written using Italian schemas, how can the French University web service understand those kind of schema? We try to bridge the gap. So

31:05

we leave agencies the freedom to define their own JSON schema so they can define freely the fields they want in providing their services, but they should do it in a way that

31:25

fields map consistently. The meaning of the fields should be consistent with the Italian ontology, so with Italian vocabularies. So when you say, for example, given name, it's not the patronymic, it's not the matronymic, it's not the name you had at birth,

31:45

but it's the name that you have now and after you change your name because you don't like your old name or because you don't like your surname and you change. This is the name you have currently on the Italian National Registry. And the agency that provides the service

32:06

should provide semantic information in the form of a JSON-LD context. The JSON-LD context is this thing we saw before. So it's an object where every JSON field is mapped back to

32:27

a URI in a vocabulary. So the country should be mappable, the given name should be mappable, the surname should be mappable. If I add this kind of information into the schema,

32:42

I can design the integration before start developing. So it's an exchange of information that does not happen at runtime, not while the agencies or while the mesh shop is ongoing, but when two organizations design the API, they will check the context, they will check whether

33:07

the semantic of those APIs is the same and then they will be able to create a mesh shop that is syntactically coherent and that can be used at the integration phase because you can

33:22

write tests that, for example, if you rely on vocabulary, that download during the test the vocabulary and check whether the information you have provided in your test are coherent, for example, with the job title that your organization decided, with the

33:41

list of countries that your organization intended, or, for example, whether the web developers of your UI used the same localization labels that are provided by the vocabulary. So for this one, we filed a draft RFC that you're welcome to check, and we even stubbed some

34:11

interfaces. For example, we implemented a very simple modified SUGAR editor where, while you design your API, if you stub the URL of the

34:28

person class, for example, it will make a query on that endpoint, on the SPARQL endpoint that stores that information, that vocabulary, and provides you with all the properties that are

34:46

stored into the graph database. In this case, the web developer doesn't need to know about vocabularies. He just needs to know that there is a model class for person, and that it can use that model, it can use the properties in different ways, in different order,

35:06

in different shapes, but he needs to map back each property that he provides in his own schema to the original properties that are in the class. And in some cases, for example,

35:26

if there are vocabularies for those specific properties, he can use and import those properties either in the form of lists or either in the form of open API schemas.

35:41

So all these happen at design time, and catalogs ensure that API design is consistent within a given ecosystem. So in Italy, we are building this national data catalog for semantic interoperability. It's a long name, but that's it. We already have a set of controlled

36:02

vocabularies that you can get from the URI. The sources are on GitHub, and those vocabulary are aligned with the European authority tables. Authority tables are very interesting if you have to plan services that have thousands of producers and consumers that work independently,

36:26

because in this way, you don't need people to sync up between themselves, but they can always rely on those authority tables. And the national catalog for semantic interoperability will allow to find reusable vocabularies anthology, share semantically

36:44

interoperable schemas and public services, and ensure that APIs have a current meaning and can be meshed up together. Then we are working on a lot of semantic specifications for interoperability. We are registering the YAML media type,

37:05

because it has not been formalized yet, and in this work, we are providing security and interoperability consideration. It's very interesting. I suggest you to read it. There is a YAML-LD, that is a W3C ongoing specification that allows to express all this

37:26

information instead of using JSON in YAML. And then there is this specification to bridge JSON schema and open API so that you can formalize better all these concepts.

37:44

Even there you can see interoperability and security consideration. If you are into this stuff, please drop me a line. There is a lot of community work that is ongoing, but with the open API community and the IETF and W3C, and we are working on all

38:03

those tables together. Then there is this very experimental work we are doing, that is to bridge REST APIs and linking data. This work is ongoing on GitHub. Then we are even trying to bridge those kinds of stuff with frictionless data. That is this

38:23

specification that allows to bridge CSV and Excel and data that is exported in not very semantic stuff to a well-comprehensible, well-understandable REST API ecosystem.

38:49

Well, I think that I am done. I finished quite early. If we have some time, we can show you a couple of demos, but at first I think better to have some questions.

39:20

I can show you a couple of specifications. Question first.

39:42

Okay. You mentioned that you have designed a whole system for around 60 million users. I guess that is not the real number of the users which are connecting with your system. What is the real workload? What is the demand in comparison to what you have designed the system

40:02

for? Actually, we designed the system for 60 million users. There are 60 million Italians, right? Yes, and we provide services to all of them. Actually, the kind of designing system is even wider because, for example, 60 million is just

40:22

users, but we have 10 million companies. We have vehicles. The goal of our work is not just to design the single systems, but since every agency, for example, the Ministry of the Interior

40:41

designed the system for the national industry population, the problem is that you have to make all this information interoperable with all the other agencies that, for example, have information for vehicles or for companies or, for example, for the embossing system

41:06

that processes every single invoice that is issued in Italy. When an invoice is processed, you need to ensure that the sender and the recipient are existing

41:24

live in person, for example. Actually, in terms of workload, we are speaking of a system of thousands of APIs that are interconnected.

41:41

The challenge is not really operational because if you just want to face this kind of things operational, you can say, okay, there are some best practices in addressing single services, but the point is that if every agency designs this system in isolation,

42:03

when you have to create services for citizens that need to integrate the API of the population registry with the API for companies and the API for the fiscal information, for example,

42:22

if those kinds of services are not harmonized because every single agency designs these kinds of services, optimizing for their specific workload, then you will have very efficient services, vertical services, but you are not allowing

42:45

an API ecosystem to grow and you are not allowing, for example, local agencies to build services. One of the problems you have, for example, is that in Italy you have, let's say, 400 central agencies that are big,

43:02

but you have 8,000 municipalities and they are very close to the citizen and they have not every municipality has great money for expenditure in creating large services,

43:20

but they may be able to mash up at least basic services to provide customized user experience for their citizens, for example, for some social services. So the challenge is quite different. It's not just workload. It's to optimize the workload

43:45

of the country, not just the workload of a single agency because for a single agency, in general, it's complex. Yes, because it's complex, really complex, but it's doable because you say, okay, this is for a country, I mean.

44:05

That's a huge scale. Yeah. The problem is to create something that can serve not only the verticals, but even the locals. A region should be able to mash up APIs that are provided by the great major ministries or global agencies or national-wide agencies.

44:29

So the challenge is this one. I don't know if, I understand that you have not answered to your question, but I think that maybe you've described the matter.

44:46

So thank you, Roberto, for the talk and let's give him a round of applause. Okay. Thank you.

Recommendations

37:11

Explaining model explainability

30:28

Explaining AI to Managers

46:15

Explaining the Postgres Query Optimizer

1:01:21

Explaining Online US Political Advertising

15:43

Explaining Wrong Queries Using Small Examples

59:28

Self-Publishing

32:07

33:08

QEMU internal APIs

41:27

45:42

Creating Solid APIs