We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Mastering Generative AI: Tools and Techniques with VS Code, GitHub, Azure

00:00

Formal Metadata

Title
Mastering Generative AI: Tools and Techniques with VS Code, GitHub, Azure
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
With the rise of Generative AI, developers are now able to create a wide range of applications that can generate content from simple prompts and context. In this presentation, we will explore how you can leverage the power of Visual Studio Code, GitHub, and Azure to develop, test, and deploy generative AI applications. We will discuss the latest tools and techniques for building and training generative models, and demonstrate how to build a sample application using GPT-4o, VS Code and its extensions. Additionally, we will showcase how to use GitHub for version control and collaboration, and how to deploy and manage your applications using Azure. For both beginners and veterans, join us to learn how you can master the power of generative AI to create innovative applications.
Design of experimentsCodeProduct (business)Performance appraisalBuildingUtility softwareElectric generatorFormal languageVisualization (computer graphics)Power (physics)AreaEndliche ModelltheorieCartesian coordinate systemPerformance appraisalImage resolutionProcess (computing)Code2 (number)CASE <Informatik>Product (business)Demo (music)Virtual machineVideoconferencingNeuroinformatikIntegrated development environmentContent (media)Projective planeData managementOpen setAiry functionMobile appLagrange-MethodeMachine codeMachine learningLink (knot theory)Task (computing)Computer animationLecture/Conference
Hydraulic jumpTask (computing)Endliche ModelltheorieSoftware development kitData managementCodeWebsiteoutputPersonal digital assistantFormal languageData modelContext awarenessSeries (mathematics)Asynchronous Transfer ModePressureInterpreter (computing)Function (mathematics)ACIDView (database)Projective planeFormal languageEndliche ModelltheorieElectric generatorCartesian coordinate systemSource codeComputer animation
Information securityComputer engineeringData modelPressurePicture archiving and communication systemRevision controlEndliche ModelltheorieSatelliteFamilyEndliche ModelltheorieOpen setFormal languageOpen sourceProjective planePay televisionMachine visionDemo (music)Library catalogSource codeComputer animation
Length of stayError messageDemo (music)Conservation of energyEndliche ModelltheorieCodeFormal languageComputer animation
Template (C++)Endliche ModelltheorieFormal languageDemo (music)Functional (mathematics)Machine codeDependent and independent variablesLetterpress printingType theoryFile formatSoftware developerCodeMathematicsPhysical systemRule of inferenceOnline chatComplete metric spaceResultantMereologyComputer fileMultiplication signComputer animation
Operations support systemPeg solitaireSicAbstract syntax treeCodeComplete metric spaceChi-squared distributionData modelFormal languageFeedbackSoftware developerEndliche ModelltheorieProcess (computing)WebsiteMedical imagingMathematicsSoftware developerKey (cryptography)Element (mathematics)Formal languageVideoconferencingExpert systemDependent and independent variablesResultantCodeImage processingDemo (music)FeedbackOpen setComputer animation
Function (mathematics)Demo (music)Physical systemComputer engineeringFrequency responseMobile appMedical imagingPhysical systemDemo (music)Dependent and independent variablesFormal languageEndliche ModelltheorieFunctional (mathematics)Software testingComputer file2 (number)Message passingoutputWebsiteParameter (computer programming)Revision controlData conversionVariable (mathematics)Representation (politics)Content (media)Reading (process)CodeSystem callSource codeComputer animation
Client (computing)Content (media)Software developerWebsiteCodeData modelComputer-generated imageryFile formatPhysical systemDependent and independent variablesoutputFormal languageDependent and independent variablesPhysical systemMessage passingFile formatMedical imagingEndliche ModelltheorieData conversionCodePersonal digital assistantOnline helpVideoconferencingoutputWebsiteSoftware developerDemo (music)Computer animation
First-person shooterPhysical systemComputer-generated imageryComputer fontRobotMedical imagingMathematicsPhysical systemDependent and independent variablesMessage passingRevision controlDemo (music)Endliche ModelltheorieSoftware testingWebsiteCodeComputer animation
Physical systemData modelDependent and independent variablesCartesian coordinate systemVideoconferencingEndliche ModelltheorieMessage passingPhysical systemDemo (music)ResultantComputer animation
Demo (music)Sheaf (mathematics)Dependent and independent variables5 (number)Revision controlComputer filePhysical systemMessage passingEndliche ModelltheorieFile formatMedical imagingCodeFunctional (mathematics)Context awarenessMultiplication signWhiteboardPerformance appraisalProcess (computing)
Physical systemSystem programmingComputer-generated imageryDependent and independent variablesFloating pointBasis <Mathematik>Context awarenessReduction of orderCache (computing)ChainData modelMixture modelFormal languageEndliche ModelltheorieAxiom of choiceDatabaseDependent and independent variablesMedical imagingCharacteristic polynomialPrice indexEndliche ModelltheorieStandard deviationCASE <Informatik>Formal languageChainPhysical systemElectric generatorCartesian coordinate systemLogicMixture modelService (economics)Parameter (computer programming)CodeContext awarenessTrailCache (computing)Module (mathematics)Performance appraisalDatabaseDemo (music)Library (computing)Message passingLink (knot theory)Category of beingTerm (mathematics)Online helpOpen setInternet service providerSoftware developerFunctional (mathematics)LengthTunisMathematicsDataflowComputer animation
Design of experimentsLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Hello, everyone. Welcome to the session all about getting more out of your large language models. My name is Leo, and I am a product manager at Microsoft, currently working on data science, machine learning, and AI toolings. Today, I will talk to you about how
we can gain mastery over the use of generative AI by using the powerful computes available on Azure, OpenAI's large language models, Visual Studio Code, and GitHub. By the end of the session, you will have a basic understanding on how you can build your own generative AI applications,
or intelligent apps, as we call it, as well as a few advanced tips and techniques to truly master the use of generative AI. Having some basic understanding of how Python and generative AI works is helpful in understanding today's session. If you do not, do not worry. I will still walk you through some of the basics
that you need in case you're not familiar with LLMs and large language models. OK, now let's get started. Here is the agenda for today's session. I will first discuss the benefits of building a generative
AI application. Next, I will set up a large language model deployment using OpenAI on Azure and connect to it using Python instead of VS Code. I will utilize GitHub Copilot to help me speed up the coding process. After that, we will iterate on our Python code
by looking at a more in-depth scenario. Then we will make some improvements to our LLM application by looking at some techniques using prompt engineering as well as evaluations on your large language model. Finally, we will end the session by discussing some of the advanced techniques and tips
that you can use on working on your own AI applications. So let's first take a look at the advantages that you can expect out of building a generative AI application. First, it can help with innovation and problem solving. LLMs can be a helpful partner in your workflow,
allowing you to gain insight into areas that you're not already familiar with. Second is increasing operational efficiency. It can help you with optimizing tasks, automating processes, and reducing your manual effort. Third is creative content generation. With LLM's ability to create innovative solutions,
it can help you iterate on existing projects or accelerate prototyping on your new projects. Now let's get started with setting up your development environment. I will show you a demo video on how you can do that.
To get started, we need to choose a language model. We'll be using Azure AI Studio, which is a place to build and deploy your generative AI applications. We first need a project on Azure AI. We can create one by clicking on Create New Project and fill in your own subscription and resource group.
Once we have a project set up, inside the project, Azure AI has a built-in model catalog, where it shows all the language models that are available to use. It includes open AI models, Microsoft's own models, and open source models. For this demo, we will be using GPT 4.0, the latest
model from OpenAI with both text and vision capabilities. We can deploy one quickly by clicking on this Deploy button. Once we have the model deployed, we can see it in the Deployments tab. Clicking on the GPT 4.0 model, you should be able to see the details.
Now we have our large language model deployed. The next step is to connect to this deployment using VS Code with Python code. Let's take a look at how we can do this using our code-first approach. Let's go into VS Code to start coding. Throughout this demo, I will be using GitHub CodePilot,
which is an AI coding companion that can help you write code faster and smarter. I am going to first import Azure OpenAI so I can call the language model deployment. While I was typing, great text showed up. They are part of GitHub CodePilot, and I'm hitting Tab to autocomplete this line.
Next, we need to call deployment and ask the language model questions. This time, we will use client.chat.completions.create function. Again, I am relying on CodePilot to do most of the autocompletions. Since this code looks mostly correct,
I am going to accept the suggestion. I do need to make a few changes again, and we will modify the system prompt, which is like the rules for the model written by me, the developer, that the user does not need to see. For now, I am going to give it a simple one, then modify the question, which
is a question that the user types in to ask the language model. I will give it a simple question, what is the capital of the Czech Republic? Finally, we can extract the response in the following format and print out the result. With everything looking good so far,
we can test it by running the file. After waiting for a while, we correctly got the response of prompt. Let's review on what we just saw in the demo videos. We first navigated to Azure AI Studio's website
and deployed a GBT 4.0 model, the latest and most capable model from OpenAI. With the deployment, we wrote some basic code to interact with our language model inside of VS Code using Python. As a result, we are able to send it questions and receive answers from this model.
Now, if you're using OpenAI instead of Azure OpenAI, the process is very similar. You will need a Azure, you will just need an OpenAI account to get access to these models, along with secret keys. Now, you have some basic understanding of how to work with generative AI.
Let's step it up by looking at a more complex scenario. The goal of our scenario here is to build a UI UX expert for developers to use. Some of its capabilities include answering questions about accessibility and UI elements, analyze user stories and see whether the UI or the UX
could effectively fulfill those user stories, and also provide feedback and suggest changes on design and user experience. How this would work is that we will leverage the image processing capabilities that GPT 4.0 offers, allowing us to provide an image as it's the UI element
that we want to analyze. It will come up with a response accordingly. Now, how we can accomplish this, let's take a look at the next demo. Let's take a look at the more complex scenario. In the LLM app file, it contains a few functions that will make the calls to the LLM model.
Read file function simply reads the current intent of a file. Encode image converts an image file into its base64 representation, which is needed for GPT 4.0 model to process. The most important function is the getLLMResponse function. It takes a few parameters,
system, question, image, and history. System is a system prompt that the model will use. The question variable is a question that the user asks. Image is the base64 version of the image input, and history is a chat history with a model, so the user can have a back and forth conversation.
The function will combine all of these parameters together into the correct format and send it off to the language model, and return the response and the history. To test this, we can take a look at a test.py file. We will provide the image, system message, and the user question.
The image here is a screenshot of the VS Code website. The system message contains the goal of the model, which is a scenario that we are working towards for this demo. And the question is a user story that the website can accomplish. How it works is that the language model
will read the question, analyze the image, and respond according to the system prompt. We hope to get a response that is similar to, the image can fulfill this user story, along with some concise explanation. Now I can run the test file, and after a few seconds, we got a response, and it is the kind of response that we're looking for.
To recap what we just saw in the video, we were able to create an AI assistant that can help developers with UI, UX-related issues based on the image that we provide to the language model. We wrote code that can format the user's questions
with its conversation history, send it off to the large language model for it to answer according to its system message. We received a response and the conversation from the language model. And then we tested the model with an image of a VS Code website, and the user story of downloading VS Code.
The model was able to read the user's input, analyze the image, and respond according to the system prompt. While the language model was able to respond fine when it's given its ideal scenario that we just saw earlier,
what would happen if we asked it a question that is unexpected? Let's take a look at the next demo. What if we asked it a question that is not intended for the spot to answer? Let's test it out with a question, how to download IntelliJ with the same image of the VS Code website.
After testing, we got a response with the instructions to download IntelliJ. Although it is correct, it has a few problems. First is that providing instruction is not the goal of the bot. And also the question and response is not relevant to the image we provided. One way to fix this is to modify the system message.
Here I have an improved version of it with the changes being telling the bot to only focus on the image and refuse to answer anything that is not relevant to the image. Now let's test the same question with an improved prompt and the model refused to answer, which is what we wanted.
When asked an unrelated question, our original application responded with a seemingly correct answer, even though that is not the intended goal and nor the answer that we wanted to get. To get around this issue, we use prompt engineering techniques
on the system message to improve the quality of the response that we got from the LLM model. As a result, now the model refuses to answer any question that is irrelevant to the goals of the user. Now with a model responding in the way that we want it,
how do we actually know whether the response that we got from the model is good or bad? Let's take a look at the next demo video to see that. How do we know whether the response we got from the LLM is good enough? One of the ways we can do that is to use another LLM as a judge to evaluate the quality of the response.
The eval.py is set up just to do that. The key function here is eval response. It will judge the quality of the response given the system message, question, answer, and image. What makes this work is the system message of the evaluator model.
We asked it to create a response of another model and provide a one to five score based on the relevance, groundedness, context, and coherence. We also tell it the exact format of the question we will be asking. Another important detail is that we put the temperature of the model to zero,
which means that we will get the same response if we ask it the same question so we can have some reproducibility. To simplify the process, I have stored a version of the answers in these text files. Let's first ask it to judge the response of the VS Code user story with a simple system prompt.
We got fives across the board, so we know the response is good. Now let's try it with the IntelliJ question with the first prompt. We got a few low scores, especially in relevance, groundedness, and context. This is expected because the image has nothing to do with the IntelliJ
and the model was giving responses when we do not want it to. Let's change the prompt to the improved one while keeping the same question. This time, we successfully got fives across the board again, so we know the response of this one is good.
Let's review the evaluation that we just saw in the demo. We use a technique called LLM as a judge to evaluate the quality of another language model. Through this, we were able to understand the characteristics both of a good and a bad response.
Some of the key indicators here are relevance, groundedness, context, and coherence. Relevance is how relevant the response is to the actual question that you're asking. Groundedness is how close the response is at following the system message. Context is how well the image is being used.
And coherence is how understandable to a human the response is. We were able to assign a numeric score to each of the categories so we can measure the response's quality. Keep in mind that these responses scores
also come from another language model, so it can make mistakes. One way to get around this is to write your own evaluation functions, whether that's based on length, quality, or the details that you're looking for. Now with the application at a good place, you can integrate it into the rest of your workflow.
You can either use an API endpoint or use as a library module. Now you have an understanding of how to develop LLM applications. Here are some advanced techniques and tips that you can use. First, you can use a database cache
between the user question and the language model deployment. By storing some of the commonly used questions and responses, you can save cost and decrease latency by never having to query the language model in the first place. Cosmos DB is a recommended database system for working with Azure OpenAI
as they can automatically keep track of your questions and responses as you're querying the OpenAI service. Next is to choose the best responses out of the top K responses from provided by LLM. This is what we commonly refer to as the top K method.
You can specify exactly how many responses OpenAI will provide by using the top K parameter in the OpenAI API. This provides you the greater flexibility on how exactly you want your answers to be. To further help you with your prompt engineering
and evaluations, you can use a specialized tool such as PromptFlow. Some of its capabilities include code tracing, built-in evaluators, versioning support, and built-in safety checks. Chain of thought is another powerful technique
which you can ask the language model to use. It provides an intermediate steps as the language model is responding and it is helpful in identifying where the language model is making a mistake or has a leap in its logic so you can make changes accordingly.
If your model is not performing up to your standards no matter how you change its system prompts, you can try fine-tuning some models. Fine-tuning is not available on all of the models but for some you are able to make the language model specialized for your use case so it can have better performance. Lastly, consider using a mixture of small
and large language models as each model have its their own pros and cons in terms of quality, accuracy, and response quality. Now, let's recap what we learned in this session. We first discussed the benefits of building a generative AI application.
Next, we set up a large language model deployment, GPD 4.0, using OpenAI on Azure. We connected to it using VS Code, using Python code. Then we looked at how to improve the system message as well as evaluations by looking at some of the
prompt engineering techniques and evaluation techniques. Lastly, we looked at some of the advanced techniques and tips that you can use to make better use of your large language models. I hope now you have a better understanding of how to develop an LLM application. If you need the resources or the code
that's used during today's demo, please go to this GitHub link that you see here. If you have any other questions about LLMs or anything Python and Microsoft related, you can always find us at our Microsoft booth.
Thank you all today for coming and now I am open for questions. Thank you so much for your presentation. So if you have any questions, you can move to microphone or you can also ask in a Discord channel.
If there is no other question, I can give you a small gift.