We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback
00:00

Formal Metadata

Title
Shabang
Title of Series
Number of Parts
29
Author
Contributors
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceOttawa, Canada

Content Metadata

Subject Area
Genre
Abstract
Scripting with Postgres Sometimes bash is just the way to go! This talk will cover tips and techniques for effective bash scripting with PostgreSQL. Sometimes bash is just the way to go! This talk will cover tips and techniques for effective bash scripting with PostgreSQL. It will include guidance about: Pros/cons of shell scripts Function library creation and use Executing SQL Set/get PostgreSQL data from/into script variables Keeping PostgreSQL functions in sync with scripts Locking Doing work in parallel Ensuring cleanup This is a source-code heavy talk. Moderate experience with both bash scripting and PostgreSQL is needed to get the most out of it.
AverageTask (computing)System programmingRevision controlCodeMetropolitan area networkEmailScripting languageGoodness of fitLibrary (computing)BitScripting languageSoftware testingDirectory serviceComputer programmingDifferent (Kate Ryan album)Source codeMultiplication signGastropod shellDatabaseFunctional (mathematics)Task (computing)Term (mathematics)Error messageSign (mathematics)Ferry CorstenUtility softwareExistencePortable communications deviceRevision controlVariable (mathematics)Link (knot theory)SynchronizationSet (mathematics)System callMereologyCodeCASE <Informatik>Extreme programmingStatement (computer science)Function (mathematics)Asynchronous Transfer ModeType theoryContext awarenessStandard deviationOcean currentStapeldateiPhysical systemPower (physics)Source codeXML
EmailMetropolitan area networkFunction (mathematics)Computer fileGrand Unified TheoryBlock (periodic table)MathematicsLogarithmScripting languageBinary fileDirectory serviceInformation systemsString (computer science)Ideal (ethics)Density of statesSet (mathematics)InformationData loggerFlow separationStandard errorString (computer science)Revision controlLibrary (computing)Radical (chemistry)Directory serviceTimestampFunctional (mathematics)Slide ruleEquivalence relationCASE <Informatik>Standard deviationDatabaseElectronic mailing listMereologyFunction (mathematics)Telephone number mappingMultiplication signBootstrap aggregatingVariable (mathematics)EmailBlock (periodic table)Source codeRight angleProcess (computing)Scripting languageDifferent (Kate Ryan album)Line (geometry)Performance appraisalDescriptive statisticsUniform resource locatorReading (process)WordReal numberNormal (geometry)Query languageSequelBell and HowellVideo gameParameter (computer programming)QuicksortComputer fileClient (computing)Presentation of a groupSystem callContext awarenessEqualiser (mathematics)BitComputer animation
Metropolitan area networkHand fanKey (cryptography)Grand Unified TheoryIntegrated development environmentSoftware testingGamma functionMaxima and minimaPort scannerKey (cryptography)Front and back endsMereologyError messageCASE <Informatik>Right angleStatement (computer science)Population densityString (computer science)Electronic mailing listTouchscreenRow (database)Computer fileSource codeXML
String (computer science)Set (mathematics)InformationMetropolitan area networkStorage area networkValue-added networkStructural loadState of matterCASE <Informatik>MultiplicationStatement (computer science)Error messageScripting languageDecision theoryFunctional (mathematics)Normal (geometry)Order (biology)Connected spaceComputer fileSlide ruleFlow separationField (computer science)Library (computing)QuicksortString (computer science)Revision controlSequelInformationVirtual machineSet (mathematics)Socket-SchnittstelleTraffic reportingIP addressQuery languageDirectory serviceBuildingDatabase transactionDatabaseSign (mathematics)Escape characterDifferent (Kate Ryan album)Network socketSingle-precision floating-point formatTupleComputer animation
Key (cryptography)Gamma functionHand fanSoftware testingMetropolitan area networkScripting languageHaar measureComputer fileString (computer science)Revision controlOvalGastropod shellThermal expansionMathematical singularityMaxima and minimaStructural loadState of matterFigurate numberForm (programming)BuildingDynamical systemPoint (geometry)Statement (computer science)Revision controlScripting languageBitRow (database)Loop (music)Integrated development environmentCASE <Informatik>LoginFunctional (mathematics)Computer fileGroup actionMultiplication signSingle-precision floating-point formatScalar fieldElectronic mailing listPhysical systemMereologyProcess (computing)Context awarenessParameter (computer programming)Presentation of a groupQuery languageOcean currentDatabaseLine (geometry)Variable (mathematics)Order (biology)Inheritance (object-oriented programming)MultiplicationSign (mathematics)SequelResultantDrop (liquid)String (computer science)Equaliser (mathematics)Error messageNatural numberRight angleOperator (mathematics)XMLProgram flowchartSource codeComputer animation
Gamma functionMetropolitan area networkMaxima and minimaFunction (mathematics)Bit error rateGrand Unified TheoryDedekind cutSoftware bugStructural loadEPROMScripting languageWordCausalitySystem programmingTable (information)Exclusive orFlock (web browser)Asynchronous Transfer ModeRow (database)Variable (mathematics)Functional (mathematics)File formatBitOrder (biology)DatabaseSource codeComputer fileProduct (business)Mixed realityVacuumSynchronizationMatching (graph theory)Revision controlIntegrated development environmentElectronic mailing listFerry CorstenLibrary catalogFilm editingPhysical systemMiniDiscTable (information)Multiplication signConcurrency (computer science)Combinational logicQuery languageFunction (mathematics)Process (computing)Scripting languageElectronic signatureType theorySoftware repositoryStructural loadLatent heatOperator (mathematics)FrequencyBuildingMereologySummierbarkeitSequelFactory (trading post)AreaMedical imagingNumbering schemeEqualiser (mathematics)Line (geometry)Canonical ensembleCausalityFigurate numberStatisticsMathematical analysisStapeldateiData managementWater vaporSource codeComputer animationXML
Server (computing)Flow separationTask (computing)Block (periodic table)Scripting languageMetropolitan area networkDiscrete element methodGamma functionRed HatThetafunktionStructural loadBEEPReal numberHand fanPartial derivativeTrailSet (mathematics)Temporal logicComputer fileTable (information)CountingForceTable (information)WebsiteMultiplication signScripting languageSoftware bugResultantHuman migrationSequelFunctional (mathematics)Element (mathematics)Standard deviation1 (number)Sound effectFlow separationStapeldateiWeightSoftwareTask (computing)Slide ruleGroup actionDatabaseIntegrated development environmentFunction (mathematics)LengthError messageGene clusterQuery languageMultiplicationGraph coloringInstance (computer science)Point (geometry)Right anglePrice indexProcess (computing)Row (database)Computer fileQuicksortSoftware testing2 (number)NumberOrder (biology)Information securityState of matterReplication (computing)Buffer solutionIntelligent NetworkWordStatement (computer science)CountingReverse engineeringQueue (abstract data type)Latent heatGame controllerArray data structureBlock (periodic table)HookingSet (mathematics)BackupXMLComputer animation
Metropolitan area networkTable (information)Hand fanMaxima and minimaInformation systemsForceValue-added networkGamma functionRow (database)Table (information)Source codeComputer animationProgram flowchart
CodePoisson-KlammerIntegrated development environmentSoftware testingRow (database)Goodness of fitDistribution (mathematics)Functional (mathematics)SimulationLibrary (computing)Different (Kate Ryan album)Link (knot theory)Semantics (computer science)Gastropod shellPoint (geometry)Parameter (computer programming)Food energyForm (programming)AreaOffice suiteAverageCompilerMereology
Transcript: English(auto-generated)
came up with a way to ensure that that happened reliably. Then some just general techniques that can be useful when working with a database or other things, locking, doing stuff in parallel, and making sure that cleanup happens. So what are some of the pros?
Bash scripting is pretty easy to learn. It's ubiquitous. It's all over the place if you've got Unix, Linux systems. It really is good for automating tasks. You can reasonably debug it, and there's a lot of tools.
I think a lot of existing programs that other people have written, kind of in the way that people have been using Unix and Linux for years, you kind of rest on the shoulders of everyone that's gone before you. There's so many useful utilities out there,
and you can just kind of combine them all together and not have to rewrite all that stuff by using bash scripting. The cons are that it's not that good if you've got really complex tasks to do. And depending on your definition of slow, it's relatively slow. For a lot of things, it's perfectly adequate, but it's not going to be super fast.
You have to be careful if you want to make your scripts robust. And a lot of stuff that you see that's been scripted tends not to be all that robust. And there are, as I just said, there's some portability issues between different systems, different versions of the shell, different types of shells.
So all of those things, if you need to be portable, you'd have to worry about. So first, when I do scripting, I like to have a common library so I can reuse the stuff and not have to have the same functions written over and over again.
You don't want that for all the obvious reasons. You don't want to end up with the same version of a function and 18 different bash scripts. And then trying to keep them in sync becomes impossible. It's good for version control. It's good for testing. And it's just a place to aggregate.
So in terms of the common function library that I like to use, this is kind of where I start out. And some of these things we should talk about. So set minus u is going to throw an error basically any time a script variable has not been initialized.
You may or may not want this. The other way to do this is within the call to each variable. You can put a question mark at the end of it and then throw an error if that one doesn't exist. I tend to like to make sure that all my variables have been defined. And you can combine that actually with set minus e, which says that the script should exit immediately if there's an error.
But again, depending on your context, in some places I've been, they don't like to see this. In some places, it's considered good practice to exit as soon as you have an error. Set minus x is going to turn on basically debugging.
It gives you extreme verbose output of everything that's going on in the shell. And so you don't want that on all the time, certainly. But by having it in here, you can just kind of flip that to a minus sign and go into a debug mode. But the other thing, that is so verbose that you don't always want that.
So I often will have if statements around various parts of the code that rely on a variable called debug. And so you can flip it in one place and then get some extra output, debug output. This stands here. The reason I'm doing this is you often will need to have the base
name of the program that's running and the base directory of the program that's running. But sometimes when you're testing or developing, it's kind of convenient to take this common library and source it to your open shell. And if that's the case, depending on the shell, you might get strange looking values for base name and base directory.
So this bit here allows you to do some testing if you're going to source this thing into your current shell versus when it's executing within some script that you've written.
So this is pretty straightforward. Base name is just a command line program that'll take the dollar sign zero is the name of the program that's executing. This is just going to give me the base name of it. This read link dash M is going to give me the canonical name.
So path. So if someone's called it as dot slash whatever, you're not going to get dot slash. You're going to actually get an expanded directory name. Oh, and the reason I only set minus e if this is the case is because if I'm running at a command
line shell and you turn on that minus e and then an error occurs, it closes your shell, which is kind of inconvenient. I often will need some kind of an output directory, so I usually will specify that as something under the base directory.
Obviously, this has got to be someplace that you want it to be, and you don't always want to assume that it already exists, so you might want to just create it forcibly. Same thing in these examples. I'm going to have some SQL files, so I'm doing the same thing for a directory called SQL dir. And then here you can see I've set these variables as read only, so I'm effectively treating these like an enum.
And so again, in my example later on, on locking, I'm going to use these as unlock, no wait, and block as an enum. So I'm going to define them here in the common header.
Another thing that you often do is you want to create some kind of a log. You want to create some kind of an output. You want a timestamp on it often because you're going to be running this thing over and over again. You want it to sort correctly when you list a directory. So this is just frequently used as something that you're going to append to the name of the output based on a datetime stamp.
And then this part, what this is going to do is if you want all of your output, everything that would go to standard out or standard error to the terminal, you also want it captured to a log file.
You can basically turn this on by setting that to one, and then this T is going to output both to the terminal and to the log file. So again, some places I've seen where everything just gets sent to a log file, but then when you're running your script on the command line,
it's not very convenient because you're not seeing all that unless you're also tailing the log, which of course you can do, but it's just less convenient.
No. Yeah, you could do it on a function. And in other contexts, I mean not in this slide deck, but in other contexts, I had a function that would do exactly that.
In this case, at least for these examples, each script is going to include this common once, and so that timestamp at the beginning of the run is more or less appropriate, but yeah, you could definitely put it on a function. So then after all that stuff at the top of the common library, that's when all the functions would be defined.
I'm going to go through some of those as we go through the presentation. So now, every time you write a script that uses the common library, the first thing you've got to do is kind of bootstrap the use of the common library. So this is my shebang right here.
Normally it's good practice to put some kind of a description of what this particular script is supposed to do. Because I'm loading this thing, I'm often loading it from the same directory that the script is in. You might have a standard location that you want to use for that, in which case you wouldn't need to do this. But in my scripts, at least in this example, I'm again grabbing what is the full path of the
base directory that I'm in, and I'm going to make the assumption that that common library is in there. And then I'm going to source it. And the difference between sourcing it and running it as a command is that it basically gets included in this script in line.
And gets evaluated. Versus actually starting another process up, running it, and returning. So in this case, I want all of that stuff available for the rest of my script, so I'm going to source it. The other way to do that, of course, is you can just put a dot instead of the word source.
And then if you can't find it, obviously you want to just bail out. Okay, so now this is the first of the Postgres examples. You know, since this is a Postgres talk, the first thing you want to be able to do is execute a query from your bash script, right?
There's a whole bunch of different ways you could probably do this. And in fact, in real common library that I created at this client, there were probably six or seven different versions of this exact sequel, or the equivalent.
But what I like to do is I like to make them flexible, so I have a separate host, port, database name, Postgres user, and delimiter, so that I can point this at essentially anywhere. And then I'm going to have the SQL string is also going to get passed in as an argument.
And then I've got this kind of standard way that I'm now going to call that SQL string. So this is that part of that last part, right? I'm going to go through some explanation on these things. The first thing here is that you notice I'm piping the SQL to pSQL rather than using dash c.
And actually that goes back to a discussion I had on the list with Tom, actually. I think he was the only one who answered my question. But if you're using dash c, that whole string, if it's got multiple statements in
it that are delimited by semicolons, the whole string gets passed back to the back end. And then if there's an error somewhere, it just halts. Whereas if you were trying to do something like this, where you've got, you know, that's like off the screen, isn't it?
It's a little better. Something like this where you're doing, you know, a begin, you're inserting something, a begin, you're inserting a value, you're committing, then you're going to do another row and then commit and so on.
You would expect that each one of those things would happen independently, right? Because you've got them wrapped. But if you're using dash c and there's an error, so in this case I've got a primary key error,
I don't end up with that third commit ever happening. Whereas if you use the piping it in, you'll see you do get both.
And that's because each of those statements is being executed as an independent execution in the back end. It sounds like you'd get around that by using the Python SPDA.
I would guess so. Because that would be the same thing as basically feeding a file to psql.
So now in this particular case, I'm actually setting up specifically on error stop to be on. So it's not going to make a difference really here. And this is one of the things that you need to consider. Do you want your script to stop with an error if one of the statements in a multi-statement string errors out?
Or do you want it to continue? That's really a decision you've got to make. In this particular script, I've said that I want it to stop with an error if there's any errors. This minus QAT is quiet, unaligned, tuples only.
So basically that's going to give me back just the data and not all of the decoration around it that psql normally creates. So that I can do something usually. I want to do something with that data. I don't want to just display it. And then the delimiter allows me to specify specifically what the field separator is going to be. In most cases I would just use a space, but in some cases you might want to use something else.
An example of that is I've used this sort of thing to run a query and build like an in-list for another SQL statement that I'm kind of building and I want to execute.
Yeah, you can do... I don't know if... I think later on I may have an example of it, but it is actually... What you can do is you can say dollar sign, single quote, backslash T, single quote.
And that will specify that that slash T should be considered as an escape. And then of course this is the normal connection info that I'm going to use.
Yeah, thanks for reminding me of that. Actually, as I said, I had like five or six different versions of this thing in the other common library that I did.
One of them uses URIs. It was added in Postgres 9.2 or 9.3. So as of like 9.2 or 9.3, you can actually build the connection string. It looks just like it would for JDBC, essentially. And you can pass that as the connection string instead of having separate host, port, user, and database name.
You just have it as a URI and you can pass that as one. Basically you pass that as if it were the database name. But I don't show that here. I don't have an example of it here, but I have done that.
Okay, so this is an example of now using that function. I'm going to take... So in order to compress it on the slides, I had to kind of make this a little bit ugly, but I've set the host, the port...
And by the way, this is the way... I don't know if you know this, but if you want to use a connection on your local machine using sockets instead of TCP IP, you can basically specify the socket directory as if it were the host. So that doesn't have to be an IP address. You do need the port, however.
And I need that if I'm playing with it on my own machines because I typically have Postgres set up to run on all different versions on different ports and whatnot. So I'm going to set up my connection string and my delimiter. I'm going to run a query that's basically looking for connections that are idle in transaction.
And now I can just simply say exec SQL and get a result back.
When I run that, is that big enough to see in the back? Can you guys see that in the back? So you can see the form that I get the data back in is relatively easy to use.
And I got more examples of how it actually does get used. So now this is another version of this, more or less the same thing, except this time I'm going to take a file name as an argument. So it's pretty much exactly the same.
This is useful if you need persistence for your SQL file. So either you want to have that SQL file defined in advance, so maybe you can keep it in version control. I've used it for things like, originally I actually wanted to add to this presentation another example about a script that would control access to a database.
It will let you specify a list of users who should be allowed in and everyone else will get locked out. And one of the things I found when I was developing that script was that if you try and use something like revoking connect access to the database, you can't, that doesn't work with super users, at least it didn't on older versions of Postgres.
Someone told me at some point that they thought that had been changed. Yeah, well what I ended up doing is basically for all of the login users that I don't want to be able to login, I just alter those users and turn them into groups essentially.
So now they can't login anymore. But because of that, I had to generate a list of current users, SQL statements from the list of current users that I wanted locked out, but I needed to know who those users were in order to unlock them later on. And there was always the possibility that a new user would get created or one of the existing users would get deleted in between
when I locked and when I unlocked. And so I would build basically, using all this dynamic SQL stuff, I would build SQL statements, pipe them out to a file, and then I would execute those SQL statements using this version of the function.
That was also, by the way, how I figured out that problem with the dash C because I had a bunch of statements that were altering users and in this environment part of one of their upgrades ended up dropping some users
so they would lock the system using the script and then part of their upgrade dropped a user somewhere along the line and then they would try to unlock and in that list of things that were getting executed it was trying to unlock a user that no longer existed and it threw an error.
Yeah, I mean this is a bit contrived because it's just an example for this presentation
but as I described, there are cases where I was dynamically building SQL and I wanted it out on a file and I wanted that file date time stamped and all kinds of other stuff, right? So, yeah, you definitely don't need that in this particular case but that's just to demonstrate the technique.
So now this is great. Okay, I can run a query, I can get data back. How do I actually assign that to a variable? Well, if you've got basically a scalar value, a single scalar value, that's pretty straightforward.
Again, this method of calling out basically and executing another process is the dollar sign, paren, and paren. So what that does is that executes this in another context
just as if you'd used backticks. So the kind of traditional way to do that was with backticks. This is the more modern way. The backticks I think are actually supposedly even deprecated although I'm not sure they'll ever really go away. But what this will do is it will run that function with this query
and then since that is just a single result, it just gets assigned right to that variable and then you can echo it out. It works great. But what happens if you've got multiple rows and multiple columns? So if you've got multiple rows and multiple columns,
then you can use... This is something that's called a here string. Anyone ever heard of a here string? Most people have probably heard of here docs. But what this does is it executes that SQL and assigns it to these two variables in the loop.
And so if I go out to... So now you can see in that first example, I just spit back the answer but now I can actually interact with those two variables in rows and columns.
Okay, so now I'm onto the part where I'm going to talk about managing functions that you're trying to use because as you're building these scripts that are calling SQL,
a lot of times you're better off if you can kind of wrap a bunch of the functionality that you're trying to do in the database inside of SQL functions so you're just calling basically select some SQL function in order to do the work. But if you do that, you might end up as you're maintaining both the function and the script
that could get out of sync. So what this function is going to do, this particular function just takes a schema name, a file name, a list of ARG types, and an MD5,
and it just tries to do a lookup in the system catalog to see if this particular function already exists and the MD5 matches which means the source of the function matches and then executes that query.
So you may or may not be familiar with this but the combination of the schema, the function name, and the ARG types is actually the function signature. That's how Postgres figures out which specific function you're trying to execute and then by taking this PG get function def
of the OID of that function feeding it to MD5 you're essentially getting a signature on the actual source code of that function. So then there's another bash function that you need to support this is basically this is a function
that's going to check that signature see if that exact match exists if it already does exist it does nothing and if it doesn't exist it does a create or replace to replace it in lines so that before you use it you can ensure that the correct version of your function already exists.
So the beginning of this thing is taking basically define the function name to include the schema the function name or the file name I should say includes the schema, the function name dash and then the ARG types
and this is going to parse all that out so these three things are just taking the SQL file name and using cut to pull out pieces of it and then it's doing an MD5 sum of the actual SQL file
and so the rest of this is going to then use that with the function we just talked about looking up for a match and so if it doesn't find the match it just executes the file with the file execution function that we talked about earlier
so that's going to do a create or replace so now there's one more piece that makes this easier to use if you're creating a function like say this is a trivial again a contrived example
but if you execute this to create a function and use that get function def it actually kind of formats it in a specific way a little bit nicer but if you're trying to match on an MD5 that's not going to work so once you've created the function in the database kind of manually
this script will find that function and export it to the SQL file in a way that it will exactly match the MD5 that comes from running the MD5 on the get function def output does that make sense?
so now finally once you've got all that in place you know you've got a script that's going to be using this function you specify what's the SQL file that I want to match against and then I just call this cr SQL func and it's going to do that check
so if the function already exists and the MD5 matches it does nothing essentially but if it doesn't match it'll take that file that SQL file on disk do the create or replace and now when I'm using that function I can be sure that the function that exists in Postgres is going to match the one I expect here in my bash script
ok so now some more general stuff that is not specific to Postgres but the examples are using it with Postgres if you want to do a concurrent execution lock you may want that because
you've got something that's going to run for a long time uses a lot of resources you don't want two people manually starting it up at the same time or three or maybe you're running on a cron job and it runs every five minutes and something happens and it could get stacked up so you want to have a way to lock
the use of the script so it can only run once at a time essentially and some examples of that are if you're doing a forced vacuum and this is actually a live example that I had in the environment I was working in
there was kind of a mixed OLTP and data that was coming from legacy systems and getting bulk loaded into the production systems and the people who were doing the data movement didn't want to be bothered with running vacuums or anything like that after they bulk loaded and I'm not sure how many people here are
familiar with this but when you bulk load data it's a good practice to run a vacuum on the table or vacuum analyze on the table one, to update your statistics and two, to also set all the hint bits because otherwise the first time a row gets read after it's been loaded
there's going to be actual write that will go on that you may find surprising so this function is going to make use of a Linux command called fflock I've set this up so that it could either
be blocking or non-blocking and again I've got a in the interest of time I think I'm going to skip trying to run it but I've got a script in that GitHub repo that demonstrates the use of this thing so you can call it to either
immediately exit if it can't get the lock or you can call it to block if it can't get the lock and then the other half of that is you might want to unlock so if you're doing some kind of long operation you might want to lock for some period of time and then unlock explicitly
so this is the function that the example function that's calling it it's going to attempt to grab the lock and this no-wait is telling it basically not to block so if it doesn't get it right away in no-wait it's just going to fall back and actually block
the next thing you might want to do is parallel work and that of course if you're doing something like again a real-life scenario same environment we're working with Sloanie clusters and they wanted to be able to do
kind of a snapshot backup of a Sloanie cluster and then a way to restore to some point in time for testing purposes and so because not all the tables were OLTP only the OLTP tables are essentially participating in the replication with Sloanie the bulk loaded tables were all
being loaded on all three sites independently so we needed to be actually able to load tables at three different sites some of these things were fairly large you want to be able to do them all at the same time you don't want to do one site sit there and block in your script until it's done after an hour you want to get all three of them done in that hour
so the example script that I've got is going to run three queries as separate background tasks bash actually has a wait command so if you launch three sub-processes that are asynchronous and then say wait it'll sit there
and wait for all three of them to finish so it's kind of a nice way to do asynchronous parallelism so this particular script it's basically got a timed five second sleep and without it parallelism it would take 15 seconds but it's going to only take five seconds because it's going to do it in parallel now the first kind of simple
method to do this is just to simply background the execution so you can do that right within your bash script and then that sub-process will get background and it runs asynchronously and then the wait statement will sit there and wait until those three things finish the problem with that is
you can't capture the results of that so if you wanted to capture the results this method doesn't basically hook those processes up to your standard out and so you don't get that result there's no way to gather that into a variable if you wanted to there's another way
to do it using something called coproc but coproc has some bugs and when I was going through these slides I did some research and it looks like the bug in coproc is acknowledged and has been there for at least several years and hasn't been fixed but what I found was that if I wanted to be
able to capture the output of these three executions I needed a fourth one which was kind of a dummy because whatever one was last never got hooked up to to the standard out and so you couldn't get that result so by having the fourth one here it sort of works around that bug in coproc and
lets me read that result back into the buffer it also has an issue in that it outputs these warnings that you're trying to do stuff in parallel which makes no sense because it actually works you can see that this thing executed in five seconds
and I got all three results just like I expected
yeah I mean that's important safety tip I suppose if you're doing stuff on the same database that could be an issue in most of the instances where I've done this sort of thing it's been because I've been trying to do something in parallel across multiple hosts but it certainly would be valid
that you might be trying to do something in parallel in the same host alright so finally I'm going to cover cleaning up you know often in these scripts you're going to create some sort of files temp files
you may not want to leave them laying around either for security reasons or just cleanliness reasons you might have a script that is doing things as you go along and then if there's an error you don't want to leave things in sort of this partially executed state so you want to be able to unroll
some of the effects of what you've done and you can also use it and I have used it in fact we had a set of scripts that were being used for a fairly lengthy migration that was going on and because of the length of the migration and some issues on the network
and the number of times that we were having to do it we were finding that occasionally the thing would bomb out because of a network issue right in the middle of the script and we wanted to be it was something that would take hours and we wanted to be able to just start right back up where we were and so you can use you can actually use a trap to sort of write out to a file
an indication of where you were in that script so that when you go to run it another time you restart it it just goes right to where you want it to start and it picks right back up so to make it easier to do that I actually created this function cleanup
what it's going to do it's going to evaluate the elements of an array and this is so I can and it does it in reverse order so I'm iterating backwards there and that's so that as I'm using this thing I can basically stack undo tasks onto a queue essentially it's like popping them onto the top or pushing them
onto the top and then popping them back off as the trap executes at the end so these examples here I'm going to set up SQL that truncates table T1 and I'm going to create this element which is just the
I'm going to echo what I'm doing and then I'm going to use that execute SQL function to execute that specific SQL I'm just going to set that as array element 0 and then this SQL is going to insert some values into this table T2 so these are basically the things I want to do if this particular
script errors out this is what I want to do to clean up and then finally once I've built that array with these two tasks I just set a trap cleanup and I'm saying I want to run that trap on either a SIGINT or an error
so an error is going to be basically any time the script errors and a SIGINT is going to be if someone got impatient and hit control C or if someone killed it right? So now the rest of this script just for demonstration purposes
is going to insert some values into T1 it's going to truncate T2 it's going to show the count of the records in those tables and then I've put in there the ability to have the thing asleep so that you'd have time to hit control C and see the effect of the of the trap executing
so that one so now you can
see that my trap executed and I inserted back into T2 and truncated T1 and actually if I go to here you can see that my you can't see it
you can see that T1 is back to no rows and T2 is four so my trap successfully reset that table and this
just shows you what I just showed you basically and if you let it complete successfully then the rows don't get unmodified so that's it. Any questions?
Because I didn't know about it
Again the double brackets is considered to be the more modern way to do the if test I think there is a technical difference in the way it gets executed I think the double brackets are actually built into the
bash shell whereas the old single brackets were like a sim link to the test function Is that correct? Everything I've read seems to recommend that you use the double
brackets now when you're doing the test I think what it actually provides is more functionality and there is some more functionality that comes with it So the old one the old style might be built in but it's still compatible with the original semantics
Anything else? I'll take that into consideration
That's a good point and actually in the environment where I was
working they had official coding guidelines which said you should do that and it's a good idea I think but one thing is if you are building stuff into a library and then everyone is just using those functions that's kind of getting abstracted away for them anyways I'm not a very good typist so I'm a lazy
when it comes to that so I tend to use the short arguments
In fact in the scenario where I was doing this we had exactly that and all of that stuff was getting pushed off to Splunk so yeah you can definitely definitely do that
So I think we're actually a couple of minutes over so any other questions please find me I guess between now and when everyone leaves