We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Multiplatform binary packaging and distribution of your client apps

00:00

Formal Metadata

Title
Multiplatform binary packaging and distribution of your client apps
Title of Series
Part Number
92
Number of Parts
119
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production PlaceBerlin

Content Metadata

Subject Area
Genre
Abstract
juliass - Multiplatform binary packaging and distribution of your client apps Distributing your python app to clients it’s a common task that can become hard when “stand alone” and “obfuscated code” come as requirements. Common answers in forums are on the lines of “Python is not the language you’re looking for” or “What are you trying to hide?” but another answer is possible.
Keywords
80
Thumbnail
25:14
107
Thumbnail
24:35
CodeDistribution (mathematics)Client (computing)Form (programming)Computer animationLecture/Conference
Software testingSoftwarePrototypeClient (computing)Cartesian coordinate systemOntologyProgram flowchart
Solid geometryParameter (computer programming)PlastikkarteLecture/Conference
CodeCompilerComputer file
Moment (mathematics)Lecture/ConferenceComputer animation
Metropolitan area networkComputer iconHecke operatorProcess (computing)Arithmetic meanEvent horizonFigurate numberSheaf (mathematics)Cartesian coordinate systemPredictabilityMultiplication signProcess (computing)Open sourceCodeReverse engineeringWindowLecture/Conference
Metropolitan area networkHecke operatorProcess (computing)CodeComputer iconDrop (liquid)Line (geometry)Lie groupLecture/ConferenceUML
Bit error rateCartesian coordinate systemWindowLecture/Conference
Metropolitan area networkTotal S.A.CodeSpeech synthesisComputer animationLecture/Conference
Density of statesMetropolitan area networkGamma functionCodePhysical systemWindowProcess (computing)Arithmetic meanFood energyStandard deviationInstallation artWordStatement (computer science)SoftwareComputer animation
Computer wormExtension (kinesiology)Metropolitan area networkMusical ensembleTotal S.A.ArmPersonal area networkExt functorModulo (jargon)Uniformer RaumWordMultiplication signCurvatureData dictionaryCASE <Informatik>FlagSource codeNeuroinformatikWebsiteError messageStreaming mediaAreaTrajectoryCycle (graph theory)ForestComputer fileDirectory serviceExtension (kinesiology)Lecture/ConferenceXML
Physical systemCopyright infringementProduct (business)Endliche ModelltheorieComputing platformTracing (software)Extension (kinesiology)Symbol tableCurvatureDefault (computer science)Cartesian coordinate systemFlagLecture/Conference
Metropolitan area networkExt functorExtension (kinesiology)Value-added networkPersonal area networkLucas sequenceComputer fileRevision controlModul <Datentyp>Scripting languageExtension (kinesiology)Function (mathematics)Configuration spaceRight angleRevision controlXML
Absolute valueComputer fileRevision controlScripting languageAsynchronous Transfer ModeModul <Datentyp>Ext functorClient (computing)Gastropod shellCartesian coordinate systemExtension (kinesiology)Right angle3 (number)Module (mathematics)CompilerEstimatorInferenceComputer filePersonal identification numberLibrary (computing)Installation artReal numberComputer animation
Error messageBitPersonal identification numberInstallation artLatent heatComputer fileContent (media)Lecture/Conference
Independence (probability theory)Client (computing)Binary fileInformation systemsMetropolitan area networkKnotModule (mathematics)Virtual machineComputer programmingGraphical user interfaceContent (media)Right angleComputer fileLine (geometry)Information securityCartesian coordinate systemDirectory serviceCASE <Informatik>Different (Kate Ryan album)Binary fileClient (computing)Personal identification numberInstallation artProjective planeCryptographyMultiplication signMedical imagingHacker (term)DiagramProgram flowchart
Computer programmingHacker (term)Arithmetic meanReplication (computing)DemosceneReverse engineeringCartesian coordinate systemOnline helpComputer fileLecture/Conference
Port scannerReverse engineeringMathematical singularityExtension (kinesiology)String (computer science)WebsiteExtension (kinesiology)Default (computer science)Type theoryCodeJSONXMLUML
CodeBenchmarkProjective planeSource codeLecture/Conference
Process (computing)Multiplication signProjective planeProduct (business)NumberComputer fileClient (computing)MiniDiscBenchmarkLecture/Conference
Information systemsMetropolitan area networkProjective planeLibrary (computing)2 (number)Multiplication signDiagram
Metropolitan area networkPerspective (visual)2 (number)Process (computing)Game theoryLecture/ConferenceDiagram
InternetworkingMultiplication signCodeLecture/Conference
Internet service providerStatisticsInternetworkingSeries (mathematics)Sound effectBlogMultiplication signCodeXMLComputer animation
Metropolitan area networkCodeSoftware development kitCartesian coordinate systemTracing (software)Kernel (computing)Extension (kinesiology)WebsiteLibrary (computing)Computer configurationProcess (computing)Software testingDefault (computer science)Message passingVotingMultiplication signCompilerCuboidLevel (video gaming)Physical systemMereologyCurvatureEndliche ModelltheorieDifferent (Kate Ryan album)QuicksortPoint (geometry)Binomial heapMembrane keyboardView (database)GodFunctional (mathematics)Object (grammar)Suite (music)Software bugBootingComputer fileHacker (term)CodeModule (mathematics)Lecture/Conference
Googol
Transcript: English(auto-generated)
Hello, is it working? Hello everyone, thank you for attending my talk.
As he said, my name is Julia, you can contact me on those social accounts. And I work at this little startup called Picoat in Spain. So what we're talking about this morning is I'm telling you the story on why I decided to package my client application as a binary,
how I did it and the implications I found on the way. So let's start in the beginning. It's April 2013, we just have finished our prototype of our application, entirely written in Java. We have this new smart engineer on board
who's also very brave because first thing he does as he joins us is try to convince the CTO to move into Python. And he provided very rock solid arguments on why we should do that.
But the CTO wasn't entirely convinced because Python, even if you only distribute the Python files well it's very easy to decompile and the investors wouldn't be happy with that
and blah, blah, blah. But I was hating Maven at the moment, slow as hell. And I also wanted to move into Python. So I said well just let me do some research on the thing. And I really thought it would only take me a couple of minutes at Google
to find a solution for a problem. And I went and typed obfuscate Python. And you can laugh at me if you want to. Because the answers I found weren't at all answers. They were on this kind of,
well Python is not the tool you need. It wasn't designed that way. It's against its philosophy plus everything that's ever been written in Python ever. It's open source and if you wanna do it anyway it's really hard. And even if you, even real compiles applications
can be reverse engineered. And while they hack Windows all the time so they will hack your application to quit your job. If your company is trying to do such unethical stuff you should quit your company right now.
Code protection is overrated. Just writing a legal requirement should be enough. Well for me that's just a bunch of excuses and lies.
I mean Dropbox originally, I don't know if you still do that, but it was written in Python and was obfuscated. And yeah they hacked it and they hacked Windows, yes. But I wish, I wish our application has such many people trying to hack it as Windows or Dropbox have.
So you are telling me I'm trying to do something that's not possible, that I cannot do whatever I want with my own code because people don't do that and because it's hard, really.
In my previous company they have compiled PHP into C so that's not going to stop me. I now wanna do it, I took it personally, I just wanna do it as an intellectual exercise. I wanna discover if I'm capable of doing this.
So the statements I wrote before weren't everything I found. There was a guy suggesting that maybe you could try to use Cython to compile your Python code into C code
and then go on. So that's where I started. So this is a process I came up with. First step is to take your Python code and compile it and convert it to C code with Cython.
Then you compile it with the setup. Then you need to package it and create an actual executable thing and I used PinStaller for that. With PinStaller you get a folder with everything you need, executable and all the external dependencies you may have.
And you can take that folder and pass it to any auto installer software for your system. Debian packages, setup for Windows, DMG packages for Mac.
Well this is how everything is done. Converting your Python code into C code is actually really easy. Well I don't know if you can read the code but what we're doing here is walking through our source tree directory and replicate it into a new folder
because you probably don't want your C files to be placed just by your Python files. So on every Python file you call Cythonize method which is cool. You can tell Cython not to force compilation so if a file has not changed
it won't re-convert it again which saves you a lot of time. Well that's all. Now that you have your C files is where things become a little nasty
and hacky and obscure. I haven't found a way to actually tell sysconfig which compiling flags do I have. They seem to be stored in a static dictionary
that's created first time. You call sysconfig.getconfigbar and what happens in there is you don't know which entries of the dictionary are being used really and some of the flags are duplicated along various entries
so this is trial and error mostly. First thing what we're doing here is walk our new source tree of C files and creating an extension for every C file. This thing involved in here,
the pyrex without assertions is to disable assertions because you probably don't want assertions in production. And then for different platforms you have to override the flags you don't want and what happens is that for Unix systems extensions are compiled with debugging symbols in them
that makes your compile application bigger and slower so you probably want to disable them. Then three days ago I discovered that in one of our Mac machines
traces were enabled by default but in the other one they weren't. I just discovered it so I had to add this new override here but once you're finished hacking your sysconfig configuration
you just need to call setup with your array of extensions and everything gets compiled. So now you have your Python application compiled as a native extension but you still depend on some external libraries probably.
So you want to pack it all together and as I said before we are using pin installer for this and what we are doing as we have, we had some problems with external dependencies with pin installer is we created a fake main file
which imports the real native extension main file and all the third party stuff because sometimes you need to explicitly import sub-modules. Well this is also a bit of trial and error. So first thing pin installer does is create,
you pass this file to it and it creates a specification file which you can configure a bit so you can tell where your binary contents are so we are telling here first line to include images
and then some external modules like this, MIMO, it contains binary files in it. So I just telling pin installer to copy the whole directory. I had some problems with crypto in some machines
so yeah, the same, I did the same. I told pin installer to copy the whole thing into my project. Well that's all. You get a folder with an executable file, you can copying your client machine
preferably going through a standard way of doing that but that's mostly all. Well have I achieved my goal of security improvement? Well with pin installer you can package your application
in two different ways. You can package it as a single big file or as a folder which contains everything. The problem with the single big file is that it's compressed and every time you execute it,
it needs to uncompress itself into a temporary file which works great for graphical interface applications but it's really slow for common line application as is our case. So it's really easy for hackers to discover it's Python
because if you package your application as a folder, they are seeing all the files in there and they could recognize stuff. But even if you package it all together, you can execute your application within a program that will print you every assembly line.
Sometimes with an extra help like this thing in there so everyone would recognize that that's running Python on the inside. Well can the reverse engineer you with that? Probably they can import your native extensions
and invoke your methods to discover what they are doing but they cannot actually see the code. They even have help because if you didn't tell Cython not to, Cython by default will include the docstrings of your methods.
But well, it's safer than not doing anything. Other implications you may ask, well I'm using C so is this any more efficient than running just the Python code?
So I did a little benchmark but first I need to explain what a project does. What a B-code project does is take a C++ project, C3 source and analyze it,
discovering all the interconnections among the files in the project and the external projects you may be using. So it's a CPU-bound processing. So this benchmark I did on the X axis,
I have the number of files in your C++ project while on the Y axis I have a time to process them, just processing time not reading from disk obviously. So what happens is that for small projects
or medium-sized projects like under 500 files, efficiency gain in time is around 7% which itself is not bad. But for really, really big projects and this last one is SDL library
which has over 2,000 files, efficiency gain was three seconds which from user experience perspective it's a lot and it's a 32% time gain. So I think overall the process wasn't hard,
wasn't difficult and we gained something in the way. So I think there was the reaction and the internet wasn't good enough. So, well, time for-
Did you change your Python code or did you just use it as is with the site? I mean you should ask more questions. If you want I have a series of blog posts.
I have a series of blog posts written with a wider snippets of code. I will put this on the internet later so you can go and read them. So time for more questions.
Hello, thank you for your talk. Do you think it's important to sitenize the entire application or would it be sufficient to sitenize only the kernel,
the stuff that you do differently from others and leave everything else in Python? And on a related note, how do you debug this stuff? Could you raise your hand because I'm hearing you on my back. Ah, okay. Well, as it isn't difficult at all
to sitenize the application, I don't mind sitenizing it all or just the processing part. How do I debug it? I run my tests in Python. I can set the debugging level
even when I'm running this sitenize application so I can see all the traces. So I've never found a problem I cannot solve running the application just with Python.
If you've already done this then I assume you're happy with it, especially with the extra performance. But when you were doing your research, did you consider writing a custom loader and maybe taking the Marshall module and hacking it up so that everything looks different and sort of obfuscating that way? I preferred to use something
that was already there and working because I knew that I would probably would write more bugs than useful stuff if I tried to write my own obfuscator. So as this worked, first time I tried it, I stuck with it.
So I saw that you called the sitenize function manually but probably know that there's an extension for the extension object that automatically does the sitenizing part for you.
So you can actually pass the PYX file to the extension. Is there any reason you did it that way or? No, this is the point I started and I built on so probably the process can be improved a lot. I don't know if calling the extension directly
would allow you this non-compiling again stuff, probably it does. Yeah, it does, yeah. Yeah, so no, there isn't a reason for that. Yeah, because I had an extension written in siphon as well and then I think the way you can pass the compiler options is nicer
when you do it through the extension object. Yeah, but the compiler options I'm hacking here are to the extension library. I'm calling extension, then I'm setting up the options and finally I'm calling set up.
So there is no siten involved anymore at that stage. Yeah, so what you're saying is that extension chooses some compiler flags by default which you don't want and you have to remove them, okay? Yes. Any more questions?
Any more questions? Can I ask a question? What's the implication for testing if you have a binary rather than your? Oh, sorry, yeah, sorry. What's the implication for testing with the binary and are you having to test it all twice? No, because as I said before,
I've never met that there's a bug that happens only with the binary application and not with the Python. We ran some upset, I don't know, final test with the binary, just to be sure, but the big suit of tests is run against the Python code.
Any more? Well, thank you very much, Julia. Thank you very much.