Multiplatform binary packaging and distribution of your client apps
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 92 | |
Number of Parts | 119 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/19990 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Berlin |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201491 / 119
1
2
9
10
11
13
15
17
22
23
24
27
28
41
44
46
49
56
78
79
80
81
84
97
98
99
101
102
104
105
107
109
110
111
112
113
116
118
119
00:00
CodeDistribution (mathematics)Client (computing)Form (programming)Computer animationLecture/Conference
00:22
Software testingSoftwarePrototypeClient (computing)Cartesian coordinate systemOntologyProgram flowchart
00:59
Solid geometryParameter (computer programming)PlastikkarteLecture/Conference
01:24
CodeCompilerComputer file
01:42
Moment (mathematics)Lecture/ConferenceComputer animation
02:07
Metropolitan area networkComputer iconHecke operatorProcess (computing)Arithmetic meanEvent horizonFigurate numberSheaf (mathematics)Cartesian coordinate systemPredictabilityMultiplication signProcess (computing)Open sourceCodeReverse engineeringWindowLecture/Conference
03:05
Metropolitan area networkHecke operatorProcess (computing)CodeComputer iconDrop (liquid)Line (geometry)Lie groupLecture/ConferenceUML
03:21
Bit error rateCartesian coordinate systemWindowLecture/Conference
03:47
Metropolitan area networkTotal S.A.CodeSpeech synthesisComputer animationLecture/Conference
04:12
Density of statesMetropolitan area networkGamma functionCodePhysical systemWindowProcess (computing)Arithmetic meanFood energyStandard deviationInstallation artWordStatement (computer science)SoftwareComputer animation
05:53
Computer wormExtension (kinesiology)Metropolitan area networkMusical ensembleTotal S.A.ArmPersonal area networkExt functorModulo (jargon)Uniformer RaumWordMultiplication signCurvatureData dictionaryCASE <Informatik>FlagSource codeNeuroinformatikWebsiteError messageStreaming mediaAreaTrajectoryCycle (graph theory)ForestComputer fileDirectory serviceExtension (kinesiology)Lecture/ConferenceXML
07:36
Physical systemCopyright infringementProduct (business)Endliche ModelltheorieComputing platformTracing (software)Extension (kinesiology)Symbol tableCurvatureDefault (computer science)Cartesian coordinate systemFlagLecture/Conference
08:23
Metropolitan area networkExt functorExtension (kinesiology)Value-added networkPersonal area networkLucas sequenceComputer fileRevision controlModul <Datentyp>Scripting languageExtension (kinesiology)Function (mathematics)Configuration spaceRight angleRevision controlXML
08:42
Absolute valueComputer fileRevision controlScripting languageAsynchronous Transfer ModeModul <Datentyp>Ext functorClient (computing)Gastropod shellCartesian coordinate systemExtension (kinesiology)Right angle3 (number)Module (mathematics)CompilerEstimatorInferenceComputer filePersonal identification numberLibrary (computing)Installation artReal numberComputer animation
09:34
Error messageBitPersonal identification numberInstallation artLatent heatComputer fileContent (media)Lecture/Conference
09:53
Independence (probability theory)Client (computing)Binary fileInformation systemsMetropolitan area networkKnotModule (mathematics)Virtual machineComputer programmingGraphical user interfaceContent (media)Right angleComputer fileLine (geometry)Information securityCartesian coordinate systemDirectory serviceCASE <Informatik>Different (Kate Ryan album)Binary fileClient (computing)Personal identification numberInstallation artProjective planeCryptographyMultiplication signMedical imagingHacker (term)DiagramProgram flowchart
11:39
Computer programmingHacker (term)Arithmetic meanReplication (computing)DemosceneReverse engineeringCartesian coordinate systemOnline helpComputer fileLecture/Conference
12:18
Port scannerReverse engineeringMathematical singularityExtension (kinesiology)String (computer science)WebsiteExtension (kinesiology)Default (computer science)Type theoryCodeJSONXMLUML
12:50
CodeBenchmarkProjective planeSource codeLecture/Conference
13:20
Process (computing)Multiplication signProjective planeProduct (business)NumberComputer fileClient (computing)MiniDiscBenchmarkLecture/Conference
13:55
Information systemsMetropolitan area networkProjective planeLibrary (computing)2 (number)Multiplication signDiagram
14:24
Metropolitan area networkPerspective (visual)2 (number)Process (computing)Game theoryLecture/ConferenceDiagram
14:42
InternetworkingMultiplication signCodeLecture/Conference
15:12
Internet service providerStatisticsInternetworkingSeries (mathematics)Sound effectBlogMultiplication signCodeXMLComputer animation
15:41
Metropolitan area networkCodeSoftware development kitCartesian coordinate systemTracing (software)Kernel (computing)Extension (kinesiology)WebsiteLibrary (computing)Computer configurationProcess (computing)Software testingDefault (computer science)Message passingVotingMultiplication signCompilerCuboidLevel (video gaming)Physical systemMereologyCurvatureEndliche ModelltheorieDifferent (Kate Ryan album)QuicksortPoint (geometry)Binomial heapMembrane keyboardView (database)GodFunctional (mathematics)Object (grammar)Suite (music)Software bugBootingComputer fileHacker (term)CodeModule (mathematics)Lecture/Conference
20:14
Googol
Transcript: English(auto-generated)
00:15
Hello, is it working? Hello everyone, thank you for attending my talk.
00:22
As he said, my name is Julia, you can contact me on those social accounts. And I work at this little startup called Picoat in Spain. So what we're talking about this morning is I'm telling you the story on why I decided to package my client application as a binary,
00:41
how I did it and the implications I found on the way. So let's start in the beginning. It's April 2013, we just have finished our prototype of our application, entirely written in Java. We have this new smart engineer on board
01:03
who's also very brave because first thing he does as he joins us is try to convince the CTO to move into Python. And he provided very rock solid arguments on why we should do that.
01:22
But the CTO wasn't entirely convinced because Python, even if you only distribute the Python files well it's very easy to decompile and the investors wouldn't be happy with that
01:41
and blah, blah, blah. But I was hating Maven at the moment, slow as hell. And I also wanted to move into Python. So I said well just let me do some research on the thing. And I really thought it would only take me a couple of minutes at Google
02:00
to find a solution for a problem. And I went and typed obfuscate Python. And you can laugh at me if you want to. Because the answers I found weren't at all answers. They were on this kind of,
02:23
well Python is not the tool you need. It wasn't designed that way. It's against its philosophy plus everything that's ever been written in Python ever. It's open source and if you wanna do it anyway it's really hard. And even if you, even real compiles applications
02:43
can be reverse engineered. And while they hack Windows all the time so they will hack your application to quit your job. If your company is trying to do such unethical stuff you should quit your company right now.
03:03
Code protection is overrated. Just writing a legal requirement should be enough. Well for me that's just a bunch of excuses and lies.
03:21
I mean Dropbox originally, I don't know if you still do that, but it was written in Python and was obfuscated. And yeah they hacked it and they hacked Windows, yes. But I wish, I wish our application has such many people trying to hack it as Windows or Dropbox have.
03:45
So you are telling me I'm trying to do something that's not possible, that I cannot do whatever I want with my own code because people don't do that and because it's hard, really.
04:02
In my previous company they have compiled PHP into C so that's not going to stop me. I now wanna do it, I took it personally, I just wanna do it as an intellectual exercise. I wanna discover if I'm capable of doing this.
04:26
So the statements I wrote before weren't everything I found. There was a guy suggesting that maybe you could try to use Cython to compile your Python code into C code
04:41
and then go on. So that's where I started. So this is a process I came up with. First step is to take your Python code and compile it and convert it to C code with Cython.
05:01
Then you compile it with the setup. Then you need to package it and create an actual executable thing and I used PinStaller for that. With PinStaller you get a folder with everything you need, executable and all the external dependencies you may have.
05:25
And you can take that folder and pass it to any auto installer software for your system. Debian packages, setup for Windows, DMG packages for Mac.
05:41
Well this is how everything is done. Converting your Python code into C code is actually really easy. Well I don't know if you can read the code but what we're doing here is walking through our source tree directory and replicate it into a new folder
06:00
because you probably don't want your C files to be placed just by your Python files. So on every Python file you call Cythonize method which is cool. You can tell Cython not to force compilation so if a file has not changed
06:22
it won't re-convert it again which saves you a lot of time. Well that's all. Now that you have your C files is where things become a little nasty
06:40
and hacky and obscure. I haven't found a way to actually tell sysconfig which compiling flags do I have. They seem to be stored in a static dictionary
07:00
that's created first time. You call sysconfig.getconfigbar and what happens in there is you don't know which entries of the dictionary are being used really and some of the flags are duplicated along various entries
07:21
so this is trial and error mostly. First thing what we're doing here is walk our new source tree of C files and creating an extension for every C file. This thing involved in here,
07:42
the pyrex without assertions is to disable assertions because you probably don't want assertions in production. And then for different platforms you have to override the flags you don't want and what happens is that for Unix systems extensions are compiled with debugging symbols in them
08:04
that makes your compile application bigger and slower so you probably want to disable them. Then three days ago I discovered that in one of our Mac machines
08:22
traces were enabled by default but in the other one they weren't. I just discovered it so I had to add this new override here but once you're finished hacking your sysconfig configuration
08:41
you just need to call setup with your array of extensions and everything gets compiled. So now you have your Python application compiled as a native extension but you still depend on some external libraries probably.
09:00
So you want to pack it all together and as I said before we are using pin installer for this and what we are doing as we have, we had some problems with external dependencies with pin installer is we created a fake main file
09:22
which imports the real native extension main file and all the third party stuff because sometimes you need to explicitly import sub-modules. Well this is also a bit of trial and error. So first thing pin installer does is create,
09:44
you pass this file to it and it creates a specification file which you can configure a bit so you can tell where your binary contents are so we are telling here first line to include images
10:00
and then some external modules like this, MIMO, it contains binary files in it. So I just telling pin installer to copy the whole directory. I had some problems with crypto in some machines
10:21
so yeah, the same, I did the same. I told pin installer to copy the whole thing into my project. Well that's all. You get a folder with an executable file, you can copying your client machine
10:41
preferably going through a standard way of doing that but that's mostly all. Well have I achieved my goal of security improvement? Well with pin installer you can package your application
11:05
in two different ways. You can package it as a single big file or as a folder which contains everything. The problem with the single big file is that it's compressed and every time you execute it,
11:21
it needs to uncompress itself into a temporary file which works great for graphical interface applications but it's really slow for common line application as is our case. So it's really easy for hackers to discover it's Python
11:42
because if you package your application as a folder, they are seeing all the files in there and they could recognize stuff. But even if you package it all together, you can execute your application within a program that will print you every assembly line.
12:02
Sometimes with an extra help like this thing in there so everyone would recognize that that's running Python on the inside. Well can the reverse engineer you with that? Probably they can import your native extensions
12:23
and invoke your methods to discover what they are doing but they cannot actually see the code. They even have help because if you didn't tell Cython not to, Cython by default will include the docstrings of your methods.
12:42
But well, it's safer than not doing anything. Other implications you may ask, well I'm using C so is this any more efficient than running just the Python code?
13:01
So I did a little benchmark but first I need to explain what a project does. What a B-code project does is take a C++ project, C3 source and analyze it,
13:22
discovering all the interconnections among the files in the project and the external projects you may be using. So it's a CPU-bound processing. So this benchmark I did on the X axis,
13:44
I have the number of files in your C++ project while on the Y axis I have a time to process them, just processing time not reading from disk obviously. So what happens is that for small projects
14:00
or medium-sized projects like under 500 files, efficiency gain in time is around 7% which itself is not bad. But for really, really big projects and this last one is SDL library
14:20
which has over 2,000 files, efficiency gain was three seconds which from user experience perspective it's a lot and it's a 32% time gain. So I think overall the process wasn't hard,
14:45
wasn't difficult and we gained something in the way. So I think there was the reaction and the internet wasn't good enough. So, well, time for-
15:01
Did you change your Python code or did you just use it as is with the site? I mean you should ask more questions. If you want I have a series of blog posts.
15:23
I have a series of blog posts written with a wider snippets of code. I will put this on the internet later so you can go and read them. So time for more questions.
15:51
Hello, thank you for your talk. Do you think it's important to sitenize the entire application or would it be sufficient to sitenize only the kernel,
16:01
the stuff that you do differently from others and leave everything else in Python? And on a related note, how do you debug this stuff? Could you raise your hand because I'm hearing you on my back. Ah, okay. Well, as it isn't difficult at all
16:23
to sitenize the application, I don't mind sitenizing it all or just the processing part. How do I debug it? I run my tests in Python. I can set the debugging level
16:42
even when I'm running this sitenize application so I can see all the traces. So I've never found a problem I cannot solve running the application just with Python.
17:02
If you've already done this then I assume you're happy with it, especially with the extra performance. But when you were doing your research, did you consider writing a custom loader and maybe taking the Marshall module and hacking it up so that everything looks different and sort of obfuscating that way? I preferred to use something
17:20
that was already there and working because I knew that I would probably would write more bugs than useful stuff if I tried to write my own obfuscator. So as this worked, first time I tried it, I stuck with it.
17:44
So I saw that you called the sitenize function manually but probably know that there's an extension for the extension object that automatically does the sitenizing part for you.
18:01
So you can actually pass the PYX file to the extension. Is there any reason you did it that way or? No, this is the point I started and I built on so probably the process can be improved a lot. I don't know if calling the extension directly
18:21
would allow you this non-compiling again stuff, probably it does. Yeah, it does, yeah. Yeah, so no, there isn't a reason for that. Yeah, because I had an extension written in siphon as well and then I think the way you can pass the compiler options is nicer
18:43
when you do it through the extension object. Yeah, but the compiler options I'm hacking here are to the extension library. I'm calling extension, then I'm setting up the options and finally I'm calling set up.
19:01
So there is no siten involved anymore at that stage. Yeah, so what you're saying is that extension chooses some compiler flags by default which you don't want and you have to remove them, okay? Yes. Any more questions?
19:22
Any more questions? Can I ask a question? What's the implication for testing if you have a binary rather than your? Oh, sorry, yeah, sorry. What's the implication for testing with the binary and are you having to test it all twice? No, because as I said before,
19:42
I've never met that there's a bug that happens only with the binary application and not with the Python. We ran some upset, I don't know, final test with the binary, just to be sure, but the big suit of tests is run against the Python code.
20:03
Any more? Well, thank you very much, Julia. Thank you very much.