We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Old Web Today: Keeping Flash (and other) Retro Web Sites Accessible on the modern web

00:00

Formal Metadata

Title
Old Web Today: Keeping Flash (and other) Retro Web Sites Accessible on the modern web
Subtitle
Techniques for combining web archives and emulation using FOSS tools.
Title of Series
Number of Parts
287
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
It's been just over a year since all browsers dropped support for Flash, but much Flash content continue to exist both in web archives and on the live web. This presentation will cover various techniques tried by the Webrecorder project to keep Flash content still working in today's browsers, such as running old browsers in containers, running full system emulation in WebAssembly, and using the Ruffle emulator. We will cover the trade-offs between these approaches and the current challenges in combining web archives and emulation, and present the tools we have built to make these approaches possible. As one of the options, will also talk about the OldWeb.today project (https://oldweb.today/) which provides several JS-based emulation connected to web archives, how it works and how it can be used to run not only Flash, but also old Java applets that may exist in web archives. We will leave time for a Q&A session to answer any questions about the tools and how they could be improved!
Hand fanFlash memorySoftware developerWeb browserEmulatorSource codeSoftwareMotion captureSystem programmingRevision controlSoftware maintenancePhysical systemINTEGRALFlash memoryWorld Wide Web ConsortiumEmulatorFocus (optics)Revision controlRow (database)Web browserWebsiteFile archiverBitProjective planeShared memoryOpen sourceWeb-DesignerDiagramComputer animation
Source codeUniform resource locatorEmulatorWeb browserBookmark (World Wide Web)Parameter (computer programming)World Wide Web ConsortiumAxiomWorld Wide Web ConsortiumDemo (music)Web browserMultiplication signGraphical user interfaceInternetworkingDataflowDifferent (Kate Ryan album)Home pageError messageLink (knot theory)File archiverEscape characterXML
Game theorySource codeWeb browserEmulatorUniform resource locatorIntegrated development environmentEmulatorCASE <Informatik>DampingWeb pagePhysical system
Web browserEmulatorSource codeWorld Wide Web ConsortiumUniform resource locatorDataflowComputer networkBit
Source codeWeb browserEmulatorUniform resource locatorWorld Wide Web ConsortiumBookmark (World Wide Web)Directory serviceRevision controlLink (knot theory)QuicksortWebsiteComputer animation
EmulatorWeb browserSource codeWorld Wide Web ConsortiumUniform resource locatorBookmark (World Wide Web)WindowFile archiverStructural loadMereologyWeb browserUniform resource locatorWorld Wide Web ConsortiumComputer animationSource code
Uniform resource locatorWorld Wide Web ConsortiumSource codeDirectory serviceWeb browserEmulatorBookmark (World Wide Web)E-learningSeltsamer AttraktorWebsitePhysicalismQuicksortMathematicsMessage passingJava appletFlash memoryLetterpress printingWeb pageVideo game consoleComputer animation
Web browserEmulatorUniform resource locatorWorld Wide Web ConsortiumSource codeJava appletFlash memoryContext awarenessIntelEmulatorWindowProjective planeFlash memoryQuicksortFile archiverSet (mathematics)BitWorld Wide Web ConsortiumState of matterMultiplication signJava applet2 (number)InternetworkingMathematical optimizationSoftwareWeb browser
Uniform resource locatorFlash memorySource codeEmulatorWeb browserWorld Wide Web ConsortiumMixed realityCloningMereologyJava appletFlash memoryPhysicalismRevision controlWeb browserStructural load
World Wide Web ConsortiumSource codeUniform resource locatorEmulatorWeb browserLinear mapRotationFlash memoryJava appletJava appletFlash memoryWeb browserProjective planeFile archiverCASE <Informatik>Revision controlWebsiteProgram flowchart
World Wide Web ConsortiumSource codeWeb browserEmulatorUniform resource locatorBitMedical imagingMiniDisc
Source codeWorld Wide Web ConsortiumEmulatorWeb browserUniform resource locatorShared memoryRotationLinear mapCore dumpFlash memoryWeb browserStructural loadProjective planeComputer animation
Uniform resource locatorWorld Wide Web ConsortiumWeb browserEmulatorJava appletFlash memoryCache (computing)MiniDiscDifferent (Kate Ryan album)Web pageWeb browserStructural loadMathematical optimizationBitMedical imaging
Web browserEmulatorUniform resource locatorWorld Wide Web ConsortiumGoogolWeb pageComputer iconPlug-in (computing)Java appletSource codeMathematical optimizationFlash memorySlide ruleWeb browserFile formatSinc functionMIDIQuicksortWorld Wide Web ConsortiumMereologyPhysical systemDefault (computer science)Computer animation
InternetworkingProcess (computing)WordFile formatInternet forumWorld Wide Web ConsortiumProxy serverContent (media)Uniform resource locatorTimestampEmulatorIntegrated development environmentWindowWeb browserMiniDiscComputer-generated imageryFlash memoryJava appletRevision controlVideoconferencingComputer networkNetwork socketFrame problemCompilerSocket-SchnittstelleStack (abstract data type)Server (computing)Dependent and independent variablesWeb pageSource codeArchitectureTablet computerArtistic renderingFile archiverWorld Wide Web ConsortiumUniform resource locatorProxy serverNumbering schemeTimestampWeb browserQuicksortRevision controlLibrary (computing)Server (computing)EmulatorForm (programming)Computer architectureVirtual machineMedical imagingWindowDataflowPhysical systemBitMereologyStructural loadDependent and independent variablesOrder (biology)Content (media)State of matterBroadcasting (networking)MiniDiscConnected spaceCovering spaceStreaming mediaWebsiteWrapper (data mining)Integrated development environmentWeb serviceFile formatProcess (computing)Stack (abstract data type)outputView (database)Frame problemHome pageUser interfaceSoftwareClient (computing)Thread (computing)Error messageNetwork socketIP addressRemote procedure callDiagramFlow separationFlash memoryConfiguration spaceData storage devicePoint cloudSocket-SchnittstelleProjective planeComputer animation
Web browserJava appletFlash memoryRevision controlIntegrated development environmentWeb pageEmulatorContent (media)Computer networkImplementationBefehlsprozessorVideoconferencingServer (computing)BlogSoftware repositoryComa BerenicesBuildingProxy serverExtension (kinesiology)InternetworkingDisintegrationMathematical optimizationUniform resource locatorStructural loadPhysical systemBefehlsprozessorWeb pageQuicksortWorld Wide Web ConsortiumEmulatorDifferent (Kate Ryan album)Flash memorySoftware repositoryRow (database)Java appletInterface (computing)Virtual machineCovering spaceFile archiverMatrix (mathematics)BlogUniform resource locatorIntegrated development environmentExtension (kinesiology)SoftwareRemote procedure callWeb browserProduct (business)Link (knot theory)VideoconferencingSoftware developerPiMultiplication signInstance (computer science)Content (media)Projective planeLimit (category theory)Computer fileEllipseComputer animation
Computer animation
Transcript: English(auto-generated)
Hello, I'm Ilya Cramer the lead developer for the web recorder project and I'm going to talk about our old web today system which is a browser based emulation system and Yeah, I'll talk about how to keep retro websites available on their modern web
And that includes flash and some other things as well so first So a little bit about about the web recorder project first So what is web recorder? Well, our focus is on building free and open source web archiving tools
And making it as easy as possible to capture and and replay high fidelity Web archives so archiving websites as accurately as possible and playing them back as accurately as possible We also have some other goals integration with existing archival systems
support for decentralized web archiving And including exploring the intersection of Web archiving and software emulation and that's where the old web today system fits in and our goal is really to make it is as easy as possible for anyone to create use and share web archives and our motto is
web archiving for all And so a little bit about old web today. So the idea is to be able to browse all websites with old browsers The original version was launched in 2015 with emulators running in Docker containers That version is still available at the classic that'll live out today, but requires infrastructure and means
And actually uses an obsolete version of Docker latest version of old up today, which is on initial today is Was launched in early 2021 And it is entirely client-side emulation all emulators run in in the native browser and the user's browser itself
and hopefully this will improve as web assembly and JavaScript emulation technology improves and Yeah, so that's a screenshot of Of old web today and now I'm going to Going to do a quick demo of how it actually works and
Yeah, so here we have All up today loaded in my Chrome browser and inside you can see there's an escape 3 running And it's connected to it's loading flow the geocities from 1996 using data from the Internet Archive and
Yeah, and so this is actually interactive so I can click around Here so I can click on another link and hopefully that will load I Can go back I can even change the date here and
Once I do that I can actually reload I can reload the home page and it will load a different version of oh Looks like there's a JavaScript error here. I can also try switching to a different browser Let's try it and it's keep four and so every time I change the browser it reloads the
emulation environment from scratch in this case it is running Netscape for in In Mac OS I believe system 7 and Some reason that did not work
Yeah, occasionally yeah, this is a as you can see occasionally that there are some hiccups and But Luckily we can actually reload the page for some reason it has been a
Little bit Temperamental lately, so I'll try something else So I'll try I think maybe Netscape 4 might not be working as well right now For us, but that's okay So I'm gonna load so this is a classic
Retro site this is Pepsi.com You can see kind of from the early 90s. There's all sorts of things here. It's loading in this version of Netscape 3 I can click on so yeah, let's see if I can what happens if I click on this link here Yeah, and then it loads this page and again, this is all loading from
From into an archive so it's also there's a lot of moving parts here and Yes, I'm not sure this will get us anything else And kind of what's new Yeah, and I can browse the the retro web using retro browsers
some other things that we can do is So we can try so I have a couple URLs that I could try so here's a Here is Now we're gonna reload this again and
So this this particular page is actually actually has a Java applet and This is one of the few sites that are still out there that in fact
Yes They'll have Java and we're actually archived and many of them are sort of academic websites where They're sort of math and physics related Java applets. Yeah, and so this is actually Running a Java applet here
Let's see if we can Look at the Java console believe we can Let's see if it has any information. No, maybe not Well, I guess it didn't print any messages but yes, that's Java is running
on this particular site and Yeah, so That's one example what we could do with this Another example is Could also probably more more use now is probably flash
And so I'm gonna load a So this is a flash Project and that now we're using this time. We're using Windows and IE 6. So this is a different emulator
This is using the v86 emulator And loading the state so it loaded Restore the state of the emulator as soon as the browser was clicked so it's able to To load much more quickly this way There's still a little bit of a delay in initialization and I think the network
Set up might Might need some optimization as well It's it's also currently loading from the live web On the left here, so you could actually For this example, it's not actually using Internet Archive it's actually connecting to
A website that's That is still out there that has this particular flash project and Hopefully it will load Fairly soon
And you can also look at sort of this is IE 6 so This should have flash 9 as well as Java 5.0 installed and That should be sufficient for this particular piece, which I believe is
From there. Yeah, I think it has from early from 2001 it looks like so That should be sufficient but It takes a little bit of initial loading time here, so we'll just give it a few more few more seconds
To fully load and if not, we'll just move on and Yeah, as you can see, this is also running Windows 98. So this is a full Windows 98 emulation
and Let's see here, so okay, maybe we'll just move on to something else. So this one doesn't want to load right now
So we'll try So we also have We also have ruffle as a As an emulator for flash and so ruffle
Does work pretty well and much faster since it's not emulating an entire system. It's just Emulating Just flash and so this loads much quickly, but you don't get the retro browser you just get the The Flash running in ruffle and I think this version some things don't quite work yet
So that there are some trade-offs with with using ruffle but ruffle is also being improved constantly, so so that's What will improve and Let's try one other thing here. So here's a
Let's try this physics Flash piece this is again another example of kind of old academic websites that
that have Some of these old Java and flash Applets that are still They're still out there or were archived so in this case We are loading from an archive version from 2007
and this is a Kind of one of these yeah older flash projects They can also try Also tried with one of them one of the Linux browsers here So let's try Let's try opera 12
and The Linux browsers are loading in The disk image is a little bit larger so they take a bit longer to download and They were built with
With Yes, it's basically a Tiny core Linux distro that was set up with a browser as well as flash installed and so we're loading up right here and
Yeah, so we can kind of Hopefully it'll load load the same as before and maybe we'll try the other Project that did not work in IE6 as well, but maybe we could try it in
Let's try it in Firefox instead Just to see how that goes and so yeah Sometimes things don't load in one browser. You could always try it in a different browser and Yeah, or just reload the page
And also the goal is to kind of improve the caching so this Loads a bit faster and kind of optimize disk images Various optimizations that are obviously possible Yeah, and so Hopefully this will load here
And We'll see how this goes The old browsers also are able to play MIDI files, I don't have an example with me here right now, but
Okay So that that's one other benefit of if you do have MIDI files from That Were archived or not they can and other file formats they can be of course played good back
And so I can click on this here. And so this is actually running in flash flash 32 And so hopefully the flash should be you know, this is sort of the the latest release of flash From Adobe is what's installed here
and so it should Ideally work as expected But there's a trade-off in performance. Of course, since it's running inside of an emulator And so I think with that I will yeah, I can kind of talk about more how
How the system actually works and So, let me Go back to our slides here and Maybe I'll switch to
just Go back to our default here and Yeah, so old love today and Web archives so That's kind of another key part that I want to cover. So what is web archiving? Maybe we'll start from there
It's the process of collecting portions of the World Wide Web and Preserving the collections in archival format and then serving the archives for access and use that's sort of the official definition from the IPC the International Internet Preservation Consortium
and The most well known web archive out there of course is the Internet Archive Wayback Machine where I was Loading a lot of the content that I just demoed But there are also many other web archives that I want to mention such as the UK web archive the Portuguese web archive crew that BT and the Icelandic web archive and these are at national libraries at these institutions
and there's also Many other institutions that run their own And how are these web archives accessed? So there's actually sort of a common URL scheme for loading from Web archives that I want to very quickly cover and there's a
If you just go to the wayback machine it'll you'll see kind of a custom banner and custom URL rewriting There's a way to get the original unmodified content Using this kind of special form of timestamp followed by the ID modifier followed by URL and so that allows old web today to
just plug in these values and And load web archives directly in this way and Yeah a little bit more about the emulators used in old web today
So we have the The V86 emulator that's written in JS and WebAssembly and that's used for the Windows and Linux environments Then we have Basilisk 2 which is a 68 K Mac emulator That was then ported to JavaScript and WebAssembly by James Friend
And I modified it further this emulator is used for the Mac OS environments, then we also have ruffle which is used for the Flash only emulation and that's written in Rust and running in WebAssembly And yes, so here are the supported environments again
This Is kind of what we have so far the disk images were created kind of in each emulator so I used v86 to in the Via the web interface to prepare the disk image and then export it and also save the state so that can be restored for
the emulator for faster loading that was all possible through the web interface of v86 for Basilisk 2 I just used the desktop version and created disk images that way for each of the Mac browsers and Yeah, you can see on the left all of the Mac OS
Environments use Basilisk all the Windows and Linux environments use v86 and then ruffle is sort of its own setup Yeah, and so I wanted to also cover WebAssembly is great and WebAssembly and JavaScript emulators are great for
the the video and audio So this that's very well supported But then the networking part it turns out is not so much there yet and Essentially we need a
JavaScript WebAssembly based TCP IP stack to support networking And that is not currently provided in the emulators in order to be able to Terminate the the TCP connection on the other end And so web archives need to be accessed by HTTP
Through TCP P v86 supports sending Ethernet frames over a web socket and Basilisk 2 port of J's In JavaScript actually doesn't have any Networking support although Basilisk 2 itself does And it's possible to get the Ethernet frames out. I wanted to avoid relying on external infrastructure as much as possible
So not having a server that accepts the web socket connections So we need to get from the Ethernet packet to the HTTP web archive somehow Luckily there is existing work that was able to build off of and that's the
The EAS proxy created by the folks at the University of Freiburg who work on the emulation as a service Project and they've kind of done a lot of the hard work of actually compiling a TCP stack in in WebAssembly
And providing wrappers for for sockets That Work with native JS streams. So it's a full TCP stack implemented in in WebAssembly And it can terminate the HTTP connection from the emulator in the user's browser
So that the client in the browser Connects to a server that's running in in WebAssembly and a little bit more of how that works Show in a second One other thing I wanted to mention. Oh, yeah, so Kind of the
The networking setup in a nutshell is sort of these two web workers the emulator worker sends Ethernet frames to the Pico TCP web worker over broadcast channel, which processes them as a TCP packet and then passes them to the HTTP server Which then proxies to the live web and then sends the response back. So that's kind of a very approximate explanation
One other thing I wanted to mention is how the changing of the date works and that works essentially by The home page and the browser is set to this IP address which
Basically then redirects to the actual URL that's set in the In the emulator configuration and the timestamp is also tracked in the in the HTTP server proxy And when the user reloads it sort of uses the the latest version of a timestamp and the URL
okay, so here is a Diagram that Sort of covers how the system works so the emulated browser on the left loads, let's say geocities.com Then goes to the HTTP proxy on the 10.0.2.2 IP
Which then sends a packet over the broadcast channel To the web worker That's running the the Pico TCP IP stack which then sends
Which then Goes from the web worker or within the web worker it gets processed as a TCP IP packet and Goes to the HTTP server which receives the packet Then reads the HTTP request from that it gets the URL
and sends the Request to the wayback machine or the live website and then Say the same thing all the way back. It receives a response Sends the TCP packet through the Pico TCP stack which then becomes an Ethernet frame that is then sent to the
Emulator in the other web worker Which receives the packet and sends it back to the browser, so there's a lot of moving parts here And then we saw that occasionally things don't quite work. There's a network error There's a lot of a lot of places where things can go wrong Let's just I put it that way so there's
definitely room for improvement in this system to make it more robust and kind of the overall flow of this is This Kind of this zoomed out view of the architecture There's the so what happens in the browser is we have the inputs for the URL the date and the toggle of
whether we're loading the live site or web archive and then that gets connected to the to the Emulator running in the web worker which also Separately controls the canvas, so there's a separate connection there back to the main browser thread that does the rendering
it also communicates with the TCP IP stack and a separate web worker The emulator also pulls disk images from remote storage And that's the emulator disk images and safe states and then one other part is that we need a course proxy because
The web web archives can't be accessed in the browser from other sites and Yeah, so essentially there's kind of a separate proxy that's required to load external websites and the web archives And that goes through cloud flare currently
And that's kind of a very brief overview of the architecture And so yeah, so what about flash and and Java as well? So as we saw it is possible to run both flash and Java Because we have these browsers that have flash and Java and shockwave
even installed Many flash products do work. We can support the latest flash to 32 But there are still some some limitations. So just that we only have older browsers. It's just Firefox 10 and opera 12
And That may not be sufficient for Projects that were built say within the last couple years And many flash and Java projects were just not fully archived. It's very common to kind of find web pages that have flash and Java embedded but Some of the Swift's and jar files are just missing because they weren't crawled
Initially, and then there are sort of performance So Limitations as you saw that obviously it can be slow loading flash and Java occasionally So it's not as fast as as ruffle and that kind of brings us to The ruffle emulator so that is supported in old web today
Webpage loads directly in the native browser and The emulator is only used for the flash Obviously, it's much faster since there's no full OS emulation and no need for custom networking. However ruffle is also still in development and there's not yet full support for flash and there may be subtle differences from
Sort of the official Adobe flash that's running in the full OS emulation And of course, you don't get the retro browser, but you do get performance if you're looking for just flash specifically and
Yeah, so I just wanted to kind of cover in this matrix some of the trade-offs And different ways of supporting flash in web archive. So there's ruffle as I just mentioned which Doesn't yet support all flash But can run pretty well full fidelity speed video and audio
And it doesn't require any Service side infrastructure. There's no CPU cost per user Then there's the old up-to-date JavaScript emulation system It can support all flash up to flash 32, but as we saw there's not quite the full performance yet
and there's also no cost per per user as it's all client-side and then finally there's the remote browser approach which is the oldest approach that We've tried I didn't cover it this time is there isn't quite enough time for it And it can support all flash and it can run at full fidelity
Since the content is on this since it's running on the server, but there's also a CPU cost per user as it's as You have to run a new instance of the of the container for each user And although today covers the first two
Ruffle and the older today JavaScript emulation. We have a separate system the pie to be remote browsers, which I'll link to That covers the other of this other approach And yeah, I just want to end with a few links there's a blog post about a web today and a link to a few of the
repos Including the the main ellipse day repo and as well as the forks of some of the emulators that I've used And then there's other web recorder tools that also work with flash such as the Docker container based system and we have an extension
That injects ruffle and allows for kind of archiving with with flash Yeah, and yeah some future work obviously optimization adding new environments making networking more robust Improving rewriting of HTTPS URLs for old browsers that isn't quite always working
Supporting additional archives besides an archive that also support the wayback machine URL interface and support for Perhaps uploading archives directly and integrating with other web recorder tools Yeah, so thank you for listening and yeah, I'd be happy to answer any questions in the time remaining