This is a transcript of What's Up With That Episode 9, a 2023 video discussion between Sharon ([email protected]) and Charlie ([email protected]).
The transcript was automatically generated by speech-to-text software. It may contain minor errors.
Site Isolation is a major part of Chrome‘s security. What exactly is it? How does it fit into navigation? What about security? Today’s special guest telling us all about it is Charlie, who made it happen. He’s also worked all over navigation, making sure it works with all its complexities and remains secure.
Notes:
Links:
0:00 SHARON: Hello, and welcome to “What's Up With That?” the series that demystifies all things Chrome. I‘m your host, Sharon, and today we’re talking about site isolation, what exactly is it? How does it fit into navigation? What about security? Today‘s special guest telling us all about it is Charlie. He helped make site isolation happen. He’s worked on Chrome since before the launch, though as an intern, and since then, he has worked all over navigation including things like the process model, site isolation, and just making sure that changes to that are all secure and that things still work. So welcome, Charlie.
0:30 CHARLIE: Thank you for having me.
0:30 SHARON: OK, let's start off with what is site isolation?
0:36 CHARLIE: So site isolation is a way to use Chrome‘s sandbox to try to protect websites from each other. So it’s a way to improve the browser security model.
0:43 SHARON: OK, we like security. And can you tell us a bit about what a sandbox is?
0:50 CHARLIE: Yeah. So sandbox is a mechanism that tries to keep web pages contained within the renderer process even if something goes wrong. So if they find a bug to exploit, it should still be hard for them to get out and install malware on your computer or do things outside the renderer process.
1:05 SHARON: OK. Last video, we talked all about the different types of processes and what they all do. So why are we particularly concerned about renderer processes in this case?
1:17 CHARLIE: Sure. So renderer processes really have the most attacked surface. So browser‘s job is to go out and get web pages from websites you don’t necessarily trust, pull down code, and run that on your machine. And most of that code is running within this sandbox renderer process. So an attacker may be able to run code in there and try and find bugs to exploit. The renderer process is where most of those bugs are going to be. It's where the attacker has the most options and direct control. So we want that to be locked down as much as possible.
1:55 SHARON: OK. Right. So how exactly does this work? How am I getting attacked?
2:02 CHARLIE: Right. So all software tends to have bugs, and an attacker will try to find ways to exercise those bugs in the code to let them accomplish their goals. So maybe they find that there's some parsing error, and so the code in the web browser does the wrong thing when you give it some input. And for an attacker on the web, that input could be something in HTML or JavaScript that makes the browser do something wrong, and maybe they can use that to their advantage.
2:36 SHARON: So say I do get attacked. What's the worst that can happen? Should I really be concerned about this?
2:42 CHARLIE: Well, that‘s exactly what we think about in the browser security model is, what’s the worst that can happen? How can we make that not be as bad as it could be? So in the old days when browsers were first introduced, it was basically just a program, it's all one process. And it would fetch content from the web, and so if something went wrong, there was no sandbox. There was no other protection. You were just relying on there not being bugs in the browser. But if something did go wrong, that web page could then install malware in your computer and your whole machine would be compromised. And so that might give them access to files on your disk or other things that you have access to on the network like your bank account or so on, which, obviously, is a big deal.
3:28 SHARON: Right. Yeah, it would like to not have other people have that. OK, cool. So can you tell us a bit about how site isolation actually works? What is the mechanism behind it? What is going on?
3:41 CHARLIE: Sure. So when Chrome launched, we were using the sandbox to try and prevent that first type of attack of installing malware in your machine or having access to the file system or to network, but we wanted it to do more to protect websites from each other. And to do that, you have to treat each renderer process like it can only load pages from one website. And if you go to visit a different website, that should be in a different process. And so there's a bunch of aspects of site isolation for, well, OK, as you go from one website to another, we need to use a different process, but the big one that made this such a large change to the browser was making cross-site iframes run in a different process.
4:30 SHARON: What is an iframe?
4:30 CHARLIE: So an iframe is basically a web page embedded inside of another web page. So you can think about this as an ad or a YouTube video. It might be from a different origin from the top level page that you‘re viewing, but it’s another web page embedded inside it. And so that has a different security context that it's running on.
4:54 SHARON: You mentioned it might be from a different origin, and it might be useful to know what the difference between a site and an origin is, especially as it relates to what we call site isolation.
5:00 CHARLIE: Yeah, so we‘re being specific in using the word site isolation instead of origin isolation. A site is a little broader, so it’s a registered domain name plus a scheme, so https://example.com would be an example of a site, but you might have many origins within that as you get into subdomains. So if you had foo.example.com and bar.example.com, those would be different origins within the example.com site. Web security models all about origins. Those foo.example.com and bar.example.com shouldn‘t be able to access each other, but there are some old web APIs that have stuck with us like being able to modify something called document.domain, where two different origins in the same site can sometimes access and modify each other, and we don’t know in advance if they‘re going to do this. So therefore, we have to put everything from a site in the same process because we can’t move things from one process to another later. We hope that someday we can get rid of that. There is some work in progress for that to go away. Maybe we can do origins.
6:10 SHARON: Cool. So the site isolation stuff is all in the browser, so that‘s the browser security model. What’s the difference between that and the web security model? Are these the same?
6:16 CHARLIE: They‘re certainly related to each other, but they’re a little different. So the web security model is conceptually what can web pages do, in general, what are they allow to access for another website or for another origin or for things on your machine, camera, and microphone, and things like that. And the browser security model is more about how we build that and how do we enforce the web security model, but also, provide some extra lines of defense in case things go wrong. So that incorporates things like the sandbox, the multi-process architecture, site isolation. What can we do to make it harder for attackers to accomplish their goals, even if there are bugs.
7:04 SHARON: It seems like good stuff to have. So a couple other, maybe definitions to get through. So what is a security context?
7:10 CHARLIE: Yeah. So that‘s the environment where this code is running. In the web, it’s something like an HTML document or a worker, like a service worker, someplace where code is running from what we would call security principal, which is, for the web, something like an origin. So if you have an HTML document you‘ve gotten from example.com, that’s running in a web page in the browser that has a security context. And an ad from a different origin would be a different security context.
7:49 SHARON: And a security context and security principal always the same, or are there times where those are different?
7:55 CHARLIE: No, you can have two different security contexts, like two different documents that had the same security principal, and they might be able to access each other. Or they might be living in different processes, but still have access to the same cookies or local storage, things on disk. So the principal is, this is the entity that has access to something.
8:16 SHARON: When people think of site isolation, often, they think about navigation as well, partly because that's how our teams are structured, so how exactly do these relate, and where in the life of a navigation - name of a talk, want to go watch - does site isolation stuff happen?
8:34 CHARLIE: Yeah, so they‘re definitely related. So navigation is about how you get from one web page to another, and that might be a different security context, different security principal. And I got interested and involved with navigation because of site isolation, my interest in that. And as you think of the web browser as an operating system for running programs, it’s how you‘re getting from one program to another. So it would make sense that as you go from one website to another, you get a new container for that, a new process. So that was one part of how I got involved with navigation was building what we call a cross-process navigation. So you have to start in one renderer process and then be able to end up in a different renderer process with all the various parts of the life of a navigation, where you go out to the network and ask for the web page. And maybe you have to run some - before, unload events first to see if you were actually allowed to leave, or maybe the user has some unsaved data. All the timing of that is tricky, and then switch to the new process at the right time. So navigation has a lot of different corner cases and complexity that then get involved with the process model so that you can do this in any type of navigation, in any frame. And so that’s where our team ends up involved in both site installation work and the navigation code and the browser.
10:06 SHARON: Right. What a cool team. So you mentioned the process model, and that is related, but not the same as the multi-process architecture. So let's just quickly mention what the differences there are, because in this case, it is important.
10:22 CHARLIE: Yes. So the process model for the browser is how we decide what goes into each process, and specifically, we‘re talking about renderer processes and web pages here, where we can decide, as we create new tabs and we visit websites on those tabs which renderer processes are we going to use. So without site isolation, maybe it’s that each newly created tab gets its own process. But anything you visit within a given tab stays in the same process. Or maybe you can do some cross-process transitions within that tab as long as you're not breaking scripting between existing pages. So site isolation defines a process model that says you can never put web pages from two different websites in the same renderer process, and then that provides a bunch of constraints for how navigation works.
11:16 SHARON: And then the multi-process architecture is more just the fact that we have these different processes.
11:22 CHARLIE: Right. It makes this possible, because it gives us this ability to run browser code and renderer code separately and plug-in code and other utilities and network service that - yeah.
11:27 SHARON: Yeah, because back in the day, that wasn‘t the case. That’s what made Chrome different.
11:34 CHARLIE: Right. So when Chrome launched, we were moving from this more monolithic browser architecture that was common at the time, where everything ran in one process to separate browser process, renderer process that was sandbox, and we could play around with different process models. So when Chrome launched, part of the internship that I was doing was looking at what should go in each renderer process? What process model should we use? And we thought site isolation would be great, but you can‘t really do that yet. It’s too complicated to get the iframe things to work. So maybe we can do a hybrid where sometimes we swap to a new renderer process as you go from one website to another at the top level, but then other times, you'll end up with multiple sites in the same process. And it was like that until we were able to ship site isolation much later.
12:23 SHARON: Cool. So this sounds, conceptually, like it makes sense. You want to have different sites/different origins in different renderer processes, and it sounds like it shouldn‘t be that hard, but it is/was/still is very hard. So can you briefly just tell us about how and why navigation is hard? Because other people who don’t work on browsers at all or tech or even people in Chrome, I feel like, they‘re just like, isn’t navigation just done? This just works, right? So why is there still a team doing this, and what is so hard about it?
12:59 CHARLIE: That was often the most common question we would get when we were explaining what work we were doing on site isolation was, oh, doesn‘t it already work that way? And it’s like, yeah, I wish. Yeah, so there‘s two parts of that. There is, why is navigation hard, and why is site isolation hard? So tying into any kind of navigation thing is tricky because of how many different types of navigation and corner cases there are. As you’re going from one page to another, is it redirecting to a different website, or does it end up not actually giving you a web page back? Maybe it‘s a download. Is it not moving to a new document at all and it’s just a navigation within the same document, which has different properties. There‘s a lot of things that we need to keep track of in the navigation system and how it affects the back-forward history that makes it tricky. And then it continues to get more complicated over time, as we add new fancy features to the browser. So there’s lots of things that we‘ve layered on top of that with back-forward cache and pre-rendering and new navigation APIs for interacting with session history, which make things faster and nicer for web developers, but also, provide even more ways that navigation can get into interesting corner cases, like why didn’t we think that about pre-rendering a page with a sandbox iframe that might cause a different path to happen? So that‘s where a lot of the complexity in navigation comes from and why there’s ongoing challenges, even though it‘s something that seems like it has worked from the beginning. Site isolation being hard is related to the fact that you can navigate in any frame in a page, and iframes being embedded is something that we used to just handle entirely within the renderer process. So this is a fun way to think about the multi-process architectures that shipped around when Chrome was launched and then other browsers that did similar things was we could take the rendering engines that had existed already for a decade or so from existing browsers and just run multiple copies of them. So as you open up a new tab, we’ve got another copy of WebKit, which is the rendering engine we were using at the time, and we had to make changes to make it work in the renderer process talking to the browser process, but we didn't really need to change fundamentally how it rendered a web page. And so it was in charge of deciding what network requests it was going to make for getting iframe content and then rendering the iframe and where a click was going to go, that kind of thing. And to do out-of-process iframes, you need the iframe inside the page to be rendered in an entirely separate renderer process. And that is a big change to how the rendering engine works. And so that was what took all the time and what made site isolation a multi-year project, where we had to fundamentally introduce these new data structures, like render frame host and representations of each frame in the browser process, change how the rendering engine worked, and then change all the features in the browser that assumed the renderer would take care of this. And now, we need to handle them spread across multiple processes.
16:28 SHARON: How did that fit in with the forking of WebKit into Blink, which is what the rendering engine in Chrome is now?
16:34 CHARLIE: Yeah, so the fork was absolutely necessary to do this. We pretty much had to wait until that happened, because we didn‘t have as much flexibility to make large, architectural changes to WebKit as we were sharing it with other browsers, like Safari and so on. We were looking into ways that we might be able to of approximate what we want, but as the decision to fork WebKit into Blink was made, it opened the door and gave us a chance to say, we can do this now. Let’s go ahead and dive in and make site isolation happen.
17:14 SHARON: That makes sense. In a quite early talk, it was probably from 10 years ago now, Darin gave a talk, and he was saying how having per site, having each renderer have just one site in it was like the Holy Grail, and he seemed very excited about it. So that makes sense because of the -
17:34 CHARLIE: Yeah, and it feels like the natural use of a sandbox in a browser. The same reason that we got all these questions, like isn‘t that how it already works? Is that it’s such a natural fit for we have a container for running a web page, what is this unit that you want to put in the container? It‘s a website that you’re visiting. And the fact that we couldn‘t easily pull them apart into different processes was totally an artifact of how web browsers were originally built that didn’t foresee this - oh, they're being used as complicated programs with different security principles.
18:13 SHARON: Yeah, in a different talk, John from Episode 3 content had mentioned that site isolation was basically the biggest change to Chrome since it launched and probably is still the case. So yeah, it was a project.
18:29 CHARLIE: Yeah, it was a long project, and we had a lot of help from many people across the Chrome team, but it was cool to get to this outcome, where we could then say, now we have processes that are locked to a single security principal, so it's nice to get to that outcome.
18:47 SHARON: So for people on the Chrome team now, what do you wish they knew about site isolation/navigation in terms of as an engineer? Because before, I was on a different team, and someone on my team said, oh, you should know how navigation works. And I said, yeah, that sounds like a great idea, but how? So what are things that people should just keep in mind when they‘re out and about doing their stuff that usually isn’t directly interacting with navigation even?
19:14 CHARLIE: Right. Yeah, so I think that the biggest thing to keep in mind is to limit what we put into a renderer process or what a renderer process has access to, to not include cross-site data. And we already have to have this mentality in Chrome that we don‘t trust the renderer process. If it sends an IPC or Mojo call to the browser process, we should assume that it might be lying or asking for things that it shouldn’t have access to. And I think it‘s in the back of a lot of people’s heads already that, OK, I shouldn‘t let it like go get a file from disk, but also, we don’t want it to mix data from different sites. It shouldn‘t be able to ask for something from - to lie and say, oh, I’m origin x, please give me data from there. Because that‘s often how APIs used to work in Chrome was, the renderer process would say what origin it’s asking for, and please give me the cookie for that.
20:12 SHARON: That sounds bananas.
20:12 CHARLIE: Yeah. Now, it sounds crazy. And so we think that the browser process should already know based on who‘s asking what they have access to. So that’s really the thing that, in order to avoid site isolation bypasses, that‘s what developers should keep in mind. So for features like Autofill or something where it’s easy to think, oh it would be nice for me to just have that data on hand in the renderer process and I can just put it in when it‘s needed. No, you should keep it out of the renderer, and then only provide the data that’s needed.
20:51 SHARON: In security-discuss circles, another term you hear often is a renderer escape or renderer bypass or whatever. Is that the same as a site isolation bypass, or are those different?
21:00 CHARLIE: Yeah, so sandbox escape is a common term that is used for when an attacker has found some bug already, and then they are able to escalate their privilege to affect the browser process or get out of the browser process and to the operating system. So a sandbox escape is a lot worse than a site isolation bypass. It would give the attacker control of your computer and installing malware and things. So sandbox escapes, we want to have as many boundaries as possible to try to prevent that from happening. A site isolation bypass is not as bad as a full sandbox escape, but it would be a way that an attacker could find some way to get access to another website‘s data or attack that website. So maybe it’s able to trick the browser into giving it cookies from that site or using the permissions that have been granted to another website. And then renderer compromise would be another type of exploit that happens entirely within the renderer process. That‘s one where the attacker has found some bug, they can run whatever native code they want within the renderer process, and that’s what we‘re trying to contain with the sandbox and what site isolation tries to make even less useful to the attacker. Because even if you can run any code you want within the renderer process, you shouldn’t be able to install malware because of the sandbox, and you shouldn‘t be able to access other site’s data because of site isolation
22:47 SHARON: Yeah, I think when I was learning about site isolation and stuff, I was like, whoa, this is a lot going on, and most people just have no idea about it. And in terms of how other bugs and whatnot, something that is often mentioned is Spectre and that still affect thing. And the only mention, on Wikipedia in the Mitigation section of Spectre, they mentioned site isolation, but I was like, this should have its own page, so maybe one day -
23:20 CHARLIE: Maybe one day.
23:20 SHARON: one of us is going to write a thing about that. But yeah, that's kind of the bug, right? So can you just talk about that?
23:25 CHARLIE: Yeah, so Spectre and Meltdown were certainly a big change to the security landscape for browsers. At a high level, those are attacks that are based on the micro-architectural parts of the CPU. The way that the basic CPU hardware works, there are ways to leak data that weren‘t anticipated. And we can view it as it gives attackers what we call an arbitrary read primitive, something that can access anything in your address space in a process. You can think about it as the CPU wants to not stop and wait for going and accessing data from RAM, so it thinks, well, I’ll just guess what the answer is going to be and then keep running some instructions. And if I was right in my guess, the next several steps are done already, and I can just move on from there. And if I was wrong, well, I just throw away that work, and I do the right thing, and we move on, and everybody is fine. But attackers found that while you‘re doing those extra steps ahead of time, you’re also affecting the caches on the CPU, and cache timing attacks let you find out what work was done there. So some very clever researchers found that you can do some things in those extra steps that happen in this speculative state to find out what data is in addresses you don‘t have access to. And so places where we thought some check in the renderer process could say, oh, you don’t have access to this thing from another website. We‘re fine. Now, you could get access to it, just based on how CPUs work, without needing any bugs in the browser. So now, we’re thinking, OK, we‘re running JavaScript, and if it can leak things from the renderer process, we can’t have data we‘re stealing in the renderer process. You could try to find ways to prevent those attacks, but those ended up being difficult. And ultimately, we found that it wasn’t really feasible to prevent the attacks in all the forms that they could happen. So site isolation became the first line of defense to say, data from other websites, data we're stealing should not be in the render process where a Spectre attack could get access to it. Now, that was actually one of the big, exciting events that helped us accelerate the work on site isolation and get it launched when that was discovered in 2017 or 2018.
26:24 SHARON: So at that point, site isolation was mostly done, and it was just getting it out?
26:24 CHARLIE: Yeah, it was really interesting. So we‘d been working on it for several years for a different reason for the fact that we wanted it to be a second line of defense against compromised rendering processes. We assume people are going to find bugs in the renderer process, in V8 or in Blink or things like that, and we wanted that to not be as big of a problem. We wanted to say, OK, whatever. There isn’t data we‘re stealing in that process. We had already shipped some initial uses of out-of-process iframes in 2017 for extensions, and we were working on trying to do some sort of initial steps towards using site isolation for some websites and see how that goes when we found out about Spectre and Meltdown. And so that next six months or so was a very accelerated, OK, we’ve got to get everything else working with the way that site isolation interacted with DevTools and extensions and printing and a bunch of other features in the browser that we needed to get working. And so it was an interesting accelerated rollout, where we even had an optional mode and an enterprise policy where you could say, I don‘t care if printing doesn’t work, turn on site isolation so that Spectre attacks won‘t find other data we’re stealing in the process. And then we got to where it was working well enough we could ship it for all desktop users in, I think it was Chrome 67 in mid 2018. So it was good that far along that we were able to ship the full thing within a few months.
28:19 SHARON: Very cool. Yeah, I mean, those are all the things that make navigation hard, like extensions as part of it, and there‘s just all these things and all of these go-through navigation and effective, so that’s very exciting. So what is the state of site isolation now, and are there still going to be changes? That was a few years ago, so are things still happening?
28:45 CHARLIE: Yeah, we‘re still trying to make several different improvements. We’ve made several improvements since the launch, so that initial launch, since it was mostly focused on Spectre, didn‘t have all the defenses we wanted against compromise renderer processes, because the Spectre attack can’t affect actual running code. It can‘t go and lie to the browser process. It won’t give you full control over what‘s running in the renderer process, but it can leak all the data that’s in there. So anything that a web page can pull into a renderer process can be leaked. So after that initial launch, we needed to go and actually finish the compromise renderer defenses and say, OK, all the IPCs that come out of the renderer, make sure they can‘t lie and steal someone else’s data, so get all the browser process enforcements in place. Another big thing after that was getting it to work on Android, where we wanted this defense. We have a much different set of resource constraints on mobile devices, where there‘s not nearly as much memory and renderer processes are often killed or just discarded. So there, we couldn’t isolate all websites from each other. We had to use heuristics to say, here are the sites that need it the most, so sites where users log in, in general, or sites where this particular user is logged in or other signals that this site probably needs some protection, we‘ll give those isolation, and then other ones can share a renderer process. So we’ve tried to improve those heuristics and isolate as many sites as we can there. And then things that we weren‘t initially isolating from each other, we have been able to. So extensions was an example where we started by just making sure extensions didn’t share a process with web pages, but now, we make sure that no extensions can share a process with each other. And we‘re trying to get to where we could isolate all origins from each other, depending on what resources are available, but there’s some changes with, basically, deprecating document.domain that are in flight that might make that possible.
30:57 SHARON: So say I have a fancy computer, and I just want maximum site isolation because I care about security. How do I go get that?
31:03 CHARLIE: Yeah, so there are some experimental ways to do that. You can go into the chrome://flags page, where you can turn on and off different features and experiments that are in progress. And there‘s one there called strict origin isolation, which will ensure that all origins within various sites are isolated from each other, and that works on desktop and Android. It’ll just create slightly more processes than we do today. Similarly, on Android, if you wanted to isolate all sites, there is an option for full site isolation there called site-per-process, which you could use that or strict origin isolation to get maximum site isolation today.
31:51 SHARON: So another platform that Chrome does exist on is iOS. So can we do anything there? Why is that not in [INAUDIBLE]
31:58 CHARLIE: So Chrome for iOS has to use Apple‘s WebKit rendering engine today, and current versions doesn’t have site isolation, and we don‘t have the ability to run our own rendering engine that has support for it. So we don’t have it today, but my understanding is that WebKit is working on site isolation as well, and actually, Firefox has also shipped their version of site isolation, which is pretty cool to see other browser vendors building this as well. And so if that were made available to other third-party browsers on iOS, then maybe it could be used there. But at the moment, we‘re constrained, and we can’t ship it on that platform.
32:47 SHARON: In terms of how the internet happens, this seems like a good thing to just have generally. So is it possible that this could be a spec one day that any browser should implement, or is it - because it‘s under the hood and it’s not something that‘s maybe necessarily visible to websites, maybe that’s not part of it, but is this an option?
33:04 CHARLIE: Yeah. I think it ties back to the earlier question about web security model versus browser security model, where the web visible parts of this, it‘s meant to be transparent to the websites. There’s no behavior changes to the web platform by turning on site isolation. There‘s not meant to be. And so it’s not really a spec visible thing, it‘s more part of the browser’s architecture, the same way that there‘s no spec for sandboxes in a browser. You could build a browser that doesn’t have a sandbox, but today, the best practice is to have better security by having a sandbox. So I think the relevant thing for web specs is just that we don‘t introduce APIs that don’t work when different origins are in different processes. And that sounds like, well OK, that makes sense, and thankfully, we were sort of in that state to begin with, and in some places we got lucky. Like postmessage is asynchronous, which is a mechanism for sending a message to another origin, but they don‘t need to run in the same process because that message will be delivered at a later time. So we can send it to a different process running on a different thread. Some places we got unlucky, like document.domain, where web APIs said that different origins can script each other if they agree that it’s OK, as long as they‘re in the same site, and that constrained us in the process model. So we’re trying to improve things about the web spec. You could almost say that deprecating document.domain is a way of seeing that the browser security model and the web security model aligning with each other to say, OK, we want to use processes. We want this asynchronous boundary. You shouldn‘t be able to script other origins from the same site. So I think that’s the closest is making sure that specced APIs fit well with this multi-process site isolation world.
35:12 SHARON: There are some headers and tags and whatever that websites can use to alter how the browser handles things though, right?
35:23 CHARLIE: Yes, absolutely. And those are both good ways that websites can more effectively isolate themselves, in general, both from web visible behavior and from the browser‘s architecture and ways that browsers that don’t have full-site isolation, that don‘t have out-of-process iframes in all cases, web pages might still be able to get some of the isolation benefits using those APIs. And so those are things like cross-origin opener policies that says, for example, if I open a pop up to a different website, there’s not going to be any communication between me and that pop up. So it‘s OK to put them in different processes, and they can be better isolated from each other. That’s good from an architecture perspective. It‘s also nice from a web perspective in that you don’t have to worry about is the window.opener variable in the pop up able to be used to do sneaky things to the page that opened it. So there‘s nice, web-visible reasons to use something like a cross-origin opener policy to keep them protected from each other. So that’s one example of that. There's others as well.
36:46 SHARON: Something I've seen around that is a web spec is content security policy. Is that related to any of this at all?
36:52 CHARLIE: It kind of is. Yeah, so content security policy is another way for websites to tell the browser better ways to secure that site. And so some of it is useful for saying I want to do a better job preventing cross-site scripting attacks on my page, so don‘t run a script if you find it in these random places. It should only come from these URLs or in these contexts on my page. So that’s more about what happens in a given renderer process, but there are some places where content security policy does overlap a bit with site isolation. There is a sandbox value you can put into a content security policy header that makes it get treated like a sandbox iframe. And while we don‘t yet have support for putting sandbox iframes in another process, that was work that’s in progress and we‘re hoping to ship before long. And so CSP headers that say sandbox will also be able to be isolated from the rest of their site. So if they have some kind of untrustworthy content in them, that won’t be able to attack the rest of the site.
38:04 SHARON: OK. Yeah, so it‘s that difference between the web versus browser, what’s visible, what‘s an option versus how it’s actually implemented.
38:11 CHARLIE: Right.
38:11 SHARON: Cool. So a lot of this, we‘ve talked about security a lot, and I think for people who don’t know about security, the image you have is people trying to break into - like I‘m in, that whole thing, and that’s very much not what‘s going on here, because we’re not trying to break things. So can you tell us just a bit about the difference between offensive and defensive security and how this is one of those.
38:38 CHARLIE: Yeah, so a lot of attention in the security space goes to big, exciting, flashy attacks that are found. On the offensive side, look, I found a way to break the security of this thing, and we have big vulnerability reward bounties to reward when people find these things so we can get them fixed. So even on the defensive side, you want people working on offensive security, looking for these bugs, looking for things that need to be fixed so we can defend users. But the defensive side is super important and I find it a satisfying place to be, even if it isn‘t always as glamorous. It’s like, you have to have all the defenses in place and all of these different attacks that are found, it‘s like, yeah, we need to fix them, and we need to find ways to make that less likely. But ultimately, this is the real goal, is we want to have systems that we can trust, that are safe to use, and that we can go and visit untrustworthy web content and not have to worry about it. You need these extra lines of defense. You need all these different ways of defending the product and shipping security fixes fast, all the things that security works on in a defensive sense so that people can use these systems and depend on them in their lives. So that’s the fun and fulfilling part of this, even if it isn't quite as glamorous as I found a sandbox escape, but those are fun to look at too.
40:17 SHARON: I heard security described as a bunch of layers of Swiss cheese. So you have all these different layers of mitigations to try to keep bad things from happening, but each of them is not perfect. And if the holes in those layers line up, then that's where you get a vulnerability. So in this very approximate metaphor, what are the neighboring slices of cheese to site isolation? What other defensive things are related to this and are trying to achieve the same goal sure?
40:46 CHARLIE: Sure. Yeah, so there‘s going to be holes in any layer that you build we. Have bugs in software, and in site isolation’s case, it‘s trying to put this boundary between the renderer process, where we assume everything is compromised already and the data that the attacker wants to get to, other websites, data on your machine and so on. So the adjacent layers of Swiss cheese would be within the render process, we do have security checks that try to say we have same origin policy checks, things that try to keep certain data opaque to a web page so the JavaScript can’t look at it. Those checks in the renderer process do matter. Today, we do have multiple origins from the same site in the same process. The renderer process' job is to make sure that they don‘t attack each other. But there’s some fairly large Swiss cheese holes in that layer that we try to fix whenever we find them. And so site isolation‘s job is to be the next layer, which won’t have holes in the same places, hopefully. Its holes, site isolation bypasses, might be, oh, there‘s some way for the renderer process to ask the browser process for something it shouldn’t have access to, and it tricks it, and it gets access to that. We hope that it‘s tough to line those holes up, that an attacker has to find both a bug in the renderer process and a bug in site isolation and luck out in that those bugs line up and you can get to one from the other in order to get access to another website’s data. And then the next layer of Swiss cheese would be all the things that the browser process does to keep the renderer isolated from the user‘s machine and the sandbox itself that you shouldn’t have access to the OS APIs and so on. So those would be other ways to try and get beyond site isolation to other things.
42:48 SHARON: That makes sense. Yeah, when I first heard about it, I was like, oh, that‘s such a fun way to think about it, really. It’s a good visual seeing, OK, this is how things go wrong. All right, cool. Do you have any other fun stories about site isolation, making it happen, stuff since then?
43:08 CHARLIE: I mean, it‘s been a really fun journey the whole way. There’s been different projects and different exploratory phases, where we weren‘t sure what was going to work or what we needed to get done. I’ve worked with a bunch of great interns and people who have been on the team on early phases like getting postmessage to work across renderer processes, later phases about what would it look like to build out a process iframes using something like the plugin infrastructure, just is this feasible? Or what is it that we could protect that a particular renderer process is allowed to ask for. If can we keep allowing JavaScript data from other websites into a renderer process, while blocking your bank account information from getting it, those both look like network responses from different websites, but one has to be let through for compatibility reasons, and one has to be blocked. Can we build that? Are we doing a good job of keeping that sensitive data out? These are things that. We had some great PhD interns working with us on, and ultimately, got us to where we could ship this and protect a lot of data. So it's fun working with all those people along the way.
44:35 SHARON: Yeah, that sounds very cool. These days, so earlier on, you mentioned people whose questions were like, why doesn't this already happen? So these days, it does happen more or less like that. So what kind of questions or misconceptions do you still see folks who typically work on Chrome still have when it comes to this kind of stuff?
44:52 CHARLIE: I think it‘s often assuming that navigation is simpler than it is and not realizing how many corner cases matter and how all of these different features that have built on top of navigation interact with each other. So I think that’s where we spend a lot of our time these days beyond the we want to improve site isolation. We want to make these abstractions easier for other people to understand. So I think that's one of the big challenges now is how many different directions the navigation code has been pulled and how those things interact with each other.
45:24 SHARON: Right. And that‘s kind of - was intentional initially, right? You don’t want everyone who works on Chrome to have to know how all of this works, but then when you hide it so well, they‘re like, oh, this is fine. I’ll just do my thing. It‘ll just be my one thing, but then everyone has such a thing, and then it becomes too many things. Yeah, I used to work on a different part of Chrome that was not related to this, and you see some of these big classes, like web content or whatever. You’re like, oh, I‘ll just get what I need from that, and things will be fine, but you just don’t even have any idea of all the things that could go wrong. So it's cool that someone is out here trying to keep that under control.
46:00 CHARLIE: And I‘m glad there’s a lot of efforts to try to improve the APIs for how we expose these things, web content to web content, observer which is growing into quite a large API with many users, looking at ways to make these APIs easier to use and harder to make mistakes with. So I think those are worthwhile efforts.
46:20 SHARON: OK. Cool. Well, I think that covers all of it. Now, folks know how isolation works. Problem solved. This is great. All right, thank you very much. Great.
46:34 CHARLIE: Thanks. Oh, no. What? OK, hold on.