This is a transcript of What's Up With That Episode 7, a 2023 video discussion between Sharon ([email protected]) and Daniel ([email protected]).
The transcript was automatically generated by speech-to-text software. It may contain minor errors.
Due to technical issues, timestamps were not available for this episode. The transcript below uses 00:00 placeholders instead.
Mojo is used to communicate between processes. How does that happen? What can go wrong? Is mojo the same as mojom? Today’s special guest telling us all about it is Daniel. Daniel is an IPC reviewer and has written much of the guidance and documentation around it. He’s also worked on cross-process synchronization, navigation and hardening measures to mitigate security risks.
Notes:
Links:
00:00 SHARON: Hello. And welcome to “What's Up with That,” the series that demystifies all things Chrome. I‘m your host Sharon. And today, we’re talking about Mojo. How do we communicate between processes? What can go wrong? What is mojom? Today's special guest to answer all of that and more is Daniel. You know him from the unparalleled volume of code reviews he does, including IPC Review. For which, he wrote the documentation and guidelines. And in addition, he has worked on navigation, cross-process synchronization, and hardening measures to help mitigate security bugs. So hello, Daniel. Welcome to the program.
00:00 DANIEL: Thank you.
00:00 SHARON: Thank you for being here. First question, what is Mojo?
00:00 DANIEL: Mojo is basically Chrome's IPC system for talking between processes.
00:00 SHARON: All right, that sounds pretty good. That sounds like what we‘re here to talk about. So today, we’re going to cover some questions around Mojo. There are a couple of Chrome University talks and some documentation that are really good to explain the basics of how Mojo works. So those will be linked below. Check those out too. Today are questions you might have, if you've watched those videos, maybe some followup questions that you might have. So you mentioned IPC. Does that include RPC? Or is it just Inter Process Communication?
00:00 DANIEL: So personally, I kind of think of them as the same thing. But I guess RPC is probably more general. Because it could include calls over the network, right? Mojo doesn't go over the network today.
00:00 SHARON: OK. So it mostly is between the processes we have in Chrome.
00:00 DANIEL: That‘s correct. Yeah. You also have things like gRPC, right, Google for making network API calls. But yeah, that’s not under the scope of Mojo.
00:00 SHARON: OK. Cool. Very briefly, we have a thing called Legacy IPC that I think is a long-term project in the works to get it removed. Anything briefly there?
00:00 DANIEL: Yeah. Legacy IPC is what we used before Mojo. It was based on a bunch of clever or horrible hacks, depending how you‘re looking at it, using C preprocessor macros. We still have it around because NaCl and PPAPI actually use a CIPC. So eventually, when we don’t have NaCl support, we can get rid of Legacy IPC altogether hopefully.
00:00 SHARON: Any day now.
00:00 DANIEL: Any day now.
00:00 SHARON: Any day now. OK. So what we‘ll do now is I think we’ll just rattle through some definitions because we‘ll come up with a bunch throughout it. And they’re words that probably you‘ve heard before but have maybe a special meaning in the context of Mojo. So the first of these is Mojo versus .mojom. I’ve seen both of them. What is the difference?
00:00 DANIEL: So I think people kind of use them interchangeably in some contexts. But usually, mojom is specifically the file that defines your interfaces, structs, and other types that are going over Mojo IPC. Mojo is just kind of the general name for this system, right? Mojom is specifically a file that defines these kind of types.
00:00 SHARON: OK. That's cool. Next is pipes.
00:00 DANIEL: OK, yeah, so Mojo, basically, all the higher-level stuff that we actually use, most of the time, is built on top of this primitive called a message pipe. So Mojo message pipe always has two ends. It's actually bidirectional. So basically, the idea is you can create a pipe. And then you give the endpoints to whoever you want. And those two endpoints can talk to each other.
00:00 SHARON: And that seems related to the next one, which is capabilities, in terms of passing things around.
00:00 DANIEL: Yeah. So capabilities is kind of a pretty generic term. In Mojo, I think we would kind of think of it as using interfaces to grant capabilities to processes. So for example, if your renderer has permission to, say, use file system stuff, right, we would give it an interface, like a message pipe with an interface that‘s bound to an interface for accessing the file system. Or if it can record audio for WebRTC, right, we would give it an interface for recording audio, right? But the idea is we wouldn’t just have this giant interface with all these methods and then have to permission check, at each time like someone calls a method, that they have permission, right? We would only give you the interface if you have permission. And if you don‘t have permission, you don’t have the interface at all. And you can't use the capability.
00:00 SHARON: Can you have multiple capabilities and interfaces per pipe?
00:00 DANIEL: So that probably kind of gets into the associated stuff.
00:00 SHARON: OK. We‘ll get there. We’ll get there. That's coming up. OK. Next one on our list of words is bindings.
00:00 DANIEL: Yeah, so I think when most people think of Mojo and using Mojo, the bindings layer is probably what they‘re thinking of. So this is stuff like the remotes, receivers, and the glue that actually makes these calls between processes. There’s a lot of Mojo underneath that backing it all. In fact, rockot actually rewrote the entire backend that Mojo is built on top of recently to use something called IPCZ for efficiency and other reasons.
00:00 SHARON: OK. He's one of the ones that ones that gave one of those Chrome University talks, which is very good. So go check that out. Cool. Moving along, we have remotes, one of the things you just mentioned, I think.
00:00 DANIEL: Yeah. So earlier, I mentioned message pipes. Remotes, and receivers - they kind of come as a pair - are kind of an abstraction on top of message pipes to make it a bit easier to use. Because, with message pipes, it's basically you stuff bytes in one end, and you get bytes out the other end, right? And no one wants to deal with that. And basically, the idea with remotes and receivers, remotes are basically a way of making a Mojo call. A receiver is a way of handling a Mojo call. Yeah.
00:00 SHARON: OK. Neat. And then up next, we have pending.
00:00 DANIEL: OK, yeah. So to take a step back to get the broader picture, when you use the bindings, you can create a remote. And that always comes with another endpoint, right? Because a Mojo message pipe has two endpoints. So you always get a remote and a receiver together. Pending is basically the form of remotes and receivers that they are in when you can transfer them, right? So something has to be pending if you want to, say, send it from one thread to another. Because Mojo message pipe endpoints, they‘re all thread-bound - I think sequence-bound, technically. But yeah, so if you want to move things between threads or between processes, they have to be in pending form. Pending just kind of means it’s not handling - it‘s not reading things off the message pipe or trying to send things. You can’t use it in that form. You would have to turn it from a pending into an actual remote or receiver to use it, right? And we have pending forms of both remotes and receivers for type safety.
00:00 SHARON: Right. Can you briefly explain what sequence-bound means?
00:00 DANIEL: Yeah, so I think a few years ago now, we kind of rewrote the task scheduling system in Chrome. And the idea was to abstract out some of the ideas and make things a bit more flexible, right? Because, otherwise, a lot of people in code was just creating threads, even though it didn‘t always need like a dedicated OS thread, right? And so sequences are an abstraction on top of that. And a sequence just promises that, when you PostTask to it, it runs tasks in that order. But we could have multiple sequences on the same thread. That’s kind of an implementation detail. That same sequence could potentially even run on different threads at times, right? So it‘s an abstraction. But in theory, people shouldn’t have to think about it.
00:00 SHARON: Right.
00:00 DANIEL: Not always true, but usually true.
00:00 SHARON: OK, so it‘s kind of like - in other places, it would be kind of a thread. It’s the thing you interact with. This is a unit of stuff happening.
00:00 DANIEL: Yeah. It‘s kind of Chrome’s thread basically.
00:00 SHARON: OK. Cool. Another thing you mentioned already, associated.
00:00 DANIEL: Yeah. So the kind of tricky part sometimes with Mojo is message ordering is only guaranteed on the same message pipe. So if you have a remote-end receiver and you send stuff, it‘s a guarantee that the receiver will get things in the order you sent it in, right? If you call ABC, it will get ABC. But if you have two remote and receiver endpoints - if I call ABC on one and then DEF on the other, assuming they both go through the same process, there’s actually no guarantee that ABC will happen before DEF, right? It could be any kind of interleaving of those kind of things.
00:00 SHARON: Right.
00:00 DANIEL: So associated is basically a way for remotes and receivers to share an underlying message pipe.
00:00 SHARON: Oh, OK.
00:00 DANIEL: Yeah. It‘s a bit tricky because the way it actually happens is, when you create an associated remote and receiver, it kind of gets tied to the message pipe. It’s passed over, right? So when you have a remote, you pass a pending associated receiver or a pending associated remote over it. It gets tied to use that same underlying message pipe. It's kind of implicit. It usually just works. But yeah, sometimes you have to think about the details, and it gets complicated.
00:00 SHARON: OK, this sounds - this feels a bit like this strong ref counting of, maybe we don't want to do this ourselves. But we can get into that more later.
00:00 DANIEL: Yeah. Yeah. Yeah.
00:00 SHARON: OK. And the last thing on the list of definitions is entangled.
00:00 DANIEL: Yeah, so that's I think -
00:00 SHARON: Quantum Mojo.
00:00 DANIEL: Yes. Quantum Mojo. I think that‘s usually referring to the receiver-remote pair that Mojo has. It’s not a super precise term. And I don‘t think we use it widely. But it does show up in a bunch of the comments, I guess. But yeah, usually, when it means entangled, if you have a remote, the entangled endpoint is the receiver on the other side or vice versa. If you have the receiver, then it’s the remote on the other end.
00:00 SHARON: Right. Yeah. OK. Probably all the other words that mean a similar thing have been heavily overloaded already, like connected.
00:00 DANIEL: Yeah. Yeah. It's a bit hard to write comments for Mojo. We know it could use improvements. But yeah, trying to find ways to write this sort of information precisely without like writing novels is always a bit tricky.
00:00 SHARON: It is tough. OK. So let's briefly talk about how Mojo is used. So I think the most typical case - the canonical case, I feel like, is between the browser and the renderer.
00:00 DANIEL: Yeah.
00:00 SHARON: Right? Is that the case?
00:00 DANIEL: Yeah, I think that‘s fair to say that maybe that’s where most of the IPC in Chrome happens because Chrome is a web browser.
00:00 SHARON: Right. And I‘ve heard it described as letting web pages get things that they want from the browser. So Mojo is used in that process. Like a web page wants maybe - I don’t know - a file or something. And it uses Mojo to get that. So apart from - what are all the kinds of things a web page might want from the browser or want it to do that it would use Mojo for?
00:00 DANIEL: Yeah, so I think that‘s a pretty big question. So there’s kind of a set of core capabilities like a web page always has, right? So for example, it can always navigate somewhere, kind of various things to manage the loading state or to load some resources and that sort of stuff, right? So every web page will probably have all URL-loader factories or the frame interface for managing this sort of thing, right? And then there are additional capabilities that aren‘t necessarily exposed to everything, right? Obviously, on the web, you have all sorts of things gated by permissions, like file system access, clipboard, audio recording, video recording, and that sort of thing, right? And that’s the thing where the renderer could go to the browser and be like, hey, give me an interface for geolocation or something, right? And assuming it passes the permission checks and other checks, we would give it back the geolocation interface, right? We would grant it the capability by passing it that interface.
00:00 SHARON: OK.
00:00 DANIEL: Yeah. That‘s the general sort of idea. It gets - as always, it gets a bit messy, right? Because there are edge cases where things have to work slightly differently. But in general, that’s kind of the flow we try to follow.
00:00 SHARON: So basically, it sounds like the renderer wants something that is kind of OS-level, right, like camera or audio. And because we don't trust renderers, we have to do that through the browser. So this is how it gets to the browser. And then, through whatever other magic happens -
00:00 DANIEL: Right. So yeah, there‘s some central places where we register what interfaces are even exposed to a process, right? But that registration is usually also - has other logic, like, should we even grant this thing, right? Does the origin - does the document requesting this have a secure origin? Did the user give it permissions potentially? It all kind of depends. There’s a wide gamut of things you might want to check. But yeah, that's the general idea, this central point to kind of broker these sort of capabilities out.
00:00 SHARON: OK. Cool. So within the browser still, are there - what are other examples of not browser-to-renderer or back uses of Mojo? Are there render-to-render?
00:00 DANIEL: Yeah. So like any other kind of thing that evolves over time, Chrome has gotten quite complicated. So there‘s, I think, a bunch of our things actually running utility processes now. Like I think - but don’t quote me on this - like a lot of devices' code like can do this. And so what actually happens is the renderer will talk to the browser, right? And the browser will be like, you can use it, right? And it will actually maybe spin up the utility even for the renderer and give it access. It can pass the message-type endpoints. It can pass a remote back to the renderer and the receiver off to the utility process. And then the renderer can talk to the utility directly. And that actually kind of comes in for the other question about renderer-to-renderer communication. We have these things called service workers, which can do interesting things with page loads, like support offline apps and that sort of thing. And the way that works is you can‘t necessarily, from the renderer, go directly to another renderer. But the renderer, if we know it’s controlled by a service worker in that document, we can give it a URL-loader factory that will actually go and talk to the service worker. In that sense, there is renderer-to-renderer communication happening, but it‘s brokered. It’s not just a free for all.
00:00 SHARON: Why don't we want free for all, direct renderer-to-renderer communication?
00:00 DANIEL: Well, it would probably complicate the kind of trying to - so the thing with Mojo is it‘s very flexible. It’s very easy to be - let any two endpoints in Chrome talk to each other. But with that flexibility is also a certain amount of danger, basically. We want to be able to - when things are exposed to another process, we want to be able to audit them, from a security perspective and just from a stability perspective as well. If we just kind of made it a free-for-all, it would probably become pretty hard to figure out what can talk to what? How is the permission checked? Where is it checked? So by kind of centralizing these checks in the browser interface broker, for example, the idea is we make it a bit easier to understand how the system - like, what it's exposing, and what the attack surface is, and that sort of thing.
00:00 SHARON: Yeah. There‘s a lot of stuff that’s very combinatorial explosion to me, and this seems like it's trying to limit that a little bit.
00:00 DANIEL: Yeah. There‘s always going to be things that we can’t catch, obviously. But that is kind of the general idea. By kind of limiting it through a central kind of broker area, we can figure out, if someone wants to audit it, they can be like, OK, we are exposing these things to the renderer process. Oh, no, we‘re exposing WebUI. Is that checked? It is, so we’re OK. But that sort of thing, yeah.
00:00 SHARON: OK. Can you explain a bit more about what service workers are? For those of us who might not be familiar, it sounds like they're kind of between a browser and a renderer process, maybe.
00:00 DANIEL: So I‘m actually not the best person to talk about service workers. But at a very high level, they’re workers that aren‘t confined to the lifetime of a page, of a document necessarily. And that’s why they can intercept network loads. They can also do some storage stuff. And I think some notifications are tied to service workers and other capabilities. I'm not super familiar with them. I just know how they work at a high level and that they can be used to implement offline support for apps, as one example. But all sorts of other things you could think.
00:00 SHARON: All right. That makes sense. Cool. So those are, within Chrome browser, uses of Mojo. So let's talk about some adjacent Mojo use cases. So before I used to work on Fuchsia, and they have something called FIDL. It stands for Fuchsia Interface Definition Language. And to anyone who might have seen it, it looks a lot like Mojo. So can you tell us a bit about that and how that works?
00:00 DANIEL: So I wasn‘t actually super involved with Mojo at that point. But my understanding is FIDL was basically forked from an earlier version of Mojo, and then they evolved it in their own direction. And FIDL has kind a lot of interesting things about it. And if we had infinite time in Chrome, it would be nice to integrate some of those features back. But my understanding is FIDL is very specific to Fuchsia. But they also have kind of this similar idea to Chrome where I think you only expose a FIDL interface - if you give someone a FIDL interface, you’re granting them the capability to do that thing. So in that sense, it's quite similar to Mojo. But yeah, because of the shared heritage, I expect it probably looks pretty similar, but there are definitely some differences.
00:00 SHARON: Yeah. Something I heard a lot was that Fuchsia was a capabilities-based operating system. And it wasn‘t until I started seeing more Mojo stuff that I was like, Oh, that’s what that means!
00:00 DANIEL: Yeah, yeah, yeah.
00:00 SHARON: That‘s the same capabilities. And it looks a lot like Mojo. And I think, from the case of using it, I think the only thing you might notice is that they have more bindings in different languages. So in Chrome, it’s mostly C++. Are there any non-C++ Mojo usages, really?
00:00 DANIEL: There are, actually. So there‘s Java. That was one of the motivations for doing this is to make it a bit easier to implement an endpoint in Java. Because before people had to write a bunch of JNI boilerplate to jump from the C++ IPC handling over to Javaland. Mojo kind of abstracts that away at some cost. There’s been some persistent concerns about binary size from the Java bindings from the Android team. And they could probably be improved. There's also the JavaScript and TypeScript bindings. I believe Chrome mostly uses the TypeScript bindings these days for things like WebUI. I know some WPTs also use the JavaScript endpoints for injecting test fakes or mocks and that sort of thing.
00:00 SHARON: Oh, cool! I didn‘t know about that. Cool. So that’s that. And then another kind of OSey thing is LaCrOS. I'm not super familiar with this, but I understand that Mojo is used in an interesting way in LaCrOS. So can you tell us about that?
00:00 DANIEL: So LaCrOS is basically an effort to make it easier to update Chrome on ChromeOS devices. Before, it was kind of this monolithic thing because Chrome was also responsible for the Window environment Ash on ChromeOS. And so it was sometimes a bit difficult to uprev Chrome if there is a critical security fix or whatever. And LaCrOS is an effort to kind of decouple these. So basically, it turns Chrome OS into more of an OS kind of environment. And what‘s left on the LaCrOS Chrome - it’s what it‘s called - is really just browser related. So it’s still kind of a work in progress. But in the future, Ash the Chrome - right now we have Ash Chrome, which can show WebUI still. But in the future, that would actually - WebUI would be displayed in LaCrOS Chrome. And it would just be like an Ash backend without any blink renderer and that sort of thing. And there‘s a bunch of Mojo to basically communicate between Ash Chrome and LaCrOS Chrome. There’s some constraints there. It uses versioned interfaces, which is something you won't find too much of elsewhere in Chrome, other than some ARC stuff.
00:00 SHARON: What are these interfaces?
00:00 DANIEL: So versioned just means that these interfaces have backwards compatibility constraints because Ash Chrome and LaCrOS Chrome don't necessarily ship together. We want to be able to update LaCrOS Chrome.
00:00 SHARON: That's the point.
00:00 DANIEL: Yeah, exactly. So we have to be able to tolerate some amount of skew between the interfaces. But we have to do it in a way that‘s backwards compatible. And so versioned interfaces are a way to more or less guarantee that, assuming you follow the rules. And we have some checks to make sure you don’t break the rules, generally speaking. But yeah, there's some complexity because of that. If you want to deprecate methods or remove fields, you can deprecate methods and remove them eventually, but fields are a bit trickier, and that sort of thing.
00:00 SHARON: It‘s like the whole Proto thing of you want them to optional because they’re never going away, or something.
00:00 DANIEL: Yeah. So Proto has an advantage over Mojo in this respect, because they identify their fields with tag numbers. And so you can just omit fields completely. Whereas, Mojo, we actually reserve space in the struct for it. And that means, once you have a field there in a versioned interface, you can never really get rid of it. You have to keep it there even if you‘re not using it. In the future, maybe you might use it for something else if it’s no longer needed. But yeah, it becomes a bit tricky because of that sort of thing.
00:00 SHARON: Yeah. Because I guess with regular Mojo, it's meant to just work within one monolith of the browser. So that, at least, has all the same version, and is not - the version skew is not something that was initially planned for.
00:00 DANIEL: Right. It all ships as kind of one monolithic block. You can kind of refactor freely across the system. When you have versioned interfaces, it becomes trickier. You have to follow a deprecation process. I think LaCrOS, at one point, was kind of like a three-milestone, three-version thing before you could remove old APIs. But don't quote me on that.
00:00 SHARON: Right. OK, interesting. Changing gears a bit here, so let‘s go back to talking about receivers and remotes and the different states they can be in. So some - these are all kind of words I’ve seen. I‘m not that familiar with Mojo. I haven’t done too much cross-process stuff. But you see words like, bound, connected, disconnected. I‘ve seen all these words before. I know what they mean, but I don’t think I know what they mean in this context. So can you explain?
00:00 DANIEL: Yeah. So I think maybe the simplest way to think of it is bound is when a remote or receiver isn‘t null. Why would it be null? If you just default construct a Mojo remote that’s not bound to - you just default construct on, it won‘t be bound to anything. It’ll be null internally. If you try to make a method call on it, it will crash. You actually have to create that Mojo message pipe that‘s backing it to, quote, unquote, “bind” it. So when you create that underlying Mojo message pipe, that’s what it means to go from unbound to bound. And this is kind of a bit tricky sometimes. I notice this kind of mistake pretty often. Sometimes it‘s very easy to call BindNewPipeAndPass, like, pending - I don’t even know what the function is called. We gave it a really long name to try to be descriptive, and now no one can ever remember what the actual invocation is. But when you call that thing, the remote or receiver that you‘re calling it on becomes bound synchronously at that point. Even though there’s no other side attached to the entangled endpoint, it‘s still considered bound because it’s no longer null. You could create a Mojo remote. You could bind it. You could immediately start making method calls on it, even though the other end hasn‘t been passed anywhere. And what will happen is all that stuff would just be queued internally. And so when it becomes connected is when the other endpoint basically goes from pending to - actually, no, that’s not true. Sorry. It's actually considered connected, too.
00:00 SHARON: OK.
00:00 DANIEL: Yeah. When you bind it, it's considered both bound and connected.
00:00 SHARON: OK.
00:00 DANIEL: Yeah. The disconnection, if there is one, is always kind of asynchronous. Internally, there‘s some control IPCs that do heartbeats and sort of stuff to see what’s alive and that sort of thing. I don't know those details. You would have to ask rockot, who is probably the only person who knows those details at this point.
00:00 SHARON: Oh, no!
00:00 DANIEL: So yes, let us all hope for rockot‘s continual safety. But yeah, when you create a remote or receiver and you bind it, it’s both bound and connected. If you have a remote, you can start making method calls on it immediately. You don‘t have to wait for the other side to turn from pending to a receiver, for example. Everything would just get queued. And disconnected is just when either endpoint is dropped. So if you drop the remote, the receiver will become disconnected, if you destroy the remote. Or if you destroy the receiver, the remote will become disconnected. But that’s an asynchronous process because it‘s always asynchronous, even if you’re in process. But it just happens at some point. And the tricky part here is if you have a bound thing, it can be disconnected. You can still make method calls on it. And that‘s OK. But your method calls will just disappear into thin air. Whether or not that’s desirable kind of depends on what you're doing.
00:00 SHARON: So going back to what you just said, can you have a case where you have one of the ends of a pipe disconnect, and then reconnect it? Or is the only way to disconnect one of the ends after you have connected it is to destroy the object that represents one of those ends?
00:00 DANIEL: So disconnection is a permanent thing. You can‘t reconnect something that was disconnected. There’s some Mojo underlying system - I don‘t know I would call it - but like low level Mojo APIs that you can use to fuse message pipes together. But even those won’t turn a disconnected message pipe back into a connected one. The idea with the kind of endpoints is, once they‘re entangled, they’re always kind of that pair. So if either endpoint gets destroyed, it becomes disconnected. And this could also happen if the other process crashes. Your endpoint that‘s remaining alive, whether that’s a remote or receiver, will become disconnected at some point, but no guarantee when exactly. There's no ordering guarantees there.
00:00 SHARON: OK. So whenever ordering and stuff comes up, like a concern - a common concern is like deadlocks or all sorts of synchronizing issues. So what are some of the concerns? Are deadlocks a common concern? How do we handle this? Because this seems very fraught with all of the typical, distributed, async problems that exist.
00:00 DANIEL: So if you‘re not using synchronous IPCs, you probably won’t hit deadlocks unless you‘re actually writing code that is blocking on receiving a remote IPC. In general, I haven’t seen code written like this in Chrome because I think most developers are like, well, I probably shouldn‘t block waiting for that reply because that’s not a great thing. Obviously, you‘ll see this sort of thing in tests because it’s much more convenient in tests. But in actual production code, I don‘t think this is a thing that happens. Where this could run into problems more is with sync IPCs. So by default, Mojo methods are all async. You have to actually give it a sync attribute if you want to be able to make an async call in it. And what that means is, if you use the synchronous version of the method, it will actually just wait until it gets - until the remote process, or whatever, the other end calls the reply callback to let you know that it’s done. And there‘s a lot of trickiness involved there because, when you’re just waiting for the remote thing to reply, there were concerns because - before Mojo IPC, with legacy IPC, you could also have sync calls. But the way we tried to ensure safety was to make sure that the sync IPCs only ever went in one direction. So they only go renderer to browser, and not browser to renderer as well.
00:00 SHARON: Because we don't want to block the browser ever.
00:00 DANIEL: I mean, we don‘t want to block the browser. But we also don’t want to end up with sync call cycles where the browser process is waiting for a sync reply from the renderer, and the renderer is waiting for a sync reply from the browser. That would be bad.
00:00 SHARON: That would be bad.
00:00 DANIEL: Mojo tries to avoid this problem by saying, if I‘m waiting for a reply to my message, to that sync call I made, and someone else makes a sync call to me, I better let that through and handle it and let them know just to avoid deadlocks. But this is also problematic in another way, because it means the messages you’re getting sent may be reordered, basically. So what this means is, say, I make a sync call from the renderer to the browser. The browser sends us some async IPCs, like A and B. And we see those. And we‘re like, OK, we’re in the middle of a sync call. We‘re not going to handle them right now. And then, for some reason, someone added a sync call from the browser to the renderer. And so the browser goes to the renderer. And the renderer is like, hey, I better handle that sync - that incoming sync IPC. And it handles C. But at this point, you haven’t handled A or B yet. And if you were kind of assuming that A and B would happen before C, that‘s no longer the case. It’s pretty messy, which is why we‘ve actually considered switching the behavior of sync IPCs to no interrupt by default rather than allowing sync interrupts, basically, is how it currently works. We actually had some security bugs kind of around this sort of message reordering thing. Really, the whole takeaway from this is don’t use sync IPCs if you can avoid it in any way. They do add a lot of complexity, just for the considerations. Obviously, they aren‘t great performance-wise because they are blocking - if you don’t need it, please, please, don't use them.
00:00 SHARON: Is that the main takeaway of today is don't use sync IPCs, if at all possible.
00:00 DANIEL: I mean, that is definitely one thing I would like people to remember just because, yeah, if you can avoid it, it will make things - it will make life much easier down the road, most likely.
00:00 SHARON: So to make your life and Daniel's life easier down the road, try to minimize use of sync IPCs. So of course, what are some cases where they are used now and cases where they are currently used, and we would hope to transition away from them also.
00:00 DANIEL: Hmm. That‘s a hard question, mostly because I don’t have Code Search pulled up right now.
00:00 SHARON: Right, fair enough.
00:00 DANIEL: I know there‘s some sync stuff around GPU and render stuff. A lot of the older web APIs weren’t written with promises in mind. So for example, I think document.cookie involves a sync IPC to go get whatever the latest cookie is from the cookie jar. We‘ve added some caching there to make it better, but fundamentally, those sorts of things need to happen synchronously. So we don’t have much of a choice. Interestingly enough, I think Android WebView actually has some sync IPCs from the browser to the GPU, I want to say. Don‘t quote me on that. I don’t understand that code at all, despite having reviewed a lot of those CLs. But I‘m given to understand that it’s necessary. So yeah, I mean, I don‘t know that we’re actively migrating anything away from sync IPC at this point. I know people have worked on optimizing cookie access. And so we will reduce the amount of sync IPCs, but never completely eliminate, I think. Luckily, I think a lot of the new web APIs are using promises, so they can be async. They don't need to be synced. And end life is great.
00:00 SHARON: OK. That's good.
00:00 DANIEL: Yeah. There is also some, I think, additional kind of Google integrations with Chrome. I think previously they were pretty complex because it was just trying to translate a Java code base into C++. There was a bunch of assumptions around sync calls. So they wrote sync IPCs kind of to wrap all that in their helper utility process. And that definitely led to some problems with deadlocks because we would make a Mojo sync IPC. And then to simulate the environment Java would have had, it would have - it spun a run loop internally. But it got into deadlocks. So don't write sync IPCs. Do yourself a favor.
00:00 SHARON: Do yourself a favor. That's right. So when it comes to all of this async/sync, mostly the async stuff - and you mentioned binding earlier. Something we see a lot in Chrome is callbacks. So these are used for async stuff. And you also see them bound. Is that the same binding as Mojo binding or is that - no.
00:00 DANIEL: No, it's completely different.
00:00 SHARON: It's completely different. Is there much intersection between callbacks and Mojo? These are both heavily used in async situations. Do they intersect?
00:00 DANIEL: Yeah. So it‘s actually kind of a known - I guess I would call it a wart at this point that our way of writing async code leads to kind of hard-to-follow code. If you want to make a Mojo message call and do something after it replies, you bind a reply callback. And that’s kind of the case of how async code in Chrome often works. You create callbacks, and then you wait for this other thing to be done, and call your async callback. But it kind of means that trying to read the control flow of the program can be pretty tricky sometimes. You have to be like, oh, this thing has an async callback. Let me see what it‘s bound to. So you go in Code Search. You look at the caller. You’re like, oh, it bounded to this onFooDone thing. Let me go look it onFooDone. And then if onFooDone has more async work, you‘re just kind of chasing these chains all over the place. And that’s kind of the case with Mojo. I think Mojo used callback just because that‘s kind of our language for it in Chrome. It would be nice to do better. There was a bunch of exploration around some sort of promise-based idea a while back. Ultimately, we didn’t implement that because it was felt it would be hard to migrate everything. And it was kind hard to justify prioritizing that. But we‘ve played with a lot of other ideas since then to try to make these sorts of things a bit easier to write. If you’re chaining two callbacks, you can use a callback helper called then. There‘s also something called a sequence bound which can help you if you have two objects that live on different sequences. You don’t have to post task yourself. Sequence bound can happen - handles that under the hood for you and binds the callbacks and whatever.
00:00 SHARON: Right, right. Yeah, we're still migrating off of legacy IPC. So to introduce another migration at this point seems ambitious.
00:00 DANIEL: There's kind of varying opinions on this, obviously.
00:00 SHARON: Well, they're not here right now. So what are your opinions, if you want to share them.
00:00 DANIEL: I mean, it would be really nice if we could improve on this. I know that now that we‘re slowly getting C++20, thanks to Peter Kasting’s work. I think there will probably be some exploration around co-routines and if that‘s something that we could use to help us migrate to simpler patterns for async code. It is kind of a very open-ended question now because there’s also things like Rust that are up and coming, and figuring how to do async Rust and async in Chrome, in C++, and making that all mesh together is probably going to be a pretty complex problem.
00:00 SHARON: Probably.
00:00 DANIEL: Yeah.
00:00 SHARON: Probably.
00:00 DANIEL: Yeah.
00:00 SHARON: So kind of transitioning a bit to more security things, and also as it ties into callbacks and async, is when you bind a thing - because memory safety and use-after-free and whatnot are a major problem that we have from a security perspective, especially because C++ and all of that. So when it comes to passing around these things that are async, you don‘t know when they’ll be done, if you‘re passing in things that you’re calling from - like in the callbacks, how do you make sure that they‘re still around when you need them and that call doesn’t become either a crash, like null dereference, or worse, a use-after-free? Is this a big concern we have? How are we dealing with it?
00:00 DANIEL: Yeah. So if you‘re using Mojo, quote, unquote, “the normal way”, you’re probably safe-ish. So when I mean the normal way is, you have a class. It needs to make Mojo calls. And it owns the Mojo remote. And the way that works is if you make calls on the remote, but then your class is destroyed, it will kind of cancel any reply callbacks. You will never get them. So you don‘t have to worry about that case. And that’s kind of nice. But there‘s, obviously a lot of other ways for things to go wrong. In particular, if the lifetime of the class is tied to the lifetime of the Mojo message pipe, like, if it gets disconnected, you destroy this. That’s kind of an area that‘s a bit fraught with peril. We’ve had this problem with self-owned receivers. A self-owned receiver is basically a shorthand way of creating an implementation for handling Mojo messages that deletes itself as soon as the message pipe is disconnected. And at first glance, this kind of seems a very natural pattern. If I‘m disconnected, I don’t need to be there. Just delete this. But it becomes problematic if other people are holding pointers to you. We had this problem, I think, a lot with - so a common kind of scope - for IPCs between browser and renderer, a common kind of anchoring point is the RenderFrame(Host) or RenderFrame rate. And what would happen is we -
00:00 SHARON: What is a RenderFrame or RenderFrame(Host)?
00:00 DANIEL: Yeah. So it kind of corresponds to, basically, either the main frame or an iframe. And it‘s just kind of responsible for dealing with all the fun logic of navigating, loading the page, and if the page wants to do other stuff, figuring out how to get it to the code that actually knows how to do the extra stuff, like the capabilities thing. So a common problem we had was the RenderFrame host could be destroyed, like if you remove an iframe from the document. The RenderFrame(Host) could be destroyed. But what would happen is people would grant capabilities using interfaces, but these interfaces would be self-owned receivers. And what would happen is the self-owned receiver would have a raw pointer to the RenderFrame(Host), but it wouldn’t destroyed with the RenderFrame(Host) because it‘s a self-owned receiver. And the thing controlling its lifetime is whoever holds the other endpoint. In this case, that’s a renderer that might be malicious or compromised. And so without any way to guarantee that the RenderFrame(Host) will outlive the self-owned receiver, it becomes dangerous. We had a lot of use-after-free bugs from this, actually. And that‘s why we added something called Document Service. And if you’re writing web APIs and you need to implement IPCs, and your thing is kind of roughly scoped to the lifetime of the document, it‘s highly encouraged to use something like Document Service rather than a self-owned receiver. That way you don’t need to hold a raw pointer to RenderFrame(Host) yourself. We guarantee the lifetimes are more or less correct. Obviously, kind of with anything of this nature, if other people hold pointers to you, you still need to be sure that you‘re clearing them, or your ref counted or something. It’s hard to give a one-size-fits-all fix for this sort of thing. Document Service is kind of the closest we have. There's a couple other helpers along those lines. And if your code can fit within that framework, it will probably make your code a bit more robust against those kind of problems.
00:00 SHARON: It sounds like, yeah, avoiding ref counting, or strong ref counting, we want to generally do that because that's easy to get wrong. And probably just general good advice or good practices to not use a T*
to use a global pointer.
00:00 DANIEL: Well -
00:00 SHARON: raw_ptr
instead.
00:00 DANIEL: Ref counting has its place. But it‘s a bit tricky to use correctly. And in Chrome, we’ve traditionally tried to discourage it if it‘s not needed. And then, also, with the T*
thing, with the MiraclePtr and BackupRefPtr work, I think we’ve actually turned on some enforcement that you can't actually have T*
fields anymore.
00:00 SHARON: Oh, cool.
00:00 DANIEL: So that's an additional layer of safety, which is nice.
00:00 SHARON: Things that have changed since the first episode. Wow!
00:00 DANIEL: Yes. It‘s great. You can use raw_ptr
or raw_ref
. And you should be doing that where possible, just because that way, if you mess up, or you forget about an edge case, it turns into, hopefully, a mostly nonexploitable kind of stability bug, rather than an, oh my gosh. It’s a critical-severity security bug. We must ship a fix out ASAP.
00:00 SHARON: So that‘s how lifetimes can cause problems. So in the case of this - so it sounds like the bad thing that will happen in this case is a general memory safety, use-after-free problem. So there’s nothing necessarily Mojo-specific about what can go wrong in this case where the problems are being sync and async.
00:00 DANIEL: So yeah, it‘s not so much about async and sync but just remembering that the thing - like if you’re implementing an interface, the other thing calling into you, whether it‘s a remote process or not, may be malicious, especially if it’s from the renderer. We have to assume that the renderer is compromised. And that means it‘s better to try to structure things in a way that either Mojo will enforce invariants, or that impossible things can’t happen. So one common area where we have these sort of issues is maybe something will pass like two arrays of stuff. And I don‘t know - say instead of passing a bunch of pixels, it passes all the reds in one array, all the greens in one array, and all the blues in one array. And then it just assumes those are the same length. That’s not a safe assumption if it‘s coming from the renderer, so you would have to check that. But it would be better to structure a code in ways that didn’t require checking all these assumptions. So in this contrived case, it would be better to have a pixel type, and then have an array of pixels, because then you have to specify RGB. And it‘s guaranteed that you won’t have an array mismatch because you won‘t be passing multiples of them. So just stuff like that. It’s really hard to go over all the ways things can go wrong. We did try to do that. And I think the document is 20-plus pages. It‘s a doc of guidelines for IPCs, like what reviewers and reviewees could, in theory, look for. But it is massive. It’d be nice if it could be more compact, but I think that's kind of the nature of people can write whatever they want. And there are all sorts of creative ways to get into trouble with these sort of things.
00:00 SHARON: Yeah. As an IPC reviewer, when you look when someone is making a change, adding, removing - maybe not removing, but adding things, what are the first things you check for when you are reviewing a new or updated IPC?
00:00 DANIEL: So the first things I will look at are the CL description and the comments in the module. And if I can‘t really figure out what the change is about from there, if I have extra time on my hands, I will go look at the bug. I will go read any design docs that were linked and try to kind of reverse engineer. But in general, that is the first thing I look for because I want to understand what they want to do at a high level. There’s no point in trying to nitpick like things here and there in the implementation details if the operation that‘s being exposed is fundamentally unsafe. If someone’s writing a file system interface, and it provides the capability to read any file, and they want to pass that to the renderer, that is fundamentally unsafe. And there‘s no point in reviewing the implementation. So you want to review the overall high-level ideas, and make sure you understand those. That’s what I personally go for because sometimes I think it‘s very easy, if you’re writing a CL, to be, like, I know the context behind it. I‘m fixing X bug or fixing Y bug. But it’s easy to forget that someone else coming in reading it - the IPC reviewer is not going to know every feature like the back of their hands. And so giving them the context to be, like, oh, this is a fix for Y, and we need it because Z, really helps the review. And also having these comments in the mojom, can help document constraints, or what is this going to be used for, or how will it be used, what is it expected to do, if you implement it? If you call it with - if something is nullable, you can pass nothing for it. What does that mean? Is that just a I didn't feel like figuring out the test, kind of thing, or it actually has some significance? Like documenting those sort of things.
00:00 SHARON: Who would do something like that and not have figured out the tests first?
00:00 DANIEL: I have never done anything like that.
00:00 SHARON: Yeah.
00:00 DANIEL: Yeah. But once those kind of high-level things are more out of the way, then it‘s easier to review the rest of the CL in the context of that. But without that background context, it can be quite tricky to do IPC reviews sometimes. And the other thing I would say is I would encourage people to send out reviews to IPC Reviewer Center. I kind of understand that people don’t want the spam, like the people that are asking to review. But people, if they don‘t feel like they don’t need to review it, they can ignore the CL until it is ready to review. But sometimes it‘s useful to peek in and glance and be like, yeah, this is about the right shape. I have no concerns that require immediate action. Because what’s really unfortunate is if you‘re at the end of - I don’t know - a three-week review, and you‘re like, oh, you shouldn’t do it this way. You actually need to re-engineer this entire thing and hook it up this other different way over here. That‘s just not fun for anyone. It’s not fun for the reviewer to give that kind of feedback. And it's not fun to get that kind of feedback either.
00:00 SHARON: Yeah. I‘m sure we’ve all been on at least one end of this kind of interaction before, so for sure. So would you say IPC review is basically a security review for IPC? Or are you reviewing for additional stuff beyond that?
00:00 DANIEL: That‘s the minimal scope. Some people, depending on how they’re familiar with the area, may have ideas beyond that. But the kind of expected scope - it‘s expected the cover is, basically, does this IPC make sense to add? Is it safe? What are some additional things we need to consider if the sender or the receiver is malicious? And this extra layer of scrutiny is just because, historically, before we had IPC review, we actually had a lot of security bugs due to - it’s really easy to write this code because day to day, you‘re like, oh, I’m just working the same process. Everything is fine. I can assume that people won‘t violate my invariants. If I say this thing must always be called with at least one item in the array, I can assume there will always be one item in the array. But that all goes out the window if you have to assume a malicious attacker in the renderer. And so the IPC reviewer is usually just coming in more with a hostile mindset, like ways things could go wrong, basically. In that sense, very much a security review. But to be clear, it’s very different from the security review for launches. That‘s an entirely different thing. Sometimes there might be times when an IPC review is like, I don’t know. This seems a bit potentially dangerous. Has this gone through any sort of launch review yet? And at that point, you might punt it to a security review. It's not super common, though.
00:00 SHARON: OK.
00:00 DANIEL: Yeah.
00:00 SHARON: OK. Yeah. Lots of reviews of all kinds. And I think what you said about the reviewer not having all the context applies to lots of reviews. In a launch review, you have so many fields you need to get approved. All of these people don't have the same context as you. And the same is true for IPC reviews. So are there any cases where something about the actual design of the Mojo interface itself went wrong that caused a problem that you can tell us about?
00:00 DANIEL: I don't think I have a prepared example.
00:00 SHARON: That‘s fine. It’s cool.
00:00 DANIEL: We can edit one in in post-production.
00:00 SHARON: We can edit one in in post-production. So you're going to sort out an example very shortly.
00:00 DANIEL: Sure. Let's go with that.
00:00 SHARON: Yeah, let's go with that. And then moving - so best practices, any - when it comes to introducing new IPCs? So you mentioned getting review early, just a quick kind of sanity-check situation. Do you have any other tips for best reviews for best practices for IPC reviews?
00:00 DANIEL: Well, you could go read the 20-plus page IPC guidelines doc and try to memorize it. I don‘t recommend that, though. I would say, in general, it probably comes down just to several things. It’s better not to have stateful interfaces. And so what I mean by that is an interface where it‘s like, hey, you must call the init method before you do anything else, or else it will explode. We don’t want that because that means all your other methods have to check that init has been called. And otherwise, they'll explode. Depending on who your caller is, they may or may not be trustworthy, and that sort of thing. They kind of - sorry.
00:00 SHARON: Do we want a lot of Mojo calls to generally be idempotent, too?
00:00 DANIEL: They don‘t need to be idempotent, necessarily. But when it’s a very complex set of state transitions, that is where things can get into trouble. And obviously, there are some situations where this is unavoidable. And you‘ll just have to deal with it. But if you can avoid it, like if you have an init method, it might be worth it to create a factory interface. This is what I usually recommend. Obviously, it’s a bit more boilerplate, and it‘s not the nicest always. But it can also save some headache down the road. We definitely had some IPCs in the past where this was a problem, just because malicious code could not call the init method. Or it could call it twice and cause a use-after-free. So if you can factor these out into separate interfaces, that can be a very helpful thing. And the other thing is - and I mean, it really goes along with the first - try to structure things in a way that a malicious - if the other end, if they’re malicious, they can‘t violate the invariants. So the contrived pixel example, but also using things like struct traits, rather than having each thing be like, hey, let me validate all the data, or call a function to validate all the data, try to write struct traits if you have this sort of validation logic. And so that validation kind of happens centrally in one place. And everyone using the type, does it need to go, I don’t know - data is valid, or something. Because if someone forgets, then, boom, potential security bug. So yeah, that sort of thing. It's very general. But if we wanted to get into specifics, we would be here for a couple of days.
00:00 SHARON: OK, OK, a couple of days, all right. I think we might have lost people after at least the second day. I think we might.
00:00 DANIEL: Yeah.
00:00 SHARON: Yeah. And then moving on from that now, mostly a personal question, sometimes you have a function. It's a Mojo call. You click it, and there are no callers, like in Code Search, I mean. So why are there no callers? Why are they not shown? Does it mean I can just delete this interface? OpenURL, who needs that?
00:00 DANIEL: OK. Yeah. So if you want to find out what‘s calling a Mojo method, the most reliable way is to go to the mojom definition first, and then click - get the cross references from there. And the reason for this is because, I guess, it’s a quirk. I don't know what you want to call it.
00:00 SHARON: A feature.
00:00 DANIEL: A feature, yeah, we‘ll go with that. It sounds nicer. When we generate the C++ definitions for a mojom-like interface or struct, we actually generate two, what’s called, variants. So one is - I call it the regular variant. It uses STL types as std::string
, std::map
, all the fun things that you‘re normally - sorry - base::flat_map
. It doesn’t use std::map
. But you get the idea. It‘s all the kind of regular container types. And the other variant is what’s called the Blink variant. And Blink uses WTF::String
. It has its own hash map type, its own vector type, et cetera. And so if you have a Blink variant of an interface, when you pass arrays, it‘ll be passed as WTF::Vector
. And you’re probably like, why did we do this? Why are we hurting ourselves?
00:00 SHARON: [INAUDIBLE] like WTF Mojo.
00:00 DANIEL: Yeah, something like that. And the idea behind this is we already had to do a conversion in the past. The way things worked is we handled IPCs in the content layer, like in content render, or if you have Chrome render, or whatever. But then we had to pass the data across what‘s called the Blink public API. And the Blink public API would take all these STL types and marshal it into the WTF types. And that means copying a bunch of string data or copying a bunch of vectors or maps or whatever. And so it’s not great from an efficiency perspective. So we were like, well, we have to deserialize this data already for Mojo. So why don‘t we just turn it into the right type to begin with? So that’s kind of what that‘s all about. So the problem with this is, especially if you’re in Blink, or in Content Browser, or something, if you click on a Mojo - like on a call that you know is a Mojo call, it will find the callers to that variant. So if you're on the browser side, there might - sorry
00:00 SHARON: OK. When you say the Mojo file, there are - typically, there‘s the .mojom file, and there’s like .mojom.h. So you mean the first?
00:00 DANIEL: Yeah, I mean the first. Don't look at the generated files for Code Search.
00:00 SHARON: In general.
00:00 DANIEL: It‘s because of this feature with variants that sometimes you’ll kind of get zero callers. But actually, your caller‘s in content, but you’re handling it in Blink - yeah, it's a mess.
00:00 SHARON: Yeah, all right. Because I‘ve done that before, where I click a function. I don’t realize it‘s a Mojo call because it’s overriding something. And it‘s not immediately obvious. And you’re like, oh, no one‘s calling it. We should just remove it. But it’s something that's very long and very clearly important looking.
00:00 DANIEL: Yeah, yeah, yeah.
00:00 SHARON: And you‘re like, why are there no callers? Good tip! All right, I think that is all of our questions. If someone watched this and was like, wow, Mojo, this is so cool. Where can they go to learn more? We’ll link the long 20-page doc and some other documentation. But beyond that, what can people do if they're just like, I love me some IPC?
00:00 DANIEL: Well, I think one thing that‘s in pretty shabby shape perpetually is the documentation for Mojo. We have tried to sort of incrementally improve it. We did sit down and try to write docs for it a while back. But over time, I think people have questions. And we haven’t always had the time to go back and update the documentation to reflect the questions people are having. But if you do have questions, please, always ask them. There‘s a chromium-mojo mailing list for public questions. There’s a chrome-mojo one for internal questions. And there‘s also the Mojo channel on the Slack. If you have questions, if you’re hitting weird compile errors with struct traits, I know that‘s always kind of a big mess. Please, please, do ask questions. There’s usually someone lurking on there who's happy to help with -
00:00 SHARON: They're all very helpful.
00:00 DANIEL: But don‘t be silent. Because if you’re silent, we don‘t know things are a problem. And if we don’t know it‘s a problem, it’s kind of hard to fix. But in general, we do try. Reach out. Mojo is not supposed to be intentionally hard to use. And if you do find that‘s the case, please, ask us, because people who work on Mojo don’t always understand the tricky parts. They‘re like, oh, this all make sense. But they already have that entire framework in their mind. Whereas, someone kind of coming into, it’s kind of like, this makes no sense. This is dumb. We should - why doesn‘t it work like X? And then we might change it to work like X, or we might update the documentation to be like, it can’t work like X because some reason. And that's just helpful for everyone in the long run.
00:00 SHARON: I mean, as people often say, if you‘re new, you have perspective, which is you are seeing this. You’re not just used to how it works, including the good and the bad parts. So yeah, it's a good time to ask questions. All right, well, that sounds great. Thank you very much, Daniel. Thank you for being here on the show. And we will see you all -
00:00 DANIEL: Thank you!
00:00 SHARON: next time. Cool, cool. We're relatively centered. No.