Problem #2: copying the scene is expensive
The second problem with the
approach is that it requires serializing large parts of the document before sending them to the plugin.
It turns out that people can create very, very large documents in Figma to the point of hitting memory limits. For example, on Microsoft’s design systems file (which we spent a month optimizing last year), it took 14 seconds just to serialize the document and send it to the plugin, before the plugin could even run. Given that most plugins are going to involve quick actions like “swap two items in my selection”, this would make plugins unusable.
Loading the data incrementally or lazily also isn’t really an option, because:
- It would involve months of re-architecting the core product.
- Any API that may need to wait on a piece of data that hasn’t arrived yet will now be asynchronous.
In summary, because Figma documents can have really large amount of data with a lot of interdependencies, the
wasn’t going to work for us.
With the
approach having been ruled out, we had to backtrack in our research.
We went back to the drawing board and spent two long weeks discussing a variety of approaches. As the simple solution didn’t work out, we had to give serious consideration to more exotic ideas. There were many — too many to fill the margins of this blog post.
But most approaches had one or more major disqualifying flaws:
- Have an API that would be too difficult to use (e.g. accessing the document using a REST API or GraphQL like method)
- Depends on browser features that browser vendors have removed or are trying to (e.g. synchronous xhr service worker, shared buffers)
- Requires significant research work or re-architecting of our application that could take months before we can even validate that it can work (e.g. load a copy of Figma in an iframe sync via CRDTs, hack green threads into JavaScript with generators by cross-compiling)
At the end of the day, we concluded that we had to find a way to create a model where plugins can directly manipulate the document. Writing a plugin should feel like a designer automating their actions. So we knew we’d have to allow plugins to run on the main thread.
Implications of running on the main thread
Before we dive into Attempt #2, we need to take a step back and re-examine what it means to allow plugins to run on the main thread. After all, we didn’t consider it at first because we knew that it could be dangerous. Running on the main thread sounds an awful lot like eval(UNSAFE_CODE)
.
The benefits of running on the main thread are that plugins can:
- Directly edit the document rather than a copy of it, eliminating loading time issues.
- Run our complex component updating and constraints logic without needing to have two copies of that code.
- Make synchronous API calls in situations where you’d expect a synchronous API. There would be no confusion with loading or flushing updates.
- Be written in a more intuitive way: plugins are just automating actions that the user would otherwise do manually using our UI.
However, now we have these problems:
- Plugins can hang, and there is no way to interrupt a plugin.
- Plugins can make network requests as figma.com.
- Plugins can access and modify global state. This includes modifying our UI, creating dependencies on internal application state outside the API, or doing downright malicious things like changing the value of
({}).__proto__
which poisons every new and existing JavaScript object.
We decided that we could drop the requirement for (1). When plugins freeze, it affects the perceived stability of Figma. However, our plugin model works such that they are only ever run on explicit user action. By changing the UI when a plugin runs, freezes would always be attributed to the plugin. It also means that it is not possible for a plugin to “break” a document.
What does it mean for eval
to be dangerous?
To deal with the issue of plugins being able to make network requests and access global state, we must first understand exactly what it means that “eval arbitrary JavaScript code is dangerous”.
If a variant of JavaScript, let’s call it SimpleScript, had only the ability to do arithmetic such 7 * 24 * 60 * 60
, it would be quite safe to eval
.
You can add some features to SimpleScript like variable assignment and if statements to make it more like a programming language, and it would still be very safe. At the end of the day, it still essentially boils down to doing arithmetic. Add function evaluation, and now you have lambda calculus and Turing completeness.
In other words, JavaScript doesn’t have to be dangerous. In its most reductionist form, it’s merely an extended way of doing arithmetics. What is dangerous is when it has access to input & output. This includes network access, DOM access, etc. It’s Browser APIs that are dangerous.
And APIs are all global variables. So hide the global variables!
Hiding the global variables
Now, hiding the global variables sounds good in theory, but it’s difficult to create secure implementations by merely “hiding” them. You might consider, for example, removing all properties on the window
object, or setting them to null
, but the code could still get access to global values such as ({}).constructor
. It would be very challenging to find all the possible ways in which some global value might leak.
Rather, we need some stronger form of sandboxing where those global values never existed in the first place.
In other words, JavaScript doesn’t have to be dangerous.
Consider the previous example of a hypothetical SimpleScript that only supports arithmetic. It’s a straightforward CS 101 exercise to write an arithmetic evaluation program. In any reasonable implementation of this program, SimpleScript would simply be unable to do anything other than arithmetic.
Now, expand SimpleScript to support more language features until it becomes JavaScript, and this program is called an interpreter, which is how JavaScript, a dynamic interpreted language, is run.
Implementing JavaScript is too much work for a small startup like ours. Instead, to validate this approach, we took Duktape, a lightweight JavaScript interpreter written in C and compiled it to WebAssembly.
To confirm that it works, we ran test262 on it, the standard JavaScript test suite. It passes all ES5 tests except for a few unimportant test failures. To run plugin code with Duktape, we would call the eval
function of the compiled interpreter.
What are the properties of this approach?
- This interpreter runs in the main thread, which means we can create a main-thread based API.
- It’s secure in a way that’s easy to reason about. Duktape does not support any browser APIs — and that’s a feature! Furthermore, it runs as WebAssembly which itself is a sandboxed environment that has no access to browser APIs. In other words, plugin code can communicate with the outside world only through explicit whitelisted APIs by default.
- It’s slower than regular JavaScript since this interpreter is not a JIT, but that’s ok.
- It requires the browser to compile a medium-size WASM binary, which has some cost.
- Browser debugging tools don’t work by default, but we spent a day implementing a console for the interpreter to validate that it’d be at least possible to debug plugins.
- Duktape only supports ES5, but it’s already common practice in the web community to cross-compile newer JavaScript versions using tools such as Babel.
(Aside: a few months later, Fabrice Bellard released QuickJS which supports ES6 natively.)
Now, compiling a JavaScript interpreter! Depending on your inclinations or aesthetics as a programmer, you might either think:
THIS IS AWESOME! ?
or
…really? A JavaScript engine in a browser that already has a JavaScript engine? ?. What next, an operating system in a browser?
And some amount of suspicion is healthy! It is best to avoid re-implementing the browser unless we absolutely have to. We already spent a lot of effort implementing an entire rendering system. It was necessary for performance and cross-browser support and are glad we did it, but we still try to not re-invent the wheel.
This is not the approach we ended up going with. There’s an even better approach. However, it was important to cover as a step towards understanding our final sandboxing model which is more complicated.
While we had a promising approach compiling a JS interpreter, there was one more tool to look at. We found a technology called the Realms shim created by the folks at Agoric.
This technology describes creating a sandbox and supporting plugins as a potential use case. A promising description! The Realms API looks roughly like this: