My team and I are building a GraphQL service that wraps multiple RESTful JSON services. The GraphQL server connects to backend services such as Zendesk, Salesforce, and even Shopify itself.
Our use case involves returning results from these backend services all from the same GraphQL query. When the GraphQL server goes out to query all of these backend services, each backend service can take multiple seconds to respond. This is a terrible experience if queries take many seconds to complete.
Since we’re running the GraphQL server in Ruby, we don’t get provided the nice asynchronous IO that would come with the NodeJS version of GraphQL. Because of this, the GraphQL resolvers run serially instead of in parallel – thus a GraphQL query to five backend services which take one second each to fetch data from will result in the query taking five seconds to run.
For our use case, having a GraphQL query that takes five seconds is a bad experience. What we would prefer is 2 seconds or less. This means performing some optimizations when GraphQL goes to do the HTTP requests to the backend services. Our idea is to parallelize those HTTP requests.
To parallelize those HTTP requests we took a look at non-blocking HTTP libraries, different GraphQL resolvers, and Ruby concurrency primitives.
Knowing that running the HTTP requests in parallel is the direction to explore, we first took a look at the Ruby library Typhoeus. Typhoeus offers a simple abstraction for performing parallel HTTP requests by wrapping the C library libcurl. Below is one of the many possible ways to use Typhoeus.
After playing around with Typheous, we quickly found out that it wasn’t going to work without extending the GraphQL Ruby library. It became clear that it was nontrivial to wrap a GraphQL resolver’s life cycle with a Hydra from Typhoeus. A Hydra basically being a Future that runs multiple HTTP requests in parallel and returns when all requests are complete.
We also took a look at the GraphQL Ruby’s lazy execution features. We had a hope that the lazy execution would automatically optimize by running resolvers in parallel. It didn’t. Oh well.
We also tried a perverted version of lazy execution. I can’t remember why or how we came up with this method, but it was obviously overcomplicated for no good reason and didn’t work 😆
Threads and Futures
We looked back and understood the shortcomings of the earlier methods – namely, we had to find a concurrency method that would allow us to do the HTTP requests in the background without blocking the main thread until it needed the data. Based on this understanding we took a look at some Ruby concurrency primitives – both Futures (from the Concurrent Ruby library), and Threads.
I highly recommend using higher-order concurrency primitives such as Futures, and the like because of their well-defined and simple APIs, but for hastily hacking something together to see if it would work I experimented with Threads.
My teammate ended up figuring out a working example of Futures faster than I could hack my threads example together. (I’m glad they did, since we’ll see why next.) Here is a simple use of Futures in GraphQL:
It’s not clear at first, but according to the GraphQL Ruby docs, any GraphQL resolver can return the data or can return something that can then return the data. In the code example above, we use the latter by returning a
Concurrent::Future in each resolver, and having the
lazy_resolve(Concurrent::Future, :value!) in the GraphQL schema. This means that when a resolver returns a
lazy_resolve part tells GraphQL Ruby to call
:value! on the future when it really needs the data.
What does all of this mean? When GraphQL goes to fulfill a query, all the resolvers involved with the query quickly spawn Futures that start executing in the background. GraphQL then moves to the phase where it builds the result. Since it now needs the data from the Futures, it calls the potentially blocking operation
value! on each Future.
The beautiful thing here is that we don’t have to worry about whether the Futures have finished fetching their data yet. This is because of the powerful contract we get with using Futures – the call to
value! (or even just
value) will block until the data is available.
We ended up settling on the last design – utilizing Futures to allow the main thread to put as much asynchronous work into background.
As seen through our thought process, all that we needed was to find a way to start execution of a long-running HTTP request, and give back control to the main thread as fast as possible. It was quite clear throughout the early ideas of utilizing concurrent HTTP request libraries (Typhoeus) that we were on the right track, but weren’t understanding the problem perfectly.
Part of that was not understanding the GraphQL Ruby library. Part of it was also being fuzzy on our concurrent primitives and libraries. Once we had taken a look at GraphQL Ruby’s lazy loading features, it became clear to us that we needed to kick-off the HTTP request and immediately give back control to the GraphQL Ruby library. Once we understood this, the solution became clear and we became confident after some prototypes that used Futures.
I enjoyed the problem solving process we went through, as well as this writing that resulted from it. The problem solving process ended up teaching the both of us some valuable engineering lessons about collaborative, up-front prototyping and design since we couldn’t have achieved this outcome on our own. Additionally, writing about this success can help others with our direct problem, not to mention learning about the different techniques that we met along the way.