Instrumenting Thread Stalling in Ruby Applications

In my previous post about how IO-bound Rails applications really are, I pointed at a common pitfall, how CPU starvation can look like slow IOs.

start = Time.now
database_connection.execute("SELECT ...")
query_duration = (Time.now - start) * 1000.0
puts "Query took: #{query_duration.round(2)}ms"

In the above example, the instrumentation tells you how long it took for the database to answer your query, but may also include the time needed for the Ruby thread to re-acquire the GVL, or perhaps the Ruby GC to run, or even the operating system’s scheduler to resume the process.

Thankfully, in recent years Ruby added some new APIs that help measure these things.

GC.total_time

Database queries and other IOs can often result in lots of allocations. For instance, if you select 100 database rows, with a dozen columns each, you can probably expect a thousand or more allocations and any of these might trigger a GC run inside the code block you are timing.

As such it can be a good idea to keep an eye on how much time Ruby is spending in GC. To help with that, back in 2021 I resurrected an old feature request on the Ruby bug tracker and convinced Koichi Sadasa to implement the new GC.total_time API.

This accessor is a monotonic counter, that represents the number of nanoseconds spent in the GC. So to tell how much time a particular block of code spent in GC, you can do a simple subtraction:

def time_in_gc
  before = GC.total_time
  yield
  diff_ms = (GC.total_time - before) / 1_000_000.0

  puts "gc_time: #{diff_ms.round(2)}ms"
end

time_in_gc do
  2_000_000.times { Object.new }
end

gc_time: 24.18ms

Now of course, if you are running a multi-threaded application, you can’t just subtract the time spent in GC from the measured IO time, because another thread might be responsible for it. But it’s still a good idea to instrument it and display it next to the IO duration.

That’s why starting from Rails 7.2, I added this measurement into Rails instrumentation API. Every ActiveSupport::Notifications event now has an associated gc_time, and Rails request logs include the overall time spent in GC.

GVL Instrumentation API

Even more common than GC, is GVL contention. If you configured your application to use too many threads, it can cause long delays for a thread to resume after finishing some IOs.

That’s why in Ruby 3.2 I added a new C API to allow instrumenting the GVL.

This is quite a low-level API, and you need a C extension to integrate with it, but I wrote gvltools for that, and John Hawthorn wrote the gvl_timing gem, and there’s also gvl-tracing from Ivo Anjo.

Here’s how gvltools can be used to distinguish actual IO time from GVL wait time:

require "bundler/inline"

gemfile do
  gem "bigdecimal" # for trilogy
  gem "trilogy"
  gem "gvltools"
end

GVLTools::LocalTimer.enable

def measure_time
  realtime_start = Process.clock_gettime(Process::CLOCK_MONOTONIC, :float_millisecond)
  gvl_time_start = GVLTools::LocalTimer.monotonic_time
  yield

  realtime = Process.clock_gettime(Process::CLOCK_MONOTONIC, :float_millisecond) - realtime_start
  gvl_time = GVLTools::LocalTimer.monotonic_time - gvl_time_start
  gvl_time_ms = gvl_time / 1_000_000.0
  io_time = realtime - gvl_time_ms
  puts "io: #{io_time.round(1)}ms, gvl_wait: #{gvl_time_ms.round(2)}ms"
end

trilogy = Trilogy.new

# Measure a first time with just the main thread
measure_time do
  trilogy.query("SELECT 1")
end

def fibonacci( n )
  return  n  if ( 0..1 ).include? n
  ( fibonacci( n - 1 ) + fibonacci( n - 2 ) )
end

# Spawn 5 CPU-heavy threads
threads = 5.times.map do
  Thread.new do
    loop do
      fibonacci(25)
    end
  end
end

# Measure again with the background threads
measure_time do
  trilogy.query("SELECT 1")
end

If you run this example, you can see that on the first measurement, the GVL wait time is pretty much zero, but on the second, it adds a massive half-a-second overhead:

realtime: 0.22ms, gvl_wait: 0.0ms, io: 0.2ms
realtime: 549.29ms, gvl_wait: 549.22ms, io: 0.1ms

The downside of this API however, is that it adds some overhead to Ruby’s thread scheduler. I never really managed to come up with a precise figure of how much overhead, perhaps it’s negligible, but until then, it’s a bit hard to justify integrating it as a Rails default.

That being said, recently Yuki Nishijima from Speedshop open-sourced a middleware to hook this new instrumentation API into various APM services, so it might progressively see broader usage.

Operating System Scheduler

The one remaining thing that could cause IO operations to appear longer than they really are is the operating scheduler. Unless you are running your application on dedicated hardware, and spawn no more than one Ruby process per core, then it can happen that the operating system doesn’t immediately resume a process after it is done blocking on some IO.

I’m unfortunately not aware of a really good way to measure this.

The best I’ve found is /proc/<pid>/schedstat on Linux:

# cat /proc/1/schedstat
40933713 1717706 178

The second number in that list is the amount of nanosecond the given process spent in the “runqueue”, in other words, waiting to be assigned a CPU core so it can resume work.

But reading /proc around every IO would be a bit heavy-handed, so it’s not something I’ve ever integrated into an application monitoring. Instead, we monitor it more globally on a per-machine basis as an indication that we’re running too many processes in our containers.

Conclusion

So while it’s very hard to measure exactly how long a particular IO operation took, there are a few general metrics that can help you have a better pitcture of how much “thread stalling” your application is experiencing.

Also, I don’t keep a close eye on the various application performance monitoring services out there, but my hope when adding these new APIs was that they would eventually integrate it in their offering. Not sure if some did already.