What's wrong with the JSON gem API?
As I mentioned at the start of my Optimizing Ruby’s JSON series of posts, performance isn’t why I candidated to be the new gem’s maintainer.
The actual reason is that the gem has many APIs that I think aren’t very good, and some that are outright dangerous.
As a gem user, it’s easy to be annoyed at deprecations and breaking changes. It’s noisy and creates extra work, so I entirely understand that people may suffer from deprecation fatigue. But while it occasionally happens to run into mostly cosmetic deprecations that aren’t really worth the churn they cause (and that annoys me a lot too), most of the time there’s a good reason for them, it just is very rarely conveyed to the users, and even more rarely discussed, so let’s do that for once.
So I’d like to go over some of the API changes and deprecations I already implemented or will likely implement soon, given it’s a good occasion to explain why the change is valuable, and to talk about API design more broadly.
Dealing With Deprecations in Ruby
But before I delve into deprecated API, I’d like to mention how to effectively deal with deprecations in modern Ruby.
Since Ruby 2.7, warning messages emitted with Kernel#warn
are categorized, and one of the available categories is :deprecated
.
By default, deprecation warnings are silenced; to display them, you must enable the :deprecated
category like so:
Warning[:deprecated] = true
It is very highly recommended to do so in your test suite, so much so that Rails and Minitest will do it by default.
However, if you are using RSpec, you’ll have to do it yourself in your spec_helper.rb
file, because we’ve tried to get
RSpec to do it too for over four years now, but without success.
But I’m still hopeful it will eventually happen.
Another useful thing to know about Ruby’s Kernel#warn
method is that under the hood, it calls the Warning.warn
method,
allowing you to redefine it and customize its behavior.
For instance, you could turn warnings into errors like this:
module Warning
def warn(message, ...)
raise message
end
end
Doing so both ensures warnings aren’t missed, and helps tracking them down as you’ll get an exception with a full backtrace rather than a warning that points at a single call-site that may not necessarily help you find the problem.
This is a pattern I use in most of my own projects, and that I also included into Rails’ own test suite.
For larger projects, where being deprecation-free all the time may be complicated, there’s also the more sophisticated deprecation_toolkit
gem.
The create_additions Option
Now, let’s start with the API that convinced me to request maintainership.
Do you know the difference between JSON.load
and JSON.parse
?
There’s more than one, but the main difference is that it has a different set of options enabled by default, and notably
one that is a massive footgun: create_additions: true
.
This option is so bad that Rubocop’s default set of rules bans JSON.load
outright for security reasons,
and it has been involved in more than one security vulnerabilities.
Let’s dig into what it does:
require "json"
class Point
class << self
def json_create(data)
new(data["x"], data["y"])
end
end
def initialize(x, y)
@x = x
@y = y
end
end
document = <<~'JSON'
{
"json_class": "Point",
"x": 123.456,
"y": 789.321
}
JSON
p JSON.parse(document)
# => {"json_class" => "Point", "x" => 123.456, "y" => 789.321}
p JSON.load(document)
# => #<Point:0x00000001007f6d08 @x=123.456, @y=789.321>
So what the create_additions: true
parsing option does is that when it notices an object with the special key "json_class"
,
It resolves the constant and calls #json_create
on it with the object.
By itself, this isn’t really a security vulnerability, as only classes with a .json_create
method can be instantiated this way.
But if you’ve been using Ruby for a long time, this may remind you of similar issues with gems like YAML
where similar capabilities
were exploited.
That’s the problem with these sorts of duck-typed APIs: they are way too global.
You can have a piece of code using JSON.load
that is perfectly safe on its own, but then if it’s embedded in an application
that also loads some other piece of code that defines some .json_create
methods you weren’t expecting, you may end up with
an unforeseen vulnerability.
But even if you don’t define any json_create
methods, the gem will always define one on String
:
>> require "json"
>> JSON.load('{"json_class": "String", "raw": [112, 119, 110, 101, 100]}')
=> "pwned"
Here again, you probably need to find some specific circumstances to exploit that, but you can probably see how this trick can be used to bypass a validation check of some sort.
So what do I plan to do about it? Several things.
First, I deprecated the implicit create_additions: true
option. If you use JSON.load
for that feature, a deprecation
warning will be emitted, asking to use JSON.unsafe_load
instead:
require "json"
Warning[:deprecated] = true
JSON.load('{"json_class": "String", "raw": [112, 119, 110, 101, 100]}')
# /tmp/j.rb:3: warning: JSON.load implicit support for `create_additions: true`
# is deprecated and will be removed in 3.0,
# use JSON.unsafe_load or explicitly pass `create_additions: true`
That being said, considering how wonky this feature is, I’m also considering extracting it into another gem.
This used to be impossible, as it was baked deep into the both the C and the Java parsers, but I recently refactored it to be pure Ruby code using a callback exposed by the parsers.
Now you can provide a Proc
to JSON.load
, the parser will invoke it for every parsed value, allowing you to substitute
a value by another:
cb = ->(obj) do
case obj
when String
obj.upcase
else
obj
end
end
p JSON.load('["a", {"b": 1}]', cb)
# => ["A", {"B" => 1}]
Prior to that change, JSON.load
already accepted a Proc, but its return value was ignored.
The nice thing is that this callback also now serves as a much safer and flexible way to handle the serialization of rich objects. For instance, you could implement something like this:
types = {
"range" => MyRangeType
}
cb = ->(obj) do
case obj
when Hash
if type = types[obj["__type"]]
type.load(obj)
else
obj
end
else
obj
end
end
p JSON.load('["a", {"b": 1}]', cb)
# => ["A", {"B" => 1}]
While this requires more code from the user, it gives much tighter control over the deserialization,
but more importantly, it isn’t global anymore.
If a library uses this feature to deserialize trusted data, its callback is never going to be invoked by another library
like it’s the case with the old Class#json_create
API.
The obvious solution would have been to follow the same route as YAML
, with its permitted_classes
argument, but
in my opinion, it wouldn’t have addressed the root of the problem, and it makes for a very unpleasant API to use.
Instead, I believe this Proc interface provides the same functionality as before, but in a way that is both more flexible and safer.
I think this is a clear case for deprecation, given it is very rarely needed, has security implications, and surprises users.
Parsing of Duplicate Keys
Another behavior of the parser I recently deprecated is the treatment of duplicate keys. Consider the following code:
p JSON.parse('{"a": 1, "a": 2}')["a"]
What do you think it should return? You could argue that the first key or the last key should win, or that this should result in a parse error.
Unfortunately, JSON is a bit of a “post-specified” format, as in it started as an extremely simple document. All it says about “objects” is:
An object is an unordered set of name/value pairs. An object begins with
{
and ends with}
. Each name is followed by:
and the name/value pairs are separated by,
.
That’s it, that’s the extent of the specification, as you can see, there is no mention of what a parser should do if it encounters a duplicate key.
Later on, various standardisation bodies tried to specify JSON based on the implementations out there.
Hence, we now have IETF’s STD 90, also known as RFC 8259, which states:
Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.
In other words, it acknowledges most implementations return the last seen pair, but doesn’t prescribe any particular behavior.
There’s also the ECMA-404 standard
The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.
Which is pretty much the specification language equivalent of: 🤷♂️.
The problem with under-specified formats is that they can sometimes be exploited, the classic example being HTTP request smuggling.
And while it wasn’t an exploitation per se, a security issue happened to Hacker One, in part because of that behavior. Technically, the bug was on the JSON generation side, but if the JSON’s gem parser didn’t silently accept duplicated keys, they would have caught it early in development.
That’s why starting from version 2.13.0
, JSON.parse
now accepts a new allow_duplicate_key:
keyword argument,
and if not explicitly allowed, a deprecation warning is emitted if a duplicate key is encountered:
require "json"
Warning[:deprecated] = true
p JSON.parse('{"a": 1, "a": 2}')
# => {"a" => 2}
# /tmp/j.rb:4: warning: detected duplicate key "a" in JSON object.
# This will raise an error in json 3.0 unless enabled via `allow_duplicate_key: true`
#at line 1 column 1
As mentioned in the warning message, I plan to change the default behavior to be an error in the next major version, but of course it will always be possible to explicitly allow for duplicate keys, for the rare cases where it’s needed.
Here again, I think this deprecation is justified because duplicated keys are rare, but also almost always a mistake, hence I expect few people to need to change anything, and the ones who do will likely learn about a previously unnoticed mistake in their application.
The to_json And to_s Methods
Before you gasp in horror, don’t worry, I don’t plan on deprecating the Object#to_json
method, ever.
It is way too widespread for this to ever be acceptable.
But that doesn’t mean this API is good, nor that nothing should be done about it.
At the center of the json
gem API, there’s the notion that objects can define themselves how they should be
serialized into JSON by responding to the to_json
method.
At first sight, it seems like a perfectly fine API, it’s an interface that objects can implement, fairly classic object-oriented design.
Here’s an example that changes how Time
objects are serialized.
By default, json
will call #to_s
on objects it doesn’t know how to handle:
>> puts JSON.generate({ created_at: Time.now })
{"created_at":"2025-08-02 13:03:32 +0200"}
But we can instruct it to instead serialize Time
using the ISO8601 / RFC 3339
format:
class Time
def to_json(...)
iso8601(3).to_json(...)
end
end
>> puts JSON.generate({ created_at: Time.now })
{"created_at":"2025-08-02T13:05:04.160+02:00"}
This seems all well and good, but the problem, like for the .json_create
method, is that this is a global behavior.
An application may very well need to serialize dates in different ways in different contexts.
Worse, in the context of a library, say an API client that needs to serialize Time
in a specific way, it’s not really
possible to use this API, you can’t assume it’s acceptable to change such a global behavior, given you know nothing about the application in which you’ll run.
So to me, there are two problems here. First, using #to_s
as a fallback works for a few types, like date, but it is really not helpful
for the overwhelming majority of other objects:
>> puts JSON.generate(Object.new)
"#<Object:0x000000011ce214a0>"
I really can’t think of a situation in which this is the behavior that you want. If JSON.generate
ends up calling to_s
on an object, I’m willing to bet that in 99% of the time, the developer didn’t intend for that object to be serialized, or forgot to implement a #to_json
on it.
Either way, it would be way more useful to raise an error, and requires that an explicit method to serialize that unknown object be provided.
The second is that it should be possible to customize a given type serialization locally, instead of globally.
In addition, returning a String as a JSON fragment is also not great, because it means recursively calling generators, and allows to generate invalid documents:
class Broken
def to_json
to_s
end
end
>> Broken.new.to_json
=> "#<Broken:0x0000000123054050>"
>> JSON.parse(Broken.new.to_json)
#> JSON::ParserError: unexpected character: '#<Broken:0x000000011c9377a0>'
# > at line 1 column 1
That’s the problems the new JSON::Coder
API is meant to solve.
By default, JSON::Coder
only accepts to serialize types that have a direct JSON equivalent, so Hash
, Array
, String
/ Symbol
,
Integer
, Float
, true
, false
and nil
. Any type that doesn’t have a direct JSON equivalent produces an error:
>> MY_JSON = JSON::Coder.new
>> MY_JSON.dump({a: 1})
=> "{\"a\":1}"
>> MY_JSON.dump({a: Time.new})
#> JSON::GeneratorError: Time not allowed in JSON
But it does allow you to provide a Proc
to define the serialization of all other types:
MY_JSON = JSON::Coder.new do |obj|
case obj
when Time
obj.iso8601(3)
else
obj # return `obj` to fail serialization
end
end
>> MY_JSON.dump({a: Time.new})
=> "{\"a\":\"2025-08-02T14:03:15.091+02:00\"}"
Contrary to the #to_json
method, here the Proc is expected to return a JSON primitive object, so you don’t have to
concern yourself with JSON escaping rules and such, which is much safer.
But if for some reason you do need to, you still can using JSON::Fragment
:
MY_JSON = JSON::Coder.new do |obj|
case obj
when SomeRecord
JSON::Fragment.new(obj.json_blob)
else
obj # return `obj` to fail serialization
end
end
With this new API, it’s now much easier for a gem to customize JSON generation in a local way.
Now, as I said before, I absolutely don’t plan to deprecate #to_json
, nor even the behavior that calls #to_s
on unknown objects.
Even though I think it’s a bad API, and that its replacement is way superior, the #to_json
method has been at the center of the json
gem from the beginning and would require a massive amount of work from the community to migrate out of.
The decision to deprecate an API should always weigh the benefits against the costs. Here, the cost is so massive that it is unimaginable for me to even consider it.
load_default_options / dump_default_options
Another set of APIs I’ve marked as deprecated are the various _default_options
accessors.
>> puts JSON.dump("http://example.com")
"http://example.com"
>> JSON.dump_default_options[:script_safe] = true
>> puts JSON.dump("http://example.com")
"http:\/\/example.com"
The concept is simple: you can globally change the default options received by certain methods.
At first sight, this might seem like a convenience, it allows you to set some option without having to pass it around at potentially dozens of different call sites.
But just like #to_json
and other APIs, this change applies to the entire application, including some dependencies that may
not expect standard JSON methods to behave differently.
And that’s not a hypothetical, I personally ran into a gem that was using JSON to fingerprint some object graphs, e.g.
def fingerpint
Digest::SHA1.hexdigest(JSON.dump(some_object_graph))
end
That fingerprinting method was well tested in the gem, and was working well in a few dozen applications until one
day someone reported a bug in the gem. After some investigation, I figured the host application in question
had modified JSON.dump_default_options
, causing the fingerprints to be different.
If you think about it, these sorts of global settings aren’t very different from monkey patching:
JSON.singleton_class.prepend(Module.new {
def dump(obj, proc = nil, opts = {})
opts = opts.merge(script_safe: true)
super
end
})
The overwhelming majority of Rubyists are very aware of the potential pitfalls of monkey patching, and some absolutely loathe it, yet, these sorts of global configuration APIs don’t get frowned upon as much for some reason.
In some cases, they make sense. e.g. if the configuration is for an application, or a framework (a framework essentially being an application skeleton), there’s not really a need for local configuration, and a global one is simpler and easier to reason about. But in a library, that may in turn be used by multiple other libraries with different configuration needs, they’re a problem.
Amusingly, this sort of API was one of the justifications for the currently experimental namespace feature in Ruby 3.5.0dev,
which shows the json
gem is not the only one with this problem.
Here again, a better solution is the JSON::Coder
API, if you want to centralize your JSON generation configuration across
your codebase, you can allocate a singleton with your desired options:
module MyLibrary
JSON_CODER = JSON::Coder.new(script_safe: true)
def do_things
JSON_CODER.dump(...)
end
end
As a library author, you can even allow your users to substitute the configuration for one of their choosing:
module MyLibrary
class << self
attr_accessor :json_coder
end
@json_coder = JSON::Coder.new(script_safe: true)
def do_things
MyLibrary.json_coder.dump(...)
end
end
Thankfully, from what I can see of the gem’s usage, these API were very rarely used, so while they’re not a major hindrance, I figured the cost vs benefit is positive. And if someone really needs to set an option globally, they can monkey-patch JSON, the effect is the same, and at least it’s more honest.
Conclusion
As mentioned previously, the decision to deprecate shouldn’t be taken lightly. It’s important to have empathy for the users who will have to deal with the fallout, and there are a few things more annoying than cosmetic deprecations.
Yet it is also important to recognize when an API is error-prone or even outright dangerous, and deprecations are sometimes a necessary evil to correct course.
Also, as you probably noticed, a common theme in most of the APIs I don’t like in the json
gem, is global behavior and configuration.
I’m not certain why that is. A part of it might be that as Rubyists we value simplicity and conciseness, and that historically
the community has built its ethos as a reaction against overly verbose and ceremonial enterprise Java APIs, with their dependency injection frameworks and whatnot.
A bit of global state or behavior can sometimes bring a lot of simplicity, but it’s a very sharp tool that needs to be handled with extreme care.