Can Rust prevent logic errors?

Comment on: HN Reddit

I'm a Rust fanboienthusiast and as a Rust enthusiast I often talk about things that make Rust appealing to me. Sometimes these conversations revolve around reliability and correctness and then I usually mention how I feel Rust helps me write software with fewer bugs, which usually is brushed off as something improbable. My experience tells me otherwise, so facing a lack of research, I'd like to at least offer my point of view in this context.

Let's start with a definition of a software bug:

A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways

When coding in Rust you quickly notice that whole categories of bugs are impossible to introduce in (safe) Rust:

Additionaly Rust requires explicitly marking variables as mutable and it distinguishes between mutable and immutable borrowed values, so it's typically harder to introduce problems caused by unexpected mutations.

In combination all of these features, and more, let you write applications that almost never crash. Or in other words they have no runtime exceptions at all, except maybe things like running out of memory.

I spent a large part of my career writing Ruby and runtime exceptions are not only common, but in many big applications they become practically impossible to deal with. Pretty much every big application I've seen had an error tracker full of errors. Each single exception in a tracker is of course fixable, but when you have hundreds of them you need a dedicated effort to go to "inbox zero".

The problem is, bugs are not equal. Some bugs don't have too much impact on customer experience. A common thing happening in Rails apps is not handling exceptions properly that leads to a 500 error instead of a 404 error. Is it a big problem? Not really, a customer wouldn't see what they expected anyway, so not much harm done. It's similar in many other cases. Maybe it's not a big deal, maybe it happens very rarely, maybe it's a bot hitting a non-existent page. Whatever the reason is, if a bug doesn't have much impact on user experience, it's often ignored. Maybe a ticket is opened to fix it, but then it often lands on the bottom of a never ending backlog.

At some point it turns out that there are too many errors in an error tracker and fixing them would require considerable effort. But there is no time, and besides, those are minor problems anyway. When a new error shows up in the tracker, you don't deal with it right away, you first check if it's something new or if it's even something actually problematic. You can't rely on an error tracker anymore.

It's not a hypothetical problem, too. I've seen multiple outages in my life that could have been detected much sooner if only error tracker notifications meant anything. If you have hundreds of errors, happening thousands of times a day, you can't really turn on notifications for every single error. You have to periodically check if anything worth looking into showed up there.

Imagine how much nicer it feels when you don't have runtime exceptions at all. And again, it is not hypothetical. I co-wrote a real-time production app in Rust able to handle more than a million websocket connections on one server that had literally zero runtime errors for more than a year. Something that I absolutely never achieved in any other language.

What about logic errors, though?

So far I've spent quite a bit of time talking about runtime errors. And while for me, it is a serious issue, many people brush it off with a statement similar to: "whatever, most of the bugs are logic errors anyway, so the fact that Rust prevents a few simple issues doesn't matter". The argument is the following: there is no research showing that statically typed languages, or even Rust specifically, result in fewer bugs. Yes, you may have fewer bugs of a certain type, but most bugs are caused by logic errors anyway.

At this point it's worth differentiating between logic errors and runtime errors. When a runtime error happens, an application either crashes or returns an error to a user. Logic erors are a bit sneakier as they result in an incorrect result, but without crashing a program. Look at this function for example:

1fn add(a: u64, b: u64) -> u64 {
2 a - b
3}

There is clearly a logic error here, which can't be easily found by conventional tools. The function will not result in an error, but it will not do what it's supposed to do, ie. add two unsigned integers. That, plus also the fact that it can overflow. If you do add(0, u64::MAX) it will panic in the development mode and return 1 in release mode (you can prevent this kind of stuff with a clippy lint, btw).

One could argue that this specific case could still be prevented if we used a generic type instead of u64:

1use std::ops::Add;
2
3fn add<A: Add>(a: A, b: A) -> A {
4 a - b
5}

This example will fail, because the A type is only expected to implement the Add trait, which allows using the + operator. It quickly looses its strength, though, if the calculation is a bit more complex. We might need to use more traits and who knows if it's correct or not:

1use std::ops::{Add, Sub, Mul};
2
3fn calculate<A: Add + Sub + Mul>(a: A, b: A) -> A {
4 a - b + a * b
5}

The calculate function may require a few operations like addition, subtraction or multiplication, and thus we can't ensure correctness just by specifying the arguments type.

So as we can clearly see Rust can't prevent all the bugs (and I'm not sure if there's anyone that says otherwise!). But now the question is: can it prevent at least some logic bugs?

Expressiveness

I would argue that an expressive language can sometimes help you prevent logic errors that you would otherwise introduce. Let's take a simple for loop as an example. Consider this JavaScript code:

let animals = ["dog", "cat", "fish"];
for (let i = 0; i <= animals.length; i++) {
  console.log(animals[i]);
}

Can you spot the problem? Instead of writing i < animals.length I wrote i <= animals.length, which would run the loop 4 times, including the case where i is equal to 3. Of course there is another way to loop through an array in JavaScript, which prevents "off by one" errors:

for (let animal of animals) {
  console.log(animal);
}

As simple as this example is, I hope it clearly shows that, in fact, language constructs can sometimes prevent logic errors. One, of course, could argue that this is not a logic error, but I agree with the Wikipedia definition:

In computer programming, a logic error is a bug in a program that causes it to operate incorrectly, but not to terminate abnormally (or crash). A logic error produces unintended or undesired output or other behaviour, although it may not immediately be recognized as such.

In the case of the above code, the for loop version would not crash, it would just print and extra line in the console saying undefined. Of course some variations of off by one error could result in a runtime exception, but it's not always the case.

Digression: outside mainstream languages

This post is mainly about mainstream programming languages and Rust specifically, but if we go outside general-purpose mainstream languages there are lots of solutions for writing formally verified software. Two examples are Agda and Coq. There are also multiple tools for model verification and other formal verification methods that can be used on top of programming languages, like TLA+.

That's outside of the scope of this post, though, so let's get back to Rust.

Logic errors turned into type errors

When using a language like Rust you can sometimes end up with a type related compilation error that would otherwise be a logic error at runtime.

Let's consider the sqlx library in Rust. Sqlx is a library for interacting with SQL databases and an interesting feature it has is checking for type errors between code and a database at compile time. For example imagine the following database schema in SQL:

create table authors (
  id bigserial primary key not null,
  name text
);

and the following structure in Rust:

#[derive(Serialize, Deserialize, Debug)]
struct Author {
    id: i64,
    name: String,
}

Now imagine if we tried to fetch all the authors from a database and compile a list of their names

let authors = sqlx::query_as!(Author, r#"SELECT * FROM authors"#)
        .fetch_all(db)
        .await?;

// map through the authors list returning their name and join all of the names
// with a coma
let authors_list = authors.iter().map(|author| author.name).collect::<Vec<String>>().join(", ");

Even though it might look correct, this code will not compile. It will result in a following error:

   |
11 | let authors = sqlx::query_as!(Author, r#"SELECT * FROM authors"#)
   |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `From<std::option::Option<std::string::String>>` is not implemented for `std::string::String`

Why is it happening? The error says:

the trait `From<std::option::Option<std::string::String>>` is not implemented for `std::string::String`

which is maybe not the best error message of all time, but what it means in Rust is that sqlx is trying to convert from Option<String> to String and there's no way to do it. In Rust there is no null type, so cases which could potentially result in a null value usually use the Option type which can either have Some(...) value or be None. Because the schema didn't define the name column as a not null column, it may contain null values. Thus sqlx is correctly trying to map the name type into Option<String>. The problem is the name field in the Author struct is of type String and thus it can't accept a possibility of a non existing value. In other words Rust forces us to decide whether we actually want the name to be optional or not. We have to either change it to optional in the program code or add a not null constraint to disallow null values altogether.

This is a type of error that I've seen countless times in my career. Someone forgets to add a not null constraint, everywhere in the code it's assumed that the column will always be present and eventually a null value ends up in the table breaking something. Sometimes it's a runtime error, sometimes a logic error. In this particular case if we used a language like Ruby it might result in a logic error with a weird looking output, like:

N.K. Jemisin, , Goerge R.R. Martin

Concurrency

As I mentioned in the beginning you can't introduce data races in safe Rust. That alone is a huge source of preventing logic errors. When writing concurrent code it's very easy to forget about guarding access to a variable shared between threads, which might introduce a data race. It also happens that data races are often also logic errors in high level languages. In C a data race is considered undefined behavior, so you never know what will happen, but in a language like Ruby or Python a data race might either result in a runtime exception or a wrong answer.

Here, again, we can see that the language can prevent logic errors. What's more, I think that the way Rust prevents data races (ie. by enforcing ownership rules) makes it easier to prevent race conditions as well. Let's take a serious real life problem that was caused by a data race and resulted in a race condition. At some point GitHub encountered a problem with users being randomly logged in as other users. Although it was a small percent of users, it makes for a very bad PR. Imagine owning a company and learning that a random person could have looked and cloned all of your source code. Kinda scary. The blog post I linked explains the situation quite well, but hopefully I can summarize it here so you don't have to read it all (I recommend it, though!). The problem they encountered was caused by Unicorn, a web server for Ruby applications. It was reusing a hash map for keeping request data between requests for performance. Whenever a request finished, the hash map was cleared and it was reused for the next request. As one thread can only handle one request at a time it was perfectly safe to do. The problem was, somewhere down the line a background thread for handling exceptions was introduced in GitHub's code. The thread modified session data, also stored in the request hash map, which resulted in a possibility of a data race and a race condition. It happened because the request hash map was passed to a separate thread while Unicorn was also holding onto it and reusing it for the next request. The biggest issue in this situation is that even if you looked at the code in Unicorn, you couldn't have predicted this kind of behavior. Because once you pass data to another method in Ruby, you have no idea where it will land and what will happen to it (and btw, this is also true for most of the mainstream languages, even statically typed like Go, when you pass objects by a reference). You can't prevent downstream code from modifying the value you pass, and you can't prevent it from passing it to another thread.

In Rust, on the other hand, you have tools for this kind of control. If you create a request data hash map, and you define it as a simple HashMap, one can't simply send it to another thread for modification. It would have to be wrapped by something like Arc + Mutex, ie. Arc<Mutex<HashMap>>. Now, I want to be perfectly clear; it is possible to introduce exactly the same bug in Rust if you try hard enough, but it would be highly unlikely, because:

Most importantly libraries don't typically share resources behind a Mutex. They may hand you the data for ownership, but then they couldn't be using it for modification afterwards. Or they could borrow you the data, but with a limited lifetime, so even if it's mutable you wouldn't be able to pass it to a thread. Thus errors like the one with Unicorn are almost impossible to introduce in Rust.

I think that the biggest value here lays in having to explicitly consider data ownership. Which also makes programming in Rust harder, because you have to actually make all of those decisions, similarly to error handling. A lot of examples in Rust use unwrap() just because handling all of the possible errors can get rather verbose. Which also shows how much we "sweep under the rug" in languages where error handling is dynamic and based on exceptions.

Access control

As mentioned in the previous section Rust has tools for mutability and access control. One of the types of bugs I've seen in in the wild was caused by mutating stuff that shouldn't be mutated. It often also resulted in a logic error.

It typically goes like this:

  1. Pass a non-primitive type to a function
  2. The function needs to calculate something and it treats the passed value as it owns it, typically by using it for an intermediary cache or even storing the result.
  3. The calling function doesn't expect any mutation to happen.

Let's look at the following Ruby code:

def calculate_project_cost(tasks, hourly_rate)
  total_cost = 0

  tasks.each do |task|
    estimated_hours = task[:estimated_hours]

    # If a task is labeled as 'complex', increase the estimated_hours by 30%
    if task[:complexity] == 'complex'
      task[:estimated_hours] = (estimated_hours * 1.3).round(2)
    end

    total_cost += task[:estimated_hours] * hourly_rate
  end

  total_cost
end

In this result we go through the list of tasks, with each task being described by complexity and an estimated hours value. In order to make project estimation less risky, we want to increase each of the complex tasks budget by 30% before calculating the total project cost at our hourly rate. This function uses the original task as a holder of temporary value thus mutating it. Of course it would be wiser to create a separate variable for the calculation and even better use elements of functional programming like map and sum on the array, but we all know what happens when people are under time pressure - mistakes happen and code that should never have been written inevitably gets written. I've also seen serious production issues caused by this type of bug. Usually it's also hard to debug as the mutation can happen multiple levels deep with the effects of the bug not being directly related to the modifying code.

In Rust when you pass a variable to a function that calculates something you typically only pass a read pointer, or in Rust terminology, the function borrows it. A function calculating a value that needs an ownership on input arguments or that needs a mutable borrow would certainly raise some eyebrows. There are functions that calculate or generate data that accept mutable references, but it's usually done for saving the result only (for example it's common to pass a mutable reference to an array as a buffer to write to).

Some of the commenters pointed out that this is a contrived example, cause people would most likely use map + sum instead of using each. And while, yes, this is a bit of a contrived example, I can assure you, you can absolutely find this kind of code in production. It took me 2 minutes of using GitHub search to find an example in the Ruby repository itself.

If you want to see a bug caused by unwanted mutation in the wild, here's an issue in Rails and a commit where I fixed it. I know it's old, but it was easier to look through my commits that include .dup or .freeze rather than all commits to the framework and I knew for a fact I fixed at least a few of issues like that.

The issue is that when a default model value in ActiveRecord was modified in place, it would also modify the default values itself, which would result in this unexpected behaviour:

1author = Author.new
2author.name # returns ""
3author.name << "N.K. Jemisin" # we modify the name in place
4another_author = Author.new
5another_author.name # returns "N.K. Jemisin"

What's especially interesting about this particular issue is that the author of the original code already used dup on the default attributes. It just wasn't enough, because dup used on a hash will only duplicate the hash itself, not the values:

1h = { foo: "" }
2h1 = h.dup
3h1[:foo] << "bar"
4h[:foo] # returns "bar"

Enums

This is an easy one. In Rust whenever we deal with a number of options the chances are very high we will use an enum. In other languages it will a lot of the time be just a loosely defined list of possible states. Consider the following code:

def process_payment(status)
  case status
  when :pending
    # process a pending payment
  when :authorized
    # process an authorized payment
  end
end

This works great until someone comes and adds an additional status for payments and then you have to find all of the places where you could possibly be handling the payment status. And while you don't necessarily have to handle all of the status types in every single case, there is a chance you might miss a place where you need to handle all of them unless the compiler yells at you. This is not a contrived example, either. I'd bet most of Rails applications have at least one case of an attribute like status being a symbol.

In Rust you would typically use an enum. Using an enum in a match statement will by default result in an error if one of the options is not handled

enum PaymentStatus {
    Pending,
    Authorized,
}

fn process_payment(status: PaymentStatus) {
    match status {
        Pending => todo!(),
        Authorized => todo!(),
    }
}

Of course it is possible to not use an enum for this case, but it's rather uncommon in Rust, because enum is so easy to use. You can emulate a similar behaviour in Ruby or other languages that lack enums, but people tend to use solutions that are readily available and popular.

Beyond the language

Although the Rust language alone already gives you quite a lot of guarantees and tools for writing correct software, I would argue the value lies not only in the language. In this one of the examples I gave uses the sqlx library, but there are other libraries that are widely used and use similar context. For example, even though it's not in the standard library, when dealing with JSON or other serialization formats, you would almost always use the serde library, which makes it easy to handle serialization with types.

But even outside of everyday use libraries there is a value of having a community that cares about correctness. There are techniques that you can use to catch more bugs, like:

And many more.

Some popular languages don't have even the basic tools from that list (I'm looking at you, Ruby), but what's more a lot of these tools can't be implemented for a lot of the languages. For example proof verification would be extremely hard (if not impossible) to implement for dynamic languages.

Conclusion

I hope now you could agree with me that Rust gives you tools that help you prevent at least some logic errors from happening, especially when you write idiomatic Rust. Of course it's still very much possible to shoot yourself in the foot in Rust, but I'd argue Rust is one of the languages with a relatively small amount of footguns.