03 Jan 2019

Lints, Syntax Parsing, and You

In my previous post I ran into a number of issues and confusion around clippy’s #[clippy::author] annotation and autogenerated code. Instead of continuing with clippy’s documentation, I’m going to jump over to llogiq’s blogpost on writing clippy lints and see what I can learn about lint implementation and the necessary datatypes.

Because clippy is a Rust toolchain component, and the clippy version on https://crates.io is no longer maintained, documentation tools like docs.rs are unavailable for browsing clippy’s types. Fortunately, Rust has a very strong offline documentation story, and generating documentation for a project is as simple as cargo doc --open. This can be a little tricky to get set up when you’re halfway through implementing a feature and looking for information in the documentation though, because the project has to successfully compile as part of generating the offline documentation.

Since clippy uses Rust compiler types to implement lints, we also need to have access to the documentation for rustc, Rust’s compiler, on the nightly toolchain; this can be accessed at https://doc.rust-lang.org/nightly/nightly-rustc/rustc/index.html. Unfortunately this set of compiler documentation is not available offline which seems like a gap in Rust’s documentation story.

Steps to Writing a Clippy Lint

I’ve broken the process I went through to implement this lint into steps to more clearly separate the different parts of a lint, the tools and processes involved at each stage, and problems and errors I encountered along the way as well as solutions I found where applicable.

For quick reference, here is a list of the steps involved:

Step One: Write an Example

The first step in writing a lint, as with most code, is to write the piece that uses the code you’re going to write. If we were writing regular code this might take the form of one or more test cases, but because we’re writing a lint we want to instead write an example of the code we’re trying to lint against. I’ve already done this earlier when I tried to use the #[clippy::author] directive, so I’m going to reuse the same example while using llogiq’s blogpost as a guide for my own lint.

Here’s the small code example we want to lint against:

pub struct MyStruct {
    id: usize
}

impl MyStruct {
    pub fn get_id(&self) -> usize {
        self.id
    }
}

fn main() {
   let s = MyStruct { id: 42 };
   s.get_id();
}

Step Two: Declare the Lint

After writing the example to lint against, we need to actually define the lint; this is done using the declare_clippy_lint! macro from the clippy_lints crate. I tried compiling clippy after declaring the lint and ran into the following error:

cargo check
   Compiling clippy_lints v0.0.212
error: cannot find macro `declare_tool_lint!` in this scope
  --> clippy_lints/src/lib.rs:53:9
   |
53 |           declare_tool_lint! { pub clippy::$name, Warn, $description, report_in_external_macro: true }
   |           ^^^^^^^^^^^^^^^^^
   |
  ::: clippy_lints/src/getter_prefix.rs:3:1
   |
3  | / declare_clippy_lint! {
4  | |     pub GETTER_PREFIX,
5  | |     style,
6  | |     "prefixing a getter with `get_`, which does not follow convention"
7  | | }
   | |_- in this macro invocation

error: aborting due to previous error

error: Could not compile `clippy_lints`.
warning: build failed, waiting for other jobs to finish...
error: cannot find macro `declare_tool_lint!` in this scope
  --> clippy_lints/src/lib.rs:53:9
   |
53 |           declare_tool_lint! { pub clippy::$name, Warn, $description, report_in_external_macro: true }
   |           ^^^^^^^^^^^^^^^^^
   |
  ::: clippy_lints/src/getter_prefix.rs:3:1
   |
3  | / declare_clippy_lint! {
4  | |     pub GETTER_PREFIX,
5  | |     style,
6  | |     "prefixing a getter with `get_`, which does not follow convention"
7  | | }
   | |_- in this macro invocation

error: aborting due to previous error

error: Could not compile `clippy_lints`.

To learn more, run the command again with --verbose.

cargo can’t find the declare_tool_lint! macro in the current scope, which is not an error I was expecting to see because I didn’t think I was using that macro in my lint definition. It turns out that the definition of the declare_clippy_lint! macro uses the declare_tool_lint! macro, so this second macro must be brought into scope before the first can be used. I’m not very familiar with macros or metaprogramming in Rust, but after looking at a number of other lints it seems like all of them use the declare_tool_lint! macro so I will too.

Here is the lint code so far, including the first line which imports the dependent lint:

use crate::rustc::declare_tool_lint;

declare_clippy_lint! {
    pub GETTER_PREFIX,
    style,
    "prefixing a getter with `get_`, which does not follow convention"
}

As part of declaring the lint, we also want to document the lint by describing what it does and include some examples of code that will not pass the lint and alternatives that will pass the lint rules. By following a standard convention, documentation written for each lint can be extracted and added to the lint list which provides a filterable list of all of clippy’s lints.

Rust provides a lot of tools for documenting Rust code, the cornerstone of which is documentation comments. Documentation comments support Markdown syntax for formatting and are used to generate browsable HTML pages, without requiring the author to maintain separate written documentation. Documentation comments also drive other tools such as docs.rs.

Here is a (very rough) initial documentation comment for the lint, including examples of bad and good code samples:

/// **What it does:** Checks for the `get_` prefix on getters.
///
/// **Why is this bad?** The Rust API Guidelines section on naming
/// [specifies](https://rust-lang-nursery.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter)
/// that the `get_` prefix is not used for getters in Rust code unless
/// there is a single and obvious thing that could reasonably be gotten by
/// a getter.
///
/// **Known problems:** Exceptions not yet implemented.
///
/// **Example:**
///
/// ```rust
/// // Bad
/// impl B {
///     fn get_id(&self) -> usize {
///         ..
///     }
///}
///
/// // Good
/// impl G {
///     fn id(&self) -> usize {
///         ..
///     }
/// }
/// ```

Step Three: Register the Lint with Clippy

This is a largely automated process thanks to some clippy tooling as described in the How Clippy Works section of the contribution documentation. Simply (re-)run util/dev update_lints as necessary, which autogenerates the majority of clippy_lints/src/lib.rs to declare lints and lint groups.

Alternatively, this can be done manually by adding the lint to the correct lint group inside the register_plugins function.

Regardless of how the lint is added to the lint groups, the lint must be registered with the lint registry by introducing the lint as either an early or late lint pass. In the case of this lint, which will be implemented as an early lint pass, the following line adds it to the lint registry as declared inside the register_plugins function:

reg.register_early_lint_pass(box naming::GetterPrefix);

Step Four: Start Implementing Lint Passes

Now that the lint is defined and registered with clippy, we can start implementing the logic to lint against the example we implemented in Step One.

For most lints there are two traits that need to be implemented so the lint can be registered with rustc_plugin::registry::Registry: the LintPass trait and one of either the EarlyLintPass or the LateLintPass trait. LintPass is apparently necessary to provide descriptions of the possible lints the lint can emit, but in all of the lints I have looked at, the LintPass implementation always takes the form:

impl LintPass for Pass {
    fn get_lints(&self) -> LintArray {
        lint_array!(LINT_NAME)
    }
}

so I’m surprised there hasn’t been a #[derive] annotation written for it or some kind of macro that would reduce the repetition.

As for EarlyLintPass or LateLintPass, the choice of which trait to implement comes down to the kind of information a given lint needs about the code it is linting: EarlyLintPass methods only provide abstract syntax tree (AST) information, whereas LateLintPass methods are, as the name implies, executed later in the compilation process and contain type information. Since this lint is only interested in function names and checking them against known patterns, I’ve decided to implement the EarlyLintPass trait. There seems to be a lot of overlap in method signatures between the two types of lint pass, so if I need to switch to LateLintPass to get access to additional type information it should not be too difficult to transition over.

Step Five: Inspect and Interpret the AST

Most everything up until now has been boilerplate for getting the lint set up and correctly registered with clippy’s infrastructure. Now that we have everything set up, we need to figure out how to tell the lint to match against the undesired function names we’ve already written in our test. To do this, we need to first understand how the rustc compiler sees the code we have written, which for rustc is represented as an abstract syntax tree (AST), and then parse that tree to match against undesired nodes within the tree that correspond to the Rust source code. To retrieve this representation, we can use rustc to generate the AST and then inspect it manually to get a sense of the program structure; rustc has the -Z option for controlling various debug options (see rustc --help), and one of these options tells rustc to print the AST as JSON and halt compilation: -Z ast-json.

The full command to print the AST as JSON, specifying the individual test for this lint whose code we want to inspect, is:

$ rustc tests/ui/naming.rs -L target/debug -Z ast-json

The initial output of this command is utterly unreadable because rustc prints it as a single line of JSON, so I’m going to use jq to pretty-print the AST so it’s easier to read:

{
    "module": {
        "inner": {
            "lo": 426,
            "hi": 611
        },
        "items": [
            [snip 1604 lines]
        ],
        "inline": true
    },
    "attrs": [],
    "span": {
        "lo": 426,
        "hi": 611
    }
}

In all, the AST for my 12-line test file ended up expanding to a 1627-line (pretty-printed) JSON file. The items key and its 1604 lines are where the actually interesting AST information is in the data structure, containing information about every identifier, implementation, attribute, etc in the code. Within the items array, the name of the function in the test, get_id, appears 5 times in various contexts such as "variant": "Impl" and "variant": "MethodCall". The following is what I believe to be the beginning of the AST for the get_id function:

{
    "id": 20,
    "ident": "get_id",
    "vis": {
        "node": "Public",
        "span": {
        "lo": 485,
        "hi": 488
        }
    },
    "defaultness": "Final",
    "attrs": [],
    "generics": {
        "params": [],
        "where_clause": {
            "id": 21,
            "predicates": [],
            "span": {
                "lo": 0,
                "hi": 0
            }
        },
        "span": {
            "lo": 0,
            "hi": 0
        }
    },
    ...
}

Specifically, the following are the nodes that describe the pub fn get_id text literals of the function signature:

"tokens": [
    {
        "variant": "Token",
        "fields": [
            {
                "lo": 485,
                "hi": 488
            },
            {
                "variant": "Ident",
                "fields": [
                    "pub",
                    false
                ]
            }
        ]
    },
    {
        "variant": "Token",
        "fields": [
            {
                "lo": 489,
                "hi": 491
            },
            {
                "variant": "Ident",
                "fields": [
                    "fn",
                    false
                ]
            }
        ]
    },
    {
        "variant": "Token",
        "fields": [
            {
                "lo": 492,
                "hi": 498
            },
            {
                "variant": "Ident",
                "fields": [
                    "get_id",
                    false
                ]
            }
        ]
    },
]

Now that the relevant sections of the AST have been identified, we can use this information to get a better understanding of how rustc (and thus tools like clippy) understand the written Rust code in the source file. This information can be useful at many different stages of lint implementation, particularly if we need to debug out lint or we are not matching the expected nodes in our lint code.

Step Six: Implement EarlyLintPass

The EarlyLintPass trait requires the implementation of one of its provided methods to perform the lint work. Looking at the list of provided methods, however, I’m not really sure where to start or which method to implement. A lot of rustc’s internals seem to be chronically under-documented, which makes it very difficult to understand how different internals are used (what’s the difference between check_item and check_item_post, for example?) or the subtle (or not so subtle) differences between various types or methods.

My strategy in learning about the rustc internals so far has been a combination of looking at existing lints and trying to identify what the internals do based on the lint’s goals and existing code, as well as inspecting not the EarlyLintPass methods themselves but rather the associated types such as syntax::ast::Item or syntax::ast::Local which are generally better documented than the methods that use them. These associated types also match very closely if not exactly to the AST structure produced by rustc, so by identifying the structure of the information in the AST as above, it is easier to choose the correct function to implement based on its associated type which contains the same information as the already-identified AST subtree(s).

Based on my understanding, it looks like the method that I want to implement is check_item which will yield syntax::ast::Item instances whose syntax::ast::ItemKind we can then match on for function declarations.

My initial implementation of the EarlyLintPass trait based on the above yielded the following:

impl EarlyLintPass for GetterPrefix {
    fn check_item(&mut self, cx: &EarlyContext<'_>, item: &ast::Item) {
        if let ast::ItemKind::Fn(..) = item.node {
            let name = item.ident.name;
            if name.as_str().starts_with("get_") {
                span_lint(cx, GETTER_PREFIX, item.span, "prefixing a getter with `get_` does not follow naming conventions");
            }
        }
    }
}

This implementation checks each Item in the AST and looks for nodes identified as ItemKind::Fn; that is, functions declared in the source. For matching nodes, the name is extracted from the node identifier and is checked to see if it starts with the get_ prefix. The span_lint function comes from clippy’s lint utils and is what actually contextualizes lint warnings and errors around the offending code and outputs the lint message.

However, it turns out that if let ast::ItemKind::Fn(..) only matches the main function name of the test case, rather than my expectation that it would match all functions declared within the file.

Instead, since we are interested in functions implemented for a type, we can use the check_impl_item method which will yield syntax::ast::ImplItem instances whose syntax::ast::ImplItemKind we can then match on for methods like so:

impl EarlyLintPass for GetterPrefix {
    fn check_impl_item(&mut self, cx: &EarlyContext<'_>, implitem: &ast::ImplItem) {
        if let ast::ImplItemKind::Method(..) = implitem.node {
            let name = implitem.ident.name;
            if name.as_str().starts_with("get_") {
                span_lint(
                    cx,
                    GETTER_PREFIX,
                    implitem.span,
                    "prefixing a getter with `get_` does not follow naming conventions"
                );
            }
        }
    }
}

The above implementation works very similarly to the initial implementation, but instead of checking Items looking for ItemKind::Fn, it instead looks at ImplItems in the AST and matches ImplItemKind::Method nodes (since implementation nodes also include const declarations, types and type aliases, traits, and macros in addition to methods).

To test an individual lint without the full clippy test harness (or to see println!s or other debugging statements more clearly), we can use the following clippy-driver incantation and specify a single UI test file, tests/ui/naming.rs:

$ CLIPPY_TESTS=true cargo run --bin clippy-driver -- -L ./target/debug tests/ui/naming.rs

This lint pass implementation results in the following (successful) lint warning:

warning: prefixing a getter with `get_` does not follow naming conventions
  --> tests/ui/naming.rs:15:5
   |
15 | /     pub fn get_id(&self) -> usize {
16 | |         self.id
17 | |     }
   | |_____^
   |
   = note: #[warn(clippy::getter_prefix)] on by default
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#getter_prefix

Step Seven: Generate the .stderr File for the New Lint

To test its behaviour, clippy uses UI tests to check that the output of the compiler is exactly as expected. The .stderr file clippy will use to check the newly implemented lint is automatically generated whenever the tests are run using cargo test, but rather than running all of the tests at this point we can run just the test for the individual lint by specifying TESTNAME=ui/naming where ui/naming is the UI test to run:

$ TESTNAME=ui/naming cargo test --test compile-test

To update the .stderr (and .stdout, if applicable) files in tests/ui/, we use the provided update script (the correct incantation for an individual lint can be found in the output of cargo test with a specified TESTNAME as we did above):

$ tests/ui/update-references.sh 'target/debug/test_build_base' 'naming.rs'

This will create the tests/ui/naming.stderr file for the lint.

Step Eight: Iterate

Now that the bare bones of the lint is implemented and it has stderr output as expected, the lint can be iterated on to add more functionality and identify possible false positives that need to be mitigated.

In the case of this lint, I added test cases that cover the exceptions to the naming rule and then introduced an additional check in the lint to see if the matched function name appeared in this exceptions list:

const ALLOWED_METHOD_NAMES: [&'static str; 5] = [
    "get",
    "get_mut",
    "get_unchecked",
    "get_unchecked_mut",
    "get_ref"
];

impl EarlyLintPass for GetterPrefix {
    fn check_impl_item(&mut self, cx: &EarlyContext<'_>, implitem: &ast::ImplItem) {
        if let ast::ImplItemKind::Method(..) = implitem.node {
            let name = implitem.ident.name.as_str().get();
            if name.starts_with("get_") && !ALLOWED_METHOD_NAMES.contains(&name) {
                span_lint(
                    cx,
                    GETTER_PREFIX,
                    implitem.span,
                    "prefixing a getter with `get_` does not follow naming conventions",
                );
            }
        }
    }
}

Step Nine: Run the Full Test Suite

Up until now I have only been running tests for the lint I have been working on to speed up the feedback cycle. Before moving on, I want to run the complete test suite to make sure that the tests still pass with my additions, and if any existing lints need to be changed to conform to the new lint.

In doing so I discovered a get_unit function defined in the unused_unit lint, but since this definition was localized to one lint file I was able to change the function name to pass the getter prefix lint and still maintain the lint’s original functionality. Changing this function name also meant that the tests/ui/unused_unit.stderr file was out of date, which was updated using the provided tests/ui/update-all-references.sh script.

Step Ten: Linting Clippy with Local Changes

The last step before submitting a pull request to rust-lang/rust-clippy is to make sure that all lints that have already been defined pass clippy (that is, there are no suggestions reported by clippy for its own codebase). Running clippy locally and addressing any issues found ahead of submitting a pull request will cut down on the feedback cycle and speed up the pull request review process. The recommended way to do this is by building clippy and then running it with all lint groups (including internal and pedantic) turned on:

$ cargo build
$ `pwd`/target/debug/cargo-clippy clippy --all-targets --all-features -- -D clippy::all -D clippy::internal -D clippy::pedantic

I was not able to run this command successfully, as it resulted in what appear to be dynamic linker errors on my machine:

$ clippy/target/debug/cargo-clippy --all-targets --all-features -- -D clippy::all -D clippy::internal -D clippy::pedantic
error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `clippy/target/debug/clippy-driver rustc - --crate-name ___ --print=file-names --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro` (signal: 6, SIGABRT: process abort signal)
--- stderr
dyld: Library not loaded: @rpath/librustc_driver-b630426988dbbdb0.dylib
  Referenced from: clippy/target/debug/clippy-driver
  Reason: image not found

However, even without running clippy locally, in the case of this getter prefix lint clippy would not pass local clippy. It turns out that there a number of rustc methods that do not follow the API naming convention, and I will need to consult with other clippy developers to find out if we will need to add more exceptions to the lint than we had originally identified. The one method that is particularly problematic is LintPass::get_lints because it appears in every lint that has already been defined in clippy.

Submitting the Lint

Here’s the implemented lint in its entirety:

The UI test file tests/ui/naming.rs:

// Copyright 2018 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

pub struct MyStruct {
    id: usize
}

impl MyStruct {
    pub fn get_id(&self) -> usize {
        self.id
    }

    pub fn get(&self) -> usize {
        self.id
    }

    pub fn get_mut(&mut self) -> usize {
        self.id
    }

    pub fn get_unchecked(&self) -> usize {
        self.id
    }

    pub fn get_unchecked_mut(&mut self) -> usize {
        self.id
    }

    pub fn get_ref(&self) -> usize {
        self.id
    }
}

fn main() {
   let mut s = MyStruct { id: 42 };
   s.get_id();
   s.get();
   s.get_mut();
   s.get_unchecked();
   s.get_unchecked_mut();
   s.get_ref();
}

The lint implementation clippy_lints/src/naming.rs:

// Copyright 2018 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

use crate::rustc::lint::{EarlyContext, EarlyLintPass, LintArray, LintPass};
use crate::rustc::{declare_tool_lint, lint_array};
use crate::syntax::ast;
use crate::utils::span_lint;

/// **What it does:** Checks for the `get_` prefix on getters.
///
/// **Why is this bad?** The Rust API Guidelines section on naming
/// [specifies](https://rust-lang-nursery.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter)
/// that the `get_` prefix is not used for getters in Rust code unless
/// there is a single and obvious thing that could reasonably be gotten by
/// a getter.
///
/// The exceptions to this naming convention are as follows:
/// - `get` (such as in
///   [`std::cell::Cell::get`](https://doc.rust-lang.org/std/cell/struct.Cell.html#method.get))
/// - `get_mut`
/// - `get_unchecked`
/// - `get_unchecked_mut`
/// - `get_ref`
///
/// **Known problems:** None.
///
/// **Example:**
///
/// ```rust
/// // Bad
/// impl B {
///     fn get_id(&self) -> usize {
///         ..
///     }
/// }
///
/// // Good
/// impl G {
///     fn id(&self) -> usize {
///         ..
///     }
/// }
///
/// // Also allowed
/// impl A {
///     fn get(&self) -> usize {
///         ..
///     }
/// }
/// ```
declare_clippy_lint! {
    pub GETTER_PREFIX,
    style,
    "prefixing a getter with `get_`, which does not follow convention"
}

#[derive(Copy, Clone)]
pub struct GetterPrefix;

#[rustfmt::skip]
const ALLOWED_METHOD_NAMES: [&'static str; 5] = [
    "get",
    "get_mut",
    "get_unchecked",
    "get_unchecked_mut",
    "get_ref"
];

impl LintPass for GetterPrefix {
    fn get_lints(&self) -> LintArray {
        lint_array!(GETTER_PREFIX)
    }
}

impl EarlyLintPass for GetterPrefix {
    fn check_impl_item(&mut self, cx: &EarlyContext<'_>, implitem: &ast::ImplItem) {
        if let ast::ImplItemKind::Method(..) = implitem.node {
            let name = implitem.ident.name.as_str().get();
            if name.starts_with("get_") && !ALLOWED_METHOD_NAMES.contains(&name) {
                span_lint(
                    cx,
                    GETTER_PREFIX,
                    implitem.span,
                    "prefixing a getter with `get_` does not follow naming conventions",
                );
            }
        }
    }
}

Once the basic lint functionality was completed, I wanted to open a pull request against rust-lang/rust-clippy as soon as possible. This is my first time writing a lint for clippy, so I wanted to start getting feedback from other clippy developers and incorporate their suggestions to improve the lint code as well as my understanding of how clippy works. Once I got the lint working I also realized there were a number of decisions to be made where I did not have enough Rust ecosystem or clippy-specific knowledge to answer, so opening a pull request would be a great way to get those questions answered within the context of the code I had written for the lint.

My pull request for this getter prefix name lint can be found at rust-lang/rust-clippy#3616 which details the current state of the lint implementation and its progress getting merged into clippy.

There is still some work to be done before I feel confident this lint can be merged. First, I need to go through clippy more thoroughly and identify existing lints that need to be changed to pass the newly introduced lint; resolving the error I identified in Step Ten should get me well on the way to achieving this.

I would also like to implement some machine-applicable renaming suggestions that would remove the get_ prefix from method names so that functions that fail to meet this Rust API naming convention can be automatically fixed by rustfix. I’m looking into the mechanisms clippy provides to make these kinds of suggestions, and span_lint_and_sugg looks like a likely candidate for teaching rustfix about these renaming rules.

Lastly I will of course need to implement the feedback on the pull request from other clippy developers so the lint will be accepted for inclusion in clippy.

Conclusion

I’m really glad I was able to write this lint for clippy. I had found the original issue, rust-lang/rust-clippy#1673, back in October 2018 and thought it was the perfect size for getting my feet wet writing a lint. Some parts of implementing this lint turned out to be more difficult than I had anticipated, most notably around the sparseness or complete lack of rustc internals documentation which was surprising given the Rust community’s focus on writing documentation and the generally high quality documentation available across the ecosystem, but a little trial and error and some println! debugging pointed me in the right direction in the end.

A special thanks to Manish Goregaokar, Philipp Krones, Matthias Krüger, and hcpl for their input, feedback, and help at various points throughout this process, and llogiq for their blog post on writing clippy lints which was very helpful in finding my way to a working lint implementation.

There is a particular quote by Charles H. Spurgeon that I tend to associate with new years and new beginnings: “Begin as you mean to go on, and go on as you began”. The way that I began 2019 was writing Rust code, and I fully intend to go on writing Rust code for the rest of 2019 and beyond. I think that’s a worth-while New Year’s resolution, don’t you?