Squiggly Lines: Background lint- and type-checking for Clojure in emacs.

In the alternate universe where I am very honest, my résumé contains a line item for the aggregate year I've spent fiddling with init.el. It does not, however, list emacs lisp among the languages I know, because (in this alternate universe) I have too much self respect to flaunt cut and paste skills.

In the present universe, this is going to be awkward, because I will have to present really crappy elisp code without apology.

What this is about.

One of the things you get in an industrial strength IDE, in exchange for giving up your favorite editor, is a sort of ghost-pairing assistant who draws squiggly red lines under your coding infelicities without requiring you to trawl through batch compilation output.
There are a number of Clojure IDEs in various stages of development, but none of them really perform this kind of magic. Cursive is moving in that direction, but in a way that may limit itself, by not taking advantage of existing checking tools and those inherent in the compiler. And it's closed source. Boo hiss.
Cider is open source and lives in emacs, but gets its Clojure insights from a sideboard Clojure process, for which it is essentially a presentation layer. This is a sensible approach that fuses efforts across may projects rather than trying to duplicate all the functionality in one parallel code base.
But it doesn't have squiggly lines.
While googling about, I realized it would not be that hard to add them.
And the general technique is sort of interesting.
But my solution needs work, is probably duplicative, and I need advice.

The code below is of course on github.

flycheck

flycheck is an extension that gives emacs the ability to highlight coding errors and warnings, in a wide variety of languages, in near-time, just as we've come to expect from fancy IDEs. Importantly,

the error checking runs in the background, without you having to request it, and
the errors show up as discreet annotations in the source buffer, with elaboration available through mouse/cursor-over.

The fancy IDEs, of course, are generally written in modern languages closely related to the languages they support, so, while IntelliJ and Eclipse are incredibly polished and impressive, they do start out with some advantages over flycheck, which has a much broader language support remit and a significantly more grizzled platform to build upon.

Unlike its predecessor, flymake, flycheck is not distributed with the base distribution of emacs, but it is easily installed from MELPA. While you're at it, it will also be helpful to install flycheck-pos-tip, which displays errors and warnings as tool tips rather than in the rather overburdened minibuffer.

Since roughly the beginning of December, flycheck has allowed relatively straightforward customization, letting you extend it beyond the mere 43 languages that it natively supports.

flycheck and cider

Most flycheck checkers are implemented using an external command that checks the code in the current buffer and emits its complaints in some parsable form. Due to the long startup times of JVM programs, and the tendency of JVM language linters to be written in the languages they lint (3 times fast please), there hasn't been any support in flycheck for such languages. To get acceptable performance, we'd really need a persistent JVM process, with ongoing two-way communication.

That sort of thing is a pain in the neck to get right, but, fortunately for us, it is also a popular Bulgarian pastime, so we have Cider. Cider uses its persistent connection to the REPL to provide a vast set of features, including code completion, documentation lookup, code browsing, etc.

It also makes Clojure errors a bit more palatable, formatting them nicely and highlighting the offending code in the source buffer. Still, the error checking experience is not quite as smooth as what one has for Java or Scala in Eclipse or IntelliJ. The main difference is that Cider doesn't disguise that batch nature of error checking operations. Errors emerge in a punctuated fashion, when you throw an exception, and they show up one at a time in a popup buffer.

Using cider's utilities for asynchronous communication with a persistent Clojure process together with flycheck's asynchronous error handling, we can now get a little closer to the IDE experience.

linting and type-checking

In a charitable mood, you might call me a core.typed enthusiast, rather than complaining that I never shut up about it. Whatever your approach, I'm probably not going to stop talking about it, because I strongly believe that strongly believing in unmitigated dynamic typing does a disservice to a computer language I have come to love. Enthusiasm aside, I don't exactly love the ritual of getting up the nerve to run (check-ns) and then bracing for an eruption of text. If I could run the check in the background periodically and gently flag the type transgressions for later review, that would be a fine thing.

Another tool I've discovered recently is eastwood, a more general purpose linter for Clojure. It does quite a lot, from detecting typos that will crash immediately at runtime

wrong-arity: Function on var #'clojure.core/map called with 1 args,
  but it is only known to take one of the following args:
  [f coll]  [f c1 c2]  [f c1 c2 c3]  [f c1 c2 c3 & colls]...

to subtler indications that you probably made a mistake

unused-ret-vals: Lazy function call return value is discarded:

to complaints that will probably feel a bit pedantic at first

    unlimited-use: Unlimited use of (clojure.walk clojure.pprint)

(because these namespaces show up in a use without :only or :refer).

In a recent release of eastwood, it became possible to invoke the linter from the REPL, which means it can be invoked by cider, and thus by flycheck.

Asynchronous everything

Emacs lisp code runs in a single thread, which means we can't just ask ask clojure (or any process) to perform a task and then block on its response. Since both cider and flycheck make a living by communicating with external processes, they work with and provide tools for asynchronous callback.

cider callbacks

The canonical cider expression evaluator looks like this,

(cider-tooling-eval "(do (println "hello, how do you you do?") 42)"
  (nrepl-make-response-handler buffer
    (lambda (buffer value)
      (message (format "The final value should be \"42\" == %s" value)))
    (lambda (buffer stdout)
      (message "Very well, thank you.")
    (lambda (buffer stderr)
      (message "Nothing here.  Move along."))
    '())))

with hooks to capture, respectively, the final return value of the expression, standard out and standard error. Since eastwood communicates via formatted output lines, while check-ns-info returns type errors in structured form, we'll get to use two of these three.

flycheck callbacks

The core of a canonical flycheck checker looks like this,

 (defun my-flycheck-checker-start (checker callback)
  (let ((buffer (current-buffer))
        (errors ()))
    (push (flycheck-error-new-at 37      ;; line
                                 1       ;; column
                                 'error
                                 "Something terrible happened."
                                 :checker  checker
                                 :buffer   buffer
                                 :filename "foo.clj")
          errors)
    (funcall callback 'finished errors)))

packing infelicities into flycheck error structures and passing them on to the provided callback.

flycheck + cider

Since all our error information is coming from evaluations of Clojure code, both the accumulation of error structures and the final invocation of the flycheck callback will occur within cider response handler callbacks, something like this,

 (defun my-flycheck-checker-start (checker callback)
  (let ((buffer (current-buffer))
    (errors ()))
      (cider-tooling-eval "(check-something)"
         (nrepl-make-response-handler buffer
           (lambda (buffer value)
              (mapc (lambda (e) (push e errors)) (parse-the-return-value value))
              (funcall callback 'finished errors))
           (lambda (buffer stdout)
              (mapc (lambda (e) (push e errors)) (parse-some-output value)))
           (lambda (buffer stderr)
              (message "whoops"))
           '()))))

assuming we have written parse-something-or-other functions to convert the clojure responses into error structures.

things I learned and haven't learned about callbacks

All of these anonymous callback functions would be a bit silly if they couldn't close over variables in their lexical scope, so be sure to (setq lexical-binding t) (or include -*- lexical-binding: t; -*- on the first line).
The cider stdout callback will likely be invoked more than once. I believe it gets called on every fflush, as it sometimes receives multiple lines.
The cider value callback seems to be invoked only once, so one may invoke the flycheck callback from within it.
cider-tooling-evals are queued, so I can do a whole bunch of them and count on them all having completed when I receive a value from the last.

Invoking typed clojure and parsing its output.

Normally, one types (check-ns) in the REPL and sits back to enjoy the printed output. One can, alternatively, call check-ns-info, which returns the same information as a vector of ex-info exceptions, where the ex-data is of the form {:env {:file "foo.clj" :column 1 :line 73}} and the usual multiline output is in the exception message, which we can just merge into the :env. Cider seems to use bencode to pass data between clojure and elisp, but I got impatient trying to figure out how, so I used JSON instead:

(setq cmdf-tc "(do (require 'clojure.core.typed)
                   (require 'clojure.data.json)
                   (clojure.data.json/write-str
                      (map (fn [e] (assoc (:env (ex-data e)) :msg (.getMessage e)))
                      (:delayed-errors (clojure.core.typed/check-ns-info '%s)))))")

With the appropriate namespace formated in, this string will form the second argument to cider-tooling-eval. In the value callback, we decode the JSON -- maps become alists and keywords become symbols -- and turn the entire thing into a list of tuples (file line column msg),

(defun get-rec-from-alist (al ks)
  (mapcar (lambda (k) (cdr (assoc k al))) ks))
(defun parse-tc-json (s)
  (let ((ws (json-read-from-string (json-read-from-string s))))
    (mapcar (lambda (w) (get-rec-from-alist w '(file line column msg))) ws)))

for further conversion into flycheck errors:

(defun tuple-to-error (w checker buffer fname)
  "Convert W of form '(file, line, column, message) to flycheck error object.
Uses CHECKER, BUFFER and FNAME unmodified."
  (pcase-let* ((`(,file ,line ,column ,msg) w))
    (flycheck-error-new-at line column 'error msg
               :checker checker
               :buffer buffer
               :filename fname)))

Two things of which I am not proud: First, there really must be a better way to extract a cross-section of alist values. Second, the JSON string comes to us EDN quoted, which means we must remove a lot of extra \\s; hence the double json-read-from-string.

The entire type-check wraps up to,

    (cider-tooling-eval (format cmdf-tc ns)
     (nrepl-make-response-handler buffer
      (lambda (_buffer value)
        (message "Finished core.typed check.")
        (mapc (lambda (w) (push (tuple-to-error w checker buffer fname) errors))  (parse-tc-json value)))
      (lambda (_buffer out))
      (lambda (_buffer err))
      '()))

where we're using only one cider hook, so most of the code is do-nothing boilerplate.

At this point, we haven't sent the error list to flycheck, because this can only be done once, and there are more checks to be done.

Invoking eastwood and parsing its output.

Eastwood speaks in formatted output lines like

the/file/name.clj:472:10:a message of some sort

They're easy to parse, at least if one makes the lazy assumption that file names never contain colons.

(setq cmdf-ew "(do (require 'eastwood.lint)
    (eastwood.lint/eastwood {:source-paths [\"src\"] :namespaces ['%s] } ))")
(defun parse-ew (out)
  (delq nil
    (mapcar (lambda (s)
           (let ((r "^\\([^[:space:]]+\\)\\:\\([[:digit:]]+\\)\\:\\([[:digit:]]+\\)\\:[[:space:]]*\\(.*\\)"))
         (if (string-match r s)
             (list
              (match-string 1 s)                     ;; file
              (string-to-number (match-string 2 s))  ;; line
              (string-to-number (match-string 3 s))  ;; col
              (match-string 4 s)                     ;; msg
              ))))
        (split-string out "\n"))))

Once the errors have been parsed into tuples, we do almost the same thing as we did for core.typed,

    (cider-tooling-eval cmd-ew
     (nrepl-make-response-handler
      buffer
      (lambda (_buffer _value) (message "Finished eastwood check."))
      (lambda (_buffer out)
        (mapc (lambda (w) (push (tuple-to-error w checker buffer fname) errors))
              (parse-ew out)))
      (lambda (_buffer err))
      '()))

with the main difference being that the action occurs in the stdout callback.

etc

The final cider-tooling-eval does nothing but detect that the others have run and then passes the errors back to flycheck for display.

    (cider-tooling-eval "true"
        (nrepl-make-response-handler
         buffer
         (lambda (_buffer _value)
           (message "Finished all clj checks.")
           (funcall callback 'finished errors))
         (lambda (_buffer out))
         (lambda (_buffer err))
         '()))

What it looks like

The following is actual production code from a well-known Fortune 500 company:

(defn fly-tests []
  (inc "foo")
  (map inc [1 2 3])
  (+ 3))

In emacs, it looks like this:

squiggle

If you place the cursor right before inc, we quickly see the first error of our ways, courtesy of core.typed:

squiggle

The next complaint is from eastwood, which observes that we're setting up a lazy map calculation and then never using it.

squiggle

The last is also from eastwood, which, again, highlights something that is not an error per se but probably indicates a mistake:

squiggle

To do:

Find out either that someone has done this already, or that there's a good reason not to do it, or both.
Deal with the not so rare circumstance that either eastwood or core.typed fails catastrophically. It might be helpful to run lein check or the equivalent first, but ideally not in a one-time process.
Add configuration options, in case someone doesn't have all the linters installed and on the classpath.
General error handling. Under circumstances I don't entirely understand, it has been necessary to turn flycheck on and off to restore sanity.
Performance optimizations - perhaps throttling and/or narrowing.

What this is about.

flycheck

flycheck and cider

linting and type-checking

Asynchronous everything

cider callbacks

flycheck callbacks

flycheck + cider

things I learned and haven't learned about callbacks

Invoking typed clojure and parsing its output.

Invoking eastwood and parsing its output.

etc

What it looks like

To do:

Comments