- There was this perplexing article in Ars Technica about replacing Fortran for scientific computation.
- The main performance advantage of Fortran is that its restrictive data structures facilitate vector optimization.
- There are many ways to match and even outperform Fortran, while using other languages.
- But it's OK to use Fortran if you want to.
- Ars Technica notwithstanding, Clojure, Haskell and Julia are not locked in competition to replace Fortran.
- Ars Technica notwithstanding, Fibonacci sequences may be something we learned about once, but it is difficult to disguise their irrelevance to scientific computing.
Ten or fifteen pages into a typical Ars Technica piece on the latest generation of graphics cards, I am forced to sigh, acknowledging that years of grueling study would never enable to me write with the erudition and verbosity of the author I'm reading.
Came along this survey of languages for scientific computation, and I finally saw the light. The trick is that you don't always have to be right. With that liberating thought, now it's my turn.
For concreteness, we'll define scientific computing as computer programs written by physical and natural scientists (not mathematicians or "data scientists") to perform time-consuming numerical calculations and simulations, which need to run as fast as possible, and where throwing more hardware at the problem is not economically feasible.
These conditions do not always hold, but they provide a useful framework for discussion, and they seem to be the implicit assumptions made in the Ars Technica article.
Not using Fortran
I cut my teeth on Fortran. It's where I wrote my very first (and only) bubble sort and (I admit this is peculiar) quite a few device drivers. I still have an uncanny ability to judge the exact moment when auto-repeat has emitted precisely six spaces, which accounts for my reputation as the life of any party. Notwithstanding this jolly history, I'm very OK charting a life course whereon I expect never to type another line of Fortran.
In circumstances requiring the performance said to be delivered only by Fortran, I would go one of three ways: First - honestly the most likely - I would discover that someone already wrote Fortran code for the same problem 40 years ago. Second, I could write it in C++, by which I mostly but not quite mean C. Third, I'd notice that the core cycle consumption of my code is in stuff that could be replaced by calls to libraries that were customized to my hardware platform, such as Intel's Math Kernel Library or ATLAS (discussed below), whereupon it wouldn't matter much from a performance perspective what I wrote the rest of the code in. These strategies will work because the performance benefits of Fortran lie almost entirely in keyhole optimization - maximally exploiting the processor in the tight loops. Most of these optimizations are performed by multi-language OSS toolchains like gcc and llvm; almost all of them are performed by cpu vendor C/C++ compilers like icc; and absolutely all of them are employed in the hand-tuned code of cpu vendor math libraries.
Go ahead, use Fortran
The one innate advantage of Fortran from a performance perspective is that its most complex data structures available are arrays of basic numerical types. Results and inputs can never overlap, and data are always aligned to be maximally digestible by the processor's vector instructions. In less restrictive languages, you have to follow instructions like these, which aren't particularly onerous.
Other than not having to read the foregoing link, the main reasons to use Fortran are
- It's what you know, everyone around you is using it.
- You don't really care about code quality and extensibility as long as you can get it to work.
For the record, these are very good reasons. They explain my Fortran device drivers, which I wrote largely to support my own graduate research. Most scientists are writing code that will not used for very long or by many other people and will be ultimately verified by independent implementation in other labs anyway. Training themselves to reconceptualize programming would be a waste of time. Personally, I found it to be an enjoyable waste of time, which is why I'm no longer a scientist. (Also, I didn't want to live in Podunk.)
If you don't love Fortran but have an existing mass of gnarly Fortran code that you don't want to rewrite, it is very much an option to call it from another language. Fortran's calling convention is essentially C's, except every argument has to be a pointer and 2D arrays are transposed. Any language that can call C can call Fortran too. Julia makes the process ridiculously easy, but it's never more than a few googles away in any language.
If you just want to be part of the club, consider using, in the language of your choice, single letters from i through n as integer counter variables and array subscripts. Your comp sci friends may snicker, but the code will look a lot more like the math behind it.
ATLAS shrugged - Outperforming Fortran
Have you ever wondered why it takes so long to install NumPy from source? Most of the time is spent in ATLAS, which does a brute force search over combinations of loop unrolling, multithreading and various other code substitutions, in search of the optimal implementations for a host of numerical routines (mostly matrix related). The reason they don't do this work on their own time and hardware is that optimality is highly dependent on exact configurations of CPU, cache and memory.
When you compare ATLAS-optimized to pre-compiled BLAS operations (irrespective of the language they were pre-compiled from), the difference can be staggering, far more than you can get by manual fiddling and experimenting with compiler flags, and often better than vendor-supplied libraries.
This, by the way, explains why JVM-based matrix math performance is so crappy. With the hard constraint of doing optimization just-in-time, there's no way to compete with someone who had hours to do it.
Another approach to linear algebra optimization is taken by the Eigen template library for C++. Here, the advantage comes from template metaprogramming, which allows custom rearrangement and inlining, specializing to fixed size operations, eliding temporaries by combining successive operations and spoon-feeding the optimizing compiler.
With a lengthy template instantiation stage, compilation takes longer, but of course not as long as ATLAS would have spent, and the resulting optimizations, while different from ATLAS's, seem to be as valuable.
Among ATLAS, Eigen and vendor libraries, which one benchmarks fastest will depend on specific circumstances, the most crucial of which is whether the benchmarks are being presented by, respectively, ATLAS, Eigen or the vendor. But no matter who does the benchmarking, the results are vastly better than what you would achieve coding the algorithms yourself in Fortran, despite its ostensible inherent advantages.
Functional Programming and Concurrency
When thinking about scientific computing, we should keep in mind that concurrency comes in three flavors:
- Managing a large number of asynchronous clients and events, and keeping them from destroying each other.
- Extracting the maximum floating-point operations per second from a single computer.
- Dividing a calculation into multiple, identical pieces that are "embarrassingly parallel" in that they don't interact until their results are harvested.
When we say that a language, particularly a functional language, makes it easy to "reason about" concurrency, we almost should mean the first of these. It's an extremely difficult and important problem in real world applications, but it's generally not what scientists are worried about.
Maximizing FLOPS, as we've discussed, is a complicated game. The main reason that functional programming is mentioned in this context is aliasing: as with Fortran, you don't have to worry about result arrays overlapping with input, but in the case of FP it's because the arrays are immutable. Immutability is not the easiest, and certainly not the cheapest way to deal with aliasing, and if it's achieved with pointer-rich persistent data structures, it won't even help you with vectorizing.
Embarrassingly parallel computations, it goes without saying, do not become vastly easier or more difficult as a result of language choice.
Assuming that it wasn't all a stunt by Marcel Duchamp's geeky younger brother, here, apparently, is what you need to know about scientific computing:
- There are exactly three contenders for the Fortran crown: Clojure, Haskell and Julia.
- The best way to compare them is by using them to compute the Fibonacci series to arbitrary precision.
This is asinine. Clojure, Haskell and Julia are wonderful languages, but they are not competing to displace Fortran. If you really need to view scientific computing from a Game of Thrones perspective, it would be dangerous to ignore Python and C++ (House Targaryen and White Walkers, respectively). Whether you like them or not, both of these languages are actually used by scientists, which is why things like Numpy and Eigen even exist.
Of the three ostensible combatants, Julia is the most likely to appeal to current users of Fortran, if only because they'll be able to read it easily and because, as mentioned earlier, it's extremely compatible with existing Fortran libraries. The REPL experience alone may be enough to draw them in, because the typical scientist's approach to code involves a lot of trial and error, which is vastly more pleasant without a compile cycle. I like multiple dispatch, but I don't think that's why scientists will like Julia. And now I'm going to stop talking about Julia, because I haven't actually coded in it, even though that isn't a good enough reason to stop some people.
Haskell is a terrific workout partner. I won't dispute that it can be employed to write useful code, but most people who learn it will tell you that it made them a better programmer - in another language. Most of what Ars Technica has to say about Haskell is accurate, but it does make one wonder why they chose to mention it in an article about Fortran and scientific computing. The fact that some guy at Lebanon Valley College taught a course using it is not compelling.
It's hard to imagine a worse example to use when evaluating languages for scientific computing. The irreducible challenge in computing Fibonacci sequences is efficiently maintaining an arbitrary precision integer, which has little to do with the vectorized floating point calculations so important to the simulation of physical systems. If, however, you decide to shove Fibonacci code examples into an article about Fortran, and you are for some reason OK with having all the heavy lifting done in someone else's BigInt implementation, you should probably provide examples that don't crash within seconds. If your program never makes it past a minute of execution, why the hell were you worried about its computational efficiency?
Ars gives us
(def fibs (lazy-cat [0N 1N] (map + fibs (rest fibs)))) (def fib [n] (nth fibs n))
fib :: Integer -> Integer fib 0 = 1 fib 1 = 1 fib n = fib (n-1) + fib (n-2)
For any challengingly large n, both of these programs will gorge on memory until they die. The Clojure example will run out of heap, having created a sort of snake that never relinquishes its head, while the Haskell version will exhaust stack. Of course, when they don't die, you end up with an approximately 2^n log(2)/log(10) digit number, so if you try this at home be sure to do something with the result other than printing it out.
While it still doesn't have anything to do with scientific computing, you can of course find Fibonacci numbers in Clojure and Haskell without running out of memory immediately. Of course, you're going to end up the way you would have done it procedurally, and then tie it up in a functional bow. Of the top of my head:
(defn fib2 [n] (second (nth (iterate (fn [[i1 i2]] [i2 (+ i1 i2)]) [0N 1N]) n)))
fib2 :: Int -> Integer fib2 n = fib' n 0 1 where fib' :: Int -> Integer -> Integer -> Integer fib' 0 _ i2 = i2 fib' n i1 i2 = fib' (n-1) i2 (i1+i2)
The Fibonacci Sequence is an even better example of a bad example than I thought. Recently, Intel introduced three new instructions to facilitate arbitrary precision integer arithmetic. On Intel hardware, presumably, the efficiency of any non-pathological Fibonacci generator will depend entirely on whether the library or classes you use to manipulate large integers employs. As with matrix operations, the choice of language fades in importance compared to the choice of library.