Hindley-Milner is cool, but if you're planning on implementing a language with type inference, I strongly suggest you take inspiration from bidirectional type inference, notably Pierce-Turner[1] and subsequently Dunfield-Krishnaswami[2]. These algorithms are capable of inferring much more complex types and generally produce better error messages. Typically, there are very few programs which can't be typed when you combine Dunfield-Krishnaswami type inference and have the user place annotations on major function definitions.
Why are they capable of inferring much more complex types? Which sort of types can Hindley-Milner infer – what sort of types can Pierce-Turner and Dunfield-Krishnaswami respectively infer? Which programs still can't be typed using the latter method?
Dunfield-Krishnaswami handles higher ranked polymorphism but is probably most famous for just being a very nice presentation of bidirectional typechecking. Algorithm W (Hindley-Milner) is only able to infer first-order polymorphism occurring at generalization points (usually "let"s).
Here is what they say in the introduction of the second paper:
Bidirectional propagation of type information allows the types of parameters of anonymous functions to be inferred. When an anonymous function appears as an argument to another function, the expected domain type is used as the expected type for the anonymous abstraction, allowing the type annotations on its parameters to be omitted. A similar, but even simpler, technique infers type annotations on local variable bindings.
From this I gather it means syntactically bidirectional. (But I don't understand this really, so someone should correct me :P)
The algorithm is split into two mutual components the "inference" and the "checking" (upward and downward) which run together. This turns out to be a really flexible setting for playing with language and type features as it's often possible to figure out how to extend a bidirectional typer/inferrer with new features.
edit: but it’s incomplete: it never gets around to discussing subtyping and thus helping justify its existence. I think the main point is that due to subtyping and other complexities, we may not be able to synthesize the type of an expression just by looking at it, but if its type has been determined by other constraints, we can still check that that type is valid for the expression.
The trend seems to be for languages to have forward inference of types within functions, while declaring them in function definitions. Assignments which initialize a variable also implicitly declare its type. C++ started this with "auto", but it's really because programmers had become used to dynamic languages where assignment set type. Go and Rust both do this.
Inferring types of function parameters and results isn't as useful. Combined with generics, it gets really complicated. It seems better to have those declared explicitly. Programmers need to be able to read function signatures when programming. They're an essential part of the documentation. Also, whole-program type inference and separate compilation inherently conflict.
That's pretty much where things seem to be settling out. Even Python now has optional type declarations for function signatures.
> The trend seems to be for languages to have forward inference of types within functions, while declaring them in function definitions. Assignments which initialize a variable also implicitly declare its type. C++ started this with "auto", but it's really because programmers had become used to dynamic languages where assignment set type. Go and Rust both do this.
Note that this is quite different to, and simpler than, HM inference. It doesn't require unification like the article describes and is something all statically typed languages practically have to do anyway: even with full annotations, validating `T x = f()` (or equivalent) requires type checking the `f()` expression to see that it actually is a T. And thus, this style of "inference" can be gained by simply not requiring the T type to be written and just taking the result of type checking `f()` as gospel, and is what C++ and Go do. (There's a bunch of subtlety/complexity for this in C++, but that's due to C++, not this form of "inference".)
Proper HM (or similar) type inference will use information from how variables are manipulated after their declaration to decide on the type, and so is more than what's already done for the purposes of type checking. This is what Rust does, breaking the supposed trend.
Although even with local type inference you usually want to do some unification based inference also, at least to infer the type arguments for generic functions. But it's still simpler than inferring the most general types for functions themselves.
> Inferring types of function parameters and results isn't as useful. Combined with generics, it gets really complicated.
I wonder why you think so? The inferred types in a language like Haskell or OCaml are almost always exactly what the programmer would have written. So long as a language has principle type inference, the worst case is that the inferred type is a bit more general than you expected.
> Also, whole-program type inference and separate compilation inherently conflict.
There is no conflict here, actually. All type variables in definitions exported from a module are generalizable. As such, no uses in other modules will further refine those types.
> I wonder why you think so? The inferred types in a language like Haskell or OCaml are almost always exactly what the programmer would have written
I have no experience with ML-using languages but in Scala, which has inferior but still useful type inference, it is recommended to add explicit types to public functions/methods because otherwise, implementation details or errors might alter the signature.
Take for example:
data Tree = Node Leaf Leaf | Leaf Integral
makeEmptyTree x => Leaf x
I assume `Leaf` will be inferred as a type, while the programmer might have meant `Tree` to be the public API. (or vice-versa)
This is largely to do with subtyping and it's well known that systems with user-defined nominal subtyping don't have good inference. Scala's inference is really in the middle: squarely better than C/Java and much worse than Haskell/OCaml.
C++ most certainly did not start this. It had been in ML
(orginally developed by Robin Milner) for decades. It has been a feature of every ML derived language, and Scala had it back in 2004.
Things are "settling out" at that stage with non-ML-derived languages because that it the most inference you can get without going full Hindley-Milner. It doesn't mean that inferring types of function parameters and return values isn't as useful.
I don't really like where type inference is settling out. Maybe it does come from programmers becoming used to dynamic languages but it doesn't mean it's better.
I find
val x = fn()
to be almost maliciously hiding the type when
int x = fn()
is just as simple and gives you much more information, especially when reading code outside an IDE such as on github. val/let/def and the like are just low information cruft.
A lot of times I'll find myself creating an extra local variable just to have the type info in the variable name since otherwise it's invisible from just reading the code. Essentially adding an extra unchecked type system layer I have to manually maintain:
val listOfThings = functionThatHopefullyReturnsThings()
But then again, Julia is a relatively new language that aims for high performance in scientific computing while using type inference. Helps to have multiple dispatch that tells you how many definitions there are for a function.
Julia however does not use Hindley-Milner[1]. I'm under the impression that HM is not compatible with multiple dispatch, which would be very unfortunate as typeclasses can serve the same purpose while maintaining compatibility with HM.
if the editor is aware of the type inference, you can display the signature without having to type it. it can be useful as you get automatically generic functions, which can save a lot of time if you decide the refactor later and use say, floats instead of doubles. you don't have to edit any functions. just start using floats.
Is this what rust uses in IntelliJ? I know they have built some tooling that IDEs delegate to for tooling such as autocomplete etc rather than building it n times in n editors
Any editor or IDE with F# or TypeScript support can show the inferred type for anything you have selected or hover over. Ionide (an F# plugin for VS Code) can even show the inferred type over each let binding (including functions) within the text editor itself.
I dare say we can call this "yes, for many popular languages, in many modern IDEs".
Might I suggest taking a fresh look at IDEs? It sounds like you might be stuck in text-editor land, wondering how green the grass is on the other side (or if they even have grass). IDEs do indeed provide grass, and a significant portion of the time it'll even trim and weed itself when you look at it and wonder if it's time to mow.
IDEs promise a lot, and in many cases they deliver at least some of them. But they're all or nothing: to benefit, you must use their text editor, their debugger, their supported suite of compilers, their documentation browser, their everything.
A trend has emerged recently: editor-agnostic language oracles. These tools expose an IPC interface for querying, and in some cases even mutating (i.e. refactoring), source code. They often incorporate a compiler so the type information you get is perfect, and exactly what the compiler sees. Integration with the programmable text editor of your choice (vim, Emacs, Atom, Sublime Text, VSCode…) is a simpler matter of making the right IPC calls.
I think this is the sweet spot. I would hate to be locked into an IDE's poor text editor or buggy interface just because they offer some convenient tools.
Some examples of editor-agnostic tooling:
* CIDER for Clojure and ClojureScript (via weasel or figwheel). The primary editor plugin is Emacs, but vim and Sublime Tex support exists outside the main repository.
Oh yeah, very much agreed (though debuggers are rarely bound so tightly in my experience). And all the language-server-protocol stuff seems exciting and fantastic and I hope it brings about a new breed of tools
But that's probably still a year or two away at least for any given language for basic support, much less reaching parity with mature IDEs. In the meantime...
The editor I had to write for Full Metal Jacket (my visual dataflow language) infers types by a mechanism similar to Prolog unification when you draw edges to connect vertices/function calls, and won't let you draw edges where the types at each end are incompatible. I imagine it would be a lot harder to do this in an editor for a text-based language, so it would probably be left to the compiler to check, which would have to build its own graph connecting function calls.
Incidentally, things have moved on in the functional programming world from vanilla Hindley-Milner types, and there's a lot of interest now in dependent types, which depend on values returned at run time, e.g. if you're drawing on a window, the x-coordinate's type is a number from 0 to the window's width, which might change while the program is running. This adds an extra layer of type safety, eliminates a further class of errors, and allows the compiler to safely remove some run-time checks.
Merlin [1] can provide type information for OCaml and integrates with emacs, vim and sublime (at least). One can easily select any expression and get its type. From experience, it's very convenient to spot mistakes leading to properly typed code, but not with the expected type(s).
One drawback of the Hindley-Milner type system is that its typecheckers are necessarily non-compositional. The basic idea here is that this can result in "non-symmetrical" error messages[1]: given something like
MkPair :: a -> b -> Pair a b
not :: Bool -> Bool
succ :: Int -> Int
foo x = MkPair (succ x) (not x)
the error message will either complain that `x` has type `Bool` but it should be `Int`, or it will complain that `x` has type `Int` but it should be `Bool` -- depending on what implementation you use! For example, in Haskell, Hugs 98 will emit the first error message, but GHC 7.10 will emit the second (see the full example in my slides[2]). And in some sense, neither is right (and certainly neither is as helpful for the programmer as it could be).
So next time you're implementing vanilla HM, maybe consider a compositional type system[3] instead, which can give the following, much more informative error message:
Cannot unify 'Int' with 'Bool' when unifying 'x':
Cannot unify 'Int' with 'Bool' in the following context:
MkPair (succ x) (not x)
Bool -> Pair Int Bool Bool
x :: Int Bool
I have a simple implementation here[4], and am about to release a new version which works on `haskell-src-ext`'s AST to support more of Haskell 98 syntax (but still for vanilla HM only).
Hindley Milner is awesome and surpursingly not super hard to implement. Shameless plug: Wrote a minimal implementation in OCaml back in college https://github.com/prakhar1989/type-inference
I first became interested in programming language theory back in high school when a guy in an IRC chat, who was a student at UIUC, asked me to do his homework for him. The assignment was to implement Hindley-Milner type checking with unification in OCaml. Thanks, lazy IRC guy!
[1] http://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf
[2] https://arxiv.org/pdf/1306.6032.pdf
The other direction also holds:
Now we want to ensure that e conforms to type t.Consider for example, a function:
and an expression that we know the type of already: Therefore we can say when we apply `foo` onto `e`, we'll yield something of type `b`In effect, you perform 2 passes: forwards inference, and then backwards inference.
p/s: I'm not actually a PL person. This is from my casual reading, so my understanding is highly probably very wrong
Bidirectional propagation of type information allows the types of parameters of anonymous functions to be inferred. When an anonymous function appears as an argument to another function, the expected domain type is used as the expected type for the anonymous abstraction, allowing the type annotations on its parameters to be omitted. A similar, but even simpler, technique infers type annotations on local variable bindings.
From this I gather it means syntactically bidirectional. (But I don't understand this really, so someone should correct me :P)
https://people.mpi-sws.org/~joshua/bitype.pdf
edit: but it’s incomplete: it never gets around to discussing subtyping and thus helping justify its existence. I think the main point is that due to subtyping and other complexities, we may not be able to synthesize the type of an expression just by looking at it, but if its type has been determined by other constraints, we can still check that that type is valid for the expression.
Inferring types of function parameters and results isn't as useful. Combined with generics, it gets really complicated. It seems better to have those declared explicitly. Programmers need to be able to read function signatures when programming. They're an essential part of the documentation. Also, whole-program type inference and separate compilation inherently conflict.
That's pretty much where things seem to be settling out. Even Python now has optional type declarations for function signatures.
Note that this is quite different to, and simpler than, HM inference. It doesn't require unification like the article describes and is something all statically typed languages practically have to do anyway: even with full annotations, validating `T x = f()` (or equivalent) requires type checking the `f()` expression to see that it actually is a T. And thus, this style of "inference" can be gained by simply not requiring the T type to be written and just taking the result of type checking `f()` as gospel, and is what C++ and Go do. (There's a bunch of subtlety/complexity for this in C++, but that's due to C++, not this form of "inference".)
Proper HM (or similar) type inference will use information from how variables are manipulated after their declaration to decide on the type, and so is more than what's already done for the purposes of type checking. This is what Rust does, breaking the supposed trend.
I wonder why you think so? The inferred types in a language like Haskell or OCaml are almost always exactly what the programmer would have written. So long as a language has principle type inference, the worst case is that the inferred type is a bit more general than you expected.
> Also, whole-program type inference and separate compilation inherently conflict.
There is no conflict here, actually. All type variables in definitions exported from a module are generalizable. As such, no uses in other modules will further refine those types.
I have no experience with ML-using languages but in Scala, which has inferior but still useful type inference, it is recommended to add explicit types to public functions/methods because otherwise, implementation details or errors might alter the signature.
Take for example:
I assume `Leaf` will be inferred as a type, while the programmer might have meant `Tree` to be the public API. (or vice-versa)https://en.wikipedia.org/wiki/C_Sharp_3.0#Local_variable_typ...
C++ most certainly did not start this. It had been in ML (orginally developed by Robin Milner) for decades. It has been a feature of every ML derived language, and Scala had it back in 2004.
> forward inference of types within functions, while declaring them in function definitions.
part, not type inference in general.
I'd agree if we're talking about top-level only, though.
I find
to be almost maliciously hiding the type when is just as simple and gives you much more information, especially when reading code outside an IDE such as on github. val/let/def and the like are just low information cruft.A lot of times I'll find myself creating an extra local variable just to have the type info in the variable name since otherwise it's invisible from just reading the code. Essentially adding an extra unchecked type system layer I have to manually maintain:
[1] https://stackoverflow.com/questions/28078089/is-julia-dynami...
D also has this.
https://dlang.org/spec/declaration.html#AutoDeclaration
Does any actual editor do that? Few text editors for programs are even aware of definitions in other modules.
http://www.mew.org/~kazu/proj/ghc-mod/en/
It works with many text editors, like Emacs or vim.
edit: the trend today is to work with the "language server protocol":
http://langserver.org/
Haskell people are writing new tooling based on LSP to supersede ghc-mod:
https://github.com/alanz/haskell-lsp
Might I suggest taking a fresh look at IDEs? It sounds like you might be stuck in text-editor land, wondering how green the grass is on the other side (or if they even have grass). IDEs do indeed provide grass, and a significant portion of the time it'll even trim and weed itself when you look at it and wonder if it's time to mow.
A trend has emerged recently: editor-agnostic language oracles. These tools expose an IPC interface for querying, and in some cases even mutating (i.e. refactoring), source code. They often incorporate a compiler so the type information you get is perfect, and exactly what the compiler sees. Integration with the programmable text editor of your choice (vim, Emacs, Atom, Sublime Text, VSCode…) is a simpler matter of making the right IPC calls.
I think this is the sweet spot. I would hate to be locked into an IDE's poor text editor or buggy interface just because they offer some convenient tools.
Some examples of editor-agnostic tooling:
* CIDER for Clojure and ClojureScript (via weasel or figwheel). The primary editor plugin is Emacs, but vim and Sublime Tex support exists outside the main repository.
* rtags and ycmd for C and C++
* tern for Javascript
* racer and RLS for Rust
* eclipse.jdt.ls for Java
* guru for Go
Microsoft has even created a protocol for these daemons to follow, which makes editor integration even simpler: https://github.com/Microsoft/language-server-protocol/wiki/P...
But that's probably still a year or two away at least for any given language for basic support, much less reaching parity with mature IDEs. In the meantime...
Incidentally, things have moved on in the functional programming world from vanilla Hindley-Milner types, and there's a lot of interest now in dependent types, which depend on values returned at run time, e.g. if you're drawing on a window, the x-coordinate's type is a number from 0 to the window's width, which might change while the program is running. This adds an extra layer of type safety, eliminates a further class of errors, and allows the compiler to safely remove some run-time checks.
http://caml.inria.fr/svn/ocaml/trunk/emacs/caml-types.el
[1] https://opam.ocaml.org/blog/merlin-2-0-0-released/
(It also does that for C++ and C# and whatnot for inferred types, but I thought I've seen other editors do it too...)
So next time you're implementing vanilla HM, maybe consider a compositional type system[3] instead, which can give the following, much more informative error message:
I have a simple implementation here[4], and am about to release a new version which works on `haskell-src-ext`'s AST to support more of Haskell 98 syntax (but still for vanilla HM only).[1] https://gergo.erdi.hu/blog/2010-10-23-the_case_for_compositi... [2] https://gergo.erdi.hu/talks/2016-06-compty/CompTy.pdf [3] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.8... [4] https://github.com/gergoerdi/hm-compo/blob/master/src/Langua...
But also thanks to that detour, this article is a nice read on OCaml in particular: http://okmij.org/ftp/ML/generalization.html
I had to rewrite mine (https://github.com/chewxy/hm) twice, and finally removing typeclasses to be fast enough for a inference system I wrote.