-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anything new? #7
Comments
One important TODO which I considered would be more pedagogical materials and more docs for numbered packages. I'm a bit lazy for this unfortunately, and I'm more motivated to figure out new stuff. My general idea about packages is that they should be the simplest possible implementations of features, which are still sensible in the sense that you could scale them up to production strength without conceptual change. Anyway, new packages that I considered:
I shall note that I believe the glued evaluation from smalltt to be critically important in production-strength implementation. I would like to add glued elaboration here as a package as well. But I also want to implement at some point the "fully scoped" version of smalltt, with arbitrary local metas, because it seems to be conceptually the most elegant and also the most efficient elaborator, which would also play along well with instance constraints, postponed constraints and let-generalization.
The evaluator which I have in all packages is already on par with Agda
This is very easy with hardwired types, like in Agda. Would be harder to do in an extensible/generic way.
I'm interested in this as well but have no strong opinions yet, and haven't deeply researched the existing approaches. |
A de Bruijn version of Holes would be awesome :) |
@AndrasKovacs if you are interested. I just finished HOAS to JVM bytecode https://github.com/molikto/mlang/blob/master/src-main/src/main/scala/mlang/compiler/PlatformEvaluator.scala#L903 it doesn't "full reduce" though, as in cubicaltt there is some compliciations. |
also may I ask what exactly is "glued evaluator", my understanding is when solving meta, whnf is needed, a glued evaluator will be able to trace back to a much simpler non-reduced term, so solved meta is more readable. |
Nice. Do you have benchmark data? Could you perhaps benchmark this as well? I've been interested in HOAS JIT for a while, haven't gotten around to testing it.
See smalltt and the slides linked from there. The point is that whenever we attempt to solve a meta with a value, we want to
The two things are at odds with each other, unfortunately; the first requires to fully unfold (with call-by-need strategy preferably), the second requires to avoid unfolding as much as possible. A glued evaluator computes two (or more) differently represented values from the same syntactic term. I used it in smaltt, to always have the two specialized values available when solving a meta. It also avoids work duplication compared to when we have two different evaluators instead of just one glued evaluator. |
I just tried it: https://github.com/molikto/mlang/blob/master/library/01_foundations/00_builtin.poor forcing 10M seems not terminating in reasonable time forcing1M: conversion 1M polymorphic in both branch: ~5s conversion checking non-polymorhic version 10M seems fast in both branch. (why?) I don't know if my implementation is efficient though... let me debug and see what's happening... I wonder why Coq native_compute failed to compile.... tried on JVM JVM (implemented in Scala): 10M in 8ms fixed a bug in full-reduction branch, now forcing 10M takes 2 ms (even faster than Scala?!!!) but conversion checking seems NOT affected... still 5s and overflow. I guess full-reduction is really good for computing normal form for closed terms. |
also this is what I do for reify https://github.com/molikto/mlang/blob/master/src-main/src/main/scala/mlang/compiler/semantic/10_value.scala#L122 basically it tries to see if this value is a whnf computed from other term, and the other term is used as meta solution, as it should be simpler. |
A bit suspicious to me, maybe you are not doing the same amount of work. For forcing, you could also use a |
You are right! in the full-evaluation mode, I measured the wrong thing! so trivial forcing used time: 106ms, using count: 1716ms This doesn't include the time to generate the JVM bytecode and load it to JVM, I only measured the time to evaluate the JVM expression.
for conversion checking, I think it is mostly because the recursive function will stackoverflow on such a deep structure, it will not be longger than 5s if we change it to a loop (if possible?) Sorry for the wrong results given. so the correct result is: trivial forcing 10M: conversion 1M: conversion 10M: stackoverflow on both I wonder why my whnf implementation is so slow here. maybe HOAS doesn't help with evaluation speed, only when you do cbv with HOAS (full reduction). I wonder what about a full-reduction cubical evaluator, currently: computing "problem" https://github.com/mortberg/cubicaltt/blob/pi4s3_nobug/examples/problem.ctt#L519 in cubicaltt uses 40ms, and in mlang it is 200ms. A full-reduction cubical evaluator will be computing open terms a lot of time, because it is just how cubical theories works, and you need "renormalization" which might be a waste of time (because later you need to do it again after a "swap"). So I am not sure if full-reduction will help a cubical evaluator... but I guess I am not doing it now, it is a lot to change to make it work full-reduction generally. What I have on full-reduction branch is just a hack to make the church numbers works. |
@molikto my expectation is that JIT HOAS should be always faster, for cubical as well. Also, I did some Haskell benchmarking which showed that the difference between HOAS closed evaluation and HOAS open term normalization is not big, maybe 10-20%. I think that the difference should not be dramatic in other systems as well. |
@AndrasKovacs it seems cubicaltt use cbv and renormalize after context restriction (although it is in haskell, a lazy langauge, don't know if this makes any difference?) mlang instead always works with whnf. in cbv you need to renormalize, in cbn you don't, so this complicates things. but I don't actually know if this matters that much. maybe I should try convert mlang to cbv... (but then you seems still need to whnf in typechecking? because you need a good small term for meta solution?). |
@molikto What do you mean by "renormalize"? Can you give a link to a line in cubicaltt? Cubicaltt uses call-by-need AFAIK. |
https://github.com/mortberg/cubicaltt/blob/master/Eval.hs#L106 here, after a nominal operation (act, swap, etc.), neutral values needs to be eliminated again (compare the VPair and
|
OK, I see what you mean by renormalization. I think it is just inefficient design in cubicaltt. They seem to use nominal sets for interval variables but normal NbE-style vars and evaluation for "normal" variables. IMO, nominal sets should be just removed entirely, and instead one can handle interval vars similarly to normal vars. That way, this instance of the "renormalization" issue disappears as well.
Yes. |
I would think that this is actually necessary in cubical type theories because of the odd behavior of dimension variables. The problem is that if you perform a renaming which sends previously distinct dimension variables to the same thing, you can eliminate a (now trivial) coercion or composition. In particular, The upshot of all of this is all the current implementations of cubical type theories I'm aware of have to be very careful to do seemingly redundant work. |
@jozefg hello, are you implementing XTT? :-) |
@molikto Sadly I think none of us are currently working on implementing XTT. In my case, time constraints and some other projects I'm more interested in at the moment, along with the fact that I'm unsure of how to extend the theory of XTT with some additional features I would like. |
@jozefg I'm aware of such reductions in CTTs. My observation is specifically about evaluation in cubicaltt, which seems to have rather major efficiency issues. I haven't looked at cubicaltt source closely before, I've noticed this just now. On this line, we see that path application uses the Unsurprisingly, the solution is to get rid of naive substitution. If we want to keep interval variables and expressions separate from the non-interval ones, we can just have dual or explicitly interleaved value environments, and handle interval substitutions the same way as non-interval ones. This setup is also compatible with the cubical reduction rules. |
@AndrasKovacs I am interested in what you are saying, but I don't see how it is possible to treat interval substitutions "the same" as we treat other substitutions. What I mean is the following: In the Coquand-style nbe algorithms, as you know, we never do substitutions; instead, we have closures, and all the "substitutions" that need to happen correspond exactly to instantiation of closures. It would be nice if cubical type theory's interval variables worked this way! Unfortunately, there are plenty of times where you apparently need to do a "substitution" which is non-local, in the sense that you do not have your hands on some closure that you can just instantiate. Have you thought about these algorithms and whether there is a way around this problem? I have been wanting to think about it again. Unless we can fix this, I think cubical type theory may be kind of hopeless as far as efficient implementation is concerned. |
(Of course, I am aware that cubicaltt and redtt suffer from some very naive mistakes in implementation: but I am talking about the apparently more difficult problems which can't be chalked up to hurried implementation.) |
I guess you mean the cases when we have to pattern match under abstraction, e.g. in composition computation rules? I agree that these are not so straightforward. But a) there should be at least several possible implementations which are not as nearly bad as recursively de-sharing closures b) even if we handle the tricky cases naively, the simple cases, such as path lambdas and applications, can still benefit greatly from closure-based implementation. I note that the issue of things not lining up neatly as closure instantiation is not unique to cubical TT. In fact, it pops us when elaborating plain old type theory as well. A classic example is inferring a type for an annotated lambda This is similar to a (simplified) coercion computation case, where we have
Here, the computation rule peeks under the closure. The obvious way to match on I list some solutions for doing the value-closure conversion.
So there are definitely many things to try. I feel now a bit motivated to explore this and attempt some high-performance cubical TT implementation. There's a chance I'll start doing this a couple of weeks from now when the current paper deadline season is over. |
I would be very interested in continuing to discuss this, and perhaps collaborating; our research group is beginning to resume thinking about how to properly evaluate cubical type theory, so it would seem to be good timing. I agree that in the case of implementing coe, some kind of closure/value munging can be done, but the problem is that there really do appear to be cases that do not fall under that category. These are deep in the algorithm of composition in glue (or in V + formal composite types, if you follow the AFH style); I tried a number of "smart" things of the style you are advocating, and it was unfortunately late in the process that I found they didn't scale up to implementing the harder parts of the evaluation semantics of cubical type theory. Perhaps a careful analysis of composition in glue / V / fcom can provide some insight, but at this point I am quite cautious about any approach which doesn't allow me to just substitute a dimension variable non-locally (even though I am aware that is very inefficient). In redtt, we had some delayed substitutions going on, but these were incorrectly implemented and very hard to debug. |
Regarding calling quote from eval, I have not considered doing this because I do not know how I would prove such an algorithm correct in principle (this approach was considered back in the original OTT paper, but later abandoned when they found an alternative method which was more conventional). Of course, I believe that one must interleave quotation and evaluation in order to implement features like regularity; it is perhaps a blessing that we don't have regularity in most versions of cubical type theory ;-) |
@AndrasKovacs Mortberg suggested me to checkout pi4s3_nobug branch in cubicaltt when I am working on mlang. This is the branch with most updated computation rules I think. |
@jonsterling Thanks for the comments. I don't yet have a good grasp on gluing composition; I will be hopefully more qualified to comment after I've delved deeper into this.
NbE with universes has to include quote in the model together with the correctness of everything, and I expect that this wouldn't change when moving to NbE for cubical TT. So it doesn't seem far-fetched to me. |
@AndrasKovacs I am not sure what you mean about nbe with universes; it is not the case that eval and quote are mutually recursive in nbe with universes. |
BTW, I have recently been studying your "glued evaluators" and it is really fascinating; I think I understand in principle what you are doing, but at the technical level I got a little confused... My problem was that I was trying to look at the code and find the moment at which you did something different on the local vs the global side (where the "rubber meets the road"); could you point me to some decisive line of code which would give me the insight? Once I understand it a bit better, I'd like to explore the possibility of adopting this style in my future implementations of type theory. |
Inspired by your idea of glued evaluation, I've done the following in Sixty. I've found it to be quite easy to implement and it seems to allow many of the same optimisations. The gist of it is to add an extra domain constructor, which tracks a head and spine that isn't unfolded, as well as a lazy unfolded value. Quotation uses the head and spine to get small terms. Unification first tries unifying the head and spine before diving into the unfolded value. Scope checking does something similar. I'd be interested to hear your thoughts on this @AndrasKovacs. I'm pretty sure it isn't quite as efficient as your version, but I've found it quite pleasant to work with. |
@ollef Looks pretty cool! I just copied your idea and implemented it in my "cool-tt" prototype. |
To my knowledge it is the case. Both here and here, quoting is included in the semantics, i.e. a type is interpreted as a dependent preasheaf together with quote and unquote maps. Now, if we're looking at the above NbE implementations as they are, quoting is not given mutually with the evaluation morphism into the model; in a hypothetical pattern-matching definition in Agda, Without universes, e.g. as in here, I'm being a bit hypocritical here, because in my TT implementations I never actually (operationally) use NbE given by modified glued models in presheaves over renamings. Instead I use purely syntax-directed NbE with Abel-style "liftable terms", which I understand way less on a formal level than normalization-by-gluing. Anyway, I'm saying that if NbE extends from MLTT with universes to cubical TT and the basics don't change too much, then I'm always able to quote terms to nfs when giving the model, and with Russell-universes as in Thierry's version, I can also quote types to normal types. So implementing a regularity rule seems feasible, assuming that it's not blocked by other issues. |
@AndrasKovacs I think there may be some misunderstanding about how quotation works in your summary of the literature. In thierry's paper that you link to, quotation is indeed bundled into the computability data of each type, but it is not "mutually recursive" with evaluation. In fact, evaluation isn't even defined until after the entire model is constructed; evaluation is obtained from the initiality of syntax. With all this said, there is quite a bit of asymmetry between the role of evaluation in operational accounts of nbe (e.g. Abel-style nbe, which we implement) and semantic accounts of nbe (Altenkirch-Kaposi 2016[?], Coquand 2019, etc.). It is quite easy to see however that the algorithm for evaluation never calls quote or readback, if you "extract" it from the semantic constructions (e.g. those of Thierry's paper). That's all I meant when I said that there is no mutual recursion. Anyway, I'm open to using a mutual recursion if it is totally necessary, but I see it as a "code smell", a real indication that something about the setup isn't yet understood properly. |
I was referring to this point when talking about using closures vs. meta-level functions for implementation.
Thierry's paper clearly calls quote in the interpretation of Perhaps it's better to avoid using the term "mutual recursion", which a bit fuzzy, and just talk about (dependent) models. To phrase my point differently: in Thierry's model, if I wanted to, I could quote things to nf and decide conversion/strengthening. E.g. I could add a definitional equality to the syntax which says that the Bool -> Bool -> Bool equality testing function shortcuts to |
I will have to take a look at Thierry's paper again when I have more time, but let me just say that it is totally unnecessary to call quote in eval of bool induction. I have implemented that dozens of times. |
@jonsterling It's necessary if we want nfs to be eta-long for function types and negative types in general. In that case there is no inclusion of arbitrary neutral terms into nfs, only those which have neutral types or positive types. I agree that this is something that we totally ignore in practical implementation, because we use a separate conversion checking algorithm that can handle etas. But if the goal is to get decidable conversion with function eta out of normalization-by-gluing, then this is essential. |
@AndrasKovacs I do not believe that is true. It is very easy to achieve eta long normal forms for negative types without calling quote from eval; or if I am missing something, can you please supply a counterexample? |
@AndrasKovacs I spent my walk to the office thinking about your comments -- I still haven't had time to check Thierry's paper again, but now I understand the point about bool-ind in semantic NBE. The computability data for the motive must of course bundle a family of quotation functions (really, we should call these "reification" functions, not quotation! Quotation is a totally different thing). However, as you know, when you defunctionalize the naive semantic algorithm, this ceases to be necessary. But based on this discussion, I think I see that these semantic versions of nbe may actually provide a path to justifying these very non-traditional algorithms which call quote from eval. Regarding the eta-long negative types, I am still puzzled by the comment. This is always achieved by implementing quote in a type-directed manner, and eta-long-ness is not affected by anything you do in |
I can attempt an explanation why (eta-long negative types)+(positive types) requires calling quote from eval. Let's work with STLC with functions and Bool, and we have the recursor
We have normals and neturals as:
Let's do a simple presheaf model over renamings; every type is a presheaf. We interpret Bool as follows:
So
Now for
We need to somehow turn a stuck
Which is what Thierry did in the note. I don't have a super pithy or intuitive take on this issue unfortunately. There is more discussion in Abel & Sattler. They have an alternative, more sophisticated implementation for STLC which avoids quoting from eval, and supports some eta laws for coproducts as well. I don't know if the same would work in dependent TT. What if we don't want eta-long functions, but want positive types? Then any neutral term is normal, and we can define a plain presheaf model, where neutral terms can be included in any semantic type, so that every A is interpreted as I also remember that having a universe + eta-long negative types also requires quoting from eval or some other trickery. Unfortunately, I don't recall details at all; maybe there's something with the interpretation of type codes. |
@AndrasKovacs I see what you mean about needing to call reify and reflect, but please note that in Abel's defunctionalized nbe, these are just generators of the semantic domain, so there is no actual recursive call happening. This is why, as I have been trying to convey for a while, there is no call to I agree with your general point about semantic nbe, but the difference here is very important: calling an actual quotation function during evaluation is kind of nutty because it results in walking the entire term, and it is a fact for many years that we can totally avoid such calls in the implementation of standard MLTT (yes, with eta-long normal forms at negative type, and weak positive types); this is possible through Abel's defunctionalization of nbe. On the other hand, it is not clear that more ambitious use of actual quotation during evaluation can be accounted for in the same sensible way. Let me list below some applications of quote-during-eval which do not appear to fit in Abel's defunctionalized paradigm, and currently seem to require an actual recursive call in the algorithm:
This is why I am making a distinction between genuine recursive calls, and those which may be easily defunctionalized. I think I agree with you now that there may not be a problem mathematically justifying a mutual recursion, but it remains that this is a kind of bizarre and expensive thing to do, indicating that we do not yet fully understand the operational aspects of the type theories which seem to require (1-4) above. |
Can you give a reference for "Abel's defunctionalized nbe"? It's not clear to me what that means. |
I mean the style of nbe where you use closures instead of "higher order", described in his habilitation thesis. |
The main contribution was to kill the 'higher order' aspect of nbe algorithms by adding inert constructors to the semantic domain for reification and reflection, which are then interpreted appropriately by the Old-school algorithms for nbe on an untyped domain required a semantic domain which is not a set; domain theory was used in the 90s to solve the recursive domain equation. In 2020, the situation is more clear:
I like the typed approach best, but I don't yet have a programming language which can represent it ;-) I need QIITs. |
Is the "reifly/reflect inert node" essentially the same as this? I haven't made the effort to look into algorithms which quote to eta-long forms efficiently, because I find them practically inferior to implementing conversion checking directly in the semantic domain. These algorithms are between between the Most Practical and the Formally Nicest; they are inefficient and still unwieldy formally. In my mind the Practical and Formal approaches are, perhaps unfortunately, rather separate. Hence, when justifying non-linear patterns or regularity, I'm approaching this purely from semantic nbe standpoint, and I'm content to have any inefficient solution. I admit this is not a good situation, but I have no idea how to make formally nice nbe fast, or fast nbe nicely formal. |
Yes, that's the idea; Danny, Lars and I also spent some time trying to explain Abel's work in our recent paper: https://dl.acm.org/doi/10.1145/3341711 BTW, I strongly agree with you that in practice there is no need to quote to eta-long normal forms because you should definitely implement conversion directly in the semantic domain. |
@jonsterling @AndrasKovacs will solving the non-local substitution makes cubical type theory efficient enough to compute Brunerie's number though? I am not sure... I think the computation rules for transpGlue produce really big intermediate terms when computing something like Brunerie's' number |
I don’t know for sure, but I think it is a prerequisite for making more progress on the problem :) I have some new ideas this week on avoiding naive dimension substitutions which I will try to experiment on soon. |
What's the plan after poly instantiation? I wish there can be:
fastReduce
that throws the term onto an abstract machine to do the computation, that sounds coolThe text was updated successfully, but these errors were encountered: