-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implied End Tags not Handled Correctly #110
Comments
So one hypothetical way to deal with this would be to add an <ul>
<li>a
</li><li>b
</li></ul> The closing I think creating the |
I have been looking into where to best address this. The selectors VM already maintains a stack of currently open elements. This means that in order for implicitly self closing elements not to confuse the selector VM the VM needs to be updated accordingly. As far as I can see the VM is entirely fed from the However I'm entirely unsure how the system works. In particular there is this tag sink and tree builder system internally which I'm not at all familiar with so I'm a bit stuck here with the limited time I have to spend on this. If someone were to point me at the right place to add this logic I'm happy to provide a patch. |
So my current understanding is that tokens are effectively only produced via the The main The other challenge I see is that one effectively would need to either move the stack out of the Going by the HTML5 spec the actual logic of the implicit end tag handling is specified in "13.2.6.3":
These "generate implied end tags" callouts are happening on a few start tags (like The quickest hack were I believe to expose the token stack from the |
Generating implied end tags is in general can't be done on streams and requires full AST. lol-html was never designed to operate on ASTs, rather it provides a simplistic "token tree" representation of the markup. Also, it goes against lol-html paradigm of rewriting/modifying only the content specified by selectors. There are lots of other corner cases in HTML that we can hit once we step into implicit tag nesting territory, so it's better stick with the current paradigm of the "token tree". If you need something more advanced and streaming/memory consumption is not a concern I would suggest to use solutions that perform full HTML parsing (e.g. html5ever) In your example I see the following:
is |
I take your word for it, but I'm not entirely sure why that is true. The system already needs to maintain some sense of tag stack and that stack should be sufficient to figure out if there are tags that need to close and such events can then be emitted as events.
The issue is that on the one hand the selectors are wrong if the end tags are missing, on the other that there is an API to change the tag name which is not safe to use on tags which might have implied end tags.
That is the output. I'm assuming it thinks that the |
ok, then this is not good.
We wanted to keep "token tree" construction rules as simple as possible - it's much easier to explain/document simple nesting rules, also we need to do as little implicit modifications as possible due to the security considerations. Partial tree construction rules implementation (implied end tags in this case) is a slippery slope. It still will have lots of corner cases where it will do the wrong thing as we can't always have proper stack as spec requires us doing streaming parsing (see infamous Adoption Agency Algorithm as a prominent example). With all that being said, considering how the rewriting is broken according to your example, I'm afraid we don't have any other choice than to try it. We shouldn't emit implied end tags as real tokens, instead let's just popup elements from the selector's vm stack according to the rules. The performance needs to be considered and we would probably need some new optimisations |
I agree that the stream based approach is unlikely to be able to cover all cases so I think this will largely come down to "how likely is certain type of HTML in the real world". In my particular case I started running into these cases as implied end tags particularly around lists, tables and paragraphs are super common and I needed to respond to the "end of tag" situation myself. So just fixing the VM's stack won't be enough as currently the API does not provide a way to respond to the end of a tag which however is necessary in a few situations. And to some degree this has been recognized now with the introduction of #109. |
While working on #109 I noticed that one can create a bit of an odd situation when trying to rewrite tags in the presence of optional end tags.
Take this list:
When I select
li
and change the tag todiv
I end up with the following output:This tests shows the issue
I believe the system would need to emit "virtual" end tags for when implicitly closed tags are encountered. I believe the system also does not handle this tag stack correctly otherwise which probably should confuse the css selector as well.
The text was updated successfully, but these errors were encountered: