Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve processing when nodes have large numbers of children #2080

Open
cbevan-figma opened this issue Oct 2, 2024 · 5 comments
Open

Improve processing when nodes have large numbers of children #2080

cbevan-figma opened this issue Oct 2, 2024 · 5 comments

Comments

@cbevan-figma
Copy link

Problem

I have an SVG with ~25k children under the same parent, and it takes ~9 seconds to optimize. Unfortunately these sorts of SVGs are not entirely unusual for my use case :( Notably though, with a small change I reduced the time to ~3 seconds.

Suggested solution

You're probably aware of the two operations that turn the overall traversal into O(N2): detachNodeFromParent() (which recreates the array while removing just one element) and the if (parentNode.children.includes(node)) line in xast.js. I addressed these in a local change by deferring the node removals using a set. The changes I made were hardly revolutionary, but I'm posting them below for clarity. The downside of course is that they'd technically introduce an incompatibility with plugins that remove siblings without calling detachNodeFromParent.

Maybe something like this is worth including in a future major change though, one where the release could announce the small difference in behavior?

export type XastElement = {
  // existing stuff
  pendingRemovals?: Set<XastChild>;
};
// similarly for XastRoot
// visit() in xast.js
if (node.type === 'element') {
  if (!parentNode.pendingRemovals?.has(node)) {
    for (const child of node.children) {
      visit(child, visitor, node);
    }
    // apply pending child removals
    if (node.pendingRemovals) {
      const childrenToRemove = node.pendingRemovals;
      node.children = node.children.filter(
        (child) => !childrenToRemove.has(child),
      );
      node.pendingRemovals = undefined;
    }
  }
}
const detachNodeFromParent = (node, parentNode) => {
  // defer removal to avoid O(N^2) complexity - can be a big hit for large files
  if (parentNode.pendingRemovals) {
    parentNode.pendingRemovals.add(node);
  } else {
    parentNode.pendingRemovals = new Set([node]);
  }
};

Workaround

I'm using this deferred removal in some custom plugins to speed them up (just adding an apply step in the exit callback). I also tried working around the issue by splitting the ~25k siblings into subgroups as a first step, but the grouping causes problems with some other parsing we do later down the line, so it's not ideal.

Related

Note this is a similar issue to #1969, so feel free to merge if you like, but I'm initially posting it here for visibility so it doesn't just disappear as a comment.

@AidanWelch
Copy link

3 seconds is good compared to

npx svgo .\public\svg\composition.svg -o .\public\svg\composition.min.svg

composition.svg:
Done in 2322623 ms!
9428.81 KiB - 25.7% = 7003.774 KiB

nearly 30 minutes I just got

@johnkenny54
Copy link
Contributor

I've been working on a version that addresses some of these performance issues. The short term solution I came up with is to modify detachNodeFromParent() so that it removes the parentNode property, and this is used as a flag for the visit() function, so if an element detaches itself in enter(), the visit() function skips visiting its children (see https://github.com/svg-utils/svgo-ll/blob/main/lib/xast.js).

The only way I see that this could get weird is if an element detaches its siblings, but as far as I can see this isn't happening.

detachNodeFromParent() is still slow, but I think the long term solution to that problem is to get rid of it entirely. Plugins can either (a) have parents update their children rather than children detaching themselves, or (b) defer the deletion until an exit() call. Either way, the update should be safe (since visit() is done with the children), and more efficient (since you can do a bulk update rather than detaching one node at a time).

@cbevan-figma
Copy link
Author

@johnkenny54 - looks good to me! It should work better than what I was thinking, too, since it doesn't require new properties.

What do you think about having detachNodeFromParent() only delete parentNode and not update the parent's children immediately? visitChild() could update a node's children array after processing them all, removing the ones that have an invalid parentNode. (The loop that processes each child could also set a boolean if it spots one in this case, which would avoid doing unnecessary filtering)

It'd be nice if detachNodeFromParent() could become O(1) since then the entire traversal would be O(N); it does seem like the small theoretical incompatibility could be worth it, and in practice shouldn't affect anything. I'm of course not as familiar with this code as you though :)

I did start changing our custom plugins to traverse children and process them in bulk, but it doesn't seem quite as obvious to plugin authors to do it that way for a first implementation.

@johnkenny54
Copy link
Contributor

My personal preference is to deprecate detachNodeFromParent() and not add complexity to visitChild(). Sometimes it adds a little complexity to the plugin to avoid detachNodeFromParent(), but seems like it's worth it to avoid the scalability issues when files have elements with a large number of children.

@cbevan-figma
Copy link
Author

I see where you're coming from. I'd just note though that at the moment the default is for plugins to have performance problems at scale (including the built-in ones), and it takes an extra step for people to realize and improve the situation (as for my case), so I'm interested in ways to make the default path have better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@AidanWelch @johnkenny54 @cbevan-figma and others