Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed text for RTT computation and ACK_MP scheduling #217

Merged
merged 6 commits into from
Jul 5, 2023
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 38 additions & 28 deletions draft-ietf-quic-multipath.md
Original file line number Diff line number Diff line change
Expand Up @@ -889,31 +889,36 @@ Terrestrial | 100ms | 350ms
Satellite | 350ms | 600ms
{: #fig-example-ack-delay title="Example of ACK delays using multiple paths"}

Using the default algorithm specified in {{QUIC-RECOVERY}} would result
in suboptimal performance, computing average RTT and standard
deviation from series of different delay measurements of different
combined paths. At the same time, early tests showed that it is
desirable to send ACKs through the shortest path because a shorter
ACK delay results in a tighter control loop and better performances.
The tests also showed that it is desirable to send copies of the ACKs
on multiple paths, for robustness if a path experiences sudden losses.

An early implementation mitigated the delay variation issue by using
time stamps, as specified in {{QUIC-Timestamp}}. When the timestamps
are present, the implementation can estimate the transmission delay
on each one-way path, and can then use these one way delays for more
efficient implementations of recovery and congestion control
algorithms.

If timestamps are not available, implementations could estimate one
way delays using statistical techniques. For example, in the example
shown in Table 1, implementations can use "same path"
measurements to estimate the one way delay of the terrestrial path to
about 50ms in each direction, and that of the satellite path to about
300ms. Further measurements can then be used to maintain estimates
of one way delay variations, using logical similar to Kalman filters.
But statistical processing is error-prone, and using time stamps
provides more robust measurements.
The ACK_MP frames describe packets that were sent on the specified path,
but they may be received through any available path. There is an
understandable concern that if successive acknowledgements are received
on different paths, the measured RTT samples will fluctuate widely,
and that might result in poor performance. In fact, this concern is
probably not justified.

The computed values reflect both the state of the network path and the
scheduling decisions by the sender of the ACK_MP frames. In the example
above, we may assume that the ACK_MP will be sent over the terrestrial
link, because that provides the best response time. In that case, the
computed RTT value for the satellite path will be about 350ms. This
lower than the 600ms that would be measured if the ACK_MP came over
the satellite channel, but it is still the right value for computing
for example the PTO timeout: if an ACK_MP is not received after more
than 350ms, either the data packet or its ACK_MP were probably lost.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the PTO 3xRTT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm not sure about this. In this examples you assume that the terrestrial path will used in a stable way for ACK. In the paragraph above you say if different path are used which is a different case. So if I send half my ACK over each pass, I will see an average RTT of 475ms which is too low for those ACKs that are sent over the satellite path.

Copy link
Contributor Author

@huitema huitema Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting on this PR after the merge: in the example that you cite, the smoothed RTT will actually be very close to the RTT via the shortest path + maybe 1 ACK delay. ACK_MP frames are redondant. If you send ACK alternately on short and long path, the ACK on the short path will arrive before the one on the long path. When the ACK on the long path arrives, its "highest" packet will already have been acked as part of a range carried on the short path, so it will not contribute to RTT computation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you send ACKs multiple time per RTT and the latency difference between the path is large.

So maybe we should rather not make any default recommendations but just explain the problems and non-problems in different alternatives...?

(btw. which merge are you talking about? This PR is not merged yet)

Copy link
Contributor Author

@huitema huitema Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge: I was mislead by the PR search feature. Sorry.

General idea:

1- loss detection will be mostly triggered by the ACK sent on the shortest return path, because with RACK most loss detection is triggered by packet number comparisons, not by timers. That means losses are detected faster if enough ACK travel on the shortest return path, which tends to improve performance quite a bit.

2- ACKing packets faster leads to lower memory utilisation, as packets get out of the retransmit queue faster.

3- Still, we need timer measurements to compute the PTO and deal with the loss of the "last packet". Just feeding all RTT samples into the algorithms works, because the combination of SmoothedRTT and RTTvar will capture both the characteristics of the paths and whatever algorithm is implemented by the peer. The PTO formula incorporates both average and variance, and thus the PTO ends up making sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From RFC 9002:

PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay


The simplest implementation is to compute smoothedRTT and RTTvar per
{{Section 5.3 of QUIC-RECOVERY}} regardless of the path through which MP_ACKs are
received. This algorithm will provide good results,
except if the set of paths changes and the ACK_MP sender
revisits its sending preferences. This is not very
different from what happens on a single path if the routing changes.
The RTT, RTT variance and PTO estimates will rapidly converge to
reflect the new conditions.
There is however an exception: some congestion
control functions rely on estimates of the minimum RTT. It might be prudent
for nodes to remember the path over which the ACK MP that produced
the minimum RTT was received, and to restart the minimum RTT computation
if that path is abandoned.

## Packet Scheduling

Expand All @@ -931,9 +936,14 @@ implementation.

Note that this implies that an endpoint may send and receive ACK_MP
frames on a path different from the one that carried the acknowledged
packets. A reasonable default consists in sending ACK_MP frames on the
path they acknowledge packets, but the receiver must not assume its
peer will do so.
packets. As noted in {{compute-rtt}} the values computed using
the standard algorithm reflect both the characteristics of the
path and the scheduling algorithm of ACK_MP frames. The estimates will converge
faster if the scheduling strategy is stable, but besides that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if convergence is the right term here. I think it might actually not work if you keep using multiple paths for ACKs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "convergence" generally means that the algorithm has produced results that reflect the current network state, and that the smoothed RTT and RTTvar values will reflect that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a fix in the latest commit.

implementations can choose between multiple strategies such as sending
ACK_MP frames on the path they acknowledge packets, or sending
ACK_MP frames on the shortest path, which results in shorter control loops
and thus better performance.

## Retransmissions

Expand Down
Loading