Skip to main content

3 posts tagged with "tcp"

View All Tags

tcp_nodelay

· 8 min read

tcp_nodealy

 The solution to the small-packet problem
解决小包问题的方法

Clearly an adaptive approach is desirable. One would expect a
proposal for an adaptive inter-packet time limit based on the
round-trip delay observed by TCP. While such a mechanism could
certainly be implemented, it is unnecessary. A simple and
elegant solution has been discovered.

The solution is to inhibit the sending of new TCP segments when
new outgoing data arrives from the user if any previously
transmitted data on the connection remains unacknowledged. This
inhibition is to be unconditional; no timers, tests for size of
data received, or other conditions are required. Implementation
typically requires one or two lines inside a TCP program.
解决的方式是如果之前发送的数据没有被ack,阻止发送新的tcp段.这个抑制条件是不需要前置的条件的:不需要定时器,不需要探测包是否被接收,以及其他条件.实现上只需要添加一两行代码在tcp程序里面


At first glance, this solution seems to imply drastic changes in
the behavior of TCP. This is not so. It all works out right in
the end. Let us see why this is so.
乍看起来,这会很大地改变tcp的行为.但是实际上并不是这样,这从头到尾都没有太大变化.让我们看看为什么是这样.

When a user process writes to a TCP connection, TCP receives some
data. It may hold that data for future sending or may send a
packet immediately. If it refrains from sending now, it will
typically send the data later when an incoming packet arrives and
changes the state of the system. The state changes in one of two
ways; the incoming packet acknowledges old data the distant host
has received, or announces the availability of buffer space in
the distant host for new data. (This last is referred to as
"updating the window"). Each time data arrives on a connec-
tion, TCP must reexamine its current state and perhaps send some
packets out. Thus, when we omit sending data on arrival from the
user, we are simply deferring its transmission until the next
message arrives from the distant host. A message must always
arrive soon unless the connection was previously idle or communi-
cations with the other end have been lost. In the first case,
the idle connection, our scheme will result in a packet being
sent whenever the user writes to the TCP connection. Thus we do
not deadlock in the idle condition. In the second case, where
当一个用户写消息到tcp连接,Tcp协议栈会受到这些信息.tcp协议栈会保持这些内容或者立马发送这些内容.



RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84

the distant host has failed, sending more data is futile anyway.
Note that we have done nothing to inhibit normal TCP retransmis-
sion logic, so lost messages are not a problem.

Examination of the behavior of this scheme under various condi-
tions demonstrates that the scheme does work in all cases. The
first case to examine is the one we wanted to solve, that of the
character-oriented Telnet connection. Let us suppose that the
user is sending TCP a new character every 200ms, and that the
connection is via an Ethernet with a round-trip time including
software processing of 50ms. Without any mechanism to prevent
small-packet congestion, one packet will be sent for each charac-
ter, and response will be optimal. Overhead will be 4000%, but
this is acceptable on an Ethernet. The classic timer scheme,
with a limit of 2 packets per second, will cause two or three
characters to be sent per packet. Response will thus be degraded
even though on a high-bandwidth Ethernet this is unnecessary.
Overhead will drop to 1500%, but on an Ethernet this is a bad
tradeoff. With our scheme, every character the user types will
find TCP with an idle connection, and the character will be sent
at once, just as in the no-control case. The user will see no
visible delay. Thus, our scheme performs as well as the no-
control scheme and provides better responsiveness than the timer
scheme.

The second case to examine is the same Telnet test but over a
long-haul link with a 5-second round trip time. Without any
mechanism to prevent small-packet congestion, 25 new packets
would be sent in 5 seconds.* Overhead here is 4000%. With the
classic timer scheme, and the same limit of 2 packets per second,
there would still be 10 packets outstanding and contributing to
congestion. Round-trip time will not be improved by sending many
packets, of course; in general it will be worse since the packets
will contend for line time. Overhead now drops to 1500%. With
our scheme, however, the first character from the user would find
an idle TCP connection and would be sent immediately. The next
24 characters, arriving from the user at 200ms intervals, would
be held pending a message from the distant host. When an ACK
arrived for the first packet at the end of 5 seconds, a single
packet with the 24 queued characters would be sent. Our scheme
thus results in an overhead reduction to 320% with no penalty in
response time. Response time will usually be improved with our
scheme because packet overhead is reduced, here by a factor of
4.7 over the classic timer scheme. Congestion will be reduced by
this factor and round-trip delay will decrease sharply. For this
________
* This problem is not seen in the pure ARPANET case because the
IMPs will block the host when the count of packets
outstanding becomes excessive, but in the case where a pure
datagram local net (such as an Ethernet) or a pure datagram
gateway (such as an ARPANET / MILNET gateway) is involved, it
is possible to have large numbers of tiny packets
outstanding.



RFC 896 Congestion Control in IP/TCP Internetworks 1/6/84

case, our scheme has a striking advantage over either of the
other approaches.

We use our scheme for all TCP connections, not just Telnet con-
nections. Let us see what happens for a file transfer data con-
nection using our technique. The two extreme cases will again be
considered.

As before, we first consider the Ethernet case. The user is now
writing data to TCP in 512 byte blocks as fast as TCP will accept
them. The user's first write to TCP will start things going; our
first datagram will be 512+40 bytes or 552 bytes long. The
user's second write to TCP will not cause a send but will cause
the block to be buffered. Assume that the user fills up TCP's
outgoing buffer area before the first ACK comes back. Then when
the ACK comes in, all queued data up to the window size will be
sent. From then on, the window will be kept full, as each ACK
initiates a sending cycle and queued data is sent out. Thus,
after a one round-trip time initial period when only one block is
sent, our scheme settles down into a maximum-throughput condi-
tion. The delay in startup is only 50ms on the Ethernet, so the
startup transient is insignificant. All three schemes provide
equivalent performance for this case.

Finally, let us look at a file transfer over the 5-second round
trip time connection. Again, only one packet will be sent until
the first ACK comes back; the window will then be filled and kept
full. Since the round-trip time is 5 seconds, only 512 bytes of
data are transmitted in the first 5 seconds. Assuming a 2K win-
dow, once the first ACK comes in, 2K of data will be sent and a
steady rate of 2K per 5 seconds will be maintained thereafter.
Only for this case is our scheme inferior to the timer scheme,
and the difference is only in the startup transient; steady-state
throughput is identical. The naive scheme and the timer scheme
would both take 250 seconds to transmit a 100K byte file under
the above conditions and our scheme would take 254 seconds, a
difference of 1.6%.

Thus, for all cases examined, our scheme provides at least 98% of
the performance of both other schemes, and provides a dramatic
improvement in Telnet performance over paths with long round trip
times. We use our scheme in the Ford Aerospace Software
Engineering Network, and are able to run screen editors over Eth-
ernet and talk to distant TOPS-20 hosts with improved performance
in both cases.

相关阅读

tcp协议

· 3 min read

状态机

                              +---------+ ---------\      active OPEN
| CLOSED | \ -----------
+---------+<---------\ \ create TCB
| ^ \ \ snd SYN
passive OPEN | | CLOSE \ \
------------ | | ---------- \ \
create TCB | | delete TCB \ \
V | \ \
+---------+ CLOSE | \
| LISTEN | ---------- | |
+---------+ delete TCB | |
rcv SYN | | SEND | |
----------- | | ------- | V
+---------+ snd SYN,ACK / \ snd SYN +---------+
| |<----------------- ------------------>| |
| SYN | rcv SYN | SYN |
| RCVD |<-----------------------------------------------| SENT |
| | snd ACK | |
| |------------------ -------------------| |
+---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+
| -------------- | | -----------
| x | | snd ACK
| V V
| CLOSE +---------+
| ------- | ESTAB |
| snd FIN +---------+
| CLOSE | | rcv FIN
V ------- | | -------
+---------+ snd FIN / \ snd ACK +---------+
| FIN |<----------------- ------------------>| CLOSE |
| WAIT-1 |------------------ | WAIT |
+---------+ rcv FIN \ +---------+
| rcv ACK of FIN ------- | CLOSE |
| -------------- snd ACK | ------- |
V x V snd FIN V
+---------+ +---------+ +---------+
|FINWAIT-2| | CLOSING | | LAST-ACK|
+---------+ +---------+ +---------+
| rcv ACK of FIN | rcv ACK of FIN |
| rcv FIN -------------- | Timeout=2MSL -------------- |
| ------- x V ------------ x V
\ snd ACK +---------+delete TCB +---------+
------------------------>|TIME WAIT|------------------>| CLOSED |
+---------+ +---------+

拥塞控制

1 慢开始和拥塞避免 2 快速重传

慢开始

慢开始为了什么?

快速重传

快速恢复

我的理解

tcp本质是什么?
本质是一个字节流
为什么会有大小端问题? 因为字节流终究是字节流,如果你只要一个字节的的话在不同端的机器一点问题都没有
怎么保证不丢包?
序号+重传,为什么序号可以?
因为序号是和包一一映射的,所以序号和报文是同构的,也就是一一映射的 重传有什么问题吗?
因为序号和包一一对应,也就是幂等的,所以重传没有什么问题

为什么需要状态机? 因为状态机是从一个状态到另外一个状态,这样我们更加明确整个流程

相关阅读

tcp与消息队列与paxos与顺序

· One min read

我们如何保证消息的可靠性?

前提: 每块消息都是分割成一小块

如何保证不丢消息?

每块消息映射一个id , 我们只要保证每个id都有就能保证我们的消息必然是全的(没有丢失的 , 因为id是全序的 , id映射的内容本身已是不会丢失的)