TCP(7P) | Protocols | TCP(7P) |
tcp
, TCP
—
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
s = socket(AF_INET, SOCK_STREAM, 0); s = socket(AF_INET6, SOCK_STREAM, 0); t = t_open("/dev/tcp", O_RDWR); t = t_open("/dev/tcp6", O_RDWR);
Programs can access TCP using the socket interface as a
SOCK_STREAM
socket type, or using the Transport
Level Interface (TLI) where it supports the
connection-oriented (BT_COTS_ORD
) service type.
A checksum over all data helps TCP provide reliable communication. Using a window-based flow control mechanism that makes use of positive acknowledgements, sequence numbers, and a retransmission strategy, TCP can usually recover when datagrams are damaged, delayed, duplicated or delivered out of order by the underlying medium.
TCP provides several socket options, defined in
<netinet/tcp.h>
and
described throughout this document, which may be set using
setsockopt(3SOCKET) and read
using getsockopt(3SOCKET).
The level argument for these calls is the protocol
number for TCP, available from
getprotobyname(3SOCKET).
IP level options may also be used with TCP. See
ip(7P) and
ip6(7P).
Sockets utilizing TCP are either “active” or “passive”. Active sockets initiate connections to passive sockets. Passive sockets must have their local IPv4 or IPv6 address and TCP port number bound with the bind(3SOCKET) system call after the socket is created. If an active socket has not been bound by the time connect(3SOCKET) is called, then the operating system will choose a local address and port for the application. By default, TCP sockets are active. A passive socket is created by calling the listen(3SOCKET) system call after binding, which establishes a queueing parameter for the passive socket. Connections to the passive socket can then be received using the accept(3SOCKET) system call. Active sockets use the connect(3SOCKET) call after binding to initiate connections.
If incoming connection requests include an IP source route option, then the reverse source route will be used when responding.
By using the special value INADDR_ANY
with
IPv4, or the unspecified address (all zeroes) with IPv6, the local IP
address can be left unspecified in the bind
() call
by either active or passive TCP sockets. This feature is usually used if the
local address is either unknown or irrelevant. If left unspecified, the
local IP address will be bound at connection time to the address of the
network interface used to service the connection. For passive sockets, this
is the destination address used by the connecting peer. For active sockets,
this is usually an address on the same subnet as the destination or default
gateway address, although the rules can be more complex. See
Source Address Selection in
inet6(7P) for a detailed discussion of
how this works in IPv6.
Note that no two TCP sockets can be bound to the same port unless
the bound IP addresses are different. IPv4
INADDR_ANY
and IPv6 unspecified addresses compare as
equal to any IPv4 or IPv6 address. For example, if a socket is bound to
INADDR_ANY
or the unspecified address and port
N, no other socket can bind to port N,
regardless of the binding address. This special consideration of
INADDR_ANY
and the unspecified address can be
changed using the socket option SO_REUSEADDR
. If
SO_REUSEADDR
is set on a socket doing a bind, IPv4
INADDR_ANY
and the IPv6 unspecified address do not
compare as equal to any IP address. This means that as long as the two
sockets are not both bound to INADDR_ANY
, the
unspecified address, or the same IP address, then the two sockets can be
bound to the same port.
If an application does not want to allow another socket using the
SO_REUSEADDR
option to bind to a port its socket is
bound to, the application can set the socket-level
(SOL_SOCKET
) option
SO_EXCLBIND
on a socket. The option values of 0 and
1 mean enabling and disabling the option respectively. Once this option is
enabled on a socket, no other socket can be bound to the same port.
When a peer is sending data, it will only send up to the
advertised “receive window”, which is determined by how much
more data the recipient can fit in its buffer. Applications can use the
socket-level option SO_RCVBUF
to increase or
decrease the receive buffer size. Similarly, the socket-level option
SO_SNDBUF
can be used to allow TCP to buffer more
unacknowledged and unsent data locally.
Under most circumstances, TCP will send data when it is written by the application. When outstanding data has not yet been acknowledged, though, TCP will gather small amounts of output to be sent as a single packet once an acknowledgement has been received. Usually referred to as Nagle's Algorithm (RFC 896), this behavior helps prevent flooding the network with many small packets.
However, for some highly interactive clients (such as remote
shells or windowing systems that send a stream of keypresses or mouse
events), this batching may cause significant delays. To disable this
behavior, TCP provides a boolean socket option,
TCP_NODELAY
.
Conversely, for other applications, it may be desirable for TCP
not to send out any data until a full TCP segment can be sent. To enable
this behavior, an application can use the TCP-level socket option
TCP_CORK
. When set to a non-zero value, TCP will
only send out a full TCP segment. When TCP_CORK
is
set to zero after it has been enabled, all currently buffered data is sent
out (as permitted by the peer's receive window and the current congestion
window).
Still other latency-sensitive applications rely on receiving a
quick notification that their packets have been successfully received. To
satisfy the requirements of those applications, setting the
TCP_QUICKACK
option to a non-zero value will
instruct the TCP stack to send an acknowlegment immediately upon receipt of
a packet, rather than waiting to acknowledge multiple packets at once.
TCP provides an urgent data mechanism, which may be invoked using
the out-of-band provisions of
send(3SOCKET). The caller may mark
one byte as “urgent” with the MSG_OOB
flag to send(3SOCKET). This sets
an “urgent pointer” pointing to this byte in the TCP stream.
The receiver on the other side of the stream is notified of the urgent data
by a SIGURG
signal. The
SIOCATMARK
ioctl(2) request returns a value
indicating whether the stream is at the urgent mark. Because the system
never returns data across the urgent mark in a single
read(2) call, it is possible to advance
to the urgent data in a simple loop which reads data, testing the socket
with the SIOCATMARK
ioctl
()
request, until it reaches the mark.
TCP_INIT_CWND
. An application can use this option to
set the initial cwnd to a specified number of TCP segments. This applies to
the cases when the connection first starts and restarts after an idle period.
The process must have the PRIV_SYS_NET_CONFIG
privilege if it wants to specify a number greater than that calculated by RFC
3390.
The operating system also provides alternative algorithms that may
be more appropriate for your application, including the CUBIC congestion
control algorithm described in RFC 8312. These can be configured system-wide
using ipadm(1M), or on a
per-connection basis with the TCP-level socket option
TCP_CONGESTION
, whose argument is the name of the
algorithm to use (for example “cubic”). If the requested
algorithm does not exist, then setsockopt
() will
fail, and errno will be set to
ENOENT
.
SO_KEEPALIVE
. When enabled, the first keep-alive probe
is sent out after a TCP connection is idle for two hours. If the peer does not
respond to the probe within eight minutes, the TCP connection is aborted. An
application can alter the probe behavior using the following TCP-level socket
options:
TCP_KEEPALIVE_THRESHOLD
ndd
parameter
tcp_keepalive_interval
. The minimum value is ten
seconds. The maximum is ten days, while the default is two hours.TCP_KEEPALIVE_ABORT_THRESHOLD
ndd
parameter tcp_keepalive_abort_interval. The default is
eight minutes.TCP_KEEPIDLE
TCP_KEEPALIVE_THRESHOLD
,
determines the interval for sending the first probe, except that the
option value is an unsigned integer in seconds. It is
provided primarily for compatibility with other Unix flavors.TCP_KEEPCNT
TCP_KEEPINTVL
Turn on the window scale option in one of the following ways:
SO_SNDBUF
or
SO_RCVBUF
size in the
setsockopt
() option to be larger than 64K. This
must be done before the program calls
listen
() or connect
(),
because the window scale option is negotiated when the connection is
established. Once the connection has been made, it is too late to increase
the send or receive window beyond the default TCP limit of 64K.tcp_wscale_always
. If
tcp_wscale_always
is set to 1,
the window scale option will always be set when connecting to a remote
system. If tcp_wscale_always
is
0, the window scale option will be set only if the user
has requested a send or receive window larger than 64K. The default value
of tcp_wscale_always
is 1.tcp_wscale_always
, the
window scale option will always be included in a connect acknowledgement
if the connecting system has used the option.Turn on SACK capabilities in the following way:
ndd
to modify the configuration parameter
tcp_sack_permitted
. If
tcp_sack_permitted
is set to 0,
TCP will not accept SACK or send out SACK information. If
tcp_sack_permitted
is set to 1,
TCP will not initiate a connection with SACK permitted option in the
SYN segment, but will respond with SACK permitted option
in the SYN|ACK segment if an incoming connection request
has the SACK permitted option. This means that TCP will only accept SACK
information if the other side of the connection also accepts SACK
information. If tcp_sack_permitted
is set to
2, it will both initiate and accept connections with
SACK information. The default for
tcp_sack_permitted
is 2 (active
enabled).Turn on the TCP ECN mechanism in the following way:
ndd
to modify the configuration parameter
tcp_ecn_permitted
. If
tcp_ecn_permitted
is set to 0,
then TCP will not negotiate with a peer that supports ECN mechanism. If
tcp_ecn_permitted
is set to 1
when initiating a connection, TCP will not tell a peer that it supports
ECN mechanism. However, it will tell a peer that it
supports ECN mechanism when accepting a new incoming
connection request if the peer indicates that it supports
ECN mechanism in the SYN segment. If
tcp_ecn_permitted
is set to 2, in addition to
negotiating with a peer on ECN mechanism when accepting
connections, TCP will indicate in the outgoing SYN
segment that it supports ECN mechanism when TCP makes
active outgoing connections. The default for
tcp_ecn_permitted
is 1.Turn on the timestamp option in the following way:
ndd
to modify the configuration parameter
tcp_tstamp_always
. If
tcp_tstamp_always
is 1, the
timestamp option will always be set when connecting to a remote machine.
If tcp_tstamp_always
is 0, the
timestamp option will not be set when connecting to a remote system. The
default for tcp_tstamp_always
is
0.tcp_tstamp_always
, the
timestamp option will always be included in a connect acknowledgement (and
all succeeding packets) if the connecting system has used the timestamp
option.Use the following procedure to turn on the timestamp option only when the window scale option is in effect:
ndd
to modify the configuration parameter
tcp_tstamp_if_wscale
. Setting
tcp_tstamp_if_wscale
to 1 will
cause the timestamp option to be set when connecting to a remote system,
if the window scale option has been set. If
tcp_tstamp_if_wscale
is 0, the
timestamp option will not be set when connecting to a remote system. The
default for tcp_tstamp_if_wscale
is
1.Protection Against Wrap Around Sequence Numbers (PAWS) is always used when the timestamp option is set.
The operating system also supports multiple methods of generating initial sequence numbers. One of these methods is the improved technique suggested in RFC 1948. We HIGHLY recommend that you set sequence number generation parameters as close to boot time as possible. This prevents sequence number problems on connections that use the same connection-ID as ones that used a different sequence number generation. The svc:/network/initial:default service configures the initial sequence number generation. The service reads the value contained in the configuration file /etc/default/inetinit to determine which method to use.
The /etc/default/inetinit file is an unstable interface, and may change in future releases.
$ gcc -std=c99 -Wall -lsocket -o client client.c $ cat client.c #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <netdb.h> #include <stdio.h> #include <string.h> #include <unistd.h> int main(int argc, char *argv[]) { struct addrinfo hints, *gair, *p; int fd, rv, rlen; char buf[1024]; int y = 1; if (argc != 3) { fprintf(stderr, "%s <host> <port>\n", argv[0]); return (1); } memset(&hints, 0, sizeof (hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; if ((rv = getaddrinfo(argv[1], argv[2], &hints, &gair)) != 0) { fprintf(stderr, "getaddrinfo() failed: %s\n", gai_strerror(rv)); return (1); } for (p = gair; p != NULL; p = p->ai_next) { if ((fd = socket( p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) { perror("socket() failed"); continue; } if (connect(fd, p->ai_addr, p->ai_addrlen) == -1) { close(fd); perror("connect() failed"); continue; } break; } if (p == NULL) { fprintf(stderr, "failed to connect to server\n"); return (1); } freeaddrinfo(gair); if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &y, sizeof (y)) == -1) { perror("setsockopt(SO_KEEPALIVE) failed"); return (1); } while ((rlen = read(fd, buf, sizeof (buf))) > 0) { fwrite(buf, rlen, 1, stdout); } if (rlen == -1) { perror("read() failed"); } fflush(stdout); if (close(fd) == -1) { perror("close() failed"); } return (0); } $ ./client 127.0.0.1 8080 hello $ ./client ::1 8080 hello
$ gcc -std=c99 -Wall -lsocket -o server server.c $ cat server.c #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <netdb.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <arpa/inet.h> void logmsg(struct sockaddr *s, int bytes) { char dq[INET6_ADDRSTRLEN]; switch (s->sa_family) { case AF_INET: { struct sockaddr_in *s4 = (struct sockaddr_in *)s; inet_ntop(AF_INET, &s4->sin_addr, dq, sizeof (dq)); fprintf(stdout, "sent %d bytes to %s:%d\n", bytes, dq, ntohs(s4->sin_port)); break; } case AF_INET6: { struct sockaddr_in6 *s6 = (struct sockaddr_in6 *)s; inet_ntop(AF_INET6, &s6->sin6_addr, dq, sizeof (dq)); fprintf(stdout, "sent %d bytes to [%s]:%d\n", bytes, dq, ntohs(s6->sin6_port)); break; } default: fprintf(stdout, "sent %d bytes to unknown client\n", bytes); break; } } int main(int argc, char *argv[]) { struct addrinfo hints, *gair, *p; int sfd, cfd; int slen, wlen, rv; if (argc != 3) { fprintf(stderr, "%s <port> <message>\n", argv[0]); return (1); } slen = strlen(argv[2]); memset(&hints, 0, sizeof (hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE; if ((rv = getaddrinfo(NULL, argv[1], &hints, &gair)) != 0) { fprintf(stderr, "getaddrinfo() failed: %s\n", gai_strerror(rv)); return (1); } for (p = gair; p != NULL; p = p->ai_next) { if ((sfd = socket( p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) { perror("socket() failed"); continue; } if (bind(sfd, p->ai_addr, p->ai_addrlen) == -1) { close(sfd); perror("bind() failed"); continue; } break; } if (p == NULL) { fprintf(stderr, "server failed to bind()\n"); return (1); } freeaddrinfo(gair); if (listen(sfd, 1024) != 0) { perror("listen() failed"); return (1); } fprintf(stdout, "waiting for clients...\n"); for (int times = 0; times < 5; times++) { struct sockaddr_storage stor; socklen_t alen = sizeof (stor); struct sockaddr *addr = (struct sockaddr *)&stor; if ((cfd = accept(sfd, addr, &alen)) == -1) { perror("accept() failed"); continue; } wlen = 0; do { wlen += write(cfd, argv[2] + wlen, slen - wlen); } while (wlen < slen); logmsg(addr, wlen); if (close(cfd) == -1) { perror("close(cfd) failed"); } } if (close(sfd) == -1) { perror("close(sfd) failed"); } fprintf(stdout, "finished.\n"); return (0); } $ ./server 8080 $'hello\n' waiting for clients... sent 6 bytes to [::ffff:127.0.0.1]:59059 sent 6 bytes to [::ffff:127.0.0.1]:47448 sent 6 bytes to [::ffff:127.0.0.1]:54949 sent 6 bytes to [::ffff:127.0.0.1]:55186 sent 6 bytes to [::1]:62256 finished.
EISCONN
connect
() operation was attempted on a socket on
which a connect
() operation had already been
performed.ETIMEDOUT
ECONNRESET
ECONNREFUSED
EADDRINUSE
bind
() operation was attempted on a socket with
a network address/port pair that has already been bound to another
socket.EADDRNOTAVAIL
bind
() operation was attempted on a socket with
a network address for which no network interface exists.EACCES
bind
() operation was attempted with a
“reserved” port number and the effective user ID of the
process was not the privileged user.ENOBUFS
K. Ramakrishnan, S. Floyd, and D. Black, The Addition of Explicit Congestion Notification (ECN) to IP, RFC 3168, September 2001.
M. Mathias, J. Mahdavi, S. Ford, and A. Romanow, TCP Selective Acknowledgement Options, RFC 2018, October 1996.
S. Bellovin, Defending Against Sequence Number Attacks, RFC 1948, May 1996.
D. Borman, B. Braden, V. Jacobson, and R. Scheffenegger, Ed., TCP Extensions for High Performance, RFC 7323, September 2014.
Jon Postel, Transmission Control Protocol - DARPA Internet Program Protocol Specification, RFC 793, Network Information Center, SRI International, Menlo Park, CA., September 1981.
Administrative actions on this service, such as enabling, disabling, or requesting restart, can be performed using svcadm(1M). The service's status can be queried using the svcs(1) command.
January 7, 2019 | OmniOS |