Computer Networking:
a Top-Down Approach (8th ed.) :
Notes of "Select" Lectures
Brief Content
Course Information
Related Resources
Foreword
Chapter 1 Computer Networks and the Internet
Chapter 2 Application Layer
Chapter 3 Transport Layer
Chapter 4 Network Layer - data plane
Chapter 5 Network Layer - control plane
Chapter 6 The Link Layer
Chapter 7 Wireless and Mobile Networks
Chapter 8 Security
Computer Networks
Professor Jim Kurose
COMPSCI 453
College of Information and Computer Sciences
University of Massachusetts
Class textbook:
Computer Networking: a Top-Down Approach (8th ed.) J.F. Kurose, K.W. Ross, Pearson, 2020
J.F. Kurose, K.W. Ross, Pearson, 2020
http://gaia.cs.umass.edu/kurose_ross
Book Edition |
8th ed. en-us |
7th ed. zh-cn |
Book home |
gaia.cs.umass.edu |
HZCOURSE.COM |
Book PDF |
LibGen |
用户1496968452581's Juejin blog |
PowerPoint |
gaia.cs.umass.edu |
URL path exists but is not given here |
Solutions etc. |
|
|
Online lectures |
gias.cs.umass.edu bilibili |
|
High quality reading notes |
|
王文萱's CSDN post of reading notes GuoYi's Github post of English PowerPoints notes |
With my second attempt at the postgraduate entrance exam doomed, I started to study myself subjects of computer science that might be useful for all of my possible future circumstances –to eventually receive an interview invitation from a school and accept it, to prepare for a beginner developer's interview, or to fail to be enrolled this year and make a third try. I decided to study myself computer networking first, by watching online lectures and reading slides of the book Computer Networking. However, I found it too challenging to acquire knowledge without a translator. Meanwhile, the absence of terminology translation does not align with real-life study and work. And what is most annoying, knowledge in my mind is not well-organized. To cope with issues above, I believe it is wise to write down, rearrange or even simply copy and paste key points. Time limited, these series of notes will only cover core topics (concepts and principles, e.g.) such as the overview, important layers and protocols.
• Content of the series of notes is based on (from mostly to least):
• Part of Professor Jim Kurose's lecture videos and PowerPoints which I believe is of importance.
• The original book mainly for chapter/section numbering. Meanwhile, I use it for figure screenshots, definition/mechanism checks and comprehensive knowledge acquisition.
• The Chinese version of the book for terminology referencing.
• Others' notes.
• Microsoft Word operations.
• Share the document via "Post to blog".
• Convert numbers to text (thanks to cmt); resize all pictures to 100%.
• Publish it to cnblogs as draft.
• cnblogs operations.
• Set titles, tags, alias URL, etc. for posts.
• Copy the fragment needed of the draft to the corresponding post.
A nuts-and-bolts and a services description. What is a protocol?
Billions of connected computing devices:
• hosts (主机)= end systems (端系统)
• running network apps (应用) at Internet's "edge" (边缘)
Packet switches (分组交换机) : forward packets (分组) (chunks of data)
• routers (路由器) , switches (交换机)
Communication links (通信链路)
• fiber (光纤) , copper (铜) , radio (无线电) , satellite (卫星)
• transmission rate (传输速率): bandwidth (带宽)
Networks
• collection of devices, routers, links: managed by an organization
Internet: "network of networks"
• Interconnected ISPs (Internet Service Providers, 因特网服务提供商)
Protocols are everywhere
• control sending, receiving of messages (报文)
• e.g., HTTP (Web), streaming video, Skype, TCP, IP, WiFi, 4G, Ethernet
Internet standards
• RFC: Request for Comments (请求评论)
• IETF: Internet Engineering Task Force (因特网工程任务组)
Infrastructure that provides services to applications:
• Web, streaming video, multimedia teleconferencing, email, games, e-commerce, social media, inter-connected appliances, …
provides programming interface to distributed applications (分布式应用程序):
• "hooks" allowing sending/receiving apps to "connect" to, use Internet transport service
• provides service options, analogous to postal service
Network protocols:
• computers (devices) rather than humans
• all communication activity in Internet governed by protocols
Protocols (协议) define the format, order of messages sent and received among network entities, and actions taken on message transmission, receipt
Access networks, physical media
Network edge (网络边缘) :
• hosts (主机): clients and servers
• servers often in data centers (数据中心)
Access networks (接入网), physical media (物理媒体):
• wired, wireless communication links
Network core (网络核心) :
• interconnected routers
• network of networks
Q: How to connect end systems to edge router?
• residential access nets
• institutional access networks (school, company)
• mobile access networks (WiFi, 4G/5G)
Frequency division multiplexing (FDM, 频分复用): different channels transmitted in different frequency bands
HFC: hybrid fiber coax (混合光纤同轴)
• asymmetric: up to 40 Mbps – 1.2 Gbps downstream transmission rate, 30-100 Mbps upstream transmission rate
network of cable, fiber attaches homes to ISP router
• homes share access network to cable headend
Use existing telephone line to central office DSLAM (digital subscriber line access multiplexer, 数字用户线接入复用器)
• data over DSL phone line goes to Internet
• voice over DSL phone line goes to telephone net
• 24-52 Mbps dedicated downstream transmission rate
• 3.5-16 Mbps dedicated upstream transmission rate
• Shared wireless access network connects end system to router
• via base station (基站) aka "access point" (接入点)
Wireless local area networks (WLANs)
• typically within or around building (~100 ft)
• 802.11b/g/n (WiFi): 11, 54, 450 Mbps transmission rate
Wide-area cellular access networks
• provided by mobile, cellular (蜂窝) network operator (10's km)
• 10's Mbps
• 4G cellular networks (5G coming)
companies, universities, etc.
mix of wired, wireless link technologies, connecting a mix of switches and routers
• Ethernet: wired access at 100Mbps, 1Gbps, 10Gbps
• WiFi: wireless access points at 11, 54, 450 Mbps
high-bandwidth links (10s to 100s Gbps) connect hundreds to thousands of servers together, and to Internet
Host: sends packets of data
host sending function:
• takes application message
• breaks into smaller chunks, known as packets, of length L bits
• transmits packet into access network at transmission rate R
• link transmission rate, aka link capacity, aka link bandwidth
packet transmission delay = time needed to transmit L-bit packet into link = L (bits) /R (bits/sec)
bit: propagates (传播) between transmitter/receiver (发射器—接收器) pairs
physical link: what lies between transmitter & receiver
guided media (导引型媒体):
• signals propagate in solid media: copper, fiber, coax
unguided media (非导引型媒体):
• signals propagate freely, e.g., radio
two insulated (绝缘的) copper wires
• Category 5: 100 Mbps, 1 Gbps Ethernet
• Category 6: 10Gbps Ethernet
two concentric (同心的) copper conductors
bidirectional
broadband:
• multiple frequency channels on cable
• 100's Mbps per channel
glass fiber carrying light pulses, each pulse a bit
high-speed operation:
• high-speed point-to-point transmission (10's-100's Gbps)
low error rate:
• repeaters spaced far apart
• immune to electromagnetic noise
signal carried in various "bands" in electromagnetic spectrum
no physical "wire"
broadcast, "half-duplex" (半双工,sender to receiver)
propagation environment effects:
• reflection
• obstruction by objects
• interference/noise
Radio link types:
• Wireless LAN (WiFi)
• 10-100's Mbps; 10's of meters
• wide-area (e.g., 4G cellular)
• 10's Mbps over ~10 Km
• Bluetooth: cable replacement
• short distances, limited rates
• terrestrial (陆地的) microwave
• point-to-point (点对点); 45 Mbps channelsssssssssssssssssssssssssssss sssssssssssssss
• satellite
• up to 45 Mbps per channel
• 270 msec end-end delay
Forwarding, routing; packet switching; circuit switching; a network of networks
Mesh (网状物) of interconnected routers
Packet-switching (分组交换): hosts break application-layer messages into packets
• network forwards packets from one router to the next, across links on path from source to destination
Forwarding (转发) :
• aka "switching" (交换)
• local action: move arriving packets from router's input link to appropriate router output link
Routing (路由):
• global action: determine source-destination paths taken by packets
• routing algorithms
Packet transmission delay (时延): takes L/R seconds to transmit (push out) L-bit packet into link at R bps
Store and forward: entire packet must arrive at router before it can be transmitted on next link
One-hop (跳) numerical example:
• L = 10 Kbits
• R = 100 Mbps
• one-hop transmission delay = 0.1 msec
Queueing occurs when work arrives faster than it can be serviced
Packet queuing and loss: if arrival rate (in bps) to link exceeds transmission rate (bps) of link for some period of time:
• packets will queue, waiting to be transmitted on output link
• packets can be dropped (lost) if memory (buffer) in router fills up
End-end resources allocated to, reserved for "call" between source and destination
• in diagram, each link has four circuits.
• call gets 2nd circuit in top link and 1st circuit in right link.
• dedicated (专用的) resources: no sharing
• circuit-like (guaranteed) performance
• circuit segment idle if not used by call (no sharing)
• commonly used in traditional telephone networks
Frequency Division Multiplexing (FDM, 频分复用)
• optical, electromagnetic frequencies divided into (narrow) frequency bands
• each call allocated its own band, can transmit at max rate of that narrow band
Time Division Multiplexing (TDM, 时分复用)
• time divided into slots
• each call allocated periodic slot(s), can transmit at maximum rate of (wider) frequency band (only) during its time slot(s)
Is packet switching a "slam dunk winner" (必定成功的事;稳操胜券的事) ?
• great for "bursty" data – sometimes has data to send, but at other times not
• resource sharing
• simpler, no call setup
• excessive congestion possible: packet delay and loss due to buffer overflow
• protocols needed for reliable data transfer, congestion control (拥塞控制)
hosts connect to Internet via access Internet Service Providers (ISPs)
access ISPs in turn must be interconnected
• so that any two hosts (anywhere!) can send packets to each other
resulting network of networks is very complex
• evolution driven by economics, national policies
At "center": small # of well-connected large networks
• "tier-1" commercial ISPs (e.g., Level 3, Sprint, AT&T, NTT), national & international coverage
• content provider networks (e.g., Google, Facebook, 内容提供商网络): private network that connects its data centers to Internet, often bypassing tier-1, regional ISPs
Packet delay and loss, end-end throughput
Packets queue in router buffers, waiting for turn for transmission
• queue length grows when arrival rate to link (temporarily) exceeds output link capacity
packet loss occurs when memory to hold queued packets fills up
dnodal = dproc + dqueue + dtrans + dprop
dproc: Processing Delay (处理时延)
• check bit errors
• determine output link
• typically < microsecs
dqueue: QueueingDelay (排队时延)
• time waiting at output link for transmission
• depends on congestion level of router
dtrans: Transmission Delay (传输时延)
• L: packet length (bits)
• R: link transmission rate (bps)
• dtrans = L/R
dprop: Propagation Delay (传播时延)
• d: length of physical link
• s: propagation speed (~2x108 m/sec)
• dprop = d/s
dtrans and dprop
very different
a: average packet arrival rate
L: packet length (bits)
R: link bandwidth (bit transmission rate)
La/R : arrival rate of bits / service rate of bits "traffic intensity"
La/R ~ 0: avg. queueing delay small
La/R -> 1: avg. queueing delay large
La/R > 1: more "work" arriving is more than can be serviced - average delay infinite!
queue (aka buffer) preceding link in buffer has finite capacity
packet arriving to full queue dropped (aka lost)
lost packet may be retransmitted by previous node, by source end system, or not at all
What do "real" Internet delay & loss look like?
traceroute program: provides delay measurement from source to router along end-end Internet path towards destination. For all i:
• sends three packets that will reach router i on path towards destination (with time-to-live (生存时间) field value of i)
• router i will return packets to sender
• sender measures time interval between transmission and reply
Throughput: rate (bits/time unit) at which bits are being sent from sender to receiver
• instantaneous (瞬时) : rate at given point in time
• average (平均): rate over longer period of time
bottleneck link (瓶颈链路)
link on end-end path that constrains end-end throughput
per-connection end-end throughput: min(Rc, Rs , R/10)
in practice: Rc or Rs is often bottleneck
Layered architecture, encapsulation.
Networks are complex, with many "pieces":
• hosts
• routers
• links of various media
• applications
• protocols
• hardware, software
layers: each layer implements a service
• via its own internal-layer actions
• relying on services provided by layer below
Why layering?
Approach to designing/discussing complex systems:
• explicit structure allows identification, relationship of system's pieces
• layered reference model for discussion
• modularization eases maintenance, updating of system
• change in layer's service implementation: transparent to rest of system
• e.g., change in gate procedure doesn't affect rest of system
Layered Internet protocol stack (协议栈)
application: supporting network applications
• HTTP, IMAP, SMTP, DNS
transport: process-process data transfer
• TCP, UDP
network: routing of datagrams from source to destination
• IP, routing protocols
link: data transfer between neighboring network elements
• Ethernet, 802.11 (WiFi), PPP
physical: bits "on the wire"
Application exchanges messages (报文) to implement some application service using services of transport layer
Transport-layer protocol transfers M (e.g., reliably) from one process to another, using services of network layer
• transport-layer protocol encapsulates application-layer message, M, with transport layer-layer header Ht to create a transport-layer segment (报文段)
• Ht used by transport layer protocol to implement its service
Network-layer protocol transfers transport-layer segment [Ht | M] from one host to another, using link layer services
• network-layer protocol encapsulates transport-layer segment [Ht | M] with network layer-layer header Hn to create a network-layer datagram (数据报)
• Hn used by network layer protocol to implement its service
Link-layer protocol transfers datagram [Hn| [Ht |M] from host to neighboring host, using network-layer services
• link-layer protocol encapsulates network datagram [Hn| [Ht |M], with link-layer header Hl to create a link-layer frame (帧)
What can bad actors do? What defenses?
Internet not originally designed with (much) security in mind
• original vision: "a group of mutually trusting users attached to a transparent network"
• Internet protocol designers playing "catch-up"
• security considerations in all layers!
We now need to think about:
• how bad guys can attack computer networks
• how we can defend networks against attacks
• how to design architectures that are immune to attacks
Denial of Service (DoS, 拒绝服务): attackers make resources (server, bandwidth) unavailable to legitimate traffic by overwhelming resource with bogus (伪造的) traffic
1. select target
2. break into hosts around the network (see botnet)
3. send packets to target from compromised hosts (受害主机)
Packet "sniffing" (嗅探分组):
broadcast media (shared Ethernet, wireless)
promiscuous network interface reads/records all packets (e.g., including passwords!) passing by
IP spoofing (IP哄骗): injection of packet with false source address
Authentication (鉴别): proving you are who you say you are
• cellular networks provides hardware identity via SIM card; no such hardware assist in traditional Internet
confidentiality (机密性): via encryption
integrity checks (完整性检查): digital signatures prevent/detect tampering (篡改)
access restrictions: password-protected VPNs
firewalls: specialized "middleboxes" (中间盒) in access and core networks:
• off-by-default: filter incoming packets to restrict senders, receivers, applications
• detecting/reacting to DOS attacks
From 1961 until today!
Early packet-switching principles
Internetworking, new and proprietary networks
New protocols, a proliferation of networks
Commercialization, the Web, new applications
Scale, SDN, mobility, cloud
Applications; client-server, P2P, sockets, APIs; transport services
Server:
• always-on host
• permanent IP address
• often in data centers, for scaling (扩展)
Clients:
• contact, communicate with server
• may be intermittently (间歇) connected
• may have dynamic IP addresses
• do not communicate directly with each other
• examples: HTTP, IMAP, FTP
• no always-on server
• arbitrary end systems directly communicate
• peers request service from other peers, provide service in return to other peers
• self scalability – new peers bring new service capacity, as well as new service demands
• peers are intermittently connected and change IP addresses
• complex management
• example: P2P file sharing
Process (进程): program running within a host
• within same host, two processes communicate using inter-process communication (defined by OS)
• processes in different hosts communicate by exchanging messages (报文)
client (客户) process: process that initiates communication
server (服务器) process: process that waits to be contacted
• note: applications with P2P architectures have client processes & server processes
• process sends/receives messages to/from its socket (套接字)
• socket analogous to door
• sending process shoves message out door
• sending process relies on transport infrastructure on other side of door to deliver message to socket at receiving process
• two sockets involved: one on each side
To receive messages, process must have identifier (标识符)
host device has unique 32-bit IP address
Q: does IP address of host on which process runs suffice for (足够,足以) identifying the process?
• A: no, many processes can be running on same host
identifier includes both IP address (IP地址) and port numbers (端口号) associated with process on host.
example port numbers:
• HTTP server: 80
• mail server: 25
to send HTTP message to gaia.cs.umass.edu web server:
• IP address: 128.119.245.12
• port number: 80
Some apps (e.g., file transfer, web transactions) require 100% reliable data transfer
Other apps (e.g., audio) can tolerate some loss
Some apps (e.g., multimedia) require minimum amount of throughput to be "effective"
Other apps ("elastic apps" (弹性应用)) make use of whatever throughput they get
Some apps (e.g., Internet telephony, interactive games) require low delay to be "effective"
Encryption, data integrity, …
Internet transport protocols services
TCP service |
UDP service |
• reliable transport (可靠的传输) between sending and receiving process • flow control (流量控制): sender won't overwhelm receiver • congestion control (拥塞控制): throttle (抑制) sender when network overloaded • connection-oriented (面向连接的): setup required between client and server processes • does not provide: timing, minimum throughput guarantee, security |
• unreliable data transfer between sending and receiving process • does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, or connection setup. |
Vanilla TCP & UDP sockets:
• no encryption
• cleartext (明文) passwords sent into socket traverse Internet in cleartext (!)
Transport Layer Security (TLS)
• provides encrypted TCP connections
• data integrity
• end-point authentication
TLS implemented in application layer
• apps use TLS libraries, that use TCP in turn
• cleartext sent into "socket" traverse Internet encrypted
An application-layer protocol defines
• types of messages exchanged (交换的报文类型),
• e.g., request, response
• message syntax (报文的语法):
• what fields in messages & how fields are delineated (描述)
• message semantics (报文的语义)
• meaning of information in fields
• rules for when and how processes send & respond to messages
Open protocols (开放的协议):
• defined in RFCs, everyone has access to protocol definition
• allows for interoperability (相互操作)
• e.g., HTTP, SMTP
Proprietary protocols (专用协议):
• e.g., Skype, Zoom
Overview, statelessness, HTTP messages, cookies, caching, HTTP/2
• web page consists of objects (对象), each of which can be stored on different Web servers
• object can be HTML file, JPEG image, Java applet, audio file, …
• web page consists of base HTML-file (HTML基本文件) which includes several referenced objects (引用对象), each addressable by a URL, e.g.,
• Web's application-layer protocol
• client/server model:
• client: browser (浏览器) that requests, receives, (using HTTP protocol) and "displays" Web objects
• server: Web server sends (using HTTP protocol) objects in response to requests
• client initiates TCP connection (creates socket) to server, port 80
• server accepts TCP connection from client
• HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
• TCP connection closed
• server maintains no information about past client requests
aside
• protocols that maintain "state" are complex!
• past history (state) must be maintained
• if server/client crashes, their views of "state" may be inconsistent, must be reconciled (折中)
1. TCP connection opened
2. at most one object sent over TCP connection
3. TCP connection closed
downloading multiple objects required multiple connections
• TCP connection opened to a server
• multiple objects can be sent over single TCP connection between client, and that server
• TCP connection closed
User enters URL: www.someSchool.edu/someDepartment/home.index
(containing text, references to 10 jpeg images)
1a. HTTP client initiates TCP connection to HTTP server (process) at www.someSchool.edu on port 80
1b. HTTP server at host www.someSchool.edu waiting for TCP connection at port 80 "accepts" connection, notifying client
2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object someDepartment/home.index
3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket
4. HTTP server closes TCP connection.
5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects
6. Steps 1-5 repeated for each of 10 jpeg objects
RTT (Round-Trip Time, 往返时间, definition): time for a small packet to travel from client to server and back
HTTP response time (per object):
• one RTT to initiate TCP connection
• one RTT for HTTP request and first few bytes of HTTP response to return
• object/file transmission time
Non-persistent HTTP response time = 2RTT+ file transmission time
• requires 2 RTTs per object
• OS overhead for each TCP connection
• browsers often open multiple parallel TCP connections to fetch referenced objects in parallel
• server leaves connection open after sending response
• subsequent HTTP messages between same client/server sent over open connection
• client sends requests as soon as it encounters a referenced object
• as little as one RTT for all the referenced objects (cutting response time in half)
two types of HTTP messages: request, response
ASCII (human-readable format)
request line (请求行, GET, POST, HEAD commands) |
GET /index.html HTTP/1.1\r\n |
carriage return character line-feed character |
header lines (首部行) |
Host: www-net.cs.umass.edu\r\n User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:80.0) Gecko/20100101 Firefox/80.0 \r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Connection: keep-alive\r\n \r\n |
carriage return, line feed at start of line indicates end of header lines |
POST method:
• web page often includes form input
• user input sent from client to server in entity body of HTTP POST request message
GET method (for sending data to server):
• include user data in URL field of HTTP GET request message (following a '?'):
www.somesite.com/animalsearch?monkeys&banana
HEAD method:
• requests headers (only) that would be returned if specified URL were requested with an HTTP GET method.
PUT method:
• uploads new file (object) to server
• completely replaces file that exists at specified URL with content in entity body of POST HTTP request message
status line (状态行, protocol status code status phrase) |
HTTP/1.1 200 OK |
header lines (首部行) |
Date: Tue, 08 Sep 2020 00:53:20 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.4.9 mod_perl/2.0.11 Perl/v5.16.3 Last-Modified: Tue, 01 Mar 2016 18:57:50 GMT ETag: "a5b-52d015789ee9e" Accept-Ranges: bytes Content-Length: 2651 Content-Type: text/html; charset=UTF-8 \r\n |
data, e.g., requested HTML file |
data data data data data ... |
status code appears in 1st line in server-to-client response message.
some sample codes:
200 OK
• request succeeded, requested object later in this message
301 Moved Permanently
• requested object moved, new location specified later in this message (in Location: field)
400 Bad Request
• request msg not understood by server
404 Not Found
• requested document not found on this server
505 HTTP Version Not Supported
Web sites and client browser use cookies to maintain some state between transactions
four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user's host, managed by user's browser
4) back-end database at Web site
What cookies can be used for:
• authorization
• shopping carts
• recommendations
• user session state (Web e-mail)
Challenge: How to keep state?
• at protocol endpoints: maintain state at sender/receiver over multiple transactions
• in messages: cookies in HTTP messages carry state
aside
cookies and privacy:
• cookies permit sites to learn a lot about you on their site.
• third party persistent cookies (tracking cookies) allow common identity (cookie value) to be tracked across multiple web sites
Goal: satisfy client requests without involving origin server
• user configures browser to point to a (local) Web cache
• browser sends all HTTP requests to cache
• if object in cache: cache returns object to client
• else cache requests object from origin server, caches received object, then returns object to client
Web caches (aka proxy servers, 代理服务器)
• Web cache acts as both client and server
• server for original requesting client
• client to origin server
• server tells cache about object's allowable caching in response header:
Why Web caching?
• reduce response time for client request
• cache is closer to client
• reduce traffic on an institution's access link
• Internet is dense with caches
• enables "poor" content providers to more effectively deliver content
Goal: don't send object if cache has up-to-date cached version
• no object transmission delay (or use of network resources)
• client: specify date of cached copy in HTTP request
If-modified-since: <date>
• server: response contains no object if cached copy is up-to-date:
HTTP/1.0 304 Not Modified
Key goal: decreased delay in multi-object HTTP requests
HTTP/2: [RFC 7540, 2015] increased flexibility at server in sending objects to client:
• methods, status codes, most header fields unchanged from HTTP 1.1
• transmission order of requested objects based on client-specified object priority (not necessarily FCFS)
• push unrequested objects to client
• divide objects into frames, schedule frames to mitigate HOL blocking
HTTP 1.1: client requests 1 large object (e.g., video file) and 3 smaller objects
HTTP/2: objects divided into frames, frame transmission interleaved (尤指将片状物插入,夹进)
HTTP/2 over single TCP connection means:
• recovery from packet loss still stalls (暂缓;搁置;停顿) all object transmissions
• as in HTTP 1.1, browsers have incentive (激励) to open multiple parallel TCP connections to reduce stalling, increase overall throughput
• no security over vanilla TCP connection
• HTTP/3: adds security, per object error- and congestion-control (more pipelining,流水线) over UDP
• more on HTTP/3 in transport layer
2.3
Internet hosts, routers:
• IP address (IP地址, 32 bit) - used for addressing datagrams
• "name" - used by humans
• Domain Name System (域名系统, DNS):
• a distributed database (分布式数据库) implemented in a hierarchy of DNS servers
• an application-layer protocol that allows hosts to query the distributed database (address/name translation)
• hostname-to-IP-address translation
• host aliasing (主机别名)
• canonical (规范), alias names
• mail server aliasing (邮件服务器别名)
• load distribution (负载分配)
• replicated (冗余的) Web servers: many IP addresses correspond to one name
Q: Why not centralize DNS?
• single point of failure (单点故障)
• traffic volume (通信容量)
• distant centralized database (远距离的集中式数据库)
• maintenance (维护)
A: doesn't scale (有可扩展能力)!
Client wants IP address for www.amazon.com; 1st approximation:
• client queries root server to find .com DNS server
• client queries .com DNS server to get amazon.com DNS server
• client queries amazon.com DNS server to get IP address for www.amazon.com
Root DNS servers (根DNS服务器)
• official, contact-of-last-resort by name servers that cannot resolve name
• incredibly important Internet function
• Internet couldn't function without it!
• DNSSEC – provides security (authentication, message integrity)
• ICANN (Internet Corporation for Assigned Names and Numbers) manages root DNS domain
Top-level domain (TLD) servers (顶级域名服务器)
• responsible for .com, .org, .net, .edu, .aero, .jobs, .museums, and all top-level country domains, e.g.: .cn, .uk, .fr, .ca, .jp
• Network Solutions: authoritative registry for .com, .net TLD
• Educause: .edu TLD
Authoritative DNS servers (权威DNS服务器)
• organization's own DNS server(s), providing authoritative hostname to IP mappings for organization's named hosts
• can be maintained by organization or service provider
Local DNS server (本地DNS服务器)
• when host makes DNS query, it is sent to its local DNS server
• Local DNS server returns reply, answering:
• from its local cache of recent name-to-address translation pairs
• forwarding the query into DNS hierarchy
• each ISP has local DNS server; to find yours:
• MacOS: % scutil --dns
• Windows: >ipconfig /all
• local DNS server doesn't strictly belong to hierarchy
Example: host at engineering.nyu.edu wants IP address for gaia.cs.umass.edu
Iterative query (迭代查询):
• contacted server replies with name of server to contact
• "I don't know this name, but ask this server"
Recursive query (递归查询):
• puts burden of name resolution on contacted name server
• heavy load at upper levels of hierarchy?
• once (any) name server learns mapping, it caches mapping, and immediately returns a cached mapping in response to a query
• caching improves response time
• cache entries timeout (disappear) after some time (TTL)
• TLD servers typically cached in local name servers
• cached entries may be out-of-date
• if named host changes IP address, may not be known Internet-wide until all TTLs expire!
• best-effort (尽力而为) name-to-address translation!
RR (resource records, 资源记录) format: (name, value, type, ttl)
type=A
• name is hostname
• value is IP address
type=NS
• name is domain
• value is hostname of authoritative name server for this domain
type=CNAME
• name is alias name for some "canonical" (the real) name
• value is canonical name
type=MX
• value is name of SMTP mail server associated with name
DNS query and reply messages, both have same format:
• register name at DNS registrar (注册登记机构)
• create authoritative server locally with the IP address
DDoS (分布式拒绝服务) attacks
• bombard root servers with traffic
• not successful to date
• traffic filtering
• local DNS servers cache IPs of TLD servers, allowing root server bypass
• bombard TLD servers
• potentially more dangerous
Spoofing (哄骗) attacks
• intercept (截获) DNS queries, returning bogus (伪造的) replies
• DNS cache poisoning (毒害)
• RFC 4033: DNSSEC authentication services
Socket abstraction, UDP and TCP socket programming
Goal: learn how to build client/server applications that communicate using sockets
Socket: door between application process and end-end-transport protocol
• UDP: unreliable datagram
• TCP: reliable, byte stream-oriented
1. client reads a line of characters (data) from its keyboard and sends data to server
2. server receives the data and converts characters to uppercase
3. server sends modified data to client
4. client receives modified data and displays line on its screen
UDP: no "connection" between client and server:
• no handshaking before sending data
• sender explicitly attaches IP destination address and port # to each packet
• receiver extracts sender IP address and port# from received packet
UDP: transmitted data may be lost or received out-of-order
Application viewpoint:
• UDP provides unreliable transfer of groups of bytes ("datagrams") between client and server processes
Client/server socket interaction:
Python UDPClient |
||
include Python's socket library |
→ |
from socket import * |
serverName = 'hostname' |
||
serverPort = 12000 |
||
create UDP socket for server |
→ |
clientSocket = socket(AF_INET, SOCK_DGRAM) |
get user keyboard input |
→ |
message = raw_input('Input lowercase sentence:') |
attach server name, port to message; send into socket |
→ |
clientSocket.sendto(message.encode(), (serverName, serverPort)) |
read reply characters from socket into string |
→ |
modifiedMessage, serverAddress = clientSocket.recvfrom(2048) |
print out received string and close socket |
→ |
print modifiedMessage.decode() |
clientSocket.close() |
Python UDPServer |
||
from socket import * |
||
serverPort = 12000 |
||
create UDP socket |
→ |
serverSocket = socket(AF_INET, SOCK_DGRAM) |
bind socket to local port number 12000 |
→ |
serverSocket.bind(('', serverPort)) |
print ("The server is ready to receive") |
||
loop forever |
→ |
while True: |
Read from UDP socket into message, getting client's address (client IP and port) |
→ |
message, clientAddress = serverSocket.recvfrom(2048) |
modifiedMessage = message.decode().upper() |
||
send upper case string back to this client |
→ |
serverSocket.sendto(modifiedMessage.encode(), clientAddress) |
Client must contact server
• server process must first be running
• server must have created socket (door) that welcomes client's contact
Client contacts server by:
• Creating TCP socket, specifying IP address, port number of server process
• when client creates socket: client TCP establishes connection to server TCP
when contacted by client, server TCP creates new socket for server process to communicate with that particular client
• allows server to talk with multiple clients
• source port numbers used to distinguish clients
Application viewpoint
TCP provides reliable, in-order byte-stream transfer ("pipe") between client and server processes
Client/server socket interaction:
Python TCPClient |
||
from socket import * |
||
serverName = 'servername' |
||
serverPort = 12000 |
||
create TCP socket for server, remote port 12000 |
→ |
clientSocket = socket(AF_INET, SOCK_STREAM) |
clientSocket.connect((serverName,serverPort)) |
||
sentence = raw_input('Input lowercase sentence:') |
||
clientSocket.send(sentence.encode()) |
||
Read from UDP socket into message, No need to attach server name, port |
→ |
modifiedSentence = clientSocket.recv(1024) |
print ('From Server:', modifiedSentence.decode()) |
Python TCPServer |
||
from socket import * |
||
serverPort = 12000 |
||
create TCP welcoming socket |
→ |
serverSocket = socket(AF_INET,SOCK_STREAM) |
→ |
serverSocket.bind(('',serverPort)) |
|
server begins listening for incoming TCP requests |
serverSocket.listen(1) |
|
→ |
print 'The server is ready to receive' |
|
loop forever |
→ |
while True: |
server waits on accept() for incoming requests, new socket created on return |
connectionSocket, addr = serverSocket.accept() |
|
read bytes from socket (but not address as in UDP) |
→ |
sentence = connectionSocket.recv(1024).decode() |
capitalizedSentence = sentence.upper() |
||
connectionSocket.send(capitalizedSentence.encode()) |
||
close connection to this client (but not welcoming socket) |
connectionSocket.close() |
|
Transport-layer services and protocols. Transport layer actions.
• provide logical communication (逻辑通信) between application processes running on different hosts
• transport protocols actions in end systems:
• sender: breaks application messages into segments (报文段), passes to network layer
• receiver: reassembles segments into messages, passes to application layer
• two transport protocols available to Internet applications
• TCP, UDP
• network layer: logical communication between hosts
• transport layer: logical communication between processes
• relies on, enhances, network layer services
Sender:
• is passed an application-layer message
• determines segment header fields values
• creates segment
• passes segment to IP
Receiver:
• receives segment from IP
• checks header values
• extracts application-layer message
• demultiplexes (多路分解) message up to application via socket
TCP: Transmission Control Protocol
• reliable (可靠), in-order delivery
• congestion control (拥塞控制)
• flow control
• connection setup
UDP: User Datagram Protocol
• unreliable (不可靠), unordered delivery
• no-frills (不提供不必要服务的) extension of "best-effort" (尽力而为) IP
Services not available:
• delay guarantees
• bandwidth guarantees
What is multiplexing, demultiplexing? How is it done? How does it work in TCP and UDP?
Multiplexing (多路复用) at sender:
handle data from multiple sockets (套接字), add transport header (later used for demultiplexing)
Demultiplexing (多路分解) at receiver:
use header info to deliver received segments to correct socket
• host receives IP datagrams
• each datagram has source IP address, destination IP address
• each datagram carries one transport-layer segment
• each segment has source, destination port number
• host uses IP addresses & port numbers to direct segment to appropriate socket
Recall:
• when creating socket, must specify host-local port #:
• DatagramSocket mySocket1 = new DatagramSocket(12534);
• when creating datagram to send into UDP socket, must specify
• destination IP address
• destination port #
• when receiving host receives UDP segment:
• checks destination port # in segment
• directs UDP segment to socket with that port #
↓
IP/UDP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at receiving host
• TCP socket identified by 4-tuple:
• source IP address
• source port number
• dest IP address
• dest port number
• demux: receiver uses all four values (4-tuple) to direct segment to appropriate socket
• server may support many simultaneous TCP sockets:
• each socket identified by its own 4-tuple
• each socket associated with a different connecting client
• Multiplexing, demultiplexing: based on segment, datagram header field values
• UDP: demultiplexing using destination port number (only)
• TCP: demultiplexing using 4-tuple: source and destination IP addresses, and port numbers
• Multiplexing/demultiplexing happen at all layers
UDP segment structure. The Internet checksum.
• "no frills," "bare bones" Internet transport protocol
• "best effort" service, UDP segments may be:
• lost
• delivered out-of-order to app
• connectionless:
• no handshaking between UDP sender, receiver
• each UDP segment handled independently of others
• no connection establishment (which can add RTT delay)
• simple: no connection state at sender, receiver
• Small packet header overhead
• Finer application-level control over what data is sent, and when. No congestion control.
• UDP can blast away as fast as desired!
• can function in the face of congestion
• UDP use:
• streaming multimedia apps (loss tolerant, rate sensitive)
• DNS
• SNMP (Simple Network Management Protocol, 简单网络管理协)
• HTTP/3
• if reliable transfer needed over UDP (e.g., HTTP/3):
• add needed reliability at application layer
• add congestion control at application layer
RFC 768
UDP sender actions:
• is passed an application-layer message
• determines UDP segment header fields values
• creates UDP segment
• passes segment to IP
UDP receiver actions:
• checks UDP checksum header value
• extracts application-layer message
• demultiplexes message up to application via socket
Goal: detect errors (i.e., flipped bits) in transmitted segment
Sender:
• Treat contents of UDP segment (including UDP header fields and IP addresses) as sequence of 16-bit integers
• Checksum (校验和): addition (one's complement sum) of segment content
• Checksum value put into UDP checksum field
Receiver:
• Compute checksum of received segment
• Check if computed checksum equals checksum field value:
• Not equal - error detected
• Equal - no error detected. But maybe errors nonetheless? More later ….
Example: add two 16-bit integers
1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound
1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum
Note: when adding numbers, a carryout from the most significant bit needs to be added to the result
Weak protection!
1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0
1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 wraparound
1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 sum
0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 checksum
Even though numbers have changed (bit flips), no change in checksum!
Protocol mechanisms for reliable data transfer (rdt). Building an rdt protocol. Pipelining. Go-back-N. Selective Repeat.
We will:
• incrementally develop sender, receiver sides of reliable data transfer protocol (可靠数据传输协议, rdt)
• consider only unidirectional data transfer (单向数据传输)
• but control info will flow in both directions!
• use finite-state machines (FSM, 有限状态机) to specify sender, receiver
• underlying channel perfectly reliable
• no bit errors
• no loss of packets
• separate FSMs for sender, receiver:
• sender sends data into underlying channel
• receiver reads data from underlying channel
• underlying channel may flip bits in packet
• checksum to detect bit errors
• the question: how to recover from errors?
• Positive acknowledgements (ACKs, 肯定确认): receiver explicitly tells sender that packet received OK
• Negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors
• sender retransmits packet on receipt of NAK
stop-and-wait (停等)
sender sends one packet, then waits for receiver response
rdt2.0: the FSM Representation
Note: "state" of receiver (did the receiver get my message correctly?) isn't known to sender unless somehow communicated from receiver to sender
that's why we need a protocol!
What happens if ACK/NAK corrupted?
• sender doesn't know what happened at receiver!
• can't just retransmit: possible duplicate
Handling duplicates:
• sender retransmits current packet if ACK/NAK corrupted
• sender adds sequence number (序号) to each packet
• receiver discards (doesn't deliver up) duplicate packet (冗余分组)
stop-and-wait
sender sends one packet, then waits for receiver response
The FSM Description
Discussion
Sender:
• sequence number added to packet
• two sequence numbers (0,1) will suffice. Why?
• must check if received ACK/NAK corrupted
• twice as many states
• state must "remember" whether "expected" packet should have sequence number of 0 or 1
Receiver:
• must check if received packet is duplicate
• state indicates whether 0 or 1 is expected packet sequence number
• note: receiver can not know if its last ACK/NAK received OK at sender
• same functionality as rdt2.1, using ACKs only
• instead of NAK, receiver sends ACK for last packet received OK
• receiver must explicitly include sequence number of packet being ACKed
• duplicate ACK (冗余ACK) at sender results in same action as NAK: retransmit current packet
As we will see, TCP uses this approach to be NAK-free
New channel assumption: underlying channel can also lose packets (data, ACKs)
• checksum, sequence numbers, ACKs, retransmissions will be of help … but not quite enough
Approach: sender waits "reasonable" amount of time for ACK
• retransmits if no ACK received in this time
• if packet (or ACK) just delayed (not lost):
• retransmission will be duplicate, but sequence numbers already handles this!
• receiver must specify sequence number of packet being ACKed
• use countdown timer (倒计数定时器) to interrupt after "reasonable" amount of time
• Usender: utilization (利用率) – fraction of time sender busy sending
• example: 1 Gbps link, 15 ms prop. delay, 8000 bit packet
• time to transmit packet into channel:
• rdt 3.0 protocol performance stinks!
• Protocol limits performance of underlying infrastructure (channel)
Pipelining (流水线): sender allows multiple, "in-flight", yet-to-be-acknowledged packets
• range of sequence numbers must be increased
• buffering at sender and/or receiver
3-packet pipelining tripled the utilization.
• Sender: "window" of up to N, consecutive transmitted but unACKed packets
• k-bit sequence number in packet header
• Cumulative acknowledgement (累计确认): ACK(n): ACKs all packets up to, including sequence number n
• on receiving ACK(n): move window forward to begin at n+1
• Timer for oldest in-flight packet
• timeout(n): retransmit packet n and all higher sequence number packets in window
• ACK-only: always send ACK for correctly-received packet so far, with highest in-order sequence number
• may generate duplicate ACKs
• need only remember rcv_base
• on receipt of out-of-order packet:
• can discard (don't buffer) or buffer: an implementation decision
• re-ACK packet with highest in-order sequence number
• Receiver individually acknowledges all correctly received packets
• buffers packets, as needed, for eventual in-order delivery to upper layer
• Sender times-out/retransmits individually for unACKed packets
• sender maintains timer for each unACKed packet
• Sender window
• N consecutive sequence numbers
• limits sequence numbers of sent, unACKed packets
Sender |
Receiver |
• Data received from above. • if next available sequence number in window, send packet Timeout. • resend packet n, restart timer • ACK receivedin [send_base, send_base+N]. • mark packet n as received • if n smallest unACKed packet, advance window base to next unACKed sequence number |
• Packet with sequence number in [rcv_base, rcv_base+N-1] is correctly received. • send ACK(n) • out-of-order: buffer • in-order: deliver (also deliver buffered, in-order packets), advance window to next not-yet-received packet • Packet with sequence number in[rcv_base-N, rcvbase-1] is correctly received. • ACK(n) • Otherwise • ignore |
Example:
• sequence numbers 0, 1, 2, 3
• a window size of three.
• receiver can't see sender side
• receiver behavior identical in both cases!
• something's (very) wrong!
Q: What relationship is needed between sequence # size and window size to avoid problem in scenario (b)?
The window size must be less than or equal to half the size of the sequence number space for SR protocols.
The TCP connection and segment, RTT estimation and timeout, flow control
RFCs: 793,1122, 2018, 5681, 7323
• Reliable, in-order byte stream:
• no "message boundaries"
• Cumulative acknowledgements
• Pipelining:
• TCP congestion and flow control set window size
• Flow controlled:
• sender will not overwhelm receiver
• Connection-oriented (面向连接的):
• handshaking (exchange of control messages) initializes sender, receiver state before data exchange
• Full-duplex service (全双工服务):
• bi-directional data flow in same connection
• Point-to-point (点对点):
• one sender, one receiver
• Three-way handshake (三次握手).
• Send buffer (发送缓存):
• Maximum segment size (MSS, 最大报文段长度)
• Maximum transmission unit (MTU, 最大传输单元)
• TCP segments (TCP报文段).
• Sequence numbers:
• byte stream "number" of first byte in segment's data
• Acknowledgements:
• seq # of next byte expected from other side
• cumulative acknowledgement (累积确认)
• Q: how receiver handles out-of-order segments
• A: TCP spec doesn't say, - up to implementor
Q: how to set TCP timeout value?
• Longer than RTT, but RTT varies!
• Too short: premature timeout, unnecessary retransmissions
• Too long: slow reaction to segment loss
Q: how to estimate RTT?
• SampleRTT: measured time from segment transmission until ACK receipt
• Ignore retransmissions
• SampleRTT will vary, want estimated RTT "smoother"
• Average several recent measurements, not just current SampleRTT
EstimatedRTT = (1 – α) · EstimatedRTT + α · SampleRTT
• Exponential weighted moving average (EWMA, 指数加权移动平均)
• Influence of past sample decreases exponentially fast
• Typical value: α = 0.125
• Timeout interval: EstimatedRTT plus "safety margin"
• Large variation in EstimatedRTT: want a larger safety margin
TimeoutInterval = EstimatedRTT + 4 · DevRTT
• DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:
DevRTT = (1 – β) · DevRTT + β · | SampleRTT – EstimatedRTT |
(typically, β = 0.25)
Event: data received from application above
• Create segment with sequence number
• Sequence number is byte-stream number of first data byte in segment
• Start timer if not already running
• Think of timer as for oldest unACKed segment
• Expiration interval: TimeOutInterval
Event: Timer timeout
• Retransmit segment that caused timeout
• Restart timer
Event: ACK receipt
• If ACK acknowledges previously unACKed segments
• Update what is known to be ACKed
• Start timer if there are still unACKed segments
Fast retransmit (快速重传):
If sender receives 3 additional ACKs for same data ("triple duplicate ACKs"), resend unACKed segment with smallest sequence number
• likely that unACKed segment lost, so don't wait for timeout
Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost segment is likely. So retransmit!
Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?
Flow control (流量控制): receiver controls sender, so sender won't overflow receiver's buffer by transmitting too much, too fast.
• TCP receiver "advertises" free buffer space in rwnd field in TCP header
• RcvBuffer size set via socket options (typical default is 4096 bytes)
• Many operating systems autoadjust RcvBuffer
• Sender limits amount of unACKed ("in-flight") data to received rwnd
• Guarantees receive buffer will not overflow
Before exchanging data, sender/receiver "handshake":
• agree to establish connection (each knowing the other willing to establish connection)
• agree on connection parameters (e.g., starting seq #s)
Two-way handshake
Q: Will 2-way handshake always work in network?
• Variable delays
• Retransmitted messages (e.g., req_conn(x)) due to message loss
• Message reordering
• Can't "see" other side
Two-way handshake scenarios:
TCP three-way handshake(三次握手)
• Client, server each close their side of connection
• send TCP segment with FIN bit = 1
• Respond to received FIN with ACK
• on receiving FIN, ACK can be combined with own FIN
• Simultaneous FIN exchanges can be handled
Causes and costs of congestion, approaches to congestion control
Congestion:
• informally: "too many sources attempting to send data at too high a rate"
• manifestations:
• long delays (queueing in router buffers)
• packet loss (buffer overflow at routers)
• different from flow control!
• a top-10 problem!
Congestion control: too many senders, sending too fast
Flow control: one sender too fast for one receiver
Simplest scenario:
• one router, infinite buffers
• input, output link capacity: R
• two flows
• no retransmissions needed
Q: What happens as arrival rate in approaches R/2?
One cost of a congested network—large queuing delays are experienced as the packet-arrival rate nears the link capacity.
• one router, finite buffers
• sender retransmits lost, timed-out packet
• application-layer input = application-layer output: in = out
• transport-layer input includes retransmissions: ' in ≥ in
First, the unrealistic case
• Host A sends a packet only when a buffer is free.
The slightly more realistic case
• the sender retransmits only when a packet is known for certain to be lost.
Another cost of a congested network—the sender must perform retransmissions in order to compensate for dropped (lost) packets due to buffer overflow.
Finally, the case
• the sender may time out prematurely and retransmit a packet that has been delayed in the queue but not yet lost.
Yet another cost of a congested network—unneeded retransmissions by the sender in the face of large delays may cause a router to use its link bandwidth to forward unneeded copies of a packet.
• four senders
• multi-hop paths
• timeout/retransmit
If ' in is extremely large for all connections, the A–C end-to-end throughput goes to zero in the limit of heavy traffic.
Yet another cost of dropping a packet due to congestion—when a packet is dropped along a path, the transmission capacity that was used at each of the upstream links to forward that packet to the point at which it is dropped ends up having been wasted.
• End-end congestion control (端到端拥塞控制):
• no explicit feedback from network
• congestion inferred from observed loss, delay
• approach taken by TCP
• Network-assisted congestion control (网络辅助的拥塞控制):
• routers provide direct feedback to sending/receiving hosts with flows passing through congested router
• may indicate congestion level or explicitly set sending rate
• TCP ECN, ATM, DEC DECnet protocols
AIMD
approach: senders can increase sending rate until packet loss (congestion) occurs, then decrease sending rate on loss event
Multiplicative decrease detail: sending rate is
Cut in half on loss detected by triple duplicate ACK (TCP Reno)
Cut to 1 MSS (maximum segment size) when loss detected by timeout (TCP Tahoe)
Why AIMD?
AIMD – a distributed, asynchronous algorithm – has been shown to:
optimize congested flow rates network wide!
have desirable stability properties
details
TCP sending behavior:
roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes
TCP sender limits transmission:
cwnd is dynamically adjusted in response to observed network congestion (implementing TCP congestion control)
Classic TCP; Explicit Congestion Notification, delay-based TCP, fairness
TCP slow start
when connection begins, increase rate exponentially until first loss event:
initially cwnd = 1 MSS
double cwnd every RTT
done by incrementing cwnd for every ACK received
summary: initial rate is slow, but ramps up exponentially fast
TCP: from slow start to congestion avoidance
Q: when should the exponential increase switch to linear?
A: when cwnd gets to 1/2 of its value before timeout.
Implementation:
variable ssthresh
on loss event, ssthresh is set to 1/2 of cwnd just before loss event
Summary: TCP congestion control
Is there a better way than AIMD to "probe" for usable bandwidth?
Insight/intuition:
Wmax: sending rate at which congestion loss was detected
congestion state of bottleneck link probably (?) hasn't changed much
after cutting rate/window in half on loss, initially ramp to to Wmax faster, but then approach Wmax more slowly
K: point in time when TCP window size will reach Wmax
K itself is tuneable
increase W as a function of the cube of the distance between current time and K
larger increases when further away from K
smaller increases (cautious) when nearer K
TCP CUBIC default in Linux, most popular TCP for popular Web servers
TCP and the congested "bottleneck link"
TCP (classic, CUBIC) increase TCP's sending rate until packet loss occurs at some router's output: the bottleneck link
understanding congestion: useful to focus on congested bottleneck link
Keeping sender-to-receiver pipe "just full enough, but no fuller": keep bottleneck link busy transmitting, but avoid high delays/buffering
Explicit congestion notification (ECN)
TCP deployments often implement network-assisted congestion control:
two bits in IP header (ToS field) marked by network router to indicate congestion
policy to determine marking chosen by network operator
congestion indication carried to destination
destination sets ECE bit on ACK segment to notify sender of congestion
involves both IP (IP header ECN bit marking) and TCP (TCP header C,E bit marking)
Delay-based approach:
RTTmin - minimum observed RTT (uncongested path)
uncongested throughput with congestion window cwnd is cwnd/RTTmin
if measured throughput "very close" to uncongested throughput
increase cwnd linearly /* since path not congested */
else if measured throughput "far below" uncongested throughout
decrease cwnd linearly /* since path is congested */
congestion control without inducing/forcing loss
maximizing throughout ("keeping the just pipe full… ") while keeping delay low ("…but not fuller")
a number of deployed TCPs take a delay-based approach
BBR deployed on Google's (internal) backbone network
Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
Q: is TCP Fair?
Example: two competing TCP sessions:
additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
Is TCP fair?
A: Yes, under idealized assumptions:
same RTT
fixed number of sessions only in congestion avoidance
Fairness: must all network apps be "fair"?
multimedia apps often do not use TCP
do not want rate throttled by congestion control
instead use UDP:
send audio/video at constant rate, tolerate packet loss
there is no "Internet police" policing use of congestion control
application can open multiple parallel connections between two hosts
web browsers do this , e.g., link of rate R with 9 existing connections:
new app asks for 1 TCP, gets rate R/10
new app asks for 11 TCPs, gets R/2
TCP Evolution. HTTP/3, QUIC: functionality in the application layer.
TCP, UDP: principal transport protocols for 40 years
different "flavors" of TCP developed, for specific scenarios:
moving transport–layer functions to application layer, on top of UDP
HTTP/3: QUIC
application-layer protocol, on top of UDP
increase performance of HTTP
deployed on many Google servers, apps (Chrome, mobile YouTube app)
adopts approaches we've studied in this chapter for connection establishment, error control, congestion control
error and congestion control: "Readers familiar with TCP's loss detection and congestion control will find algorithms here that parallel well-known TCP ones." [from QUIC specification]
connection establishment: reliability, congestion control, authentication, encryption, state established in one RTT
multiple application-level "streams" multiplexed over single QUIC connection
separate reliable data transfer, security
common congestion control
TCP (reliability, congestion control state) + TLS (authentication, crypto state)
2 serial handshakes
QUIC: reliability, congestion control, authentication, crypto state
1 handshake
QUIC: streams: parallelism, no HOL blocking
Forwarding versus routing; data plane, control plane; network service model.
Per-router versus SDN control plane.
Link-layer: services, implementation context.