================= 0. LOGISTICS, REVIEW, OUTLINE (5 minutes) ================= * Logistics: Assignment 1 will be out on Friday, Sep 15. Due Sep 29. ** Upload on Piazza as a zip file as a private post to both me and Sai Anirudh alone. ** PLEASE DO NOT upload publicly anywhere. ** Upload multiple times if needed by editing the post. * Review: Last class: ** Internet's design goals ** Where are the 5 layers? ** Packet & circuit switching * This class: We'll get concrete, much less abstract :) But try to keep big picture in mind. ** Finish up network performance from last class: throughput and latency in packet switching ** Identifying hosts and routers on the Internet (IP addresses and domain names) ** DNS: Mapping domain names to IPs ** Socket programming: UDP Sockets, TCP Sockets ================ 1. PERFORMANCE OF A NETWORK (10 minutes) ================ 1.1 Throughput ---------------------------- * Aggregate metric: data per second * Different at each layer because of protocol overheads (give an example) * In-order delivery reduces throughput further if bytes are lost 1.2 Latency --------------------------- * Per-packet metric: how long does it take for packets to get from one end to another? >>>>>>>>>>>DEMO: Could do a ping demo here if time permits. 1.3 What do applications care about? --------------------------- * File transfer: throughput * Audio calls: latency * Video conferencing: both * Will later see more precise ways of capturing application preferences 1.4 Performance differences between packet, circuit switching -------------------------- * Everything is guaranteed in circuit switching. * But, a call might be rejected if there isn't enough network capacity. * In packet switching, performance is variable depending on who else is on the network. * But, you are never prevented from getting on to the network. ========================== 2. IDENTIFYING HOSTS ON THE INTERNET (15 minutes) ========================== 2.1 IP addresses --------------------------- * 32-bit number in Internet Protocol version 4. Use quad dot notation. * Two kinds of IP addresses: public and private * Public IPs: An IP that can be routed to (router can find a path to this IP). ** E.g., a server that needs to be on a well-known IP address. * Private IPs: For devices on a private network ** a group of machines doing number crunching or data processing ** A laptop at home. ** Allows us to reuse same addresses in different independent networks. ** Solution to IP address exhaustion. ** But, doesn't make sense to routers on the global Internet; they can't route to it. For instance, can't easily use laptop as a server * What if a host on a private IP network needs to get to the outside world? ** Example: Laptop wants to reach www.google.com ** A device called a network-address translator translates between these two addresses ** Typically, one public address for many private addresses ** >>> DEMO: Type IP address into Google to find out your public IP address. ** >>> DEMO: Maybe do this from my phone as well. * Private IPs are an example of hierarchy: ** Analogy: Room numbers only make sense within a building, building address is global ** Allows us to scale to a large number of hosts ** More examples of hierarchy in routing layer section. * Allotment: ** IANA controls public IPs ** Self-assign private IP so long as it's in a few designated ranges (192.168.*.*, 10.*.*.*, 172.16.*.*, etc.) 2.2 Domain names ---------------------- * Human-readable analogue of IP addresses * Hierarchical structure: Top-level domain (.com, .net, .org, etc.) subdomains (google, nyu) ======================= 3. DOMAIN NAME SYSTEM ======================= * Question: How do we map domain names to IP addresses? * What's the dumbest way to do this: a single mapping file ** This is how it worked before 1984. What do you think this file was called? HOSTS.TXT ** A few 100 hosts on the Internet then (show them the graph). ** Call up operator at SRI to get yourself added to it. ** Get HOSTS.TXT over the Internet by asking your friend for it :) * Clearly unsustainable as the Internet grew. ** DNS evolved as an automated solution to this problem. ** Think of it as a globally available hash table / dictionary. * How does DNS work? ** Hierarchy of servers similar to the domain name hierarchy: --> root server (logically just one, physically replicated) --> TLD servers (one each for .com, .org, etc.) --> authoritative servers (one each for FB, Google, NYU) --> local server (provided as a convenience, outside the hierarchy) ** So how do you lookup www.google.com --> Start at your local DNS server. --> If it already has the IP address, it returns it, you're done. --> Otherwise, start at root DNS server and go downwards. ** Recursive vs. iterative DNS --> Iterative: delegate to next server in the chain and return right away. --> Recursive: take on responsibility on your own for the lookup. * DEMO with dig+trace * But, this recursive or iterative process takes up too much time. ** Each arrow can be several ms. ** Too much load on the higher levels of the hierarchy ** How do we fix this? ** Standard systems trick: Caching ** Similar to processor memory hierarchy * Finally, who allocates domain names. (COULD OMIT IF NEEDED) ** New TLDs are created by ICANN (an umbrella organization of the IANA) ** ICANN decides who operates the 13 root servers at specific IPs. --> Really no difference among these 13 (think of it as one root server system) --> Each of the 13 is also replicated. --> Verisign is only example of a company operating a root server. ** ICANN also decides who controls allocation of each TLD --> These are called registries (e.g., Verisign) ** Registries sell subdomains to registrars --> GoDaddy is one registrar ** Registars sell to web site owners * Why have two idenitifiers (domain names and addresses)? ** Domain names are easier to remember ** addresses are easier for routers to handle when finding where to send your packets. ======================== 4. SOCKET PROGRAMMING ======================= * How do applications talk to the network? * The Socket programming interface (been around since 1983. Great example of longevity.) * We'll discuss the one in Python * UDP first and then TCP. * Mostly going to be a show-and-tell on the computer, but here are the important concepts. * UDP Sockets ** the socket() method ** the bind() method ** the sendto() method ** the recv() method * What if we want to receive from two sockets at once? ** recv() blocks ** can use non-blocking recv(), but this consumes too much CPU ** the select() method: wait on multiple sockets at once. Wake up if any one has data. * Aside: select() is a great example of modularity ** Designed to accommodate certain kinds of file descriptors ** But also works for sockets, because sockets are file descriptors at their core. * TCP Sockets: a few additional complications ** Need to synchronize sender and receiver so that they agree where bytestream starts ** listen(), connect(), accept() * accept() returns a new socket? Why? ** Can't tied up original listening socket in data communication * After accept(), communication similar to TCP, but a few key differences ** UDP is message oriented. TCP is bytestream oriented.