Bulk Transport Protocol Call Friday, March 18, 2005 Attendees: Stanislav Shalunov Shawn McKee Matt Zekauskas Steve Senger Larry Dunn Susan Evett (scribe) Call started at 12:01 p.m. EST. Agenda A. Review of previous action items B. New business A. Review of Previous Action Items 1. Stanislav continued to pursue Internet2 WG status; at this point, it is in the queue and a matter of time until it is in effect. [Done] 2. Steven to release a new version of the API document -- he sent out Draft 3 this morning. He made several of the changes; needs more information on: a. - x_sockerror [Done] b. - Statistics include availability of data. [Done] c. - x_socket() to include a last argument of (X_SOCKET *) [Done] d. - Max library datagram -- Gu raised several questions about this, both during and after the call. For Steven’s purposes, he’d like to construct the message as an indestructible unit -- either it is delivered intact or not at all. He could then break his messages up into smaller bits. Out-of-order delivery of messages -- doesn’t effect Steven’s application. So he’s not planning to address it. Larry was concerned that out-of-order delivery could have an effect on how long you wait before declaring messages `lost’. Steven thought that allowing messages to have a size larger than the MTU. Stas suggested IPv4 have message sizes up to 2* the size of the MTU; don’t restrict IPv6 message size -- it would be too limiting for future developments. Stas would be fine with 16-bit messages but he’d be happier with a 32-bit field. Matt noted that it didn’t make sense to him to fragment stuff at the IP level; Stas suggested that, to reduce the number of context-switches, you could issue your packets in 64KB (UDP) and have the kernel fragment them for you (old NFS trick); he can’t think of any other reason to fragment at the IP layer. Larry asked if you would be reusing an existing system functionality vs. writing it each time. Stas felt that, mostly, this is existing functionality (tho not done very well). Larry asked if this was working well for large UDP datagrams -- Matt said, at the LAN level, yes but he wouldn’t use it outside that context. Stas reported that errors (undetected by UDP) occurred in very long streams at the rate of 30-40x/terabyte. Problem with UDP checksum -- too short -- so this requires other e2e checksums. Fragmentation before it hits IP layer and do our own checksums. Matt didn’t know how this would pan out in user space. Matt noted that, if lose one fragment, you have to resend the entire message. Stas argued that, if performance doubles, it would still be worth it. Another argument against very large packets. Steven suggested running an experiment -- Larry was concerned that we could not sufficiently test this. Steven asked how the group planned to resolve this -- Stas and Matt offered to forward Matt Matthis’ IPPM WG internet draft to the group. Stas recommended keeping the option of taking the burden away from the library and putting it into the IP stack. Stas feels it was too early to call, so the API should include both. Implementation would need to react to this and the protocol spec would need to know about it. Use setsockopt for both datastreams and datagrams. e. - A flag to tell the library whether to set DF or use PMTUD -- Steven used different elements than suggested by the group. Discussion ensued. Group felt that the solution he offered was workable. You pass a single array and pass a single status flag to indicate what you want in that. Flag values are: readable, writable, exception. This is used in the select call. Union of the arrays with the union of the flags and, when it returns, figure out what happened. `Select’ sets a status variable that you can test after the call returns to see if it has satisfied the conditions you asked for. Computation -- based on the stats structure (cumulative counts over the life of the socket, or last 2 calls to this function, for example). Larry asked if it was possible to have the application be able to identify what the stats structure would be for the rate data (computation) that you are getting back. Stas didn’t feel that the application should be burdened with maintaining too much state; having it in the library (1 time) would be more economical than having it in each application (requiring updates to the software). Steven felt that, during the life of an application, he’d be able to call up these statistics -- which would work for his type of application. Stas asked if there would be a way to have short-term and long-term rates. Larry suggested another possibility -- Steven felt that, to do this, you need to separate out two different types of variables -- one to do the computation and one to provide data. Steven suggested that he create a strawman for the group’s review. f. - A way to configure underlying send() size [Done] 3. Gu will collect his ``gotchas'' on programming with Windows. Gu collected a significant number and sent to group for review. Stas wrote to the Microsoft Asia contact -- Matt noted that this was the same guy who did the original I2 Land Speed Record with Abilene. [Done] B. New Business -- none. ACTION ITEMS: a. Steven to modify API such that: 1) 32-bit filled messages are allowed for IPv6 and 2) messages are delivered intact (or not at all). b. Steven to modify the API such that: datagram description should be disable PD. c. Steven to develop a strawman document on the use of two different types of variables (one to do computation and one to provide data). d. Stas will poll group on good time to meet at the Spring Member Meeting (same time or switch to breakfast). Next call is at the same time on April 1. Call ended at 12:57 p.m. EST.