First TMF Switch-Off: Organizational Meeting (Minutes)

The Measurement Factory

These are the notes from switch-off organizational meeting. Meeting CFP is available elsewhere.

Table of Contents

1. Attendees
2. Executive summary
3. Scheduling
4. Client/server-side modifications and DSR
5. L4 versus L7
6. L7 features
    6.1 Load balancing algorithm
    6.2 Server-side persistent connections
    6.3 Source IP address preservation
7. Traffic aggregation and ingress points
8. Miscellaneous
9. To Do

This page talks about various issues discussed at the meeting. Further switch-off preparations will be based, in part, on the decisions reached at the meeting. However, some issues remain unresolved and old preferences might change. Further public discussions will be held on the mailing list. We welcome any constructive input.

Meeting slides (Postscript) and network setup figure are available.

1. Attendees

We have invited all companies with load balancing products we knew about. Besides The Measurement Factory, the following companies sent their representatives.

ARA Networks, Dell, Extreme Networks, and LVS are probably interested in competing but did not send their representatives to the meeting.

2. Executive summary

The first switch-off will include both Layer 4 and Layer 7 workloads. Meeting attendees agreed on baseline bench design and some key workload parameters. Direct Server Return using a loopback VIP alias on the server may be allowed (provided server configuration does not change), but entries using DSR will be clearly marked on the report, with trade-offs explained. Server-side persistent connections may be turned off at vendor request. There will be a limit of 4 ingres points, but vendors will be able to use L2 aggregation devices. The switch-off will be scheduled for the last week of July.

3. Scheduling

Everyone feels the proposed schedule is too aggressive. Suggest to insert an additional two weeks between Workload/Code freeze and Switch-Off registration.

Avoid scheduling conflicts with SuperComm, June 3.

4. Client/server-side modifications and DSR

One of the major issues for TMF is whether, or what extent, to allow client- and/or server-side modifications. For example, some LB products require certain IP alias addresses bound to loopback interfaces. Some products may not actually require loopback-binding, but might just perform significantly better with it. Some LB products require server-side software ``plug-ins'' to balance load.

For TMF, the issue is consistency and fairness. We want benches to be identical to the greatest extent possible. It is perhaps unfair to compare product A that uses custom server-side software and product B that works without such modifications.

This discussion quickly came to focus on Direct Server Return (DSR). When DSR is in use, reply traffic from servers to clients does not pass through the LB device (or passes through the device on Layer 2 only). Because the LB device doesn't handle this traffic in L4+, it can usually achieve higher throughput.

Attendees seemed to feel that it is wrong/unfair to compare DSR and non-DSR products. There was a lot of discussion about what a LB customer will allow or require. Since there was no consensus, the discussion was postponed for later. Later, it turned out that since other differences in traffic handling (e.g., source IP rewrite and persistent connections) should be allowed to accommodate more vendors, allowing DSR is acceptable as long as DSR entries are clearly marked on the report and the trade-offs of using DSR are documented. The ``equal but separate'' policy will allow more vendors to participate in the test while allowing customers to discriminate based on their unique preferences.

Note that DSR cannot work with L7 workloads because a switch has to terminate each TCP connection, pretending to be an origin server. Hence, DSR is specific to L4 tests only.

TMF will collect more input from switching community before making the final decision on the DSR issue.

5. L4 versus L7

The next big issue is whether to test a L4 or L7 configuration. Attendees generally agree that customers today mostly deploy L4 configurations, perhaps as much as 80%. However, they also agree that when buying equipment today, L7 performance and features are significant factors. Customers may be using L4 today, but plan to use L7 in the foreseeable future.

The consensus was to include L7 workload into the switch-off mix. Each switch-off participant will choose to run L4 and/or L7 tests.

6. L7 features

6.1 Load balancing algorithm

Having agreed that there should be a L7 workload at the switch-off, we next discussed some of the particulars. L7 load balancing usually means forwarding based on URLs, Host headers, or Cookies. URLs seems to be the most common and ``intuitive''. The consensus was to offer URL-based switching only.

URL-based switching can be broken down into the following categories: prefixes, suffixes, general pattern matching, hashing. In order to test URL hashing for correctness, Polygraph would need to understand and implement the vendor's hash algorithms. There was rough consensus that URL suffix matching is the most popular.

TMF proposed the following scheme: Polygraph will use three URL extensions: .JPG, .GIF, and .HTM. These three extensions will be uniformly popular (1/3 each). This URL suffix matching workload uses a minimum of three servers. Someone suggested that the minimum should be 6 servers so the LB device can demonstrate both ``group selection'' and ``server selection'' within a group. Participants agreed that having 6 servers as a minimum server-side configuration is acceptable.

Since we want Polygraph (back-end) servers to be evenly balanced as server load increases, we need to keep the number of Polygraph serves a multiple of 3, adding more servers as needed: start with 6 servers, then 9, then 12, etc.

Regardless of the URL set partitioning, we have to maintain the mean object size around 10-11KB (median ~4KB) while keeping per-extension distributions realistic. TMF will propose the workload configuration to satisfy these principles.

6.2 Server-side persistent connections

HTTP/1.1 persistent connection reuse was an interesting discussion. The persistent connection model is relatively well-established in PolyMix workloads. TMF was planning to use similar parameters for L4/7 testing. This proposal met with strong objections, especially for the L7 tests.

TMF pointed out that a LB product is always allowed to close a persistent connection by adding a 'Connection: close' header. However, some vendors seemed to feel that this was an unfair burden and would cause performance degradation. They argued that a performance-oriented customer can always configure the servers to disable persistent connections.

With the L4/7 workloads, where LB devices are directly attached to back-end servers with ``zero'' latency, there seems to be little performance incentive to support persistent connections. TMF reluctantly agreed to make persistent connections an optional feature of the L4/7 workloads.

The proposed maximum number of requests for one client-side connection has a zipf(64) distribution. The proposed server-side distribution is zipf(16). These values were found to be wrong for server-side-oriented workloads. Attendees argued that clients should have fewer requests per connection because, in reality, the same connection pool would be used to access more than one server. The reverse is true for the server side because an origin server talks to one LB device rather than many clients. The workloads will be adjusted accordingly.

6.3 Source IP address preservation

There was some concern that some LB devices may change the source IP address of packets/connections forwarded to back-end servers. Attendees suggest that results should indicate which source IP addresses the back-end servers actually see. The report should also document the trade-offs of IP rewriting.

When source IP address can be changed, some LB products may ``merge'' two connections into one (i.e. when two client requests for the same URL arrive at the same time). Such merging looks like cache hits. Polygraph is able to detect this condition and it would appear in results. Some LB can make better use of server-side persistent connections because connections from different clients can be reused if the source IP is rewritten.

7. Traffic aggregation and ingress points

Since the number of client-side PCs is large for L4/7 tests, and real LB only have a few ingress points, TMF proposed to have optional L2 aggregation hardware in the bench setup at no cost to the participant (as far as price/performance reporting is concerned). See network setup diagram.

Attendees also agreed that a maximum of 4 ingress points must be enforced for the device under test. All Polygraph PCs use 100Mbit cables, but aggregation devices can be used to produce Gbit feeds.

8. Miscellaneous

Polygraph supports HTTP-based health checks if a LB can be configured to send a request for a specific URL. Health check requests will appear in totals (total requests, total connections). We can also report the number of health-checks separately. TCP setup-only connections (with no HTTP request) appear as errors in polygraph logs. ICMP-based health checks are transparent to Polygraph.

TMF needs to propose a better mix of client-side packet delays and loss.

Caching of traffic will be prohibited during the switch-off. Load balancers that are also caching proxies are encouraged to participate in TMF cache-offs where WebAxe workload will be offered to test reverse proxies or Web accelerators.

9. To Do