Network Architecture in the Data Center
Top-of-rack switching has become the norm in data centers. What does that mean for data center connectivity solutions?
With the advent of top-of-rack (ToR) switching and SFP+ direct-attach copper cables, more data centers are able to quickly implement cost-effective 10G-and-beyond connections. ToR designs originally popped up because data center managers ran out of room on their aggregation switches, so they needed to add incremental capacity. Now, the trending architecture of the data center network has changed to all but eliminate end-of-row (EoR) switches in favor of ToR.
Until fairly recently, data centers were designed with dedicated racks in mind – server, storage, and switch racks. However, about seven years ago, with the introduction of 10G connections into the data center, topologies started to change. In addition to incremental growth, data center managers started implementing ToR switches for a number of other reasons:
- Cable reduction: The cable from the server to the switch is less then three meters and the connections get aggregated in the ToR switch, so the cables running outside the rack get reduced by at least a factor of eight. This also helps with cable management throughout the data center.
- Latency and throughput: Switch manufacturers touted that adding ToR would lower latency, but that is a bit misleading. What really happens when you add a ToR switch is that you get higher data rates closer to the servers, so there is an appearance that latency was decreased, when in actuality, it’s just the data moving faster for a longer period of time. Inherently, 10-Gigabit Ethernet has 1/5 the latency that Gigabit Ethernet does. It also obviously has the capability of 10 times the throughput— provided that your switches can support line-rate (their backplanes can handle 10Gb/s). So implementing 10-Gigabit Ethernet in your equipment access layer of your data center can seriously reduce the amount of time data takes to get from initiator to destination. This, of course, is more critical in some vertical markets than in others – like the financial sector where microseconds can make a difference of millions of dollars made or lost. However, if latency is so critical, why not use an InfiniBand network instead of Ethernet ToR switches? It seems like InfiniBand already has these issues solved without adding another switch layer to the network.
- Fiber: A ToR switch not only gets higher data rates closer to the server, it also gets fiber there.
- Non-blocking/CLOS networking: In order to support CLOS networking, you most likely need to use a ToR switch. A CLOS network is a multi-stage network and its main advantage is that it requires less cross-points to produce a non-blocking structure. It is difficult and can be more costly to implement a non-blocking network without CLOS.
The diagrams below illustrate ToR and EoR configurations.
So which is the best data center architecture? Mainly, it depends on what you are trying to achieve. There are many considerations when moving to a ToR topology. Here are the main ones:
- Active versus passive racks: ToR switches typically do not cost that much unless you consider that if you were using a “traditional” structured cabling approach, you would have a passive patch panel that probably costs, at most, 1/10 that of the switch.
- Longevity: While the installation of the structured cabling is expensive, if you choose the latest and greatest like CAT6A or CAT7 for copper and OM3 or OM4 for fiber, it typically lasts at least 10 years (and could last longer). It can stay there for at least two and possibly three network equipment upgrades. ToR switches will probably need to be replaced every three to five years – with every network upgrade.
- Heating/cooling: Something most data center managers tend to forget when ToR switches are added is the heat. We visited several data centers with ToR switching and found that after about a month, some of the ports were exhibiting very high bit-error rates (BERs), to the point where they would lose their connection. This occurred because the switch is deeper than the servers that are stacked below it, so it trapped the exhaust heat at the top of the rack where some “questionable” copper patch cords were being used. This heat caused out-of-spec insertion loss on these copper patch cords and, therefore, bit errors high enough to shut down the port. Replacing the “cheap” patch cords with high-quality ones and making cabinet fans run continuously took care of the issue, but not without many hours of troubleshooting and downtime.
- Stranded ports: There are more switch ports than you can actually use within a rack. Some people call this oversubscription. Our definition (and the industry’s) for oversubscription is just the opposite, so this term will not be used here. But the complaint is this—cabinets are typically 42U or 48U high. Each server, if you’re using rack servers, is 1U or 2U. You need a UPS, which is typically 2U or 4U, and your ToR switch takes up 1U or 2U. So the maximum amount of servers you can have in a rack would realistically be 40. Most data centers have much less than this—around 30. In order to connect all of these servers to a ToR switch, you would need a 48-port switch. So you’re paying for 18 ports that you will most likely never use. Or, sometimes, data center managers may want to use these extra ports, so they connect them to servers in the next cabinet, which results in a cabling nightmare.
Popular ToR switches are Arista’s 7048 series, Brocade’s FCX series, Cisco’s Catalyst 4900 series or Nexus 3000 series, and Extreme’s X650 series.
Most data centers today still have a combination of ToR and EoR architectures, but we see a high level of adoption of ToR now for the reasons stated above. The connectivity solutions for ToR started as 10GBASE-CR and 10G SFP+ direct-attach copper (DAC) cables (twinax), but 10GBASE-T (twisted-pair) has now become a mainstream connection as well. As we move into 25G or 40G servers within the next few years, we anticipate these connections to again start as SFP28 or QSFP+ DACs or AOC since these solutions are already available. If/when we see 25GBASE-T or 40GBASE-T, the volume of connections will once again be shared.
To learn more about how the adoption of ToR switches is affecting connectivity in the data center, see Bishop & Associates’ report entitled, “Multi-Gigabit Datacom Connectors and Cable Assemblies Market.”