Review of Computer Centre Upgrade plans
A project was set up in late 1999 to prepare a plan to upgrade CERN's
computer centre to meet the infrastructure demands of LHC offline computing. The
project is now at a stage where major investment is required for the
construction of a new electrical substation. An external review is therefore
being organised to validate the various assumptions and design choices and to
comment on the adequacy and cost effectiveness of the upgrade plans.
Review Committee Members: Edwin Bringardner (PSINet/Meyrin), Lluis
Miralles (IFAE Barcelona & ATLAS), Jonathan Rodin (PSINet/Meyrin)
LHC Offline Computing Requirements
The offline computing needs of the LHC experiments were established during
the
"Hoffmann Review" in 2000. For the first year of operation at full
luminosity (2008), the total requirements for the computing centre at CERN
are estimated at
· CPU capacity |
2,300KSpecInt95 |
· Disk Storage |
2.3PB |
· Tape Storage |
11.5PB |
· Tape I/O bandwidth |
3.2GB/s |
The computer centre infrastructure must be sufficient to accommodate this
level of equipment and the foreseen upgrades.
Electrical Power Distribution
Relevant Project Documents
Estimation of Impact
In 1999, our estimation was that a dual-processor PC would have a power
consumption of 100Wi.e. we considered that technology developments would lead
to improved SpecInt/Watt performance as raw SpecInt performance increased. Based
on likely CPU performance evolution, we estimated a need for 15,000+/-2,000
systems in 2007. Including general infrastructure led to a predicted computing
load of 2MW.
Since then, LHC startup has been delayed one year. Using the same
assumptions for CPU performance evolution, we predict a need for 8,000 systems
in 2008 (first year at full luminosity). However, we have come to believe that
the energy efficiency of PC systems is not improving over time. Instead, PC
power needs can be calculated as 1W per SpecInt95 (doc,
pdf). On this basis, the computing
load in 2008 will be approximately 2.5MW.
How should we translate computing requirements in
SpecInt into power demand in kW?
Visits to other computer centres showed that all assumed a power factor
of 0.7 for computing equipment and protected against high levels of 3rd
harmonics by sizing the neutral conductor at twice the cross section of phase
conductors. Measurements on computing equipment installed at CERN shows such
provision to be appropriate if no special requirements are placed on the power
supplies of the computing equipment. Stricter requirements during PC purchasing,
however, could enable us to size the electrical installation for a power factor
of 0.8 with a consequent reduction in cost.
We have chosen to size everything assuming a power factor of 0.7 as we
believe the higher recurrent costs associated with strict criteria during PC
purchasing over the 20+ year lifetime of the upgraded computer centre would more
than offset the reduction in initial cost of the electrical infrastructure.
Comments on our choices in this area would be
welcome, as would any comments on how to size PDUs appropriatelyis
it best to "derate" PDUs (e.g. use PDUs rated at 400A to support loads of 250A)
or are PDUs with large cross-section neutral busbars preferable?
In addition to the computing load, the proposed substation must support the
hvac and general building services loads, ensuring adequate separation.
Comments on our arrangements would be appreciated.
Autonomy Considerations
Our review of outside computing centres showed universal reliance on rotary
UPS systems to ensure continued service in the event of primary supply failure.
As CERN has an auto-transfer mechanism which allows the backup French (or Swiss)
supply to cover failures of the primary Swiss (or French) supply,
installation of a dedicated rotary UPS to cover the physics load was not
considered cost effective. Rather, we accept the risk that the physics load will
be shed if the auto-transfer mechanism fails and the primary supply is down for
more than 5-10 minutes. We estimate that such failures will occur once every
5-10 years.
Comments on the estimated reliability of the
autotransfer system and the shedding of physics load would be appreciated.
Critical computing equipment (site networking and servers for accelerator
operation, home directories, mail and web services and business databases) will
have backup coverage from the general site diesel generators. A maximum of
1.4MVA is available for the computer centre. Taking into account safety and air
conditioning loads, the maximum equipment load that can be supported is 250kW.
Dedicated areas for critical computing equipment will be provided in the ground
floor and basement machine rooms.
Comments on the overall provision for maintenance of
service in the event of power failure would be appreciated.
Proposed Solution
The space demands for the required substation are estimated at 450m2
plus 150m2 for transformers. A total area of 220m2 is
available in B513 today. Reuse of this space is possible provided that services
are maintained. Various locations for the additional space were considered, but
siting the additional space next to the existing space was seen as an overriding
requirement to minimise cable lengths and installation problems. We therefore
propose the creation of additional space under the car park next to B513.
Comments on the validity of our arguments for siting
the substation would be appreciated.
We have considered two possible suppliers of static UPS equipment, MGE and
Gutor. The different technologies used lead to different space requirements. As
the supplier of the physics UPS will not be chosen until 2006, we consider that
we must design the substation to accomodate both solutions.
Comments on our approach in this area would be
appreciated. We need an acceptable balance between short and long term costs. In
terms of UPS systems, we would appreciate comments on the relative merits of
delta conversion and double conversion.
We consider that the creation of the new substation must be complete by the
end of 2004 at the very latest as the growth in physics computing load will
otherwise cause the B513 load on the secured (diesel) supply to exceed 1.4MVA,
putting at risk the continuity of the critical computing services.
Comments on our timing of the substation
construction would be appreciated.
Possible Future expansion to 4MW
If computing electrical power loads scale with SpecInt capacity, expansion of
the computing capacity during LHC operation could lead to computing loads of up
to 4MW. We consider that designing a substation that can be smoothly upgraded
from 2.5MW to 4MW to be difficult at this stage given uncertainties in the
future hvac load.
Comments on our approach in this area would be
appreciated. Again, we need to pay attention to long term costs, but we cannot
delay design of the substation if it is to be operational by the end of 2004.
HVAC
Relevant Project Documents
Estimation of Impact
We consider that the power delivered to computing systems is transformed to
heat with 100% efficiency. Overheads (such as solar heating in summer) are
evaluated at between 600-1,000kW depending on the floor area used and the use of
machine room air conditioning for office areas.
Given the limited clearance of the newly created basement machine room (2.8m
above a 0.7m) false floor, we consider the maximum load that can be cooled to be
500W/m2.
Comments on our assumptions concerning maximum heat
loads per m2 would be appreciated. Are we being conservative in our view of
acceptable environmental conditions?
Assuming equipment with a consumption of 500kW is installed in the basement,
the ground floor machine room must cope with a heat load of 2MW. Over the
existing area of ~1450m2, this is equivalent to 1.4kW/m2.
Again, comments on the feasibility of this heat load
would be appreciated.
Possible Future expansion to 4MW
Assuming a heat load of 1.4kW/m2 can be supported in the machine
room, we consider that by using the Barn area, we can support a
total load of
3.3MW, although an upgrade of the chilled water network would be required.
Comments on the extension proposal would be
appreciated.
Space
Estimation of Impact
Simplistic "box size" estimations in 1999 suggested a total space requirement
of 2,000m2, an increase of 500m2 on the space then
available. It later became clear that hvac capacity was the more relevant factor
when assessing space requirements. Assuming an average capacity of 1kW/m2,
we would need a total area of 2,500m2. Bellwater, a computer centre
construction firm, undertook a study
of B513 in 2000 and suggested that additional space be made available by
constructing a mezzanine floor over the Barn area. This option was rejected,
partly due to short term difficulties in clearing the Barn prior to construction
but mostly as we considered this would limit future expansion.
The former tape vault in the basement of B513 has been converted to a machine
room area at a cost of approximately 1.8MCHF. The following project documents
relate to the conversion:
Comments on our decision and on the suitability and
quality of the converted machine room area would be appreciated.
Fire Detection/Suppression
Our visits to other computer centres showed that all opted to maintain
services for as long as possible in the event of smoke detection. Consequently,
almost all had installed some form of inert gas fire suppression system triggered
automatically but on a "double knock" requiring confirmation of smoke by low
sensitivity detectors after an initial alert from high sensitivity laser based
systems.
We have taken the view, at least for the basement machine room, that cutting
equipment power once smoke has been detected will be sufficient to prevent any
progression to an actual fire. (This is the approach taken in the experimental
areas at CERN.) The room is divided into 3 zones, power to each of which can be
cut independentlyagain with a "double knock" system.
Comments on our decision and on the fire detection
arrangements would be appreciated.
We are not yet sure how best to proceed in the ground floor machine room. We
consider that zoning the room for fire detection purposes in not possible due to
the large height (7m). On the other hand, separation of the room into small
areas would reduce the maximum load that could be cooled. We have considered
smoke detection systems just above rack height (at 2-2.5m) and closed racks with
dedicated smoke detection, but are not convinced that either solution is
appropriate (for example, closed racks are problematic from the hvac point of
view).
We welcome any comments on possible fire detection
and suppression systems. In particular, comments on water based ("Hi-Fog")
systems would be most welcome.
|