Les Robertson
10 November 1999
Notes on the ST-IT coordination meeting - 2 November 1999
Present: Anne Funken, Les Robertson, Tim Smith, Dave Underhill, Mario Vergari
1. Background
IT Division has requested assistance from ST in the long term planning of the infrastructure for LHC offline computing, in particular the requirements for power and cooling of the computing fabrics that will be installed in building 513 (Computer Centre). ST Division has appointed Anne Funken (ST/EL) as the project leader responsible for coordination of medium and long-term evolution of the technical infrastructure of B.513. This meeting was the first of a series of regular meetings to establish the infrastructure requirements for LHC computing, provide initial cost estimates, and make a long term plan taking account of the medium term requirements and evolution.
2. People attending the meeting
Anne Funken - ST/EL - ST coordinator for B.513
Tim Smith - IT/PDP - planning of LHC physics computing facilities
Dave Underhill - IT/CIO - group leader, central infrastructure & operations
Mario Vergari - IT/PDP - operations manager of physics data processing services
Les Robertson - IT deputy division leader
3. Scope & relationship to other activities
It was agreed that the present series of meetings will cover the following areas.
Not included in the scope of these meetings are:
4. Discussion of issues to be examined.
Dave had provided a list of discussion points (attached). The conclusions and actions are listed below.
Power
An initial estimate of power requirements for CMS in 2006 has been produced by Les (see table). It is assumed that the total requirement in 2006 is 5 times that, to cover ATLAS, ALICE, LHCB, other experiments and neutrinos, giving a total sustained load of 2 MW.
The estimates assume that the base power consumption of a small system box with two disks will be similar in 2006 to current examples. Measurements of the constant and peak (start-up) power consumption of current PCs should be made (action: Mario). Current processor power consumption is about 25-30 Watts per packaged chip. This is likely to increase slightly.
CMS farm 2006 - physical space & power |
|||
processors |
tape |
||
4 |
cpus/box (400 SI95/box) |
20 |
drives per stack |
1 |
sub-farm per rack |
5 |
stacks per farm |
5 |
sq.m. floor area per rack |
2 |
sq.m. per stack |
40 |
sub-farms |
10 |
sq.m. per farm |
200 |
sq.m. per farm |
75 |
Watts per drive |
175 |
Watts per box |
8 |
KWatts per farm |
1'400 |
boxes |
100 |
GB per cartridge |
245 |
KWatts per farm |
2 |
PB per farm |
disks |
20'000 |
cartridges per farm |
|
16 |
disks/shelf (1.6TB/shelf) |
6'000 |
cartridges per silo |
1 |
disk shelf/array |
4 |
silos per farm |
1 |
shelf/controller pair |
120 |
sq.m. library + drives per farm |
2 |
shelf slots per array |
1.5 |
KWatts per silo |
9 |
arrays per 19" rack |
6 |
KWatts per library |
14'400 |
GB per rack |
14 |
KWatts library + drives per farm |
340 |
arrays per farm |
||
38 |
racks per farm |
totals |
|
1.1 |
sq.m. per rack |
400 |
KWatts power |
50 |
sq.m. per farm |
370 |
sq.m. floorspace |
300 |
Watts per disk tray |
||
100 |
Watts per controller pair |
||
400 |
Watts per array |
||
140 |
KWatts per farm |
The space requirements are such that it will be necessary to install equipment in the tape vault and the former motor generator room in the basement of B.513.
Tim will make an estimate of the evolution for the pre-LHC period (next four years). (Action: Tim).
Tim will also investigate the possibilities for central control of the power at the rack and server level. (Action Tim).
An uninterruptable power supply (UPS) is required to cover short power outages. For longer term interruptions (> 15 minutes) an alternative power supply is required, such as diesel powered generators. ST Division is at present reviewing the future of the Meyrin site stand-by generators, expecting to reach a conclusion in the next few months.
Dave will verify the warranty conditions for the current UPS, and check the anticipated battery lifetime. (Action Dave).
A secondary power supply is at present available which is used during the annual emergency power off exercise.
ST will review the complete power supply and distribution situation in B.513, and make proposals and cost estimates for evolving this to satisfy the medium and long-term requirements. (Action: Anne).
Cooling
ST will review the cooling situation in B.513, and make proposals and cost estimates for evolving this to satisfy the medium and long-term requirements. (Action: Anne).
Smoke detection
There is a working group with representatives of ST, TIS and IT looking into the smoke detection situation in the computer room. This should cover also the areas in the basement in which it is intended to install computing equipment.
The strategy for system packaging must be addressed. Currently, systems are stacked in open racks, and the smoke detection is performed at the level of the room. This makes it difficult/impossible to consider automatically cutting power. Compartmentalisation should be studied (such as packaging systems in closed racks with local smoke detection). The strategy adopted by other major computer centres should also be studied.
(Action: Dave - report on smoke detection working group; organise visits to other computer centres; Mario - investigate closed rack solutions).
Building 186 Annexe
{The following information was established after the meeting.} The status of this annexe is that a pre-study has been carried out by EP/SMI group (Alan Ball) in order to establish cost estimates, and budget approval within EP Division is being sought. The project has not yet been discussed with ST.
5. Next Meeting
The next meeting will take place on 1 February 2000 at 10:00 in B.513-2.023
A few thoughts on the issue of infrastructure for building 513
Dave Underhill
Power
What will be the power requirement?
What infrastructure would need changing?
Do we need to replace what we have or can we build on top of the existing equipment?
How to distribute power within the building?
Current power distribution does not give flexibility.
Must know from where each piece of equipment is powered.
What its requirements are and how it can be switched off/on.
Should be able to control power to equipment remotely and under process control.
Can we profit from equipment with dual power connections?
CERN backup supply via diesels?
Secondary source for EPO testing?
What are the power needs for cooling?
UPS
What will be the need?
Does all the equipment need protection? I would presume yes.
Should it protect the cooling as well?
How long do we need to maintain power?
Enough to overcome power spikes and variations.
Enough to allow CERN backup service (diesels?) to take the load.
Enough to be able to shutdown equipment.
Could be fast if automated from process control or remote.
Very slow if manual.
Current batteries need replacing soon.
Currently 3 * 400kva modules with fourth on site but without batteries.
Cooling
Currently cool air forced down from the ceiling with free flow throughout the room.
Free and mixed cooling from Autumn to Spring.
Should we go for closed in racks with built in cooling?
Which is more efficient?
Smoke
Currently detected as air flows through tubes spread across the false ceiling.
Detects smoke in the room but not necessarily from where in the room.
Just one monitor at present, although we have offer for a second.
Smoke detection in room would imply EPO of the room
Should we go for localised smoke detection equipment?
Incorporated in closed equipment racks?
Proximity detectors like those for the STK silos?
Should detection invoke automatic action?
Rundown / power off / extinction
The Pompiers are just 3 minutes away.
Racking
Current racking designed for handling equipment from floor level.
But it’s a high room so why not go higher and access from steps?
How often will installed equipment need to be accessed manually?
Build a gallery with walkway?
How about cabling and are cable lengths an issue?
Will the false floor support it?