McMorrow Report Home


Power is the Lifeblood

By Peter Curtis

Continuous, clean, and uninterrupted power is the lifeblood of any data center, especially one that operates 24 hours a day, 7 days a week. Critical enterprise power is the power without which an organization would quickly fail to achieve business objectives. Today more than ever, enterprises of all types and sizes demand 24/7 system availability, regardless of the technological sophistication of the equipment or the demands placed upon that equipment. Business losses due to downtime alone total billions of dollars a year globally. These developments have led to the creation of the Mission Critical Industry.

Critical industries must constantly and systematically evaluate their Mission Critical systems, assess and reassess their risk tolerance versus the cost of downtime, and plan for future upgrades in equipment and services to ensure uninterrupted power supply in the years ahead. Simply put, minimizing unplanned downtime reduces risk.

Providing continuous operation under all foreseeable risks of failure such as power outages, equipment breakdown, and internal fires requires use of modern design techniques. These techniques include redundant systems and components, standby power generation, fuel systems, automatic transfer and static switches, pure power quality, UPS systems, cooling systems, raised access floors, and fire protection, as well as use of Probability Risk Analysis modeling software to predict potential future outages and develop maintenance and upgrade action plans for all major systems.

Also vital to the facilities life cycle is good communication between upper management and facilities management. Only when both ends fully appreciate the role of the design, maintenance, and operation of the critical infrastructure (including the potential risk of downtime and recovery time) can they fund and implement an effective plan.

Risk Assessment

Critical industries require an extraordinary degree of planning and evaluation. The first step is to identify the costs and risks of downtime. The Mission Critical infrastructure must change over time to support the company’s continuous growth. Routine maintenance and upgrading equipment alone will not ensure continuous power. Employing new methods of distributing critical power, understanding capital constraints, and developing processes that minimize human error will improve recovery time should critical systems be hit by base-building failures.

Probability Risk Assessment (PRA) addresses the hazards of data center uptime. PRA looks at the probability of failure of each type of electrical power equipment and helps predict availability, number of failures per year, and annual downtime. PRA also facilitates assessing these important steps:

  • Engineering and Design,
  • Project Management,
  • Testing and Commissioning,
  • Documentation,
  • Education and Training,
  • Operation and Maintenance,
  • Employee Certification,
  • Risk Indicators related to ignoring Facility Life Cycle Process, and
  • Standard and Benchmarking.
Capital Costs vs. Operation Costs

Many organizations associate disaster recovery and business continuity with IT and communications functions, missing other critical areas that can seriously affect their businesses. Power outages affect employees, facilities, power, customer service, billing, and customer and public relations. All areas require a well thought out strategy based on recovery time objectives, cost, and profitability impact that considers the following:

  • maximum allowable delay prior to initiation of the recovery process;
  • timeframe required to execute the recovery process once it begins;
  • minimum computer configuration required to process critical applications;
  • minimum communication device and backup circuits for critical applications;         
  • minimum space requirements for essential staff members and equipment;
  • total cost involved in the recovery process; and
  • total loss as a result of down time.
Change Management

All companies need an Emergency Preparedness Plan. Sometimes, backup power fails to operate, fails to initiate power generation, or experiences a mechanical failure or exhaustion of fuel supply. A thorough, problem-tracking system and a strong change management system are essential.

Changing management is a process that crosses departments and must be coordinated and used by all participants in order to work effectively. Management must plan for the future and make decisions to support the anticipated needs of the organization, especially under emergency situations.

Testing and Commissioning

Facilities engineers must work with the factory, field engineers, and independent test consultants to coordinate testing and calibration. Critical circuit breakers must be tested and calibrated prior to receiving any critical electrical load. The objective is easy to define: maintain a high level of safety and reliability from equipment, components, and systems. Maintenance programs should be continuously improving. In the Mission Critical industry, most maintenance personnel perform maintenance without reviewing prior maintenance records. The Mission Critical industry’s focus on physical enhancements stems from its early history, when companies forgot about Mission Critical after the design and construction phase.

About 25 years ago, when the Mission Critical Facility Engineering Industry was in its infancy, the technology was simple. However, as more computer hardware occupied the data center, the design of the electrical and mechanical systems supporting the electrical load became more complicated, as did the business applications. As businesses increasingly rely on this infrastructure, companies invest more capital dollars to improve its uptime. The Mission Critical industry can no longer manage their critical system as it once did. Today, the sophistication of the data center infrastructure necessitates perpetual documentation refreshing. Surprisingly, human factors are perhaps the most poorly understood aspect of process safety and reliability management.

Balancing system design and training operating staff cost effectively is essential to critical infrastructure planning. When designing a mission critical facility, the ease of maintainability is vital. A recipe for human error exists when systems are complex, especially if key system operators and documentation of Emergency Action Procedures (EAP) and Standard Operating Procedures (SOP) are not immediately available. A simple electrical system design allows for quicker and easier troubleshooting. In addition, the design process must include a detailed budgeting and auditing plan.

Companies should take every opportunity to perform preventive maintenance thoroughly and completely, especially in Mission Critical facilities. If not, the next opportunity will come at a much higher price: downtime and lost current business and potential clients, not to mention the safety issues that arise when technicians rush to fix a maintenance problem. Companies should do it correctly ahead of time, without shortcuts. Plug and play does not work with critical systems.

Education and Training

Despite attaining high levels of technological standards in the Mission Critical industry, most of today’s financial resources remain allocated for planning, engineering, equipment procurement, project management, and continued research and development. The diversity among Mission Critical systems hinders people’s ability to fully understand and master all necessary equipment and information.

Operation and Maintenance

Reliability and facility infrastructure health is not guaranteed simply by investing and installing new equipment. Unexpected failures can compromise even the most robust facility infrastructure without appropriate testing and maintenance. How can the data processing or facility manager ensure that the critical system is as reliable as possible? Planning and impact assessment, engineering and design, project management, testing and commissioning, documentation, staff education and training and, operations and maintenance are vital. Elimination of any one of these steps will reduce reliability and business resiliency severely. 

When building a data processing center in an existing building, a competent engineer will make the most of the existing electrical systems. A company should only hire electrical contractors who are experienced in data processing installations and should have experienced testing firms inspect and integrate all equipment, test all circuit breakers, and use thermal-scan equipment to find “hot spots” due to improper connections or faulty equipment. Finally companies should plan for routine shutdowns of the facility to perform preventive maintenance on electrical equipment. These initial decisions will determine the systems’ ultimate reliability, as well as the ease of system maintenance.

In soliciting testing firms, a company should ask for sample reports, testing procedures, and references. Seek experienced professionals from within and outside the company: information systems, property and operations managers, space planners, and the best consultants in the industry for all engineering disciplines. The bottom line is to have proven organizations working on the project.

Employee Certification

Technology is driving itself faster than ever. Companies continue to make large investments in new technologies to keep up to date. While technology can now solve the technical problems of linkages and equipment interaction, it will not do so without well trained personnel. Employee training and certification is crucial not only to keep up with advanced technology, but also to promote quick emergency response.

Standards and Benchmarking

Benchmarking is a process driven by the participants whose goal is to improve their organization. It teaches participants about successful practices in other organizations and allows them to draw on those cases to develop solutions for their own organizations. True process benchmarking identifies the “hows” and “whys” for performance gaps and helps organizations learn enhance their practice.

Remember if a company can’t find the time to do it right the first time… when will it find the time to do it over? And how much will it cost the organization in business due to the loss of downtime? These days, Mission Critical is an organization’s lifeblood. And facilities management are the organization’s medical professionals.

Peter M. Curtis is the founder of Power Management Concepts, LLC, based in New York. Curtis received a Master of Science degree in Energy Management from New York Institute of Technology in 1994. He is a 1983 graduate of New York Institute of Technology with a Bachelor of Science degree in Electro-Mechanical Computer Technology. He has over 20 years of experience working in the Mission Critical Facilities Engineering industry in the areas of Banking and Finance, Defense, Electric and Water Utilities, Energy Management and Education. His strengths include in-depth knowledge of computer-integrated systems and extensive utilization of on-line interface and facilities operations-maintenance management expertise. For more information, please visit www.powermanage.com