|
Power
is the Lifeblood
By Peter Curtis
Continuous, clean, and uninterrupted power is the lifeblood of
any data center, especially one that operates 24 hours a day,
7 days a week. Critical enterprise power is the power without
which an organization would quickly fail to achieve business
objectives. Today more than ever, enterprises of all types and
sizes demand 24/7 system availability, regardless of the technological
sophistication of the equipment or the demands placed upon that
equipment. Business losses due to downtime alone total billions
of dollars a year globally. These developments have led to the
creation of the Mission Critical Industry.
Critical industries must constantly and systematically evaluate
their Mission Critical systems, assess and reassess their risk
tolerance versus the cost of downtime, and plan for future upgrades
in equipment and services to ensure uninterrupted power supply
in the years ahead. Simply put, minimizing unplanned downtime reduces
risk.
Providing continuous operation under all foreseeable risks of
failure such as power outages, equipment breakdown, and internal
fires requires use of modern design techniques. These techniques
include redundant systems and components, standby power generation,
fuel systems, automatic transfer and static switches, pure power
quality, UPS systems, cooling systems, raised access floors, and
fire protection, as well as use of Probability Risk Analysis modeling
software to predict potential future outages and develop maintenance
and upgrade action plans for all major systems.
Also vital to the facilities life cycle is good communication
between upper management and facilities management. Only when both
ends fully appreciate the role of the design, maintenance, and
operation of the critical infrastructure (including the potential
risk of downtime and recovery time) can they fund and implement
an effective plan.
Risk Assessment
Critical industries require an extraordinary degree of planning
and evaluation. The first step is to identify the costs and risks
of downtime. The Mission Critical infrastructure must change
over time to support the company’s continuous growth. Routine
maintenance and upgrading equipment alone will not ensure continuous
power. Employing new methods of distributing critical power,
understanding capital constraints, and developing processes that
minimize human error will improve recovery time should critical
systems be hit by base-building failures.
Probability
Risk Assessment (PRA) addresses the hazards
of data center uptime. PRA looks at the probability of failure
of each type of electrical power equipment and helps predict
availability, number of failures per year, and annual downtime.
PRA also facilitates assessing these important steps:
- Engineering and Design,
- Project Management,
- Testing and Commissioning,
- Documentation,
- Education and Training,
- Operation and Maintenance,
- Employee Certification,
- Risk Indicators related to ignoring Facility Life Cycle Process,
and
- Standard and Benchmarking.
Capital Costs vs. Operation Costs
Many organizations associate disaster recovery and business continuity
with IT and communications functions, missing other critical areas
that can seriously affect their businesses. Power outages affect
employees, facilities, power, customer service, billing, and customer
and public relations. All areas require a well thought out strategy
based on recovery time objectives, cost, and profitability impact
that considers the following:
- maximum allowable delay prior to initiation of the recovery
process;
- timeframe required to execute the recovery process once it
begins;
- minimum computer configuration required to process critical
applications;
- minimum communication device and backup circuits for critical
applications;
- minimum space requirements for essential staff members and
equipment;
- total cost involved in the recovery process; and
- total loss as a result of down time.
Change Management
All companies need an Emergency Preparedness Plan. Sometimes,
backup power fails to operate, fails to initiate power generation,
or experiences a mechanical failure or exhaustion of fuel supply.
A thorough, problem-tracking system and a strong change management
system are essential.
Changing management is a process that crosses departments and
must be coordinated and used by all participants in order to work
effectively. Management must plan for the future and make decisions
to support the anticipated needs of the organization, especially
under emergency situations.
Testing and Commissioning
Facilities engineers must work with the factory, field engineers,
and independent test consultants to coordinate testing and calibration.
Critical circuit breakers must be tested and calibrated prior to
receiving any critical electrical load. The objective is easy to
define: maintain a high level of safety and reliability from equipment,
components, and systems. Maintenance programs should be continuously
improving. In the Mission Critical industry, most maintenance personnel
perform maintenance without reviewing prior maintenance records.
The Mission Critical industry’s focus on physical enhancements
stems from its early history, when companies forgot about Mission
Critical after the design and construction phase.
About 25 years ago, when the Mission Critical Facility Engineering
Industry was in its infancy, the technology was simple. However,
as more computer hardware occupied the data center, the design
of the electrical and mechanical systems supporting the electrical
load became more complicated, as did the business applications.
As businesses increasingly rely on this infrastructure, companies
invest more capital dollars to improve its uptime. The Mission
Critical industry can no longer manage their critical system as
it once did. Today, the sophistication of the data center infrastructure
necessitates perpetual documentation refreshing. Surprisingly,
human factors are perhaps the most poorly understood aspect of
process safety and reliability management.
Balancing system design and training operating staff cost effectively
is essential to critical infrastructure planning. When designing
a mission critical facility, the ease of maintainability is vital.
A recipe for human error exists when systems are complex, especially
if key system operators and documentation of Emergency Action Procedures
(EAP) and Standard Operating Procedures (SOP) are not immediately
available. A simple electrical system design allows for quicker
and easier troubleshooting. In addition, the design process must
include a detailed budgeting and auditing plan.
Companies should take every opportunity to perform preventive
maintenance thoroughly and completely, especially in Mission Critical
facilities. If not, the next opportunity will come at a much higher
price: downtime and lost current business and potential clients,
not to mention the safety issues that arise when technicians rush
to fix a maintenance problem. Companies should do it correctly
ahead of time, without shortcuts. Plug and play does not work with
critical systems.
Education and Training
Despite attaining high levels of technological standards in the
Mission Critical industry, most of today’s financial resources
remain allocated for planning, engineering, equipment procurement,
project management, and continued research and development. The
diversity among Mission Critical systems hinders people’s
ability to fully understand and master all necessary equipment
and information.
Operation and Maintenance
Reliability and facility infrastructure health is not guaranteed
simply by investing and installing new equipment. Unexpected
failures can compromise even the most robust facility infrastructure
without appropriate testing and maintenance. How can the data
processing or facility manager ensure that the critical system
is as reliable as possible? Planning and impact assessment, engineering
and design, project management, testing and commissioning, documentation,
staff education and training and, operations and maintenance
are vital. Elimination of any one of these steps will reduce
reliability and business resiliency severely.
When building a data processing center in an existing building,
a competent engineer will make the most of the existing electrical
systems. A company should only hire electrical contractors who
are experienced in data processing installations and should have
experienced testing firms inspect and integrate all equipment,
test all circuit breakers, and use thermal-scan equipment to find “hot
spots” due to improper connections or faulty equipment. Finally
companies should plan for routine shutdowns of the facility to
perform preventive maintenance on electrical equipment. These initial
decisions will determine the systems’ ultimate reliability,
as well as the ease of system maintenance.
In soliciting testing firms, a company should ask for sample reports,
testing procedures, and references. Seek experienced professionals
from within and outside the company: information systems, property
and operations managers, space planners, and the best consultants
in the industry for all engineering disciplines. The bottom line
is to have proven organizations working on the project.
Employee Certification
Technology is driving itself faster than ever. Companies continue
to make large investments in new technologies to keep up to date.
While technology can now solve the technical problems of linkages
and equipment interaction, it will not do so without well trained
personnel. Employee training and certification is crucial not
only to keep up with advanced technology, but also to promote
quick emergency response.
Standards and Benchmarking
Benchmarking is a process driven by the participants whose goal
is to improve their organization. It teaches participants about
successful practices in other organizations and allows them to
draw on those cases to develop solutions for their own organizations.
True process benchmarking identifies the “hows” and “whys” for
performance gaps and helps organizations learn enhance their practice.
Remember if a company can’t find the time to do it right
the first time… when will it find the time to do it over?
And how much will it cost the organization in business due to the
loss of downtime? These days, Mission Critical is an organization’s
lifeblood. And facilities management are the organization’s
medical professionals.
Peter M. Curtis is the founder of Power Management Concepts,
LLC, based in New York. Curtis received a
Master of Science degree in Energy Management from New York
Institute of Technology in 1994. He is a 1983 graduate of New
York Institute of Technology with a Bachelor of Science degree
in Electro-Mechanical Computer Technology. He has over 20 years
of experience working in the Mission Critical Facilities Engineering
industry in the areas of Banking and Finance, Defense, Electric
and Water Utilities, Energy Management and Education. His strengths
include in-depth knowledge of computer-integrated systems and
extensive utilization of on-line interface and facilities operations-maintenance
management expertise. For more information, please visit www.powermanage.com
|