The Reliability Manager – Boiler and Thermal Systems is responsible for developing and implementing reliability strategies to ensure the safe, efficient, and continuous operation of boiler systems and thermal energy equipment. This role leads reliability-centered maintenance (RCM) programs, condition monitoring efforts, and root cause analyses to drive asset performance, minimize downtime, and extend equipment life. The Reliability Manager serves as the technical authority for mechanical reliability issues across all steam and thermal system assets.
Develop and implement reliability plans for boilers, heat exchangers, superheaters, economizers, piping systems, and auxiliary thermal equipment.
Integrate reliability engineering best practices into maintenance programs and capital project designs.
Lead development of preventive, predictive, and condition-based maintenance (CBM) schedules tailored to critical thermal equipment.
Work with CMMS (e.g., SAP, Maximo) to track performance, failure history, and maintenance efficiency.
Conduct in-depth failure investigations for unplanned outages and equipment failures.
Apply structured problem-solving methodologies (e.g., 5 Whys, Fishbone, FMEA) and implement corrective actions.
Oversee the implementation of advanced inspection and monitoring techniques (e.g., infrared thermography, ultrasonic testing, vibration analysis).
Manage boiler inspections (NDT, wall thickness, corrosion mapping) and ensure code compliance.
Monitor and benchmark thermal system KPIs such as availability, reliability (MTBF), maintainability (MTTR), and OEE.
Recommend improvements based on trend analysis and reliability data.
Ensure boiler and thermal systems adhere to relevant safety standards and regulatory requirements (e.g., ASME BPVC, API 579, OSHA, NFPA).
Support Process Safety Management (PSM) and Reliability Integrity Program (RIP) audits.
Coordinate with engineering, maintenance, operations, and safety teams to implement reliability initiatives.
Train maintenance and operations staff on reliability practices and thermal system fundamentals.
Support capital planning, asset replacement, and refurbishment strategies for aging boiler systems.
Collaborate with procurement and engineering for reliability-focused equipment selection and vendor evaluation.
Boiler System Reliability Improvement:
Minimize unplanned outages and failures through data-driven reliability programs and lifecycle engineering.
Thermal Efficiency and Asset Health Monitoring:
Monitor and enhance thermal equipment performance using digital tools, real-time sensors, and diagnostics.
Failure Mode Analysis & Preventive Measures:
Identify critical failure modes in boiler and thermal systems and proactively mitigate through design or maintenance.
Compliance with Pressure Equipment Codes:
Ensure ongoing certification and safety of pressure-bound systems in accordance with ASME and jurisdictional codes.
Downtime Reduction and Uptime Maximization:
Use condition monitoring and reliability analytics to increase plant availability and reduce mean time to repair (MTTR).
Maintenance Strategy Alignment:
Align reliability and maintenance efforts with operational goals, safety standards, and energy efficiency targets.
Bachelor’s degree in Mechanical Engineering or Reliability Engineering (Master’s preferred).
Certified Reliability Engineer (CRE) or equivalent credential strongly preferred.
8–12 years of experience in reliability or maintenance engineering with a strong focus on boilers, steam systems, and thermal utilities.
Strong knowledge of ASME Boiler Codes, API, NFPA, and other applicable standards.
Proficiency in CMMS software, reliability modeling tools, and condition monitoring technologies.
Excellent leadership, analytical, and communication skills.