This job is expired.

Reliability & Monitoring Engineer II

Nextracker

Nashville, TN

Full-Time

Reliability & Monitoring Engineer II

Nextracker

Nashville, TN

Full-Time

Apr 21, 2026

Engineering

Job Description

Job Description:

The Reliability & Monitoring Engineer II is responsible for fleet-level monitoring, incident analysis, and reliability insights for Nextpower-supported utility-scale solar tracker systems. This role provides real-time system visibility, post-event analysis, and actionable intelligence that support rapid recovery and long-term asset reliability, particularly following severe weather and other high-impact events.

This position goes beyond basic monitoring execution. The engineer is expected to own complex investigations, help shape monitoring logic and workflows, and act as a technical leader within the Remote Monitoring Center (RMC). The ideal candidate brings strong experience in robotics, software, APIs, and/or SRE/operations in complex distributed or cyber-physical systems and applies those skills to a new domain (solar and trackers).

Operating within a portfolio-based support model, the Reliability & Monitoring Engineer translates monitoring data into clear technical insights that improve system uptime, inform customer communication, and strengthen long-term asset performance. This is a desk-based role within the NEXTpower organization, focused on proactive monitoring, analytical investigation, and continuous operational improvement, working closely with the U.S. Technical Services organization and the Manager, Remote Monitoring & Asset Resilience (U.S.). The role operates within a structured coverage model in the Remote Monitoring Center, with engineers working staggered shifts to maintain daytime and early evening monitoring coverage and ensure effective handoffs between team members.

Key Objectives

Deliver High-Quality Fleet Monitoring Continuously monitor utility-scale tracker fleets to detect abnormal system behavior, communication failures, and offline assets across customer portfolios, and propose improvements to alerting and workflows.

Lead Incident Analysis & Root Cause Investigation Perform and often lead structured incident analysis and Root Cause Analysis (RCA) for alarms, outages, and post-weather events, producing clear, technically sound findings and recommendations.

Support Technical Services & Customer Communication Provide monitoring-based insights and documentation that enhance Technical Services’ ability to resolve issues quickly, communicate effectively with customers, and support warranty and commercial obligations.

Drive Reliability Insights, Automation & Operational Improvement Identify recurring issues and systemic risks, and contribute to the refinement of monitoring thresholds, alert logic, automation, and operational playbooks that improve asset resilience and reduce manual effort.

Core Responsibilities

Fleet Monitoring & Operational Awareness

· Monitor utility-scale solar tracker fleets using web-based monitoring platforms, including NX Navigator, to maintain real-time awareness of system status.

· Identify abnormal system states, communication failures, and offline assets across assigned customer portfolios, and drive deeper analysis of patterns across multiple sites.

· Support remote operational actions during high-wind and severe weather events, including coordination of tracker stow and recovery activities under the direction of the Manager, Remote Monitoring & Asset Resilience.

· Maintain clear situational awareness across active customer sites, including key alarms, stow states, communication health, and emerging risk signals.

· Log and track monitoring observations, ensuring key events are captured in internal systems and aligned with established RMC workflows and SOPs.

· Provide input into coverage models, alert tuning, and monitoring standards to improve RMC effectiveness and reduce alert fatigue.

Incident Response & Reliability Analysis

· Perform structured Root Cause Analysis (RCA) for system alarms, outages, and post-weather events using operational data, logs, SCADA-like signals, and environmental inputs.

· Correlate tracker behavior, monitoring signals, and weather data to determine probable failure mechanisms and reliability risks.

· Produce clear, technically sound incident summaries and RCA documentation for customers, Technical Services, and internal stakeholders.

· Support warranty-aligned documentation and evidence collection, ensuring events are captured in a way that supports potential warranty claims and risk assessments.

· Participate in and, where appropriate, lead post-event reviews, providing data-driven input on incident timelines, system behavior, and key contributing factors.

· Use experience with software systems, APIs, or robotics/automation to propose more robust detection mechanisms, health checks, or automated validation routines.

Customer & Technical Services Support

· Provide monitoring-based technical analysis to support customer issues managed by the Technical Services team and other customer-facing functions.

· Translate complex system behavior into clear, actionable insights that enable Technical Services to prioritize and execute field or remote actions.

· Ensure that incident records, timelines, and findings meet internal service expectations and quality standards for accuracy, completeness, and clarity.

· Support preparation of materials for customer calls, reports, and follow-ups by supplying data extracts, charts, and concise technical summaries derived from monitoring platforms.

· Act as a trusted technical partner to Technical Services, helping refine what “good” analysis and documentation look like for high-priority incidents.

Performance Trends, Automation & Continuous Improvement

· Identify recurring issues, performance degradation patterns, and systemic reliability risks across the monitored fleet, using both manual analysis and analytical tooling.

· Recommend improvements to monitoring thresholds, alerting logic, and response workflows, helping to reduce false alarms and improve signal-to-noise ratio.

· Use experience with APIs, scripting, and automation (e.g., Python, REST APIs, data pipelines) to suggest or prototype improvements that:

Reduce manual data pulls,

Standardize common analyses, or,

Improve visibility into key reliability indicators.

· Support refinement of monitoring tools, dashboards, and operational playbooks in partnership with the Manager, Remote Monitoring & Asset Resilience and cross-functional stakeholders.

· Participate in pilots or trials of new monitoring features, analytics capabilities, or alert configurations, providing structured feedback on effectiveness and usability.

· Take ownership of at least one improvement area (e.g., a class of alarms, a dashboard, a subset of sites, or a specific reliability theme) and drive it from problem definition through to measurable impact.

Cross-Functional Collaboration & Documentation

· Partner with Engineering, Product, Operations, and Technical Services teams to share monitoring-based field intelligence and support long-term reliability improvements.

· Contribute to the creation and maintenance of SOPs, monitoring playbooks, training materials, and internal knowledge bases used by the Remote Monitoring Center.

· Document findings, workflows, and lessons learned in a clear and reusable format to support team scaling and onboarding.

· Support knowledge sharing and best-practice development within the monitoring and reliability team, including informal coaching or mentoring of other engineers on tools, workflows, and analysis methods.

· Bring a software/robotics/system design perspective into conversations with Product and Engineering, helping to translate field/monitoring signals into concrete product or control-system changes.

Qualifications

· Bachelor’s degree in Engineering, Computer Science, Mechatronics/Robotics, Electrical Engineering, or a related technical field; equivalent relevant experience will be considered.

· 4+ years of experience in reliability engineering, SRE/operations, robotics/automation, fleet monitoring, or operations centers dealing with complex distributed or cyber-physical systems.

· Strong experience with monitoring, automation, or control of complex systems, such as robotics, manufacturing automation, OT/ICS, data centers, or cloud services.

· Prior experience in solar, energy, or grid operations is a plus but not required; must be comfortable learning a new physical domain (PV, trackers, inverters, weather impacts).

· Demonstrated experience performing root cause analysis using operational and monitoring data (metrics, logs, time-series, event histories), including structured post-incident reviews.

· Strong analytical skills with high attention to detail and a structured, data-driven problem-solving approach.

· Clear technical writing skills and the ability to communicate findings to both technical and non-technical audiences, including customers and senior stakeholders.

Skills & Competencies

Technical & Analytical

· Proficiency with web-based monitoring platforms, observability stacks, or fleet analytics tools; familiarity with NX Navigator or similar systems is highly desirable but can be learned.

· Ability to interpret time-series data, alarms, and event logs to diagnose performance and reliability issues across a fleet of assets.

· Strong comfort using tools such as Python, SQL, Excel, or similar analytical tools for data analysis, visualization, and reporting.

· Experience working with APIs and data integration (e.g., REST APIs, webhooks, log/metrics pipelines) to move data between systems or automate routine monitoring tasks.

· Experience with robotics, control systems, or automation (e.g., embedded systems, motion control, industrial protocols) is a strong plus.

· Familiarity with dashboarding and analytical tools such as Power BI and Databricks is a nice to have, particularly for building or interacting with reliability and performance dashboards.

· Understanding of weather-driven operational risk, or demonstrated ability to reason about external risk factors impacting system performance.

Collaboration & Communication

· Strong written and verbal communication skills, with the ability to craft concise incident summaries, RCA documents, and status updates.

· Proven ability to work cross-functionally with Technical Services, Engineering, Product, and Operations teams, often across time zones.

· Customer- and stakeholder-focused mindset, ensuring information is accurate, timely, and tailored to audience needs.

· Ability to influence and drive adoption of improved monitoring practices and standards, even without formal people management responsibilities.

Execution & Operational Discipline

· Strong organizational skills with the ability to prioritize and manage multiple events and monitoring tasks concurrently in an incident-driven environment.

· Reliability and consistency in following established SOPs, workflows, and documentation standards, while also identifying where they should evolve.

· Adaptability to evolving operational needs, portfolio growth, and changes in monitoring tools or processes.

· Comfort operating in a fast-paced environment that may require occasional support during off-hours events as required by coverage models.

· Demonstrated ownership mindset—takes initiative to identify problems, propose solutions, and follow through to implementation and measurement.

Nextpower offers a comprehensive benefits package. We provide health care coverage, dental and vision, 401(K) participation including company matching, company paid holidays with unlimited paid time off, generous discretionary company bonuses, life and disability protection and more. Employees in certain positions may be eligible for stock compensation. All plans are in accordance with relevant plan documents. For more information on Nextpowers benefits please view our company website at www.Nextpower.com.

Pay is based on market location and may vary based on factors including experience, skills, education and other job-related reasons. The annual salary range for this position is $985,000.00-$100,000.00. (Applicable to California)

At Nextpower, we are driving the global energy transition with an integrated clean energy technology platform that combines intelligent structural, electrical, and digital solutions for utility-scale power plants. Our comprehensive portfolio enables faster project delivery, higher performance, and greater reliability, helping our customers capture the full value of solar power. Our talented worldwide teams are redefining how solar power plants are designed, built, and operated every day with smart technology, data-driven insights, and advanced automation. Together, we’re building the foundation for the world’s next generation of clean energy infrastructure.

Nextpower is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

We are Nextpower

Job Description:

Key Objectives

Core Responsibilities

Fleet Monitoring & Operational Awareness

· Monitor utility-scale solar tracker fleets using web-based monitoring platforms, including NX Navigator, to maintain real-time awareness of system status.

· Identify abnormal system states, communication failures, and offline assets across assigned customer portfolios, and drive deeper analysis of patterns across multiple sites.

· Maintain clear situational awareness across active customer sites, including key alarms, stow states, communication health, and emerging risk signals.

· Log and track monitoring observations, ensuring key events are captured in internal systems and aligned with established RMC workflows and SOPs.

· Provide input into coverage models, alert tuning, and monitoring standards to improve RMC effectiveness and reduce alert fatigue.

Incident Response & Reliability Analysis

· Perform structured Root Cause Analysis (RCA) for system alarms, outages, and post-weather events using operational data, logs, SCADA-like signals, and environmental inputs.

· Correlate tracker behavior, monitoring signals, and weather data to determine probable failure mechanisms and reliability risks.

· Produce clear, technically sound incident summaries and RCA documentation for customers, Technical Services, and internal stakeholders.

· Support warranty-aligned documentation and evidence collection, ensuring events are captured in a way that supports potential warranty claims and risk assessments.

· Participate in and, where appropriate, lead post-event reviews, providing data-driven input on incident timelines, system behavior, and key contributing factors.

· Use experience with software systems, APIs, or robotics/automation to propose more robust detection mechanisms, health checks, or automated validation routines.

Customer & Technical Services Support

· Provide monitoring-based technical analysis to support customer issues managed by the Technical Services team and other customer-facing functions.

· Translate complex system behavior into clear, actionable insights that enable Technical Services to prioritize and execute field or remote actions.

· Ensure that incident records, timelines, and findings meet internal service expectations and quality standards for accuracy, completeness, and clarity.

· Support preparation of materials for customer calls, reports, and follow-ups by supplying data extracts, charts, and concise technical summaries derived from monitoring platforms.

· Act as a trusted technical partner to Technical Services, helping refine what “good” analysis and documentation look like for high-priority incidents.

Performance Trends, Automation & Continuous Improvement

· Identify recurring issues, performance degradation patterns, and systemic reliability risks across the monitored fleet, using both manual analysis and analytical tooling.

· Recommend improvements to monitoring thresholds, alerting logic, and response workflows, helping to reduce false alarms and improve signal-to-noise ratio.

· Use experience with APIs, scripting, and automation (e.g., Python, REST APIs, data pipelines) to suggest or prototype improvements that:

Reduce manual data pulls,

Standardize common analyses, or,

Improve visibility into key reliability indicators.

· Support refinement of monitoring tools, dashboards, and operational playbooks in partnership with the Manager, Remote Monitoring & Asset Resilience and cross-functional stakeholders.

· Participate in pilots or trials of new monitoring features, analytics capabilities, or alert configurations, providing structured feedback on effectiveness and usability.

Cross-Functional Collaboration & Documentation

· Partner with Engineering, Product, Operations, and Technical Services teams to share monitoring-based field intelligence and support long-term reliability improvements.

· Contribute to the creation and maintenance of SOPs, monitoring playbooks, training materials, and internal knowledge bases used by the Remote Monitoring Center.

· Document findings, workflows, and lessons learned in a clear and reusable format to support team scaling and onboarding.

· Bring a software/robotics/system design perspective into conversations with Product and Engineering, helping to translate field/monitoring signals into concrete product or control-system changes.

Qualifications

· Bachelor’s degree in Engineering, Computer Science, Mechatronics/Robotics, Electrical Engineering, or a related technical field; equivalent relevant experience will be considered.

· 4+ years of experience in reliability engineering, SRE/operations, robotics/automation, fleet monitoring, or operations centers dealing with complex distributed or cyber-physical systems.

· Strong experience with monitoring, automation, or control of complex systems, such as robotics, manufacturing automation, OT/ICS, data centers, or cloud services.

· Prior experience in solar, energy, or grid operations is a plus but not required; must be comfortable learning a new physical domain (PV, trackers, inverters, weather impacts).

· Demonstrated experience performing root cause analysis using operational and monitoring data (metrics, logs, time-series, event histories), including structured post-incident reviews.

· Strong analytical skills with high attention to detail and a structured, data-driven problem-solving approach.

· Clear technical writing skills and the ability to communicate findings to both technical and non-technical audiences, including customers and senior stakeholders.

Skills & Competencies

Technical & Analytical

· Proficiency with web-based monitoring platforms, observability stacks, or fleet analytics tools; familiarity with NX Navigator or similar systems is highly desirable but can be learned.

· Ability to interpret time-series data, alarms, and event logs to diagnose performance and reliability issues across a fleet of assets.

· Strong comfort using tools such as Python, SQL, Excel, or similar analytical tools for data analysis, visualization, and reporting.

· Experience working with APIs and data integration (e.g., REST APIs, webhooks, log/metrics pipelines) to move data between systems or automate routine monitoring tasks.

· Experience with robotics, control systems, or automation (e.g., embedded systems, motion control, industrial protocols) is a strong plus.

· Familiarity with dashboarding and analytical tools such as Power BI and Databricks is a nice to have, particularly for building or interacting with reliability and performance dashboards.

· Understanding of weather-driven operational risk, or demonstrated ability to reason about external risk factors impacting system performance.

Collaboration & Communication

· Strong written and verbal communication skills, with the ability to craft concise incident summaries, RCA documents, and status updates.

· Proven ability to work cross-functionally with Technical Services, Engineering, Product, and Operations teams, often across time zones.

· Customer- and stakeholder-focused mindset, ensuring information is accurate, timely, and tailored to audience needs.

· Ability to influence and drive adoption of improved monitoring practices and standards, even without formal people management responsibilities.

Execution & Operational Discipline

· Strong organizational skills with the ability to prioritize and manage multiple events and monitoring tasks concurrently in an incident-driven environment.

· Reliability and consistency in following established SOPs, workflows, and documentation standards, while also identifying where they should evolve.

· Adaptability to evolving operational needs, portfolio growth, and changes in monitoring tools or processes.

· Comfort operating in a fast-paced environment that may require occasional support during off-hours events as required by coverage models.

· Demonstrated ownership mindset—takes initiative to identify problems, propose solutions, and follow through to implementation and measurement.

Nextpower is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

We are Nextpower

About Nextracker

Nextracker, a Flex company, provides intelligent solar tracker solutions for utility-scale and distributed generation projects that transform PV plant performance with advanced data monitoring and control software and global services.

Nextracker has been the number-one global market-share solar tracker company for several years running, according to research firm WoodMackenzie. We have delivered or fulfilled more than 50 GW of smart solar trackers for projects on five continents, including some of the largest solar farms in the world. Our TrueCapture and NX Navigator smart monitoring and control software platforms have revolutionized tracker performance, and represent our commitment to continuous innovation.

We’re creative, collaborative, committed problem-solvers from diverse backgrounds. And we’re on a mission to be the world’s leading energy solutions company delivering the most intelligent, reliable, and productive solar power for future generations. Headquartered in the San Francisco Bay Area, Nextracker has offices in Australia, Brazil, China, India, Mexico, and Spain.

Related Jobs

Youth Development Specialist - Relocation to Hershey, PA Required

Milton Hershey School

Description: Located in Hershey, PA, Milton Hershey School (MHS) is a top-notch home and school where over 2,200 pre-K through 12th grade students from disadvantaged backgrounds are provided an extrao...

Apr 30, 2026 fair oaks ranch, tx

Youth Development Specialist - Relocation to Hershey, PA Required

Milton Hershey School

Apr 30, 2026 fairmount, ga

Youth Development Specialist - Relocation to Hershey, PA Required

Milton Hershey School

Apr 30, 2026 absecon, nj

Youth Development Specialist - Relocation to Hershey, PA Required

Milton Hershey School

Apr 30, 2026 college park, ga

Youth Development Specialist - Relocation to Hershey, PA Required

Milton Hershey School

Apr 30, 2026 brunswick, md

Youth Development Specialist - Relocation to Hershey, PA Required

Milton Hershey School

Apr 30, 2026 la porte, tx

Apply For This Job

Reliability & Monitoring Engineer II

Nextracker

Nashville, TN

Apr 21, 2026

Full-time

Your Information

First Name *

Last Name *

Email Address *

This email belongs to another account. Please use a diferent email address or Sign In.

Zip Code *

Password *

Confirm Password *

Which groups do you identify with?

Veteran
Hispanic
Black or African-American
Woman
LGBTQ+
Asian
Disabled
Other / Choose not to identify

Create your Profile from your Resume

Resume

Allow employers to search for my resume

Job is Expired