Remaining Useful Life Prediction

EE 541: A Computational Introduction to Deep Learning — Final Project

Introduction

Remaining useful life (RUL) prediction estimates how many operational cycles remain before a machine fails. This is central to predictive maintenance—scheduling repairs before failure occurs rather than reacting to breakdowns or replacing components on fixed schedules. The challenge lies in learning degradation patterns from multivariate sensor data where failure modes are complex and equipment operates under varying conditions.

Dataset

The NASA Turbofan Engine Degradation Simulation Dataset (C-MAPSS) contains run-to-failure data from turbofan engines. The dataset includes four subsets (FD001, FD002, FD003, FD004) with increasing complexity based on operating conditions and failure modes.

FD001 (Recommended Starting Point):

  • 100 engines for training, 100 for testing
  • Single operating condition
  • Single failure mode
  • 21 sensor measurements per timestep
  • Run-to-failure trajectories of varying length (128-362 cycles)

Each engine runs until failure in the training set, providing complete degradation trajectories. The test set provides partial trajectories and you must predict RUL at the final observed timestep.

Sensor Measurements Include:

  • Temperatures (various locations in the engine)
  • Pressures (fan, bypass, core stages)
  • Fan and core speeds
  • Fuel flow and bleed enthalpy
  • Other operational parameters

Dataset Access: https://data.nasa.gov/dataset/cmapss-jet-engine-simulated-data

The data is provided as text files with space-separated values. Each row represents one timestep with engine ID, cycle number, three operational settings, and 21 sensor readings.

Problem Statement

Build a deep learning system that predicts remaining useful life given current and historical sensor readings. This is a regression problem where the input is a time-series window of multivariate sensor data and the output is a scalar representing remaining operational cycles.

Alternative Problem Formulations

Binary Health Classification: Instead of predicting exact RUL, classify engines as healthy or degraded (likely to fail within N cycles). This simplifies the problem to binary classification and may be more robust when exact failure timing is uncertain. Choose a threshold (e.g., 30 cycles before failure) to define the degradation boundary.

Multi-Horizon Prediction: Predict RUL at multiple future timesteps simultaneously (e.g., RUL at current time, 10 cycles ahead, 20 cycles ahead). This multi-task formulation tests whether the model learns consistent degradation trajectories.

Anomaly Detection: Frame the problem as detecting when sensor readings deviate from healthy operation patterns. Train only on early-life data (first 50% of each engine’s life) and detect anomalies in later cycles. This tests unsupervised learning of degradation.

Time-Series Data and Temporal Windows

Unlike static classification problems, RUL prediction requires understanding how sensor values evolve over time. An engine’s current state depends on its degradation history, not just current sensor readings.

Temporal Windows

A common approach is sliding windows over the time-series. For an engine at cycle \(t\), use the previous \(w\) cycles as input: sensor readings from cycles \([t-w+1, \ldots, t]\). This creates a fixed-size input regardless of total engine life.

For example, with window size \(w=50\) and 21 sensors, the input is a \(50 \times 21\) matrix. This can be treated as an “image” for 2D convolution or processed with 1D convolutions over the time dimension.

The target RUL at cycle \(t\) is the number of cycles until failure:

\[ \text{RUL}(t) = t_{\text{failure}} - t \]

where \(t_{\text{failure}}\) is the cycle when the engine fails.

Early-Life Considerations

Engines show minimal degradation early in life—sensors remain stable and RUL is simply total lifetime. Some approaches clip RUL to a maximum value (e.g., 125 cycles) for early cycles to focus learning on the degradation phase. This prevents the model from learning trivial patterns in healthy operation.

Suggested Approach

Data Preprocessing: Sensor readings have different scales and units. Normalization (zero mean, unit variance) for each sensor ensures no sensor dominates due to scale. Some sensors remain constant or have minimal variation—removing these reduces dimensionality without losing information.

Window Selection: Choose a window size that captures degradation patterns without being computationally prohibitive. Longer windows provide more context but increase memory usage and sequence length. Experiment with windows from 30-50 cycles as a starting point.

Data Augmentation: Add small Gaussian noise to sensor readings to improve robustness. Time-shift windows (using cycles \([t-w, \ldots, t-1]\) instead of \([t-w+1, \ldots, t]\)) creates additional training samples from the same engine trajectory.

Architecture Considerations: Time-series data can be processed as 2D inputs (time × sensors) with 2D convolutions. Alternatively, 1D convolutions over the time dimension with multiple channels (one per sensor) capture temporal patterns. Pooling reduces temporal dimension while preserving learned features.

Evaluation Metrics: Root mean squared error (RMSE) measures prediction accuracy:

\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(\text{RUL}_{\text{true},i} - \text{RUL}_{\text{pred},i})^2} \]

Mean absolute error (MAE) is less sensitive to outliers:

\[ \text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|\text{RUL}_{\text{true},i} - \text{RUL}_{\text{pred},i}| \]

NASA also defines a scoring function that penalizes late predictions (predicting failure too late) more heavily than early predictions:

\[ s = \sum_{i=1}^{n} \begin{cases} e^{-d_i/13} - 1 & \text{if } d_i < 0 \\ e^{d_i/10} - 1 & \text{if } d_i \geq 0 \end{cases} \]

where \(d_i = \text{RUL}_{\text{pred},i} - \text{RUL}_{\text{true},i}\).

Dataset Considerations

Engine-Level Variation: Each engine has a unique degradation trajectory even under identical operating conditions. Some engines fail after 150 cycles, others after 350 cycles. Your model must generalize across this variability.

Sensors and Noise: Sensor measurements contain noise from instrument error and environmental factors. Some variation is measurement noise, not degradation signal. Distinguish between noise and meaningful trends.

Operating Conditions: FD001 has single operating conditions, but FD002-FD004 include multiple flight regimes. Sensor readings depend on both degradation and current operating mode. Normalization or operating-condition-specific models may be necessary for complex datasets.

Test Set Structure: The test set provides partial trajectories—engines that haven’t failed yet. You predict RUL at the last observed cycle. True RUL values are provided separately for evaluation.

Technical Notes

Computational Requirements: Time-series windows create moderate-sized inputs (e.g., \(50 \times 21\) for 50-cycle windows). Training on 100 engines with varying lengths produces thousands of windowed samples. Pre-processing and caching windowed data avoids repeated computation.

Sequence Length: Engines run for 100-400 cycles, producing many overlapping windows. A 200-cycle engine with window size 50 yields 150 training samples (one per timestep after the first 50 cycles). This creates substantial training data from relatively few engines.

Regression Output: Unlike classification, regression outputs are unbounded continuous values. Ensure the output layer uses a linear activation (or ReLU if RUL must be non-negative). The network must learn the scale of RUL values (tens to hundreds of cycles).

Expected Outcomes

Your analysis should examine how prediction error varies with RUL—are predictions more accurate near failure or early in life? Investigate which sensors contribute most to prediction by analyzing learned features or performing ablation studies. Compare performance across different window sizes to understand temporal dependencies. Examine failure cases where predictions are far from true RUL and hypothesize why the model struggles. Visualize predicted RUL trajectories against true RUL to understand whether the model captures degradation trends or makes erratic predictions.