This Website is not fully compatible with Internet Explorer.
For a more complete and secure browsing experience please consider using Microsoft Edge, Firefox, or Chrome

Probabilistic Modelling

6. A Numerical Solution to the Challenge Problem

In the previous section, the NAFEMS Stochastics challenge problem was introduced as a reliability problem with uncertain resistance R and uncertain stress S. The task is to estimate the probability that the stress exceeds the resistance based on a small number of available measurement datapoints.

This section shows how the problem can be written as a probabilistic model.

Numerical Simulation

The vampire example could be solved by direct counting because the variables were simple yes-or-no outcomes. The NAFEMS challenge problem is different: resistance and stress are continuous quantities, and their true means and standard deviations are not known. They must be inferred from small samples.

Stan uses Hamiltonian Monte Carlo with the No-U-Turn Sampler, often abbreviated as NUTS. The details of these algorithms are beyond the scope of this course, but the main idea is that Stan generates samples from the joint posterior distribution of the unknown parameters.

These samples do not just give a single best-fit value. They describe what values of the parameters are plausible after combining the observed data with the assumptions in the model.

 

 

The Model

The model assumes that both resistance and stress are normally distributed. The unknown parameters are:

  • μR, the mean resistance;
  • σR, the standard deviation of the resistance;
  • μS, the mean stress;
  • σS, the standard deviation of the stress.

In Stan, these are treated as parameters because they are not known directly. The observed measurements of R and S are then modelled as data generated from normal distributions with these unknown means and standard deviations.

R ~ normal(mu_R, sigma_R); S ~ normal(mu_S, sigma_S);

The model also includes weak prior assumptions for the unknown means and standard deviations. These priors mainly express that the quantities are positive and finite, with a slight preference for smaller values over very large ones.

Assumptions

A useful feature of probabilistic modelling is that the assumptions must be written down. In this example, the model assumes that R and S are positive, finite and normally distributed. This may not be perfect, but it makes the reasoning transparent and open to improvement.

This is similar to defining a conceptual model in engineering simulation. Before running a finite element model, the analyst must decide what physics, geometry, boundary conditions and material behaviour are represented. In probabilistic modelling, the analyst must decide what data-generating process is assumed.

Generated quantities and the limit state

After the model parameters have been estimated, Stan can generate derived quantities. For each posterior sample, values of resistance and stress are drawn, and the limit state is calculated:

g = R - S

If g < 0, the stress is greater than the resistance and the component fails.

Because the difference of two normally distributed quantities is also normally distributed, the limit state can be represented by:

μg = μR - μS

σg = sqrt(σR2 + σS2)

The probability of failure is then:

pf = P(g < 0)

Solution

The model is run from R using the rstan package. In the course material, the model is run with four independent chains and many samples per chain. Stan then returns posterior samples for the parameters and derived quantities.

As with any numerical simulation, the result should be checked. Stan reports diagnostic quantities such as:

  • n_eff, the effective number of samples;
  • Rhat, which indicates whether the independent chains converged to similar results.

Values of Rhat close to 1 indicate that the chains are giving consistent results. If the diagnostics are poor, the numerical result should not be trusted without further investigation.

Result

The Stan output gives posterior distributions rather than only one answer. For example, the model estimates the mean resistance to be around 478 MPa and the mean stress to be around 353 MPa, but there is still uncertainty in these values because the sample sizes are small.

The limit state distribution has a positive mean, so the part is usually predicted to survive. However, there is overlap between the resistance and stress distributions, which creates a non-zero probability of failure.

In the course material, counting the posterior samples where g < 0 gives a probability of failure of approximately 0.027, or 2.7%.

Aleatory and epistemic uncertainty

This result includes two kinds of uncertainty:

  • Aleatory uncertainty: the natural variability of R and S. This remains even if more data are collected.
  • Epistemic uncertainty: uncertainty in the estimated parameters, caused by the limited number of measurements. This could be reduced by collecting more data.

The course material also evaluates the uncertainty in the aleatory probability of failure itself. The median estimate is about 0.010, while higher quantiles are much larger, around 0.070 at the 90% level and 0.109 at the 95% level. This shows that the most likely probability of failure is small, but the epistemic uncertainty is still high.

This is the key engineering message: the result is not just a probability of failure. It is a probability of failure with uncertainty around it. That uncertainty matters when deciding whether the result is sufficient for an engineering decision.

S​tan model

transformed data {
  vector[5] R = [503, 460, 486, 466, 475]'; // [MPa]
  vector[10] S = [377, 278, 332, 331, 395, 394, 387, 362, 300, 381]'; // [MPa]
}

parameters {
  real<lower=0> mu_R;
  real<lower=0> sigma_R;
  real<lower=0> mu_S;
  real<lower=0> sigma_S;
}

model {
  mu_R ~ exponential(.001);
  sigma_R ~ exponential(.001);
  mu_S ~ exponential(.001);
  sigma_S ~ exponential(.001);
  S ~ normal(mu_S, sigma_S);
  R ~ normal(mu_R, sigma_R);
}

generated quantities {
  real R_post = normal_rng(mu_R, sigma_R);
  real S_post = normal_rng(mu_S, sigma_S);
  real g = R_post-S_post;
  // aleatory probability of failure using property of sum of normal distributions
  real pf = normal_cdf(0 | mu_R - mu_S, sqrt(sigma_R^2+sigma_S^2));
}