Add A3I neuro-probabilistic model + survival analysis

This commit is contained in:
2024-11-01 11:29:58 +01:00
parent e9b9b4835c
commit 7dbab460a8
4 changed files with 163 additions and 4 deletions

View File

@ -1,4 +1,4 @@
\chapter{Arrivals prediction: Hospital emergency room} \chapter{Arrivals prediction: Hospital emergency room} \label{ch:ap_hospital}
\section{Data} \section{Data}
@ -22,7 +22,7 @@ Each row of the dataset represents a patient and the features are:
\section{Approaches} \section{Approaches}
\begin{remark} \begin{remark}
MSE assumes that the conditional distribution of the predictions follows a normal distribution. MSE assumes that the conditional distribution $\prob{y | x; \theta}$ of the predictions follows a normal distribution (i.e., $y \sim \mathcal{N}(\mu(x; \theta), \sigma)$).
\end{remark} \end{remark}
\begin{remark} \begin{remark}
@ -107,7 +107,7 @@ Each row of the dataset represents a patient and the features are:
Some considerations must be made: Some considerations must be made:
\begin{descriptionlist} \begin{descriptionlist}
\item[Only positive rates] \item[Only positive rates]
As $\hat{\lambda}$ must be positive, it is possible to combine a logarithm and an exponentiation to achieve this: As $\hat{\lambda}$ must be positive, it is possible to combine a logarithm (i.e., assume that the MLP outputs a log-rate) and an exponentiation to achieve this:
\[ \[
\begin{split} \begin{split}
\log(\hat{\lambda}) &= \texttt{MLP}(x) \\ \log(\hat{\lambda}) &= \texttt{MLP}(x) \\
@ -115,6 +115,10 @@ Each row of the dataset represents a patient and the features are:
\end{split} \end{split}
\] \]
\begin{remark}
A strictly positive activation function can also be used.
\end{remark}
\item[Standardization] \item[Standardization]
The input of the network can be standardized. On the other hand, standardizing the output is wrong as the Poisson distribution is discrete. The input of the network can be standardized. On the other hand, standardizing the output is wrong as the Poisson distribution is discrete.
@ -133,4 +137,59 @@ Each row of the dataset represents a patient and the features are:
\end{remark} \end{remark}
\end{descriptionlist} \end{descriptionlist}
\end{description} \end{description}
\begin{remark}
With one-hot encoding, a linear model can be non-linearized into a lookup table.
\indenttbox
\begin{example}
\phantom{}
\begin{minipage}[t]{0.3\linewidth}
Consider the dataset:
\begin{table}[H]
\centering
\footnotesize
\begin{tabular}{c|c}
\toprule
$x$ & $y$ \\
\midrule
$0$ & $1$ \\
$1$ & $4$ \\
$2$ & $2$ \\
\bottomrule
\end{tabular}
\end{table}
\end{minipage}
\begin{minipage}[t]{0.65\linewidth}
A linear model learns a straight line:
\[ f(x) = \alpha x \]
\end{minipage}\\[1em]
\begin{minipage}[t]{0.3\linewidth}
With the dataset:
\begin{table}[H]
\centering
\footnotesize
\begin{tabular}{ccc|c}
\toprule
$x_0$ & $x_1$ & $x_2$ & $y$ \\
\midrule
$1$ & $0$ & $0$ & $1$ \\
$0$ & $1$ & $0$ & $4$ \\
$0$ & $0$ & $1$ & $2$ \\
\bottomrule
\end{tabular}
\end{table}
\end{minipage}
\begin{minipage}[t]{0.65\linewidth}
A linear model learns a linear combination:
\[ f(x) = \alpha x_0 + \beta x_1 + \gamma x_2 \]
Which allows to easily learn the dataset with $\alpha=1$, $\beta=4$, and $\gamma=2$
\end{minipage}
\end{example}
\end{remark}
\begin{remark}
Having access to the distribution allows to compute all possible statistics. For instance, it is possible to plot mean and standard deviation.
\end{remark}
\end{description} \end{description}

View File

@ -253,4 +253,104 @@ Predict RUL with a classifier $f_\varepsilon$ (for a chosen $\varepsilon$) that
\end{enumerate} \end{enumerate}
\end{enumerate} \end{enumerate}
\end{description} \end{description}
\end{description} \end{description}
\subsection{Survival analysis model}
\begin{remark}
Chronologically, this approach has been presented after \Cref{ch:ap_hospital}.
\end{remark}
\begin{description}
\item[Survival analysis model] \marginnote{Survival analysis model}
Probabilistic model to estimate the survival time of an entity.
\item[Survival analysis formalization]
Consider a random variable $T$ to model the survival time. The simplest model can be defined as:
\[ t \sim \prob{T} \]
Remaining survival time depends on the time $\bar{t}$, therefore we have that:
\[ t \sim \prob{T \mid \bar{t}} \]
Remaining survival time also depends on the past sensor readings $X_{\leq \bar{t}}$ and future readings $X_{> \bar{t}}$ (at this stage, we want to capture what affects the survival time, even if we do not have access to some variables), therefore we have that:
\[ t \sim \prob{T \mid \bar{t}, X_{\leq \bar{t}}, X_{> \bar{t}}} \]
\begin{description}
\item[Marginalization]
Average over all the possible outcomes of a random variable to cancel it out.
For this problem, we do not have access to the future sensor readings, so we can marginalize them:
\[ t = \underset{X_{> \bar{t}} \sim \prob{X_{> \bar{t}}}}{\mathbb{E}} \left[ \prob{T \mid \bar{t}, X_{\leq \bar{t}}} \right] \]
\end{description}
\begin{remark}
In probabilistic terms, the regression approach previously taken can be modelled as:
\[ t \sim \mathcal{N}(\mu(X_{\bar{t}}), \sigma) \]
where $\mu(\cdot)$ is the regressor.
Compared to the survival analysis model, there are the following differences:
\begin{itemize}
\item Regressor only reasons on the sensor readings $X_{\bar{t}}$ at a single time step.
\item Regressor does not consider the current time.
\item Regressor assumes normal distribution with constant variance.
\end{itemize}
\end{remark}
\item[Neuro-probabilistic model]
We can assume that the model follows a normal distribution parametrized on both the mean and standard deviation:
\[ t \sim \mathcal{N}(\mu(X_{\bar{t}}, \bar{t}), \sigma(X_{\bar{t}}, \bar{t})) \]
\begin{remark}
The readings of a single time step are used as it can be shown that using multiple time steps does not yield significant improvements for this dataset.
\end{remark}
\begin{description}
\item[Architecture]
Use a neural network that output both $\mu$ and $\sigma$ that are passed to a distribution head.
\end{description}
\begin{figure}[H]
\centering
\includegraphics[width=0.7\linewidth]{./img/_rul_neuroprobabilistic.pdf}
\end{figure}
\end{description}
% \subsection{Survival function}
% \begin{description}
% \item[Censoring] \marginnote{Censoring}
% Hide from the dataset key events.
% \begin{remark}
% For this dataset, it is more realistic to use partial runs as run-to-failure experiments are expensive to obtain.
% \begin{figure}[H]
% \centering
% \includegraphics[width=0.7\linewidth]{./img/_rul_censoring.pdf}
% \caption{Example of partial runs}
% \end{figure}
% \end{remark}
% \item[Survival function] \marginnote{Survival function}
% Given the random variable $T$ to model survival time, the survival function $S$ is defined as:
% \[ S(\bar{t}) = \prob{T > \bar{t}} \]
% In other words, it is the probability of surviving at least until time $\bar{t}$.
% For this problem, the survival function can account for past sensor readings:
% \[ S(\bar{t}, X_{\leq \bar{t}}) = \prob{T > \bar{t} \mid X_{\leq \bar{t}}} \]
% \item[Hazard function] \marginnote{Hazard function}
% Given the random variable $T$ to model survival time, the hazard function $\lambda$ is defined as:
% \[ \lambda(\bar{t}) = \prob{T > \bar{t} \mid T > \bar{t}-1} \]
% In other words, it is the conditional probability of not surviving at time $\bar{t}$ knowing that the entity survived until time $\bar{t}-1$.
% With discrete time, the survival function can be factorized using the hazard function:
% \[ S(\bar{t}) = (1-\lambda(\bar{t})) \cdot (1 - \lambda(\bar{t}-1)) \cdot \dots \]
% For this problem, the hazard function is the following:
% \[
% \begin{gathered}
% \lambda(\bar{t}, X_{\bar{t}}) = \prob{T > \bar{t} \mid T > \bar{t}-1, X_{\bar{t}}} \\
% S(\bar{t}, X_{\bar{t}}) = (1-\lambda(\bar{t}, X_{\bar{t}})) \cdot (1 - \lambda(\bar{t}-1, X_{\bar{t}-1})) \cdot \dots
% \end{gathered}
% \]
% \end{description}