Tiny changes <noupdate>

2026-02-04 07:41:43 +01:00 · 2024-12-26 12:07:49 +01:00
parent d0229c69dc
commit 8d60699ad1
1 changed files with 5 additions and 5 deletions
--- a/src/year2/natural-language-processing/sections/_language_models.tex
+++ b/src/year2/natural-language-processing/sections/_language_models.tex
@ -80,13 +80,13 @@
    \]

    \begin{description}
-        \item[Language model] By considering a language model without context, the priors are computed as $\prob{w | \varepsilon} = \frac{\texttt{count}(w)}{\vert \texttt{COCA} \vert}$ (where $\vert \texttt{COCA} \vert = \num{404253213}$):
+        \item[Language model] By considering a language model without context, the priors are computed as $\prob{w | \varnothing} = \frac{\texttt{count}(w)}{\vert \texttt{COCA} \vert}$ (where $\vert \texttt{COCA} \vert = \num{404253213}$):
        \begin{table}[H]
            \centering
            \footnotesize
            \begin{tabular}{ccl}
                \toprule
-                $w$ & $\texttt{count}(w)$ & $\prob{w | \varepsilon}$ \\
+                $w$ & $\texttt{count}(w)$ & $\prob{w | \varnothing}$ \\
                \midrule
                \texttt{actress}    & \num{9321}    & $0.0000231$   \\
                \texttt{cress}      & \num{220}     & $0.000000544$ \\
@ -125,7 +125,7 @@
        \footnotesize
        \begin{tabular}{cl}
            \toprule
-            $w$ & $\prob{x | w} \prob{w | \varepsilon}$ \\
+            $w$ & $\prob{x | w} \prob{w | \varnothing}$ \\
            \midrule
            \texttt{actress} & $2.7 \cdot 10^{-9}$ \\
            \texttt{cress}   & $0.00078 \cdot 10^{-9}$ \\
@ -155,7 +155,7 @@
            \bottomrule
        \end{tabular}
    \end{table}
-    This allows to measure the likelihood of a sentence as:
+    This allows to measure the likelihood of a word within its context as:
    \[
        \begin{split}
            \prob{\texttt{versatile \underline{actress} whose}} &= \prob{\texttt{actress} | \texttt{versatile}} \prob{\texttt{whose} | \texttt{actress}} = 210 \cdot 10^{-10} \\
@ -293,7 +293,7 @@
        \begin{remark}[Perplexity intuition]
            Perplexity can be seen as a measure of surprise of a language model when evaluating a sequence.

-            Alternatively, it can also be seen as a weighted average branching factor (i.e., average number of possible unique next words that follow any word, accounting for their probabilities). For instance, consider a vocabulary of digits and a training corpus where every digit appears with uniform probability $0.1$. The perplexity of any sequence using a 1-gram model is:
+            Alternatively, it can also be seen as a weighted average branching factor (i.e., average number of possible unique next words that follow any word accounting for their probabilities). For instance, consider a vocabulary of digits and a training corpus where every digit appears with uniform probability $0.1$. The perplexity of any sequence using a 1-gram model is:
            \[ \texttt{PP}(w_{1..N}) = \left( 0.1^{N} \right)^{-\frac{1}{N}} = 10 \]
            Now consider a training corpus where $0$ occurs $91\%$ of the time and the other digits $1\%$ of the time. The perplexity of the sequence \texttt{0 0 0 0 0 3 0 0 0 0} is:
            \[ \texttt{PP}(\texttt{0 0 0 0 0 3 0 0 0 0}) = \left( 0.91^9 \cdot 0.01 \right)^{-\frac{1}{10}} \approx 1.73 \]