diff --git a/src/year2/natural-language-processing/sections/_italian_llm.tex b/src/year2/natural-language-processing/sections/_italian_llm.tex index aba367b..20c722e 100644 --- a/src/year2/natural-language-processing/sections/_italian_llm.tex +++ b/src/year2/natural-language-processing/sections/_italian_llm.tex @@ -13,7 +13,7 @@ Language model pre-trained on the Italian language. \begin{remark} - Minerva's pre-training corpus is actually composed by both Italian and English datasets. + Minerva's pre-training corpus is actually composed of both Italian and English datasets. Initially, English was used for benchmarking due to the lack of Italian benchmarks. However, it is also useful for tasks intrinsically in English (e.g., coding). \end{remark} diff --git a/src/year2/natural-language-processing/sections/_speech.tex b/src/year2/natural-language-processing/sections/_speech.tex index ffefe75..4b7f06c 100644 --- a/src/year2/natural-language-processing/sections/_speech.tex +++ b/src/year2/natural-language-processing/sections/_speech.tex @@ -103,7 +103,7 @@ \section{Tasks} \begin{description} - \item[Automatic speech recognition (ASP)] + \item[Automatic speech recognition (ASR)] Convert a sound signal into text. \begin{example} @@ -160,7 +160,7 @@ \begin{description} \item[Speech foundation model (SFM)] \marginnote{Speech foundation model (SFM)} - Transformer-based model pre-trained on speech. A common architecture is composed by: + Transformer-based model pre-trained on speech. A common architecture is composed of: \begin{descriptionlist} \item[Feature extractor] Converts the waveform into a low-dimensional representation (e.g., by using convolutions). @@ -179,7 +179,7 @@ \item[Multimodal model] \marginnote{Multimodal model} Model able to handle multiple modalities (e.g., speech and text). - The main considerations to take into account when working with multimodel models are: + The main considerations to take into account when working with multimodal models are: \begin{descriptionlist} \item[Representation] Decide how to encode different modalities into the same embedding space.