mirror of
https://github.com/NotXia/unibo-ai-notes.git
synced 2025-12-14 18:51:52 +01:00
Fix typos <noupdate>
This commit is contained in:
@ -13,7 +13,7 @@
|
||||
Language model pre-trained on the Italian language.
|
||||
|
||||
\begin{remark}
|
||||
Minerva's pre-training corpus is actually composed by both Italian and English datasets.
|
||||
Minerva's pre-training corpus is actually composed of both Italian and English datasets.
|
||||
|
||||
Initially, English was used for benchmarking due to the lack of Italian benchmarks. However, it is also useful for tasks intrinsically in English (e.g., coding).
|
||||
\end{remark}
|
||||
|
||||
@ -103,7 +103,7 @@
|
||||
\section{Tasks}
|
||||
|
||||
\begin{description}
|
||||
\item[Automatic speech recognition (ASP)]
|
||||
\item[Automatic speech recognition (ASR)]
|
||||
Convert a sound signal into text.
|
||||
|
||||
\begin{example}
|
||||
@ -160,7 +160,7 @@
|
||||
|
||||
\begin{description}
|
||||
\item[Speech foundation model (SFM)] \marginnote{Speech foundation model (SFM)}
|
||||
Transformer-based model pre-trained on speech. A common architecture is composed by:
|
||||
Transformer-based model pre-trained on speech. A common architecture is composed of:
|
||||
\begin{descriptionlist}
|
||||
\item[Feature extractor]
|
||||
Converts the waveform into a low-dimensional representation (e.g., by using convolutions).
|
||||
@ -179,7 +179,7 @@
|
||||
\item[Multimodal model] \marginnote{Multimodal model}
|
||||
Model able to handle multiple modalities (e.g., speech and text).
|
||||
|
||||
The main considerations to take into account when working with multimodel models are:
|
||||
The main considerations to take into account when working with multimodal models are:
|
||||
\begin{descriptionlist}
|
||||
\item[Representation]
|
||||
Decide how to encode different modalities into the same embedding space.
|
||||
|
||||
Reference in New Issue
Block a user