Fix typos <noupdate>

2026-02-04 07:41:43 +01:00 · 2024-01-01 17:09:15 +01:00
parent 9e25277c0d
commit a7933cf3ba
27 changed files with 193 additions and 190 deletions
--- a/src/fundamentals-of-ai-and-kr/module1/sections/_constraints.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_constraints.tex
@ -48,7 +48,7 @@ At level $i$, a value is assigned to the variable $X_i$ and
 constraints involving $X_1, \dots, X_i$ are checked.
 In case of failure, the path is not further explored.

-A problem of this approach is that it requires to backtrack in case of failure
+A problem with this approach is that it requires to backtrack in case of failure
 and reassign all the variables in the worst case.


@ -81,7 +81,7 @@ If the domain of a variable becomes empty, the path is considered a failure and
        \item $X_1 = 2 \hspace{1cm} X_2 :: [\cancel{1}, \cancel{2}, 3] \hspace{1cm} X_3 :: [\cancel{1}, \cancel{2}, 3]$
        \item $X_1 = 2 \hspace{1cm} X_2 = 3 \hspace{1cm} X_3 :: [\cancel{1}, \cancel{2}, \cancel{3}]$
    \end{enumerate}
-    As the domain of $X_3$ is empty, search on this branch fails and backtracking is required.
+    As the domain of $X_3$ is empty, a search on this branch fails and backtracking is required.
 \end{example}


@ -103,7 +103,7 @@ If the domain of a variable becomes empty, the path is considered a failure and
            Consider the variables and constraints: 
            \[ X_1 :: [1, 2, 3] \hspace{0.5cm} X_2 :: [1, 2, 3] \hspace{0.5cm} X_3 :: [1, 2, 3]  \hspace{1cm}  X_1 < X_2 < X_3 \]

-            We assign the variables in lexicographic order. At each step we have that:
+            We assign the variables in lexicographic order. At each step, we have that:
            \begin{enumerate}
                \item $X_1 = 1 \hspace{1cm} X_2 :: [\cancel{1}, 2, \cancel{3}] \hspace{1cm} X_3 :: [\cancel{1}, 2, 3]$ \\
                    Here, we assign $X_1=1$ and propagate to unassigned constraints.
@ -120,7 +120,7 @@ If the domain of a variable becomes empty, the path is considered a failure and
            Consider the variables and constraints: 
            \[ X_1 :: [1, 2, 3] \hspace{0.5cm} X_2 :: [1, 2, 3] \hspace{0.5cm} X_3 :: [1, 2, 3]  \hspace{1cm}  X_1 < X_2 < X_3 \]

-            We assign the variables in lexicographic order. At each step we have that:
+            We assign the variables in lexicographic order. At each step, we have that:
            \begin{enumerate}
                \item $X_1 = 1 \hspace{1cm} X_2 :: [\cancel{1}, 2, \cancel{3}] \hspace{1cm} X_3 :: [\cancel{1}, \cancel{2}, 3]$ \\
                    Here, we assign $X_1=1$ and propagate to unassigned constraints.
@ -252,5 +252,5 @@ This class of methods can be applied statically before the search or after each
        Generalization of arc/path consistency.
        If a problem with $n$ variables is $n$-consistent, the solution can be found without search.

-        Usually it is not applicable as it has exponential complexity.
+        Usually, it is not applicable as it has exponential complexity.
 \end{description}
--- a/src/fundamentals-of-ai-and-kr/module1/sections/_games.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_games.tex
@ -45,7 +45,7 @@ an iteration of the minimax algorithm can be described as follows:

    \item[Propagation]
        Starting from the parents of the leaves, the scores are propagated upwards 
-        by labeling the parents based on the children's score.
+        by labeling the parents based on the children's scores.

        Given an unlabeled node $m$, if $m$ is at a \textsc{Max} level, its label is the maximum of its children's score.
        Otherwise (\textsc{Min} level), the label is the minimum of its children's score.
@ -90,7 +90,7 @@ an iteration of the minimax algorithm can be described as follows:

 \section{Alpha-beta cuts}
 \marginnote{Alpha-beta cuts}
-Alpha-beta cuts (pruning) allows to prune subtrees whose state will never be selected (when playing optimally).
+Alpha-beta pruning (cuts) allows to prune subtrees whose state will never be selected (when playing optimally).
 $\alpha$ represents the best choice found for \textsc{Max}.
 $\beta$ represents the best choice found for \textsc{Min}.

--- a/src/fundamentals-of-ai-and-kr/module1/sections/_intro.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_intro.tex
@ -32,7 +32,7 @@ Intelligence is defined as the ability to perceive or infer information and to r
    \item[Symbolic AI (top-down)] \marginnote{Symbolic AI}
        Symbolic representation of knowledge, understandable by humans.

-    \item[Connectionist approach (bottom up)] \marginnote{Connectionist approach}
+    \item[Connectionist approach (bottom-up)] \marginnote{Connectionist approach}
        Neural networks. Knowledge is encoded and not understandable by humans.
 \end{description}

@ -124,7 +124,7 @@ A \textbf{feed-forward neural network} is composed of multiple layers of neurons
 The first layer is the input layer, while the last is the output layer.
 Intermediate layers are hidden layers.

-The expressivity of a neural networks increases when more neurons are used:
+The expressivity of a neural network increases when more neurons are used:
 \begin{descriptionlist}
    \item[Single perceptron] 
        Able to compute a linear separation.
@ -158,7 +158,7 @@ The expressivity of a neural networks increases when more neurons are used:
    \item[Deep learning] \marginnote{Deep learning}
        Neural network with a large number of layers and neurons.
        The learning process is hierarchical: the network exploits simple features in the first layers and
-        synthesis more complex concepts while advancing through the layers.
+        synthesizes more complex concepts while advancing through the layers.
 \end{description}


--- a/src/fundamentals-of-ai-and-kr/module1/sections/_local_search.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_local_search.tex
@ -12,9 +12,9 @@
        In other words, for each $s \in \mathcal{S}$, $\mathcal{N}(s) \subseteq \mathcal{S}$.

        \begin{example}[Travelling salesman problem]
-            Problem: find an Hamiltonian tour of minimum cost in an undirected graph.
+            Problem: find a Hamiltonian tour of minimum cost in an undirected graph.
            
-            A possible neighborhood of a state applies the $k$-exchange that guarantees to maintain an Hamiltonian tour.
+            A possible neighborhood of a state applies the $k$-exchange that guarantees to maintain a Hamiltonian tour.
            \begin{figure}[ht]
                \begin{subfigure}{.5\textwidth}
                    \centering
@ -37,7 +37,7 @@

    \item[Global optima]
        Given an evaluation function $f$,
-        a global optima (maximization case) is a state $s_\text{opt}$ such that:
+        a global optimum (maximization case) is a state $s_\text{opt}$ such that:
        \[ \forall s \in \mathcal{S}: f(s_\text{opt}) \geq f(s) \]

        Note: a larger neighborhood usually allows to obtain better solutions.
@ -55,7 +55,7 @@
 \marginnote{Iterative improvement (hill climbing)}
 Algorithm that only performs moves that improve the current solution.

-It does not keep track of the explored states (i.e. may return in a previously visited state) and 
+It does not keep track of the explored states (i.e. may return to a previously visited state) and 
 stops after reaching a local optima.

 \begin{algorithm}
@ -169,7 +169,7 @@ moves can be stored instead but, with this approach, some still not visited solu
 \marginnote{Iterated local search}
 Based on two steps:
 \begin{descriptionlist}
-    \item[Subsidiary local search steps] Efficiently reach a local optima (intensification).
+    \item[Subsidiary local search steps] Efficiently reach a local optimum (intensification).
    \item[Perturbation steps] Escape from a local optima (diversification).
 \end{descriptionlist}
 In addition, an acceptance criterion controls the two steps.
@ -194,7 +194,7 @@ Population based meta heuristics are built on the following concepts:
 \begin{descriptionlist}
    \item[Adaptation] Organisms are suited to their environment.
    \item[Inheritance] Offspring resemble their parents.
-    \item[Natural selection] Fit organisms have many offspring, others become extinct.
+    \item[Natural selection] Fit organisms have many offspring while others become extinct.
 \end{descriptionlist}

 \begin{table}[ht]
@ -244,10 +244,10 @@ Genetic operators are:
            \includegraphics[width=0.2\textwidth]{img/_genetic_mutation.pdf}
        \end{center}
    \item[Proportional selection] 
-        Probability of a individual to be chosen as parent of the next offspring. 
+        Probability of an individual to be chosen as parent of the next offspring. 
        Depends on the fitness.
    \item[Generational replacement] 
-        Create the new generation. Possibile approaches are:
+        Create the new generation. Possible approaches are:
        \begin{itemize}
            \item Completely replace the old generation with the new one.
            \item Keep the best $n$ individual from the new and old population.
--- a/src/fundamentals-of-ai-and-kr/module1/sections/_planning.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_planning.tex
@ -124,7 +124,7 @@ The direction of the search can be:

 \subsection{Deductive planning}
 \marginnote{Deductive planning}
-Formulates the planning problem using first order logic to represent states, goals and actions.
+Formulates the planning problem using first-order logic to represent states, goals and actions.
 Plans are generated as theorem proofs.

 \subsubsection{Green's formulation}
@ -163,7 +163,7 @@ The main concepts are:
        \end{example}

    \item[Frame axioms]
-        Besides the effects of actions, each state also have to define for all non-changing fluents their frame axioms.
+        Besides the effects of actions, each state also has to define for all non-changing fluents their frame axioms.
        If the problem is complex, the number of frame axioms becomes unreasonable.
        \begin{example}[Moving blocks]
            \[ \texttt{on(U, V, S)} \land \texttt{diff(U, X)} \rightarrow \texttt{on(U, V, do(MOVE(X, Y, Z), S))} \]
@ -247,7 +247,7 @@ Kowalsky's formulation avoids the frame axioms problem by using a set of fixed p
 Actions can be described as:
 \[ \texttt{poss(S)} \land \texttt{pact(A, S)} \rightarrow \texttt{poss(do(A, S))} \]

-In the Kowalsky's formulation, each action requires a frame assertion (in Green's formulation, each state requires frame axioms).
+In Kowalsky's formulation, each action requires a frame assertion (in Green's formulation, each state requires frame axioms).

 \begin{example}[Moving blocks]
    An initial state can be described by the following axioms:\\[0.5em]
@ -381,7 +381,7 @@ def strips(problem):
 Since there are non-deterministic choices, the search space might become very large.
 Heuristics can be used to avoid this.

-Conjunction of goals are solved separately, but this can lead to the \marginnote{Sussman anomaly} \textbf{Sussman anomaly} 
+Conjunction of goals is solved separately, but this can lead to the \marginnote{Sussman anomaly} \textbf{Sussman anomaly} 
 where a sub-goal destroys what another sub-goal has done.
 For this reason, when a conjunction is encountered, it is not immediately popped from the goal stack
 and is left as a final check.
@ -698,7 +698,7 @@ In macro-operators, two types of operators are defined:
    \item[Macro] Set of atomic operators. Before execution, this type of operator has to be decomposed.
        \begin{description}
            \item[Precompiled decomposition] 
-                The decomposition is known and described along side the preconditions and effects of the operator.
+                The decomposition is known and described alongside the preconditions and effects of the operator.
            \item[Planned decomposition] 
                The planner has to synthesize the atomic operators that compose a macro operator.
        \end{description}
@ -708,7 +708,7 @@ In macro-operators, two types of operators are defined:
        \begin{itemize}
            \item $X$ must be the effect of at least an atomic action in $P$ and should be protected until the end of $P$.
            \item Each precondition of the actions in $P$ must be guaranteed by previous actions or be a precondition of $A$.
-            \item $P$ must not threat any causal link.
+            \item $P$ must not threaten any causal link.
        \end{itemize}

        Moreover, when a macro action $A$ is replaced with its decomposition $P$:
@ -747,7 +747,7 @@ def hdpop(initial_state, goal, actions, decomposition_methods):

 \section{Conditional planning}
 \marginnote{Conditional planning}
-Conditional planning is based on the open world assumption where what is not in the initial state is unknown.
+Conditional planning is based on the open-world assumption where what is not in the initial state is unknown.
 It generates a different plan for each source of uncertainty and therefore has exponential complexity.

 \begin{description}
@ -768,7 +768,7 @@ It generates a different plan for each source of uncertainty and therefore has e


 \section{Reactive planning}
-Reactive planners are on-line algorithms able to interact with the dynamicity the world.
+Reactive planners are online algorithms able to interact with the dynamicity of the world.

 \subsection{Pure reactive systems}
 \marginnote{Pure reactive systems}
@ -777,7 +777,7 @@ The choice of the action is predictable. Therefore, this approach is not suited

 \subsection{Hybrid systems}
 \marginnote{Hybrid systems}
-Hybrid planners integrate the generative and reactive approach.
+Hybrid planners integrate the generative and reactive approaches.
 The steps the algorithm does are:
 \begin{itemize}
    \item Generates a plan to achieve the goal.
--- a/src/fundamentals-of-ai-and-kr/module1/sections/_search.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_search.tex
@ -76,7 +76,7 @@ def expand(node, problem):
 \subsection{Strategies}
 \begin{description}
    \item[Non-informed strategy] \marginnote{Non-informed strategy}
-        Domain knowledge not available. Usually does an exhaustive search.
+        Domain knowledge is not available. Usually does an exhaustive search.

    \item[Informed strategy] \marginnote{Informed strategy}
        Use domain knowledge by using heuristics.
@ -112,7 +112,7 @@ Always expands the least deep node. The fringe is implemented as a queue (FIFO).
        \hline
        \textbf{Completeness} & Yes \\
        \hline
-        \textbf{Optimality} & Only with uniform cost (i.e. all edges have same cost) \\
+        \textbf{Optimality} & Only with uniform cost (i.e. all edges have the same cost) \\
        \hline
        \textbf{\makecell{Time and space\\complexity}}
            & $O(b^d)$, where the solution depth is $d$ and the branching factor is $b$ (i.e. each non-leaf node has $b$ children) \\
@ -238,7 +238,7 @@ estimate the effort needed to reach the final goal.
 \subsection{Best-first search}
 \marginnote{Best-first seacrh}
 Uses heuristics to compute the desirability of the nodes (i.e. how close they are to the goal).
-The fringe is ordered according the estimated scores.
+The fringe is ordered according to the estimated scores.


 \begin{description}
--- a/src/fundamentals-of-ai-and-kr/module1/sections/_swarm_intelligence.tex
+++ b/src/fundamentals-of-ai-and-kr/module1/sections/_swarm_intelligence.tex
@ -2,7 +2,7 @@

 \begin{description}
    \item[Swarm intelligence] \marginnote{Swarm intelligence}
-        Group of locally-interacting agents that 
+        Group of locally interacting agents that 
        shows an emergent behavior without a centralized control system.

        A swarm intelligent system has the following features:
@ -15,7 +15,7 @@
            \item The system adapts to changes.
        \end{itemize}

-        Agents interact between each other and obtain positive and negative feedbacks.
+        Agents interact with each other and obtain positive and negative feedbacks.

    \item[Stigmergy] \marginnote{Stigmergy}
        Form of indirect communication where an agent modifies the environment and the others react to it.
@ -45,7 +45,7 @@ They also tend to prefer paths marked with the highest pheromone concentration.
            \begin{itemize}
                \item Nodes are cities.
                \item Edges are connections between cities.
-                \item A solution is an Hamiltonian path in the graph.
+                \item A solution is a Hamiltonian path in the graph.
                \item Constraints to avoid sub-cycles (i.e. avoid visiting a city multiple times).
            \end{itemize}            
        \end{example}
@ -123,7 +123,7 @@ The algorithm has the following phases:
    \item[Initialization] 
        The initial nectar source of each bee is determined randomly.
        Each solution (nectar source) is a vector $\vec{x}_m \in \mathbb{R}^n$ and 
-        each of its component is initialized constrained to a lower ($l_i$) and upper ($u_i$) bound:
+        each of its components is initialized constrained to a lower ($l_i$) and upper ($u_i$) bound:
        \[ \vec{x}_m\texttt{[}i\texttt{]} = l_i + \texttt{rand}(0, 1) \cdot (u_i - l_i) \]
    
    \item[Employed bees] 
@ -139,7 +139,7 @@ The algorithm has the following phases:
        Onlooker bees stochastically choose their food source.
        Each food source $\vec{x}_m$ has a probability associated to it defined as:
        \[ p_m = \frac{\texttt{fit}(\vec{x}_m)}{\sum_{i=1}^{n_\text{bees}} \texttt{fit}(\vec{x}_i)} \]
-        This provides a positive feedback as more promising solutions have a higher probability to be chosen.
+        This provides a positive feedback as more promising solutions have a higher probability of being chosen.

    \item[Scout bees] 
        Scout bees choose a nectar source randomly.
@ -166,7 +166,7 @@ The algorithm has the following phases:
 \section{Particle swarm optimization (PSO)}
 \marginnote{Particle swarm optimization (PSO)}

-In a bird flock, the movement of the individuals tend to:
+In a bird flock, the movement of the individuals tends to:
 \begin{itemize}
    \item Follow the neighbors.
    \item Stay in the flock.
@ -174,8 +174,8 @@ In a bird flock, the movement of the individuals tend to:
 \end{itemize}
 However, a model based on these rules does not have a common objective.

-PSO introduces as common objective the search of food.
-Each individual that finds food can:
+PSO introduces as a common objective the search for food.
+Each individual who finds food can:
 \begin{itemize}
    \item Move away from the flock and reach the food.
    \item Stay in the flock.
@ -197,7 +197,7 @@ Applied to optimization problems, the bird flock metaphor can be interpreted as:
 \end{descriptionlist}

 Given a cost function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ to minimize (gradient is not known),
-PSO initializes a swarm of particles (agents) whose movement is guided by the best known position.
+PSO initializes a swarm of particles (agents) whose movement is guided by the best-known position.
 Each particle is described by:
 \begin{itemize}
    \item Its position $\vec{x}_i \in \mathbb{R}^n$ in the search space.
--- a/src/fundamentals-of-ai-and-kr/module2/img/_sldnf_correct_example.pdf
+++ b/src/fundamentals-of-ai-and-kr/module2/img/_sldnf_correct_example.pdf
--- a/src/fundamentals-of-ai-and-kr/module2/img/_sldnf_incorrect_example.pdf
+++ b/src/fundamentals-of-ai-and-kr/module2/img/_sldnf_incorrect_example.pdf
--- a/src/fundamentals-of-ai-and-kr/module2/sections/_business_process.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_business_process.tex
@ -7,7 +7,7 @@
    \item[Business process management] \marginnote{Business process management}
        Methods to design, manage and analyze business processes by mining data contained in information systems.

-        Business processes help in making decisions and automations.
+        Business processes help in making decisions and automation.

    \item[Business process lifecycle] \phantom{}
        \begin{description}
@ -50,7 +50,7 @@



-\section{Business process modelling}
+\section{Business process modeling}

 \begin{description}
    \item[Activity instance] \marginnote{Activity instance}
@ -73,30 +73,30 @@
 \end{description}


-\subsection{Control flow modelling}
+\subsection{Control flow modeling}

 \begin{description}
-    \item[Process modelling types] \phantom{}
+    \item[Process modeling types] \phantom{}
        \begin{description}
            \item[Procedural vs declarative] \phantom{}
                \begin{description}
-                    \item[Procedural] \marginnote{Procedural modelling}
+                    \item[Procedural] \marginnote{Procedural modeling}
                        Based on a strict ordering of the steps.
                        Uses conditional choices, loops, parallel execution, events. 

                        Subject to the spaghetti-like process problem.

-                    \item[Declarative] \marginnote{Declarative modelling}
+                    \item[Declarative] \marginnote{Declarative modeling}
                        Based on the properties that should hold during execution.
-                        Uses concepts as: executions, expected executions, prohibited executions.
+                        Uses concepts such as executions, expected executions, prohibited executions.
                \end{description}

            \item[Closed vs open] \phantom{}
                \begin{description}
-                    \item[Closed] \marginnote{Closed modelling}
+                    \item[Closed] \marginnote{Closed modeling}
                        The execution of non-modelled activities is prohibited.
                    
-                    \item[Open] \marginnote{Open modelling}
+                    \item[Open] \marginnote{Open modeling}
                        Constraints to allow non-modelled activities.
                \end{description}
        \end{description}
@ -104,13 +104,13 @@

 The most common combination of approaches are:
 \begin{descriptionlist}
-    \item[Closed procedural process modelling] 
-    \item[Open declarative process modelling] 
+    \item[Closed procedural process modeling] 
+    \item[Open declarative process modeling] 
 \end{descriptionlist}



-\section{Closed procedural process modelling}
+\section{Closed procedural process modeling}

 \begin{description}
    \item[Process model]
@ -230,14 +230,14 @@ The most common combination of approaches are:
 \begin{table}[H]
    \centering
    \begin{tabular}{c|c}
-        \textbf{Petri nets} & \textbf{Business process modelling} \\
+        \textbf{Petri nets} & \textbf{Business process modeling} \\
        \hline
        Petri net & Process model \\
        Transitions & Activity models \\
        Tokens & Instances \\
        Transition firing & Activity execution \\
    \end{tabular}
-    \caption{Petri nets and business process modelling concepts equivalence}
+    \caption{Petri nets and business process modeling concepts equivalence}
 \end{table}


@ -299,7 +299,7 @@ De-facto standard for business process representation.
                        Drawn as a thin-bordered circle.
                    
                    \item[Intermediate event]
-                        Event occurring after the start of a process, but before its end.
+                        Event occurring after the start of a process but before its end.

                    \item[End event]
                        Indicates the end of a process and optionally provides its result.
@ -355,7 +355,7 @@ De-facto standard for business process representation.



-\section{Open declarative process modelling}
+\section{Open declarative process modeling}

 Define formal properties for process models (i.e. more formal than procedural methods).
 Properties defined in term of the evolution of the process (similar to the evolution of the world in modal logics)
@ -423,7 +423,7 @@ Based on constraints that must hold in every possible execution of the system.
        \end{description}

    \item[Semantics]
-        The semantic of the constraints can be defined using LTL.
+        The semantics of the constraints can be defined using LTL.

    \item[Verifiable properties] \phantom{}
        \begin{description}
@ -497,7 +497,7 @@ Based on constraints that must hold in every possible execution of the system.
    \item[Process discovery] \marginnote{Process discovery}
        Learn a process model representative of the input event log.

-        More formally, a process discovery algorithm is a function that maps an event log into a business process modelling language.
+        More formally, a process discovery algorithm is a function that maps an event log into a business process modeling language.
        In our case, we map logs into Petri nets (preferably workflow nets).

        \begin{remark}
@ -591,7 +591,7 @@ Based on constraints that must hold in every possible execution of the system.
    \item[Model evaluation]
        Different models can capture the same process described in a log.
        This allows for models that are capable of capturing all the possible traces of a log but 
-        are unable provide any insight (e.g. flower Petri net).
+        are unable to provide any insight (e.g. flower Petri net).

        \begin{figure}[H]
            \centering
@ -608,7 +608,7 @@ Based on constraints that must hold in every possible execution of the system.
            \item[Precision] \marginnote{Precision}
                How the model is able to capture rare cases.
            \item[Generalization] \marginnote{Generalization}
-                How the model generalize on the training traces.
+                How the model generalizes on the training traces.
        \end{descriptionlist}
 \end{description}

@ -618,7 +618,7 @@ Based on constraints that must hold in every possible execution of the system.

 \begin{description}
    \item[Descriptive model discrepancies] \marginnote{Descriptive model}
-        The model need to be improved.
+        The model needs to be improved.

    \item[Prescriptive model discrepancies] \marginnote{Prescriptive model}
        The traces need to be checked as the model cannot be changed (e.g. model of the law).
@ -632,14 +632,14 @@ Based on constraints that must hold in every possible execution of the system.
 \begin{description}
    \item[Token replay] \marginnote{Token replay}
        Given a trace and a Petri net, the trace is replayed on the model by moving tokens around.
-        The trace is conform if the end event can be reached, otherwise it is not.
+        The trace is conform if the end event can be reached, otherwise, it is not.

        A modified version of token replay allows to add or remove tokens when the trace is stuck on the Petri net. 
        These external interventions are tracked and used to compute a fitness score (i.e. degree of conformance).

        Limitations:
        \begin{itemize}
-            \item Fitness tend to be high for extremely problematic logs.
+            \item Fitness tends to be high for extremely problematic logs.
            \item If there are too many deviations, the model is flooded with tokens and may result in unexpected behaviors.
            \item It is a Petri net specific algorithm.
        \end{itemize}
--- a/src/fundamentals-of-ai-and-kr/module2/sections/_descriptive_logic.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_descriptive_logic.tex
@ -251,12 +251,12 @@ The following algorithms can be employed:
 \subsection{Open world assumption}

 \begin{description}
-    \item[Open world assumption] \marginnote{Open world assumption}
-        If a sentence cannot be inferred, its truth values is unknown.
+    \item[Open-world assumption] \marginnote{Open-world assumption}
+        If a sentence cannot be inferred, its truth value is unknown.
 \end{description}

-Description logics are based on the open world assumption.
-To reason in open world assumption, all the possible models are split upon encountering an unknown facts
+Description logics are based on the open-world assumption.
+To reason in open world assumption, all the possible models are split upon encountering unknown facts
 depending on the possible cases (Oedipus example).


--- a/src/fundamentals-of-ai-and-kr/module2/sections/_forward_reasoning.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_forward_reasoning.tex
@ -45,7 +45,7 @@ RETE is an efficient algorithm for implementing rule-based systems.
        A pattern can test:
        \begin{descriptionlist}
            \item[Intra-element features] Features that can be tested directly on a fact.
-            \item[Inter-element features] Features that involves more facts.
+            \item[Inter-element features] Features that involve more facts.
        \end{descriptionlist}

    \item[Conflict set] \marginnote{Conflict set}
@ -60,11 +60,11 @@ RETE is an efficient algorithm for implementing rule-based systems.
        \begin{descriptionlist}
            \item[Alpha-network] \marginnote{Alpha-network}
                For intra-element features.
-                The outcome is stored into alpha-memories and used by the beta network.
+                The outcome is stored in alpha-memories and used by the beta network.

            \item[Beta-network] \marginnote{Beta-network}
                For inter-element features.
-                The outcome is stored into beta-memories and corresponds to the conflict set.
+                The outcome is stored in beta-memories and corresponds to the conflict set.
        \end{descriptionlist}
        If more rules use the same pattern, the node of that pattern is reused and possibly outputting to different memories.
 \end{description}
@ -83,7 +83,7 @@ The best approach depends on the use case.

 \subsection{Execution}
 By default, RETE executes all the rules in the agenda and 
-then checks possible side effects that modified the working memory in a second moment.
+then checks for possible side effects that modify the working memory in a second moment.

 Note that it is very easy to create loops.

@ -162,7 +162,7 @@ RETE-based rule engine that uses Java.
        Event detected outside an event processing system (e.g. a sensor). It does not provide any information alone.

    \item[Complex event] \marginnote{Complex event}
-        Event generated by an event processing system and provides higher informative payload.
+        Event generated by an event processing system and provides a higher informative payload.

    \item[Complex event processing (CEP)] \marginnote{Complex event processing}
        Paradigm for dealing with a large amount of information.
@ -185,7 +185,7 @@ Drools supports CEP by representing events as facts.
        \end{description}

    \item[Expiration]
-        Mechanism to specify an expiration time to events and discard them from the working memory.
+        Mechanism to specify an expiration time for events and discard them from the working memory.

    \item[Temporal reasoning]
        Allen's temporal operators for temporal reasoning.
--- a/src/fundamentals-of-ai-and-kr/module2/sections/_ontologies.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_ontologies.tex
@ -17,7 +17,7 @@
        Properties:
        \begin{itemize}
            \item Should be applicable to almost any special domain.
-            \item Combining general concepts should not incur in inconsistences.
+            \item Combining general concepts should not incur in inconsistencies.
        \end{itemize}

        Approaches to create ontologies:
@ -35,7 +35,7 @@
    \item[Category] \marginnote{Category}
        Used in human reasoning when the goal is category-driven (in contrast to specific-instance-driven).

-        In first order logic, categories can be represented through:
+        In first-order logic, categories can be represented through:
        \begin{descriptionlist}
            \item[Predicate] \marginnote{Predicate categories}
                A predicate to tell if an object belongs to a category 
@ -158,7 +158,7 @@ A property of objects.

 \section{Semantic networks}
 \marginnote{Semantic networks}
-Graphical representation of objects and categories connected through labelled links.
+Graphical representation of objects and categories connected through labeled links.

 \begin{figure}[h]
    \centering
@ -189,7 +189,7 @@ Graphical representation of objects and categories connected through labelled li

 \begin{description}
    \item[Limitations]
-        Compared to first order logic, semantic networks do not have:
+        Compared to first-order logic, semantic networks do not have:
        \begin{itemize}
            \item Negations.
            \item Universally and existentially quantified properties.
@ -202,7 +202,7 @@ Graphical representation of objects and categories connected through labelled li
        This approach is powerful but does not have a corresponding logical meaning.

    \item[Advantages]
-        With semantic networks it is easy to attach default properties to categories and
+        With semantic networks, it is easy to attach default properties to categories and
        override them on the objects (i.e. \texttt{Legs} of \texttt{John}).
 \end{description}

@ -213,7 +213,7 @@ Graphical representation of objects and categories connected through labelled li
 Knowledge that describes an object in terms of its properties.
 Each frame has:
 \begin{itemize}
-    \item An unique name
+    \item A unique name
    \item Properties represented as pairs \texttt{<slot - filler>}
 \end{itemize}

--- a/src/fundamentals-of-ai-and-kr/module2/sections/_probabilistic_reasoning.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_probabilistic_reasoning.tex
@ -3,7 +3,7 @@
 \begin{description}
    \item[Probabilistic logic programming] \marginnote{Probabilistic logic programming}
        Adds probability distributions over logic programs allowing to define different worlds.
-        Joint distributions can also be defined over worlds and allows to answer to queries.
+        Joint distributions can also be defined over worlds and allow to answer to queries.
 \end{description}


--- a/src/fundamentals-of-ai-and-kr/module2/sections/_prolog.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_prolog.tex
@ -44,7 +44,7 @@ It may be useful to first have a look at the "Logic programming" section of
                        Variables appearing in a fact are quantified universally.
                        \[ \texttt{A(X).} \equiv \forall \texttt{X}: \texttt{A(X)}  \]
                    \item[Rules] 
-                        Variables appearing the the body only are quantified existentially.
+                        Variables appearing in the body only are quantified existentially.
                        Variables appearing in both the head and the body are quantified universally.
                        \[ \texttt{A(X) :- B(X, Y).} \equiv \forall \texttt{X}, \exists \texttt{Y} : \texttt{A(X)} \Leftarrow \texttt{B(X, Y)} \]

@ -72,7 +72,7 @@ It may be useful to first have a look at the "Logic programming" section of
        \end{descriptionlist}

    \item[SLD resolution] \marginnote{SLD}
-        Prolog uses SLD resolution with the following choices:
+        Prolog uses a SLD resolution with the following choices:
        \begin{descriptionlist}
            \item[Left-most] Always proves the left-most literal first.
            \item[Depth-first] Applies the predicates following the order of definition.
@ -204,7 +204,7 @@ Therefore, if \texttt{qj, \dots, qn} fails, there won't be backtracking and \tex
                Adding new axioms to the program may change the set of valid theorems. 
        \end{description}

-        As first-order logic in undecidable, closed-world assumption cannot be directly applied in practice.
+        As first-order logic is undecidable, the closed-world assumption cannot be directly applied in practice.

    \item[Negation as failure] \marginnote{Negation as failure}
        A negated atom $\lnot A$ is considered true iff $A$ fails in finite time:
@ -222,9 +222,8 @@ Therefore, if \texttt{qj, \dots, qn} fails, there won't be backtracking and \tex
                \begin{itemize}
                    \item If \texttt{L$_i$} is positive, apply the normal SLD resolution.
                    \item If \texttt{L$_i$} = $\lnot A$, prove that $A$ fails in finite time. 
-                        If it succeeds, \texttt{L$_i$} fails.
                \end{itemize}
-            \item Solve the goal \texttt{:- L$_1$, \dots, L$_{i-1}$, L$_{i+1}$, \dots  L$_m$}.
+            \item Solve the remaining goal \texttt{:- L$_1$, \dots, L$_{i-1}$, L$_{i+1}$, \dots,  L$_m$}.
        \end{enumerate}

        \begin{theorem}
@ -407,7 +406,7 @@ father(mario, paola).
        The operator \texttt{T =.. L} unifies \texttt{L} with a list where 
        its head is the head of \texttt{T} and the tail contains the remaining arguments of \texttt{T}
        (i.e. puts all the components of a predicate into a list).
-        Only one between \texttt{T} and \texttt{L} may be a variable.
+        Only one between \texttt{T} and \texttt{L} can be a variable.

        \begin{example} \phantom{} \\
            \begin{minipage}{0.5\textwidth}
@ -458,7 +457,7 @@ father(mario, paola).

        Note that \texttt{:- assert((p(X)))} quantifies \texttt{X} existentially as it is a query.
        If it is not ground and added to the database as is, 
-        is becomes a clause and therefore quantified universally: $\forall \texttt{X}: \texttt{p(X)}$.
+        it becomes a clause and therefore quantified universally: $\forall \texttt{X}: \texttt{p(X)}$.

        \begin{example}[Lemma generation] \phantom{}
            \begin{lstlisting}[language={}]
@ -473,7 +472,7 @@ father(mario, paola).
    generate_lemma(T) :- assert(T).
            \end{lstlisting} 

-            \texttt{generate\_lemma/1} allows to add to the clauses database all the intermediate steps to compute the Fibonacci sequence
+            The custom defined \texttt{generate\_lemma/1} allows to add to the clauses database all the intermediate steps to compute the Fibonacci sequence
            (similar concept to dynamic programming).
        \end{example}

--- a/src/fundamentals-of-ai-and-kr/module2/sections/_semantic_web.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_semantic_web.tex
@ -19,7 +19,7 @@

    \item[Uniform resource identifier] \marginnote{URI}
        Naming system to uniquely identify concepts.
-        Each URI correspond to one and only one concept, but multiple URIs can refer to the same concept.
+        Each URI corresponds to one and only one concept, but multiple URIs can refer to the same concept.

    \item[XML] \marginnote{XML}
        Markup language to represent hierarchically structured data.
@ -74,7 +74,7 @@ xmlns:contact=http://www.w3.org/2000/10/swap/pim/contact#>
            \item[Database similarities]
                RDF aims to integrate different databases:
                \begin{itemize}
-                    \item A DB record is a RDF node.
+                    \item A DB record is an RDF node.
                    \item The name of a column can be seen as a property type.
                    \item The value of a field corresponds to the value of a property.
                \end{itemize}
@ -87,8 +87,8 @@ xmlns:contact=http://www.w3.org/2000/10/swap/pim/contact#>
        Language to query different data sources that support RDF (natively or through a middleware).

    \item[Ontology web language (OWL)] \marginnote{Ontology web language (OWL)}
-        Ontology based on RDF and description logic fragments.
-        Three level of expressivity are available:
+        Ontology-based on RDF and description logic fragments.
+        Three levels of expressivity are available:
        \begin{itemize}
            \item OWL lite.
            \item OWL DL.
--- a/src/fundamentals-of-ai-and-kr/module2/sections/_time_reasoning.tex
+++ b/src/fundamentals-of-ai-and-kr/module2/sections/_time_reasoning.tex
@ -5,12 +5,12 @@

 \begin{description}
    \item[State] \marginnote{State} 
-        The current state of the world can be represented as a set of propositions that are true according the observation of an agent.
+        The current state of the world can be represented as a set of propositions that are true according to the observation of an agent.

        The union of a countable sequence of states represents the evolution of the world. Each proposition is distinguished by its time step.

        \begin{example}
-            A child has a bow and an arrow, then shoots the arrow.
+            A child has a bow and an arrow and then shoots the arrow.
            \[
                \begin{split}
                    \text{KB}^0 &= \{ \texttt{hasBow}^0, \texttt{hasArrow}^0  \} \\
@ -51,7 +51,7 @@


 \section{Situation calculus (Green's formulation)}
-Situation calculus uses first order logic instead of propositional logic.
+Situation calculus uses first-order logic instead of propositional logic.

 \begin{description}
    \item[Situation] \marginnote{Situation}
@ -142,8 +142,8 @@ Event calculus reifies fluents and events (actions) as terms (instead of predica
 \begin{description}
    \item[Deductive reasoning] 
        Event calculus only allows deductive reasoning: 
-        it takes as input the domain-dependant axioms and a set of events, and computes a set of true fluents.
-        If a new event is observed, the query need to be recomputed again.
+        it takes as input the domain-dependant axioms and a set of events and computes a set of true fluents.
+        If a new event is observed, the query needs to be recomputed again.
 \end{description}


@ -183,7 +183,7 @@ Allows to add events dynamically without the need to recompute the result.

 \section{Allen's logic of intervals}

-Event calculus only captures instantaneous events that happen in given points in time.
+Event calculus only captures instantaneous events that happen at given points in time.

 \begin{description}
    \item[Allen's logic of intervals] \marginnote{Allen's logic of intervals}
@ -217,7 +217,7 @@ Event calculus only captures instantaneous events that happen in given points in

 \section{Modal logics}

-Logic based on interacting agents with their own knowledge base.
+Logic-based on interacting agents with their own knowledge base.

 \begin{description}
    \item[Propositional attitudes] \marginnote{Propositional attitudes}
@ -226,7 +226,7 @@ Logic based on interacting agents with their own knowledge base.
        First-order logic is not suited to represent these operators.

    \item[Modal logics] \marginnote{Modal logics}
-        Modal logics have the same syntax of first-order logic with the addition of modal operators.
+        Modal logics have the same syntax as first-order logic with the addition of modal operators.

    \item[Modal operator]
        A modal operator takes as input the name of an agent and a sentence (instead of a term as in FOL).
@ -260,7 +260,7 @@ Logic based on interacting agents with their own knowledge base.
        \end{itemize}

        \begin{example}
-            Alice is in a room an tosses a coin. Bob is in another room an will enter Alice's room when the coin lands to observe the result.
+            Alice is in a room and tosses a coin. Bob is in another room and will enter Alice's room when the coin lands to observe the result.

            We define a model $M = (S, \pi, K_\texttt{a}, K_\texttt{b})$ on $\phi$ where:
            \begin{itemize}
@ -359,7 +359,7 @@ The accessibility relation maps into the temporal dimension with two possible ev
        \end{description}

    \item[Semantics]
-        Given a Kripke structure $M = (S, \pi, K_\texttt{1}, \dots, K_\texttt{n})$ where states are represented using integers,
+        Given a Kripke structure, $M = (S, \pi, K_\texttt{1}, \dots, K_\texttt{n})$ where states are represented using integers,
        the semantic of the operators is the following:
        \begin{itemize}
            \item $(M, i) \models P \iff i \in \pi(P)$.
@ -370,5 +370,5 @@ The accessibility relation maps into the temporal dimension with two possible ev
        \end{itemize}

    \item[Model checking] \marginnote{Model checking}
-        Methods to prove properties of linear-time temporal logic based finite state machines or distributed systems.
+        Methods to prove properties of linear-time temporal logic-based finite state machines or distributed systems.
 \end{description}
--- a/src/languages-and-algorithms-for-ai/module2/sections/_constraint_programming.tex
+++ b/src/languages-and-algorithms-for-ai/module2/sections/_constraint_programming.tex
@ -44,7 +44,7 @@
    \item[Goal] \marginnote{Goal}
        $G := \top \mid \bot \mid A \mid C \mid G_1 \land G_2$ 
    \item[Constraint logic clause] \marginnote{Constraint logic clause}
-        $K := A \leftarrow G$ 
+        $K := A \Leftarrow G$ 
    \item[Constraint logic program] \marginnote{Constraint logic program}
        $P := K_1 \dots K_m$, for $m \geq 0$ 
 \end{description}
@ -67,17 +67,17 @@
                Starting from the state $\langle A \land G, C \rangle$ of a program $P$, a transition on the atom $A$ can result in:
                \begin{description}
                    \item[Unfold] \marginnote{Unfold}
-                        If there exists a clause $(B \leftarrow H)$ in $P$ and 
+                        If there exists a clause $(B \Leftarrow H)$ in $P$ and 
                        an assignment $(B \doteq A)$ such that $((B \doteq A) \land C)$ is still valid,
                        then we have a transition $\langle A \land G, C \rangle \mapsto \langle H \land G, (B \doteq A) \land C \rangle$.

                        In other words, we want to develop an atom $A$ and the current constraints are denoted as $C$.
                        We look for a clause whose head equals $A$, applying an assignment if needed.
-                        If this is possible, we transit from solving $A$ to solving the body of the clause and 
+                        If this is possible, we transition from solving $A$ to solving the body of the clause and 
                        add the assignment to the set of active constraints.

                    \item[Failure] \marginnote{Failure}
-                        If there are no clauses $(B \leftarrow H)$ with a valid assignment $((B \doteq A) \land C)$,
+                        If there are no clauses $(B \Leftarrow H)$ with a valid assignment $((B \doteq A) \land C)$,
                        then we have a transition $\langle A \land G, C \rangle \mapsto \langle \bot, \bot \rangle$.
                \end{description}

@ -100,7 +100,7 @@
        \begin{description}
            \item[Generate-and-test] \marginnote{Generate-and-test}
                Strategy adopted by logic programs.
-                Every possible assignment to the variables are generated and tested.
+                Every possible assignment to the variables is generated and tested.
                
                The workflow is the following:
                \begin{enumerate}
--- a/src/languages-and-algorithms-for-ai/module2/sections/_first_order_logic.tex
+++ b/src/languages-and-algorithms-for-ai/module2/sections/_first_order_logic.tex
@ -1,4 +1,4 @@
-\chapter{First order logic}
+\chapter{First-order logic}


 \section{Syntax}
@ -12,12 +12,12 @@ The symbols of propositional logic are:
        Unknown elements of the domain. Do not represent truth values.
    
    \item[Function symbols] 
-        Function $f^{(n)}$ applied on $n$ constants to obtain another constant.
+        Function $f^{(n)}$ applied on $n$ elements of the domain to obtain another element of the domain.

    \item[Predicate symbols]
-        Function $P^{(n)}$ applied on $n$ constants to obtain a truth value.
+        Function $P^{(n)}$ applied on $n$ elements of the domain to obtain a truth value.

-    \item[Connectives] $\forall$ $\exists$ $\land$ $\vee$ $\rightarrow$ $\lnot$ $\leftrightarrow$ $\top$ $\bot$ $($ $)$
+    \item[Connectives] $\forall$ $\exists$ $\land$ $\vee$ $\Rightarrow$ $\lnot$ $\Leftrightarrow$ $\top$ $\bot$ $($ $)$
 \end{descriptionlist}

 Using the basic syntax, the following constructs can be defined:
@ -27,7 +27,7 @@ Using the basic syntax, the following constructs can be defined:

    \item[Proposition] Denotes truth values.
        \[
-            P := \top \,|\, \bot \,|\, P \land P \,|\, P \vee P \,|\,  P \rightarrow P \,|\, P \leftrightarrow P \,|\, 
+            P := \top \,|\, \bot \,|\, P \land P \,|\, P \vee P \,|\,  P \Rightarrow P \,|\, P \Leftrightarrow P \,|\, 
                \lnot P \,|\, \forall x. P \,|\, \exists x. P \,|\, (P) \,|\, P^{(n)}(t_1, \dots, t_n) 
        \]
 \end{descriptionlist}
@ -35,7 +35,7 @@ Using the basic syntax, the following constructs can be defined:

 \begin{description}
    \item[Well-formed formula] \marginnote{Well-formed formula}
-        The definition of well-formed formula in first order logic extends the one of
+        The definition of well-formed formula in first-order logic extends the one of
        propositional logic by adding the following conditions:
        \begin{itemize}
            \item If S is well-formed, $\exists X. S$ is well-formed. Where $X$ is a variable.
@ -44,13 +44,13 @@ Using the basic syntax, the following constructs can be defined:

    \item[Free variables] \marginnote{Free variables}
        The universal and existential quantifiers bind their variable within the scope of the formula.
-        Let $F_v(F)$ be the set of free variables in a formula $F$, $F_v$ is defined as follows:
+        Let $\mathcal{F}_v(F)$ be the set of free variables in a formula $F$, $\mathcal{F}_v$ is defined as follows:
        \begin{itemize}
-            \item $F_v(p(t)) = \bigcup \texttt{vars}(t)$
-            \item $F_v(\top) = F_v(\bot) = \varnothing$
-            \item $F_v(\lnot F) = F_v(F)$
-            \item $F_v(F_1 \land F_2) = F_v(F_1 \vee  F_2) = F_v(F_1 \rightarrow F_2) = F_v(F_1) \cup F_v(F_2)$
-            \item $F_v(\forall X.F) = F_v(\exists X.F) = F_v(F) \smallsetminus \{ X \}$
+            \item $\mathcal{F}_v(p(t)) = \bigcup \{ \text{variables of $t$} \}$
+            \item $\mathcal{F}_v(\top) = \mathcal{F}_v(\bot) = \varnothing$
+            \item $\mathcal{F}_v(\lnot F) = \mathcal{F}_v(F)$
+            \item $\mathcal{F}_v(F_1 \land F_2) = \mathcal{F}_v(F_1 \vee  F_2) = \mathcal{F}_v(F_1 \Rightarrow F_2) = \mathcal{F}_v(F_1) \cup \mathcal{F}_v(F_2)$
+            \item $\mathcal{F}_v(\forall X.F) = \mathcal{F}_v(\exists X.F) = \mathcal{F}_v(F) \smallsetminus \{ X \}$
        \end{itemize}

        \begin{description}
@ -60,7 +60,7 @@ Using the basic syntax, the following constructs can be defined:
            \item[Theory] \marginnote{Theory}
                Set of sentences.

-            \item[Ground term/Formula] \marginnote{Formula}
+            \item[Ground term/Ground formula] \marginnote{Ground term/Ground formula}
                Proposition without variables.
        \end{description}
 \end{description}
@ -71,13 +71,13 @@ Using the basic syntax, the following constructs can be defined:

 \begin{description}
    \item[Interpretation] \marginnote{Interpretation}
-        An interpretation in first order logic $\mathcal{I}$ is a pair $(D, I)$:
+        An interpretation in first-order logic $\mathcal{I}$ is a pair $(D, I)$:
        \begin{itemize}
            \item $D$ is the domain of the terms.
            \item $I$ is the interpretation function such that:
            \begin{itemize}
-                \item $I(f): D^n \rightarrow D$ for every n-ary function symbol.
-                \item $I(p) \subseteq D^n$ for every n-ary predicate symbol.
+                \item The interpretation of an n-ary function symbol is a function $I(f): D^n \rightarrow D$.
+                \item The interpretation of an n-ary predicate symbol is a relation $I(p) \subseteq D^n$.
            \end{itemize}
        \end{itemize}

@ -101,22 +101,22 @@ Using the basic syntax, the following constructs can be defined:
    \item[Logical consequence] \marginnote{Logical consequence}
        A sentence $T_1$ is a logical consequence of $T_2$ ($T_2 \models T_1$) if
        every model of $T_2$ is also model of $T_1$:
-        \[ \mathcal{I} \models T_2 \rightarrow \mathcal{I} \models T_1 \]
+        \[ \mathcal{I} \models T_2 \Rightarrow \mathcal{I} \models T_1 \]

        \begin{theorem}
-            It is undecidable to determine if a first order logic formula is a tautology.
+            Determining if a first-order logic formula is a tautology is undecidable.
        \end{theorem}

    \item[Equivalence] \marginnote{Equivalence}
-        A sentence $T_1$ is equivalent to $T_2$ if $T_1 \models T_2$ and $T_2 \models T_1$.
+        A sentence $T_1$ is equivalent to $T_2$ iff $T_1 \models T_2$ and $T_2 \models T_1$.
 \end{description}

 \begin{theorem}
    The following statements are equivalent:
    \begin{enumerate}
        \item $F_1, \dots, F_n \models G$.
-        \item $(\bigwedge_{i=1}^{n} F_i) \rightarrow G$ is valid.
-        \item $(\bigwedge_{i=1}^{n} F_i) \land \lnot G$ is unsatisfiable.
+        \item $F_1 \land \dots \land F_n \Rightarrow G$ is valid (i.e. deduction).
+        \item $F_1 \land \dots \land F_n \land \lnot G$ is unsatisfiable (i.e. refutation).
    \end{enumerate}
 \end{theorem}

@ -125,7 +125,7 @@ Using the basic syntax, the following constructs can be defined:

 \begin{description}
    \item[Substitution] \marginnote{Substitution}
-        A substitution $\sigma: \mathcal{V} \rightarrow \mathcal{T}$ is a mapping from variables to terms.
+        A substitution $\sigma: \mathcal{V} \Rightarrow \mathcal{T}$ is a mapping from variables to terms.
        It is written as $\{ X_1 \mapsto t_1, \dots, X_n \mapsto t_n \}$.

        The application of a substitution is the following:
@ -134,8 +134,8 @@ Using the basic syntax, the following constructs can be defined:
            \item $f(t_1, \dots, t_n)\sigma = fp(t_1\sigma, \dots, t_n\sigma)$
            \item $\bot\sigma = \bot$ and $\top\sigma = \top$
            \item $(\lnot F)\sigma = (\lnot F\sigma)$
-            \item $(F_1 \star F_2)\sigma = (F_1\sigma \star F_2\sigma)$ for $\star \in \{ \land, \vee, \rightarrow \}$
-            \item $(\forall X.F)\sigma = \forall X' (F \sigma[X \mapsto X'])$ where $X'$ is a fresh variable (i.e. does not appear in $F$).
+            \item $(F_1 \star F_2)\sigma = (F_1\sigma \star F_2\sigma)$ for $\star \in \{ \land, \vee, \Rightarrow \}$
+            \item $(\forall X.F)\sigma = \forall X' (F \sigma[X \mapsto X'])$ where $X'$ is a fresh variable (i.e. it does not appear in $F$).
            \item $(\exists X.F)\sigma = \exists X' (F \sigma[X \mapsto X'])$ where $X'$ is a fresh variable.
        \end{itemize}

--- a/src/languages-and-algorithms-for-ai/module2/sections/_logic_programming.tex
+++ b/src/languages-and-algorithms-for-ai/module2/sections/_logic_programming.tex
@ -13,8 +13,8 @@ A logic program has the following components (defined using BNF):
    
    \item[Horn clause] \marginnote{Horn clause}
        A clause with at most one positive literal.
-        \[ K := A \leftarrow G \]
-        In other words, $A$ and all the literals in $G$ are positive as $A \leftarrow G = A \vee \lnot G$.
+        \[ K := A \Leftarrow G \]
+        In other words, $A$ and all the literals in $G$ are positive as $A \Leftarrow G = A \vee \lnot G$.
    
    \item[Program] \marginnote{Program}
        $P := K_1 \dots K_m$ for $m \geq 0$
@ -52,14 +52,14 @@ A logic program has the following components (defined using BNF):

    \item[Computed answer substitution] \marginnote{Computed answer substitution}
        Given a goal $G$ and a program $P$, if there exists a successful derivation 
-        $\langle G, \varepsilon \rangle \mapsto* \langle \top, \theta \rangle$,
+        $\langle G, \varepsilon \rangle \mapsto^* \langle \top, \theta \rangle$,
        then the substitution $\theta$ is the computed answer substitution of $G$.

    \item[Transition] \marginnote{Transition}
        Starting from the state $\langle A \land G, \theta \rangle$ of a program $P$, a transition on the atom $A$ can result in:
        \begin{descriptionlist}
            \item[Unfold]
-                If there exists a clause $(B \leftarrow H)$ in $P$ and
+                If there exists a clause $(B \Leftarrow H)$ in $P$ and
                a (most general) unifier $\beta$ for $A\theta$ and $B$,
                then we have a transition: $\langle A \land G, \theta \rangle \mapsto \langle H \land G, \theta\beta \rangle$.

@ -67,7 +67,7 @@ A logic program has the following components (defined using BNF):
                To do this, we search for a clause that has as conclusion $A\theta$ and add its premise to the things to prove.
                If a unification is needed to match $A\theta$, we add it to the substitutions of the state.
            \item[Failure] 
-                If there are no clauses $(B \leftarrow H)$ in $P$ with a unifier for $A\theta$ and $B$,
+                If there are no clauses $(B \Leftarrow H)$ in $P$ with a unifier for $A\theta$ and $B$,
                then we have a transition: $\langle A \land G, \theta \rangle \mapsto \langle \bot, \varepsilon \rangle$.
        \end{descriptionlist}

@ -79,7 +79,7 @@ A logic program has the following components (defined using BNF):
                        This affects the length of the derivation (infinite in the worst case).
                    
                    \item[Don't-know] \marginnote{Don't-know}
-                        Any clause $(B \rightarrow H)$ in $P$ with an unifier for $A\theta$ and $B$ can be chosen.
+                        Any clause $(B \Leftarrow H)$ in $P$ with a unifier for $A\theta$ and $B$ can be chosen.
                        This determines the output of the derivation.
                \end{descriptionlist}
        \end{description}
@ -101,7 +101,7 @@ A logic program has the following components (defined using BNF):

        \begin{theorem}[Completeness]
            Given a program $P$, a goal $G$ and a substitution $\theta$,
-            if $P \models G\theta$, then it exists a computed answer substitution $\sigma$ such that $G\theta = G\sigma\beta$.
+            if $P \models G\theta$, then there exists a computed answer substitution $\sigma$ such that $G\theta = G\sigma\beta$.
        \end{theorem}

        \begin{theorem}
--- a/src/languages-and-algorithms-for-ai/module2/sections/_propositional_logic.tex
+++ b/src/languages-and-algorithms-for-ai/module2/sections/_propositional_logic.tex
@ -9,7 +9,7 @@
 The symbols of propositional logic are:
 \begin{descriptionlist}
    \item[Proposition symbols] $p_0$, $p_1$, \dots
-    \item[Connectives] $\land$ $\vee$ $\rightarrow$ $\leftrightarrow$ $\lnot$ $\bot$ $($ $)$
+    \item[Connectives] $\land$ $\vee$ $\Rightarrow$ $\Leftrightarrow$ $\lnot$ $\bot$ $($ $)$
 \end{descriptionlist}

 \begin{description}
@ -22,12 +22,12 @@ The symbols of propositional logic are:
            \item If $S_1$ and $S_2$ are well-formed, $S_1 \vee S_2$ is well-formed.
        \end{itemize}

-        Note that the implication $S_1 \rightarrow S_2$ can be written as $\lnot S_1 \vee S_2$.
+        Note that the implication $S_1 \Rightarrow S_2$ can be written as $\lnot S_1 \vee S_2$.

        The BNF definition of a formula is:
        \[
            F := \texttt{atomic\_proposition} \,|\, F \land F \,|\, F \vee F \,|\,  
-                F \rightarrow F \,|\, F \leftrightarrow F \,|\, \lnot F \,|\, (F) 
+                F \Rightarrow F \,|\, F \Leftrightarrow F \,|\, \lnot F \,|\, (F) 
        \]
        % \[ 
        %     \begin{split}
@ -35,8 +35,8 @@ The symbols of propositional logic are:
        %             &\lnot \texttt{<formula>} \,|\, \\
        %             &\texttt{<formula>} \land \texttt{<formula>} \,|\, \\
        %             &\texttt{<formula>} \vee \texttt{<formula>} \,|\, \\
-        %             &\texttt{<formula>} \rightarrow \texttt{<formula>} \,|\, \\
-        %             &\texttt{<formula>} \leftrightarrow \texttt{<formula>} \,|\, \\
+        %             &\texttt{<formula>} \Rightarrow \texttt{<formula>} \,|\, \\
+        %             &\texttt{<formula>} \Leftrightarrow \texttt{<formula>} \,|\, \\
        %             &(\texttt{<formula>}) \\
        %     \end{split}
        % \]
@ -64,7 +64,7 @@ The symbols of propositional logic are:
                to the atoms $\{ A_1, \dots, A_n \}$ an element of $D$.
        \end{itemize}

-        Note: given a formula $F$ of $n$ distinct atoms, there are $2^n$ district interpretations.
+        Note: given a formula $F$ of $n$ distinct atoms, there are $2^n$ distinct interpretations.

        \begin{description}
            \item[Model] \marginnote{Model}
@ -100,14 +100,14 @@ The symbols of propositional logic are:
            \item $\lnot S$ is true iff $S$ is false.
            \item $S_1 \land S_2$ is true iff $S_1$ is true and $S_2$ is true.
            \item $S_1 \vee S_2$ is true iff $S_1$ is true or $S_2$ is true.
-            \item $S_1 \rightarrow S_2$ is true iff $S_1$ is false or $S_2$ is true.
-            \item $S_1 \leftrightarrow S_2$ is true iff $S_1 \rightarrow S_2$ is true and $S_1 \leftarrow S_2$ is true.
+            \item $S_1 \Rightarrow S_2$ is true iff $S_1$ is false or $S_2$ is true.
+            \item $S_1 \Leftrightarrow S_2$ is true iff $S_1 \Rightarrow S_2$ is true and $S_1 \Leftarrow S_2$ is true.
        \end{itemize}


    \item[Evaluation] \marginnote{Evaluation order}
-        The connectives of a propositional formula are evaluated in the order:
-        \[ \leftrightarrow, \rightarrow, \vee, \land, \lnot \]
+        The connectives of a propositional formula are evaluated in the following order:
+        \[ \Leftrightarrow, \Rightarrow, \vee, \land, \lnot \]
        Formulas in parenthesis have higher priority.

   \item[Logical consequence] \marginnote{Logical consequence} 
@ -128,9 +128,9 @@ The symbols of propositional logic are:
            \item[Associativity]: $((P \land Q) \land R) \equiv (P \land (Q \land R))$
                and $((P \vee Q) \vee R) \equiv (P \vee (Q \vee R))$
            \item[Double negation elimination]: $\lnot(\lnot P) \equiv P$
-            \item[Contraposition]: $(P \rightarrow Q) \equiv (\lnot Q \rightarrow \lnot P)$
-            \item[Implication elimination]: $(P \rightarrow Q) \equiv (\lnot P \vee Q)$
-            \item[Biconditional elimination]: $(P \leftrightarrow Q) \equiv ((P \rightarrow Q) \land (Q \rightarrow P))$
+            \item[Contraposition]: $(P \Rightarrow Q) \equiv (\lnot Q \Rightarrow \lnot P)$
+            \item[Implication elimination]: $(P \Rightarrow Q) \equiv (\lnot P \vee Q)$
+            \item[Biconditional elimination]: $(P \Leftrightarrow Q) \equiv ((P \Rightarrow Q) \land (Q \Rightarrow P))$
            \item[De Morgan]: $\lnot(P \land Q) \equiv (\lnot P \vee \lnot Q)$ and $\lnot(P \vee Q) \equiv (\lnot P \land \lnot Q)$
            \item[Distributivity of $\land$ over $\vee$]: $(P \land (Q \vee R)) \equiv ((P \land Q) \vee (P \land R))$
            \item[Distributivity of $\vee$ over $\land$]: $(P \vee (Q \land R)) \equiv ((P \vee Q) \land (P \vee R))$
@ -179,34 +179,34 @@ The symbols of propositional logic are:
        \begin{description}
            \item[Sound] \marginnote{Soundness}
                A reasoning method $E$ is sound iff:
-                \[ (\Gamma \vdash^E F) \rightarrow (\Gamma \models F) \]
+                \[ (\Gamma \vdash^E F) \Rightarrow (\Gamma \models F) \]

            \item[Complete] \marginnote{Completeness}
                A reasoning method $E$ is complete iff:
-                \[ (\Gamma \models F) \rightarrow (\Gamma \vdash^E F) \]
+                \[ (\Gamma \models F) \Rightarrow (\Gamma \vdash^E F) \]
        \end{description}

    \item[Deduction theorem] \marginnote{Deduction theorem}
        Given a set of formulas $\{ F_1, \dots, F_n \}$ and a formula $G$:
-        \[ (F_1 \land \dots \land F_n) \models G \,\iff\, \models (F_1 \land \dots \land F_n) \rightarrow G \]
+        \[ (F_1 \land \dots \land F_n) \models G \,\iff\, \models (F_1 \land \dots \land F_n) \Rightarrow G \]
        
        \begin{proof} \phantom{}
            \begin{description}
-                \item[$\rightarrow$])
+                \item[$\Rightarrow$])
                    By hypothesis $(F_1 \land \dots \land F_n) \models G$.
        
                    So, for each interpretation $\mathcal{I}$ in which $(F_1 \land \dots \land F_n)$ is true, 
                    $G$ is also true.
-                    Therefore, $\mathcal{I} \models (F_1 \land \dots \land F_n) \rightarrow G$.
+                    Therefore, $\mathcal{I} \models (F_1 \land \dots \land F_n) \Rightarrow G$.
        
                    Moreover, for each interpretation $\mathcal{I}'$ in which $(F_1 \land \dots \land F_n)$ is false,
-                    $(F_1 \land \dots \land F_n) \rightarrow G$ is true.
-                    Therefore, $\mathcal{I}' \models (F_1 \land \dots \land F_n) \rightarrow G$.
+                    $(F_1 \land \dots \land F_n) \Rightarrow G$ is true.
+                    Therefore, $\mathcal{I}' \models (F_1 \land \dots \land F_n) \Rightarrow G$.
        
-                    In conclusion, $\models (F_1 \land \dots \land F_n) \rightarrow G$.
+                    In conclusion, $\models (F_1 \land \dots \land F_n) \Rightarrow G$.
        
-                \item[$\leftarrow$]) 
-                    By hypothesis $\models (F_1 \land \dots \land F_n) \rightarrow G$.
+                \item[$\Leftarrow$]) 
+                    By hypothesis $\models (F_1 \land \dots \land F_n) \Rightarrow G$.
                    Therefore, for each interpretation where $(F_1 \land \dots \land F_n)$ is true,
                    $G$ is also true.
        
@ -240,9 +240,9 @@ The symbols of propositional logic are:
 \begin{description}
    \item[Natural deduction] \marginnote{Natural deduction for propositional logic}
        Set of rules to introduce or eliminate connectives.
-        We consider a subset $\{ \land, \rightarrow, \bot \}$ of functionally complete connectives.
+        We consider a subset $\{ \land, \Rightarrow, \bot \}$ of functionally complete connectives.

-        Natural deduction can be represented using a tree like structure:
+        Natural deduction can be represented using a tree-like structure:
        \begin{prooftree}
            \AxiomC{[hypothesis]}
            \noLine
@ -252,12 +252,14 @@ The symbols of propositional logic are:
            \RightLabel{rule name}\UnaryInfC{conclusion}
        \end{prooftree}

-        The conclusion is true when the hypothesis are able to prove the premise.
-        Another tree can be built on top of premises to prove them.
+        The conclusion is true when the hypotheses can prove the premise.
+        Another tree can be built on top of the premises to prove them.

        \begin{descriptionlist}
            \item[Introduction] \marginnote{Introduction rules}
-                Usually used to prove the conclusion by splitting it.\\
+                Usually used to prove the conclusion by splitting it.
+                
+                Note that $\lnot \psi \equiv (\psi \Rightarrow \bot)$. \\
                \begin{minipage}{.4\linewidth}
                    \begin{prooftree}
                        \AxiomC{$\psi$}
@ -272,7 +274,7 @@ The symbols of propositional logic are:
                        \UnaryInfC{\vdots}
                        \noLine
                        \UnaryInfC{$\psi$}
-                        \RightLabel{$\rightarrow$I}\UnaryInfC{$\varphi \rightarrow \psi$}
+                        \RightLabel{$\Rightarrow$I}\UnaryInfC{$\varphi \Rightarrow \psi$}
                    \end{prooftree}
                \end{minipage}
            
@ -293,16 +295,18 @@ The symbols of propositional logic are:
                \begin{minipage}{.3\linewidth}
                    \begin{prooftree}
                        \AxiomC{$\varphi$}
-                        \AxiomC{$\varphi \rightarrow \psi$}
-                        \RightLabel{$\rightarrow$E}\BinaryInfC{$\psi$}
+                        \AxiomC{$\varphi \Rightarrow \psi$}
+                        \RightLabel{$\Rightarrow$E}\BinaryInfC{$\psi$}
                    \end{prooftree}
                \end{minipage}

            \item[Ex falso sequitur quodlibet] \marginnote{Ex falso sequitur quodlibet}
                From contradiction, anything follows.
-                This can be used when we have two contradicting hypothesis.
+                This can be used when we have two contradicting hypotheses.
                \begin{prooftree}
-                    \AxiomC{$\bot$}
+                    \AxiomC{$\psi$}
+                    \AxiomC{$\lnot \psi$}
+                    \BinaryInfC{$\bot$}
                    \RightLabel{$\bot$}\UnaryInfC{$\varphi$}
                \end{prooftree}

--- a/src/statistical-and-mathematical-methods-for-ai/sections/_finite_numbers.tex
+++ b/src/statistical-and-mathematical-methods-for-ai/sections/_finite_numbers.tex
@ -124,7 +124,7 @@ Given a floating-point system $\mathcal{F}(\beta, t, L, U)$, the representation

 \subsection{Machine precision}
 Machine precision $\varepsilon_{\text{mach}}$ determines the accuracy of a floating-point system. \marginnote{Machine precision}
-Depending on the approximation approach, machine precision can be computes as:
+Depending on the approximation approach, machine precision can be computed as:
 \begin{descriptionlist}
    \item[Truncation] $\varepsilon_{\text{mach}} = \beta^{1-t}$
    \item[Rounding] $\varepsilon_{\text{mach}} = \frac{1}{2}\beta^{1-t}$
--- a/src/statistical-and-mathematical-methods-for-ai/sections/_gradient_methods.tex
+++ b/src/statistical-and-mathematical-methods-for-ai/sections/_gradient_methods.tex
@ -34,11 +34,11 @@ Note that $\max \{ f(x) \} = \min \{ -f(x)$ \}.
 \subsection{Optimality conditions}

 \begin{description}
-    \item[First order condition] \marginnote{First order condition}
+    \item[First-order condition] \marginnote{First-order condition}
        Let $f: \mathbb{R}^N \rightarrow \mathbb{R}$ be continuous and differentiable in $\mathbb{R}^N$.
        \[ \text{If } \vec{x}^* \text{ local minimum of } f \Rightarrow \nabla f(\vec{x}^*) = \nullvec \]

-    \item[Second order condition] \marginnote{Second order condition}
+    \item[Second-order condition] \marginnote{Second-order condition}
        Let $f: \mathbb{R}^N \rightarrow \mathbb{R}$ be continuous and twice differentiable.
        \[ 
            \text{If } \nabla f(\vec{x}^*) = \nullvec \text{ and } \nabla^2 f(\vec{x}^*) \text{ positive definite} \Rightarrow 
@ -46,7 +46,7 @@ Note that $\max \{ f(x) \} = \min \{ -f(x)$ \}.
        \]
 \end{description}

-As the second order condition requires to compute the Hessian matrix, which is expensive, in practice only the first order condition is checked.
+As the second-order condition requires computing the Hessian matrix, which is expensive, in practice only the first-order condition is checked.



@ -147,7 +147,7 @@ A generic gradient-like method can then be defined as:

 \begin{description}
    \item[Choice of the initialization point] \marginnote{Initialization point}
-        The starting point of an iterative method is a user defined parameter.
+        The starting point of an iterative method is a user-defined parameter.
        For simple problems, it is usually chosen randomly in $[-1, +1]$.
        
        For complex problems, the choice of the initialization point is critical as 
@ -184,9 +184,9 @@ A generic gradient-like method can then be defined as:
    \item[Difficult topologies]
        \marginnote{Cliff}
        A cliff in the objective function causes problems when evaluating the gradient at the edge.
-        With a small step size, there is a slow down in convergence. 
+        With a small step size, there is a slowdown in convergence. 
        With a large step size, there is an overshoot that may cause the algorithm to diverge.
-        % a slow down when evaluating 
+        % a slowdown when evaluating 
        % the gradient at the edge using a small step size and 
        % an overshoot when the step is too large.

--- a/src/statistical-and-mathematical-methods-for-ai/sections/_linear_systems.tex
+++ b/src/statistical-and-mathematical-methods-for-ai/sections/_linear_systems.tex
@ -145,7 +145,7 @@ This method has time complexity $O(\frac{n^3}{6})$.
 \section{Iterative methods}
 \marginnote{Iterative methods}
 Iterative methods solve a linear system by computing a sequence that converges to the exact solution.
-Compared to direct methods, they are less precise but computationally faster and more adapt for large systems. 
+Compared to direct methods, they are less precise but computationally faster and more suited for large systems. 

 The overall idea is to build a sequence of vectors $\vec{x}_k$ 
 that converges to the exact solution $\vec{x}^*$:
@ -192,7 +192,7 @@ Obviously, as the sequence is truncated, a truncation error is introduced when u

 \section{Condition number}
 Inherent error causes inaccuracies during the resolution of a system.
-This problem is independent from the algorithm and is estimated using exact arithmetic.
+This problem is independent of the algorithm and is estimated using exact arithmetic.

 Given a system $\matr{A}\vec{x} = \vec{b}$, we perturbate $\matr{A}$ and/or $\vec{b}$ and study the inherited error.
 For instance, if we perturbate $\vec{b}$, we obtain the following system:
@ -210,8 +210,8 @@ Finally, we can define the \textbf{condition number} of a matrix $\matr{A}$ as:
 \[ K(\matr{A}) = \Vert \matr{A} \Vert \cdot \Vert \matr{A}^{-1} \Vert \]

 A system is \textbf{ill-conditioned} if $K(\matr{A})$ is large \marginnote{Ill-conditioned}
-(i.e. a small perturbation of the input causes a large change of the output).
-Otherwise it is \textbf{well-conditioned}. \marginnote{Well-conditioned}
+(i.e. a small perturbation of the input causes a large change in the output).
+Otherwise, it is \textbf{well-conditioned}. \marginnote{Well-conditioned}


 \section{Linear least squares problem}
--- a/src/statistical-and-mathematical-methods-for-ai/sections/_machine_learning.tex
+++ b/src/statistical-and-mathematical-methods-for-ai/sections/_machine_learning.tex
@ -118,7 +118,7 @@ The parameters are determined as the most likely to predict the correct label gi
        Moreover, as the dataset is identically distributed, 
        each $p_\vec{\uptheta}(y_n \vert \bm{x}_n)$ of the product has the same distribution.

-        By applying the logarithm, we have that the negative log-likelihood of a i.i.d. dataset is defined as:
+        By applying the logarithm, we have that the negative log-likelihood of an i.i.d. dataset is defined as:
        \[ \mathcal{L}(\vec{\uptheta}) = -\sum_{n=1}^{N} \log p_\vec{\uptheta}(y_n \vert \bm{x}_n) \]
        and to find good parameters $\vec{\uptheta}$, we solve the problem:
        \[ 
@ -170,7 +170,7 @@ The parameters are determined as the most likely to predict the correct label gi
            \begin{subfigure}{.45\textwidth}
                \centering
                \includegraphics[width=.75\linewidth]{img/gaussian_mle_bad.png}
-                \caption{When the parameters are bad, the label will be far the mean}
+                \caption{When the parameters are bad, the label will be far from the mean}
            \end{subfigure}

            \caption{Geometric interpretation of the Gaussian likelihood}
@ -223,7 +223,7 @@ we want to estimate the function $f$.

 \begin{description}
    \item[Model]
-        We use as predictor:
+        We use as the predictor:
        \[ f(\vec{x}) = \vec{x}^T \vec{\uptheta} \]
        Because of the noise, we use a probabilistic model with likelihood:
        \[ p_\vec{\uptheta}(y \,\vert\, \vec{x}) = \mathcal{N}(y \,\vert\, f(\vec{x}), \sigma^2) \]
--- a/src/statistical-and-mathematical-methods-for-ai/sections/_probability.tex
+++ b/src/statistical-and-mathematical-methods-for-ai/sections/_probability.tex
@ -322,7 +322,7 @@ Note: sometimes, instead of the full posterior, the maximum is considered (with
    \item[Expected value (multivariate)] \marginnote{Expected value (multivariate)}
        A multivariate random variable $X$ can be seen as 
        a vector of univariate random variables $\begin{pmatrix} X_1, \dots, X_D \end{pmatrix}^T$.
-        Its expected value can be computed element wise as:
+        Its expected value can be computed element-wise as:
        \[ 
            \mathbb{E}_X[g(\bm{x})] = 
            \begin{pmatrix} \mathbb{E}_{X_1}[g(x_1)] \\ \vdots \\ \mathbb{E}_{X_D}[g(x_D)] \end{pmatrix} \in \mathbb{R}^D
@ -466,7 +466,7 @@ Moreover, we have that:
 \begin{descriptionlist}
    \item[Uniform distribution] \marginnote{Uniform distribution}
        Given a discrete random variable $X$ with $\vert \mathcal{T}_X \vert = N$,
-        $X$ has an uniform distribution if:
+        $X$ has a uniform distribution if:
        \[ p_X(x) = \frac{1}{N}, \forall x \in \mathcal{T}_X \]
    
    \item[Poisson distribution] \marginnote{Poisson distribution}
--- a/src/statistical-and-mathematical-methods-for-ai/sections/_vector_calculus.tex
+++ b/src/statistical-and-mathematical-methods-for-ai/sections/_vector_calculus.tex
@ -71,7 +71,7 @@
        the second matrix contains in the $i$-th row the gradient of $g_i$.

        Therefore, if $g_i$ are in turn multivariate functions $g_1(s, t), g_2(s, t): \mathbb{R}^2 \rightarrow \mathbb{R}$,
-        the chain rule can be applies as follows:
+        the chain rule can be applied as follows:
        \[
            \frac{\text{d}f}{\text{d}(s, t)} = 
            \begin{pmatrix}
@ -257,7 +257,7 @@ The computation graph can be expressed as:
 \]
 where $g_i$ are elementary functions and $x_{\text{Pa}(x_i)}$ are the parent nodes of $x_i$ in the graph.
 In other words, each intermediate variable is expressed as an elementary function of its preceding nodes.
-The derivatives of $f$ can then be computed step-by-step going backwards as:
+The derivatives of $f$ can then be computed step-by-step going backward as:
 \[ \frac{\partial f}{\partial x_D} = 1 \text{, as by definition } f = x_D \]
 \[ 
    \frac{\partial f}{\partial x_i} = \sum_{\forall x_c: x_i \in \text{Pa}(x_c)} \frac{\partial f}{\partial x_c} \frac{\partial x_c}{\partial x_i}
@ -266,7 +266,7 @@ The derivatives of $f$ can then be computed step-by-step going backwards as:
 where $\text{Pa}(x_c)$ is the set of parent nodes of $x_c$ in the graph.
 In other words, to compute the partial derivative of $f$ w.r.t. $x_i$, 
 we apply the chain rule by computing 
-the partial derivative of $f$ w.r.t. the variables following $x_i$ in the graph (as the computation goes backwards).
+the partial derivative of $f$ w.r.t. the variables following $x_i$ in the graph (as the computation goes backward).

 Automatic differentiation is applicable to all functions that can be expressed as a computational graph and 
 when the elementary functions are differentiable.