summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--report.pdfbin0 -> 142274 bytes
-rw-r--r--report.tex26
2 files changed, 13 insertions, 13 deletions
diff --git a/report.pdf b/report.pdf
new file mode 100644
index 0000000..0e62ce1
--- /dev/null
+++ b/report.pdf
Binary files differ
diff --git a/report.tex b/report.tex
index 84771f0..6d7932d 100644
--- a/report.tex
+++ b/report.tex
@@ -37,7 +37,7 @@ stringstyle=\color{mauve},
\section{Introduction}
In this report we will describe our solution to the second assignment, which
consists of designing a spell checker algorithm and its implementation in Java.
-The algorithm is provided with a sentence which contains zero to two misspelled
+The algorithm is provided with a sentence containing zero to two misspelled
words with an Damerau-Levenshtein edit distance of 1. In addition, misspelled
words are not consecutive.
@@ -45,14 +45,14 @@ The expected output is a sentence where these errors have been corrected.
\section{Approach}
To remove spelling errors from a text, one has to generate possible corrections
-and evaluate them. Spelling suggestions are done on a word-by-word basis.
+and evaluate these. Spelling suggestions are done on a word-by-word basis.
-To obtain a list of possible words, \textit{candidate words} will generated with
+To obtain a list of possible words, \textit{candidate words} will be generated with
a Damerau-Levenshtein distance of 1 to the original word. While doing this, a
-noisy channel probability can also be obtained which is needed later for word
+noisy channel probability can also be obtained which is needed later on for word
evaluation.
-For each of word from the sentence, the word will be evaluated against the
+For each word of the sentence, the word will be evaluated against the
context of the word. This includes the previous and next words for n-grams.
Since we only have the guarantee that the words next to the misspelled one are
correct, tri-grams cannot reach out for the following two words before or after
@@ -66,9 +66,9 @@ misspelled word.
The program is given a single sentence which is split in words. The indices of
words that are surely correct are remembered (\texttt{readonly\_words} in the
listing below). Once a rating is found which is higher than the initial one,
-then the word is replaced.
+the word is replaced.
-When a word was modified, the hunt for more faulty words continue until all of
+When a word was modified, the hunt for more faulty words continues until all of
them (less than three) are found. Otherwise, it is assumed that the sentence
has no more spelling errors.
@@ -121,19 +121,19 @@ for attempt in range(0, 3):
\subsection{Generation of candidate words}
A naive approach to generate candidate words in the $getCandidateWords()$
-function is to generate all possible combinations of the alphabet, and then take
-the intersection with the dictionary. This does not scale and uses $O(27^n)$
+function is to generate all possible combinations of the alphabet, and then taking
+the intersection with the dictionary. This does not scale and costs $O(27^n)$
memory and time.
-Instead we chose for a smarter approach. As we know that the misspelled word
+Instead we opted for a smarter approach. As we know that the misspelled words
have a Damerau-Levenshtein distance of 1, candidates are generated by cycling
through each letter and applying an insertion, deletion, substitution or
transposition.
While doing these operations, the Noisy Channel Probability can also be obtained
-since we have a confusion matrix and know kind of operation was executed.
+since we have a confusion matrix and know what kind of alteration was executed.
-Pseudo-code (bounds checks are omitted, assume that such operations do not
+Pseudo-code (bounds checks are omitted, assuming that such operations do not
generate words. The actual Java implementation has decomposed methods for each
of the operations):
@@ -184,7 +184,7 @@ parameter should not get too high.
\section{Additional Resources}
\section{Statement of the Contributions}
-Peter wrote most of the Spell Checker implementationg and report.
+Peter wrote most of the Spell Checker implementation and report.
\section{Manual}