<
3.
tCan humans understand what is meant as opposed to what is said/written ?
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteers are at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
How do we do this?>L\L\lL
(Detecting and Correcting Spelling ErrorsApplications:
Spell checking in M$ word
OCR scanning errors
Handwriting recognition of zip codes, signatures, Graffiti
Issues:
Correct nonwords (dg for dog, but frplc?)
Correct wrong words in context (their for there, words of rule formation)
jwj%PPatterns of ErrorLHuman typists make different types of errors from OCR systems  why?
Error classification I: performancebased:
Insertion: catt
Deletion: ct
Substitution: car
Transposition: cta
Error classification II: cognitive
People don t know how to spell (nucular/nuclear)
Homonymous errors (their/there)$qB#Qq
#
NA7/D*f0AE/How do we decide if a (legal) word is an error?How likely is a word to occur?
They met there friends in Mozambique.
The Noisy Channel Model
Input to channel: true (typed or spoken) word w
Output from channel: an observation O
Decoding task: find w = P(wO)!&!$j
Bayesian Inference$Population: 10 Columbia students
J!#4/Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a vegetarian? p(v) = .4
That a rcs who is a vegetarian is also a CS major? p(cv) = .5
That a rcs is a vegetarian (and) CS major? p(c,v) = .2!ZZZZ!
>U$<.50Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is vegetarian? p(v) = .4
That a rc who is a vegetarian is also a CS major p(cv) = .5
That a rcs is a vegetarian and a CS major? p(c,v) = .2 = p(v) p(cv) (.4 * .5)!!
PR";561Bayesian InferencePopulation: Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a CS major? p(c) = .3
That rc who is a CS major is also a vegetarian? p(vc) = .66
That rcs is a vegetarian CS major? p(c,v) = .2 = p(c) p(vc) = (.3 * .66)
PO ;/
Bayes RuleSo, we know the joint probabilities
p(c,v) = p(c) p(vc)
p(v,c) = p(v) p(cv)
p(c,v) = p(v,c)
So, using these equations, we can define the conditional probability p(cv) in terms of the prior probabilities p(c) and p(v) and the likelihood p(vc)
p(v) p(cv) = p(c) p(vc)
$::
9>2Returning to Spelling...
Channel Input: w; Output: O
Decoding: hypothesis w = P(wO)
or, by Bayes Rule...
w =
and, since P(O) doesn t change for any entries in our lexicon we are going to consider, we can ignore it as constant, so&
w = P(Ow) P(w) (Given that w was intended, how likely are we to see O)
I'05VA37How could we use this model to correct spelling errors?kSimplifying assumptions
We only have to correct nonword errors
Each nonword (O) differs from its correct word (w) by one step (insertion, deletion, substitution, transposition)
From O, generate a list of candidates differing by one step and appearing in the lexicon, e.g.
Error Corr Corr letter Error letter Pos Type
caat cat  a 2 ins
caat carat r  3 delV`Y,P1How do we decide which correction is most likely?jWe want to find the lexicon entry w that maximizes P(typow) P(w)
How do we estimate the likelihood P(typow) and the prior P(w)?
First, find some corpora
Different corpora needed for different purposes
Some need to be labeled  others do not
For spelling correction, what do we need?
Word occurrence information (unlabeled)
A corpus of labeled spelling errorsL4
@;KCat vs Carat2
Suppose we look at the occurrence of cat and carat in a large (50M word) AP news corpus
cat occurs 6500 times, so p(cat) = .00013
carat occurs 3000 times, so p(carat) = .00006
Now we need to find out if inserting an a after an a is more likely than deleting an r after an a in a corrections corpus of 50K corrections ( p(typoword))
suppose a insertion after a occurs 5000 times (p(+a)=.1) and r deletion occurs 7500 times (p(r)=.15)XZXZZlZ%&lHoB4qThen p(wordtypo) = p(typoword) * p(word)
p(catcaat) = p(+a) * p(cat) = .1 * .00013 = .000013
p(caratcaat) = p(r) * p(carat) = .15 * .000006 = .000009
Issues:
What if there are no instances of carat in corpus?
Smoothing algorithms
Estimate of P(typoword) may not be accurate
Training probabilities on typo/word pairs
What if there is more than one error per word?+ZpZZ3ZZ.Z*Z/Z+p3 .*/,+
+C5/A More General Approach: Minimum Edit Distance00How can we measure how different one word is from another word?
How many operations will it take to transform one word into another?
caat > cat, fplc > fireplace (*treat abbreviations as typos??)
Levenshtein distance: smallest number of insertion, deletion, or substitution operations that transform one string into another (ins=del=subst=1)
Alternative: weight each operation by training on a corpus of spelling errors to see which most frequentT@EC@P
1~mG9Alternative: count substitutions as 2 (1 insertion and 1 deletion)
Alternative: DamerauLevenshtein Distance includes transpositions as a single operation (e.g. cta cat)
Code and demo for simple Levenshtein MED?d**VP>00F84MED Calculation is an Example of Dynamic Programming&5!SDecompose a problem into its subproblems
e.g. fp > firep a subproblem of fplc > fireplace
Intuition: An optimal solution for the subproblem will be part of an optimal solution for the problem
Solve any subproblem only once: store all solutions
Recursive algorithm
Often: Work backwards from the desired goal state to the initial state)G)
G
7
?
{E7
For MED, create an editdistance matrix:
each cell c[x,y] represents the distance between the first x chars of the target t and the first y chars of the source s (e.g the xlength prefix of t compared to the ylength prefix of s)
this distance is the minimum cost of inserting, deleting, or substituting operations on the previously considered substrings of the source and target2)T(TH:&Edit Distance Matrix, Subst=2  Wrong''
P<"NB: Subst x for x Cost is 0, not 2SummaryMWe can apply probabilistic modeling to NL problems like spellchecking
Noisy channel model, Bayesian method
Training priors and likelihoods on a corpus
Dynamic programming approaches allow us to solve large problems that can be decomposed into subproblems
e.g. MED algorithm
Apply similar methods to modeling pronunciation variationlGZQZhZZ:ZGQh:OJ;Allophonic variation + register/style (lexical) variation
butter/tub, going to/gonna
Pronunciation phenomena can be seen as insertions/deletions/substitutions too, with somewhat different ways of computing the likelihoods
Measuring ASR accuracy over words (WER)
Next time: Ch 6:9:9/LMNOQRSTUV W
XYZ
[\]^_`abcdP ` ̙33` ` ff3333f` 333MMM` f` f` 3>?" dd@ z?" dd@ " @ ` n?" dd@ @@``PV @ ` `(p>>f(
6 `
T Click to edit Master title style!
!
00
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
0ܘ ``
>*
0 `
@*
0 `
@*H
0h ? f $Blank Presentation ph@( }@h@
6
<Lecture
00 P
W#Click to edit Master subtitle style$
$
0T5 ``
>*
0d9 `
9CS 4705
0< `
@*H
0h ? f0*(
0x P
X*
0
Z*d
c$ ?
0
@
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
6D `P
X*
6 `
Z*H
0h ? ̙3380___PPT10.Va0(
l
CJPp
l
C`K0`
H
0h ? ̙33y___PPT10Y+D=' =
@B +a
P(
l
C(
l
C`@
H
0h ? fy___PPT10Y+D=' =
@B +
`(
l
CՕPp
H
0h ? fy___PPT10Y+D=' =
@B +m
p$( `x@@F@
$l
$ C`
l
$ C
H
$0h ? f___PPT10e.+D=' =
@B +a
,( hx@
,l
, C`
l
, C
H
,0h ? fy___PPT10Y+D=' =
@B +
2* 0(
0l
0 CR `
l
0 CH
:8 pP
0
0pIYP
0
0
p
p0
6Source
0
00P
0
=
Noisy Channel
0
0
7Decoder`
0
c$A]??Z
]H
00h ? fy___PPT10Y+D=' =
@B +'
4 (
4
4 C"`<$D0
4 C#*<$0
l `
,
4cp,$D0f
4
6f
H f
4
6=
H f
4
6H f
4
6nH f
4
6t$H `
4
0R
,`
4
0) 
,`
4
0 ,`
4
0 Z,`
4
0` ,"
4
0'
p,$D0"
4
0,$D0j
4
08:)C,$D0
What is the probability that a randomly chosen student (rcs) is a vegetarian? p(v) = .4
That a rcs is a CS major? p(c) = .3
That a rcs is a vegetarian CS major? p(c,v) = .2 B
>8%!(
4
6Ax p
,$D0
P4 vegetarians &
4
6Fx
,$D0
Q3 CS majors* H
40h ? f___PPT10.+LD' =
@B D' =
@BA?%,( <+O%,(
<+DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4!%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*4%(D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4%(D' =+48?dCB0#ppt_w/2BCB#ppt_xB*Y3>B
ppt_x<*4D' =+48?\CB#ppt_yBCB#ppt_yB*Y3>B
ppt_y<*4D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*4%(D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4%(D' =+48?dCB0#ppt_w/2BCB#ppt_xB*Y3>B
ppt_x<*4D' =+48?\CB#ppt_yBCB#ppt_yB*Y3>B
ppt_y<*4D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*4%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4Y%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4Y}%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*4}%(++0+4 ++0+4 ++0+4 ++0+4 +
z(
r
S_`
S`{<$0
z `
,
uW
,$D0`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,"
08
,$D0"
0p
,$D0H
0h ? f___PPT10.+wFDs' =
@B D.' =
@BA?%,( <+O%,(
<+DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!;%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*Bu%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*u%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+ +m
(
r
St~`
Sp{<$0
^F `
,
uW
`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,X"
08
"
0p
,$D0H
0h ? f___PPT10.+wFDs' =
@B D.' =
@BA?%,( <+O%,(
<+DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!;%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*?p%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*p%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+ +9
v(
r
S`
S{<$0
^F `
,
uW
`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,X"
08
X"
0p
H
0h ? f___PPT10.+wFDs' =
@B D.' =
@BA?%,( <+O%,(
<+DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*8%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*<m%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*m%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+ +
8(
8l
8 C``
l
8 C4
`
8
c$A???
H
80h ? fy___PPT10Y+D=' =
@B +O
vn(
r
S `
r
S
:8 pP
0
I)
0lp
p0
6Source
00P
0
=
Noisy Channel
0
7Decoder`
c$A*??m*`
c$A??,c`
c$A?? `
c$A??UF+H
0h ? fy___PPT10Y+D=' =
@B +a
(
l
C``
l
C4
H
0h ? fy___PPT10Y+D=' =
@B +a
D( @C@
Dl
D C`
l
D C
H
D0h ? fy___PPT10Y+D=' =
@B +a
H( @C@
Hl
H C`
l
H CX
H
H0h ? fy___PPT10Y+D=' =
@B +a
0( ̶800
l
CL`
l
C
H
0h ? fy___PPT10Y+D=' =
@B +a
@(
l
C`
l
C&
H
0h ? fy___PPT10Y+D=' =
@B +a
(
l
C4`
l
C5
H
0h ? fy___PPT10Y+D=' =
@B +a
P(
l
C@`
l
CA
H
0h ? fy___PPT10Y+D=' =
@B +
`( ,
l
CLO
H
0h ? fy___PPT10Y+D=' =
@B +E
ldp( \(
l
CZ`
r
S\A
0 m
v
NB: errors 2
30
H
0h ? fy___PPT10Y+D=' =
@B +f
eee(
r
Sn`
d u
#">2 u
~
6Ds?Vu
Gn @`
}
6?VVu
Go @`

6?VVu
Gi @`
{
64?
Vu
Gt @`
z
6̍?V
u
Gu @`
y
6?a
Vu
Gc @`
x
6?Va
u
Ge @`
w
68?Vu
Gx @`
v
6?*Vu
Ge @`
u
6Ĳ?mV*u
G# @`
t
6dͲ?Vmu
> @`
s
6ǲ?7V
G9 @`
r
6ݲ?V7V
G8 @`
q
6?7VV
G7 @`
p
6@?
7V
G6 @`
o
6?7
V
G5 @`
n
6?a
7V
G4 @`
m
6?7a
V
G3 @`
l
6@?7V
G2 @`
k
6?*7V
G1 @`
j
6d ?m7*V
G0 @`
i
6(?7mV
G# @`
h
6d"?
7
G8 @`
g
69?V
7
G7 @`
f
6A?
V7
G6 @`
e
6 J?
7
G7 @`
d
6K?
7
G6 @`
c
6Z?a
7
G5 @`
b
6Pc?
a
7
G4 @`
a
6]?
7
G3 @`
`
6t?*
7
G2 @`
_
6{?m
*7
G1 @`
^
6T?
m7
Gi @`
]
6,?
G7 @`
\
6?V
G8 @`
[
6?V
G7 @`
Z
6?
G8 @`
Y
6@?
G7 @`
X
6D?a
G6 @`
W
6ܰ?a
G5 @`
V
6(Ǵ?
G4 @`
U
6δ?*
G3 @`
T
6p״?m*
G2 @`
S
6Hٴ?m
Gn @`
R
<$?
G8 @`
Q
<?V
G9 @`
P
<8?
V
G8 @`
O
6?
G7 @`
N
<`?
G8 @`
M
<?a
G7 @`
L
<H?
a
G6 @`
K
<?
G5 @`
J
<,,?*
G4 @`
I
<3?m
*
G3 @`
H
<t<?
m
Gt @`
G
<L>?
G9 @`
F
<(M?V
H10 @`
E
<U? V
G9 @`
D
<
<?m *
G4 @`
=
<? m
Ge @`
<
6?
H10 @`
;
<?V
H11 @`
:
<ذ?V
H10 @`
9
<T?
G9 @`
8
<쳵?
G8 @`
7
<ʵ?a
G7 @`
6
<ѵ?a
G6 @`
5
<Xڵ?
G5 @`
4
<0ܵ?*
G4 @`
3
<?m*
G5 @`
2
<?m
Gn @`
1
< ?}
H11 @`
0
<?V}
H10 @`
/
<`?}V
G9 @`
.
6(?
}
G8 @`

<?}
G9 @`
,
<T?a
}
G8 @`
+
<.?}a
G7 @`
*
<6?}
G6 @`
)
<8?*}
G5 @`
(
<G?m}*
G6 @`
'
<$P?}m
Gt @`
&
<J?^}
H10 @`
%
<`?V^}
G9 @`
$
6h?^V}
G8 @`
#
<(q?
^}
G9 @`
"
<s?^
}
H10 @`
!
<܁?a
^}
G9 @`
<X?^a
}
G8 @`
<?^}
G7 @`
<?*^}
G6 @`
<䢶?m^*}
G7 @`
<\?^m}
Gi @`
<4??^
G9 @`
6?V?^
G8 @`
<Ķ??V^
G9 @`
<$?
?^
H10 @`
<Hն??
^
H11 @`
<ݶ?a
?^
H10 @`
<??a
^
G9 @`
<h??^
G8 @`
<D?*?^
G7 @`
<?m?*^
G8 @`
<??m^
Go @`
6(? ?
G8 @`
<?V ?
G9 @`
<t ? V?
H10 @`
<(?
?
H11 @`
<\1?
?
H12 @`
<*?a
?
H11 @`
<@B? a
?
H10 @`
<J? ?
G9 @`
<R?* ?
G8 @`
<`T?m *?
G9 @`
<
@
H
T0h ? ̙3380___PPT10.ٔ<0 X(
XX
X C
X S
@
H
X0h ? ̙3380___PPT10.ٔ;00\(
\X
\ C
\ S
@
H
\0h ? ̙3380___PPT10.ٔxVKTQި49xI(EpU=m̳qh!AAna2eKghTA
D!x.g~ιss(0D4K\A?}E!
Mog84)dK)P1E=\ҵ37Oi~b(Ԕ[f)YԟK{!;pbE
'̵t.85i
3QDͯ&!C1PK\?1v1p1RWGg 1J?;iKRpQ'ë0;q.#PD.Wr?lB啴Cɒ])빘H6rVDz$BbR'kҺ:a>
$UFιdk0dg3cSY,N禝q^gVhN1;n
[д˼&rVw!qh0nbI8Yo,9,'{O_+oP"u xu8]\+*1fRѻjw+!#ԮyrK*sSƵ>{!,VO%dz RttA_},ӳ}spgCKU:JUOp쳯R&U{}4igD~/CP6Xν?B?Sٛ)PN
+^9%Ɠtҫ5I'`.
]WV%SBFl!ʙV9BI!9#־ܶ.Q'`T^c'dfa(2`Hh?ORb:SI}/'1祘9Z[msNz*~U8?
]cN͞"Wt՟/mD{SETSp`r0:c
)GR9k]>ZGPe3Ob6hKUpuJGnV s}Zni,D
?K`L@rNme_dxU;OAvy(D$F
cI0V(&(Q;#y X/X٨?,5svo/ؠ9vfg۹;z^)\=]M>HĭД1vQ
跆`+Yx5/~kLoY}.>.K();8C~pf:N_@]~WcOӗϡ#w;g}U㞺Cw?o+g`).{?GTV
QmfrI[ͽ
M%MQWjn\9%~F2']ZQ8m 1eΔw\,yUӨMҡ( RIUVeXPw*c^oJxo%Z TD:`vn+kn';g YhN݁L^6xMOSApv_[kAM
Ĉ^bh9)B[FRbBbbbăd
zē֙mhH:P@)`pBQ\MD:C~=u$B)Ȗ_>p5Or!.眶/;U)0 :AΜT:`L:JCz TW4WGq㐆N*~Wi&:bS/>D?b=bb@ 큤kkk̽oƿ~R*Ng"3LD;:E*e"LG7=t2R:XD~I]eMBK4$BBr3eq~? @fRѻjw+!#ԮyrK*sSƵ>{!,VO%dz RttA_},ӳ}spgCKU:JUOp쳯R&U{}4igD~/CP6Xν?B?Sٛ)PN
+^9%Ɠtҫ5I'`.
]WV%SBFl!ʙV9BI!9#־ܶ.Q'`T^c'dfa(2`Hh?ORb:SI}/'1祘9Z[msNz*~U8?
]cN͞"Wt՟/mD{SETSp`r0:c
)GR9k]>ZGPe3Ob6hKUpuJGnV s}Zni,D
?K`L@rNme_dxMOSApvRlX DF xP1l(ħRm`$^LJyă'/~yT
3綨`H:tgvfvvvgfߛw+kNR9~ci/t ;
6}.}5a{`\!H8VGϫyϕ;yO_0ArS,$iB83*`)>r?@^?i㿀',_ХO{="]VZw!1؆1m@g&<`k`x"r߈3=fݙ=49n뗩tvܛUa_JeT]}h٣.Øȟ90EQRT%[ba3}KLk]O(Э+/B#hүtB0](deU7,zǣLO:b68D&T2k&cn*XU0Yǃw}n"L^4 ~W"o۪W[MyQ\+ڈqy>rgy"wvTF2oMyQEEq~d]qlrg7z^K]z͈d2{\2P{FbS` /`F5U}r{
a'axpX҂Z ڗ2oB48q5~^[Mj&l?rLr0atiw p{@}Zm 3@yL9x<D^ ^YE@+J4snulx@z }$8L`tĔؖ(<Pd
;
.e(`(
K
]9Equ
IOnscreen ShowAT&T LabsResearchXJ$
Times New RomanSymbol
WingdingsBlank PresentationMicrosoft Equation 3.0CS 4705)Spoken and Written Word (Lexical) ErrorsSlide 3)Detecting and Correcting Spelling ErrorsPatterns of Error0How do we decide if a (legal) word is an error?Bayesian InferenceBayesian InferenceBayesian InferenceBayesian InferenceBayes RuleReturning to Spelling...8How could we use this model to correct spelling errors?2How do we decide which correction is most likely?
Cat vs Carat Slide 160A More General Approach: Minimum Edit Distance Slide 185MED Calculation is an Example of Dynamic Programming Slide 20'Edit Distance Matrix, Subst=2  Wrong#NB: Subst x for x Cost is 0, not 2Summary Slide 24Fonts UsedDesign TemplateEmbedded OLE Servers
Slide Titles 8@_PID_HLINKSA7http://www.cs.colorado.edu/~martin/SLP/slperrata.html0http://home.earthlink.net/~dcrehr/whyqwert.html"http://www.merriampark.com/ld.htm"http(_4Jjulia hirschbergjulia hirschberg(_julia hirschbergjulia hirschberg*<,Microsoft Equation 3.0 Equation.30,Microsoft Equation 3.0=Equation Equation.30,Microsoft Equation 3.0?Equation Equation.30,Microsoft Equation 3.0@Equation Equation.30,Microsoft Equation 3.03lhttp://www.cs.colorado.edu/~martin/SLP/slperrata.htmllhttp://www.cs.colorado.edu/~martin/SLP/slperrata.htmlfwhy?^http://home.earthlink.net/~dcrehr/whyqwert.htmleEquation Equation.30,Microsoft Equation 3.0,How do we compute MED?Bhttp://www.merriampark.com/ld.htmPCode and demo for simple Levenshtein MEDBhttp://www.merriampark.com/ld.htm/0DTimes New Roman0
0DSymbolew Roman0
0 DWingdingsRoman0
0
B .
@n?" dd@ @@``x`x3
!"#$%&'()*+,./012342$xJD8Q}
iI2$8XfuE,+/e2$r;9,egqD2$<
3.
tCan humans understand what is meant as opposed to what is said/written ?
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteers are at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
How do we do this?>L\L\lL
(Detecting and Correcting Spelling ErrorsApplications:
Spell checking in M$ word
OCR scanning errors
Handwriting recognition of zip codes, signatures, Graffiti
Issues:
Correct nonwords (dg for dog, but frplc?)
Correct wrong words in context (their for there, words of rule formation)
jwj%
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
K
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFIJLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~Root EntrydO)9H@
PicturesUCurrent User3PSummaryInformation(TPowerPoint Document(XJDocumentSummaryInformation8ation Equation.30,Microsoft Equation 3.0*<,Microsoft Equation 3.0 Equation.30,Microsoft Equation 3.0=Equation Equation.30,Microsoft Equation 3.0?Equation Equation.30,Microsoft Equation 3.0@Equation Equation.30,Microsoft Equation 3.03lhttp://www.cs.colorado.edu/~martin/SLP/slperrata.htmllhttp://www.cs.colorado.edu/~martin/SLP/slperrata.htmlfwhy?^http://home.earthlink.net/~dcrehr/whyqwert.htmleEquation Equation.30,Microsoft Equation 3.0,How do we compute MED?Bhttp://www.merriampark.com/ld.htmPCode and demo for simple Levenshtein MEDBhttp://www.merriampark.com/ld.htm/0DTimes New Roman0
0DSymbolew Roman0
0 DWingdingsRoman0
0
B .
@n?" dd@ @@``x`x3
!"#$%&'()*+,./012342$xJD8Q}
iI2$8XfuE,+/e2$r;9,egqD2$<
3.
tCan humans understand what is meant as opposed to what is said/written ?
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteers are at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
How do we do this?>L\L\lL
(Detecting and Correcting Spelling ErrorsApplications:
Spell checking in M$ word
OCR scanning errors
Handwriting recognition of zip codes, signatures, Graffiti
Issues:
Correct nonwords (dg for dog, but frplc?)
Correct wrong words in context (their for there, words of rule formation)
jwj%PPatterns of ErrorLHuman typists make different types of errors from OCR systems  why?
Error classification I: performancebased:
Insertion: catt
Deletion: ct
Substitution: car
Transposition: cta
Error classification II: cognitive
People don t know how to spell (nucular/nuclear)
Homonymous errors (their/there)$qB#Qq
#
NA7/D*f0AE/How do we decide if a (legal) word is an error?How likely is a word to occur?
They met there friends in Mozambique.
The Noisy Channel Model
Input to channel: true (typed or spoken) word w
Output from channel: an observation O
Decoding task: find w = P(wO)!&!$j
Bayesian Inference$Population: 10 Columbia students
J!#4/Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a vegetarian? p(v) = .4
That a rcs who is a vegetarian is also a CS major? p(cv) = .5
That a rcs is a vegetarian (and) CS major? p(c,v) = .2!ZZZZ!
>U$<.50Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is vegetarian? p(v) = .4
That a rc who is a vegetarian is also a CS major p(cv) = .5
That a rcs is a vegetarian and a CS major? p(c,v) = .2 = p(v) p(cv) (.4 * .5)!!
PR";561Bayesian InferencePopulation: Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a CS major? p(c) = .3
That rc who is a CS major is also a vegetarian? p(vc) = .66
That rcs is a vegetarian CS major? p(c,v) = .2 = p(c) p(vc) = (.3 * .66)
PO ;/
Bayes RuleSo, we know the joint probabilities
p(c,v) = p(c) p(vc)
p(v,c) = p(v) p(cv)
p(c,v) = p(v,c)
So, using these equations, we can define the conditional probability p(cv) in terms of the prior probabilities p(c) and p(v) and the likelihood p(vc)
p(v) p(cv) = p(c) p(vc)
$::
9>2Returning to Spelling...
Channel Input: w; Output: O
Decoding: hypothesis w = P(wO)
or, by Bayes Rule...
w =
and, since P(O) doesn t change for any entries in our lexicon we are going to consider, we can ignore it as constant, so&
w = P(Ow) P(w) (Given that w was intended, how likely are we to see O)
I'05VA37How could we use this model to correct spelling errors?kSimplifying assumptions
We only have to correct nonword errors
Each nonword (O) differs from its correct word (w) by one step (insertion, deletion, substitution, transposition)
From O, generate a list of candidates differing by one step and appearing in the lexicon, e.g.
Error Corr Corr letter Error letter Pos Type
caat cat  a 2 ins
caat carat r  3 delV`Y,P1How do we decide which correction is most likely?jWe want to find the lexicon entry w that maximizes P(typow) P(w)
How do we estimate the likelihood P(typow) and the prior P(w)?
First, find some corpora
Different corpora needed for different purposes
Some need to be labeled  others do not
For spelling correction, what do we need?
Word occurrence information (unlabeled)
A corpus of labeled spelling errorsL4
@;KCat vs Carat2
Suppose we look at the occurrence of cat and carat in a large (50M word) AP news corpus
cat occurs 6500 times, so p(cat) = .00013
carat occurs 3000 times, so p(carat) = .00006
Now we need to find out if inserting an a after an a is more likely than deleting an r after an a in a corrections corpus of 50K corrections ( p(typoword))
suppose a insertion after a occurs 5000 times (p(+a)=.1) and r deletion occurs 7500 times (p(r)=.15)XZXZZlZ%&lHoB4qThen p(wordtypo) = p(typoword) * p(word)
p(catcaat) = p(+a) * p(cat) = .1 * .00013 = .000013
p(caratcaat) = p(r) * p(carat) = .15 * .000006 = .000009
Issues:
What if there are no instances of carat in corpus?
Smoothing algorithms
Estimate of P(typoword) may not be accurate
Training probabilities on typo/word pairs
What if there is more than one error per word?+ZpZZ3ZZ.Z*Z/Z+p3 .*/,+
+C5/A More General Approach: Minimum Edit Distance00How can we measure how different one word is from another word?
How many operations will it take to transform one word into another?
caat > cat, fplc > fireplace (*treat abbreviations as typos??)
Levenshtein distance: smallest number of insertion, deletion, or substitution operations that transform one string into another (ins=del=subst=1)
Alternative: weight each operation by training on a corpus of spelling errors to see which most frequentT@EC@P
1~mG9Alternative: count substitutions as 2 (1 insertion and 1 deletion)
Alternative: DamerauLevenshtein Distance includes transpositions as a single operation (e.g. cta cat)
Code and demo for simple Levenshtein MED?d**VP>00F84MED Calculation is an Example of Dynamic Programming&5!SDecompose a problem into its subproblems
e.g. fp > firep a subproblem of fplc > fireplace
Intuition: An optimal solution for the subproblem will be part of an optimal solution for the problem
Solve any subproblem only once: store all solutions
Recursive algorithm
Often: Work backwards from the desired goal state to the initial state)G)
G
7
?
{E7
For MED, create an editdistance matrix:
each cell c[x,y] represents the distance between the first x chars of the target t and the first y chars of the source s (e.g the xlength prefix of t compared to the ylength prefix of s)
this distance is the minimum cost of inserting, deleting, or substituting operations on the previously considered substrings of the source and target2)T(TH:&Edit Distance Matrix, Subst=2  Wrong''
P<"NB: Subst x for x Cost is 0, not 2SummaryMWe can apply probabilistic modeling to NL problems like spellchecking
Noisy channel model, Bayesian method
Training priors and likelihoods on a corpus
Dynamic programming approaches allow us to solve large problems that can be decomposed into subproblems
e.g. MED algorithm
Apply similar methods to modeling pronunciation variationlGZQZhZZ:ZGQh:OJ;Allophonic variation + register/style (lexical) variation
butter/tub, going to/gonna
Pronunciation phenomena can be seen as insertions/deletions/substitutions too, with somewhat different ways of computing the likelihoods
Measuring ASR accuracy over words (WER)
Next time: Ch 6:9:9/LMNOQRSTUV W
XYZ
[\]^_`abcdP"
(
r
S_`
S`{<$0
z `
,
uW
,$D0`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,X"
08
X"
0p
H
0h ? f
___PPT10
.+PD(
' =
@B D' =
@BA?%,( <+O%,(
<+Df' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!;%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*Bu%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*u%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+# +xMOSApvRlX DF xP1l(ħRm`$^LJyă'/~yT
3綨`H:tgvfvvvgfߛw+kNR9~ci/t ;
6}.}5a{`\!H8VGϫyϕ;yO_0ArS,$iB83*`)>r?@^?i㿀',_ХO{="]VZw!1؆1m@g&<`k`x"r߈3=fݙ=49n뗩tvܛUa_JeT]}h٣.Øȟ90EQRT%[ba3}KLk]O(Э+/B#hүtB0](deU7,zǣLO:b68D&T2k&cn*XU0Yǃw}n"L^4 ~W"o۪W[MyQ\+ڈqy>rgy"wvTF2oMyQEEq~d]qlrg7z^K]z͈d2{\2P{FbS` /`F5U}r{
a'axpX҂Z ڗ2oB48q5~^[Mj&l?rLrn4ge//
J2e`(
K
]9Equation Equation.30,Microsoft Equation 3.0
!"#$%&'()*+,./0124Oh+'0$hp
0<
HT\Lecturejulia hirschbergeCC:\Program Files\Microsoft Office\Templates\Blank Presentation.potjulia hirschbergMic172Microsoft PowerPointoso@@3^#>@p2CGg 1@ !'@Times New Roman. 2
CS 4705
."System@Times New Roman. 2
CS 4705%.@Times New Roman. 12
HProbabilistic Approaches to
!
.@Times New Roman. .2
Pronunciation and Spelling
.n՜.+,D՜.+,PPatterns of ErrorLHuman typists make different types of errors from OCR systems  why?
Error classification I: performancebased:
Insertion: catt
Deletion: ct
Substitution: car
Transposition: cta
Error classification II: cognitive
People don t know how to spell (nucular/nuclear)
Homonymous errors (their/there)$qB#Qq
#
NA7/D*f0AE/How do we decide if a (legal) word is an error?How likely is a word to occur?
They met there friends in Mozambique.
The Noisy Channel Model
Input to channel: true (typed or spoken) word w
Output from channel: an observation O
Decoding task: find w = P(wO)!&!$j
Bayesian Inference$Population: 10 Columbia students
J!#4/Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a vegetarian? p(v) = .4
That a rcs who is a vegetarian is also a CS major? p(cv) = .5
That a rcs is a vegetarian (and) CS major? p(c,v) = .2!ZZZZ!
>U$<.50Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is vegetarian? p(v) = .4
That a rc who is a vegetarian is also a CS major p(cv) = .5
That a rcs is a vegetarian and a CS major? p(c,v) = .2 = p(v) p(cv) (.4 * .5)!!
PR";561Bayesian InferencePopulation: Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a CS major? p(c) = .3
That rc who is a CS major is also a vegetarian? p(vc) = .66
That rcs is a vegetarian CS major? p(c,v) = .2 = p(c) p(vc) = (.3 * .66)
PO ;/
Bayes RuleSo, we know the joint probabilities
p(c,v) = p(c) p(vc)
p(v,c) = p(v) p(cv)
p(c,v) = p(v,c)
So, using these equations, we can define the conditional probability p(cv) in terms of the prior probabilities p(c) and p(v) and the likelihood p(vc)
p(v) p(cv) = p(c) p(vc)
$::
9>2Returning to Spelling...
Channel Input: w; Output: O
Decoding: hypothesis w = P(wO)
or, by Bayes Rule...
w =
and, since P(O) doesn t change for any entries in our lexicon we are going to consider, we can ignore it as constant, so&
w = P(Ow) P(w) (Given that w was intended, how likely are we to see O)
I'05VA37How could we use this model to correct spelling errors?kSimplifying assumptions
We only have to correct nonword errors
Each nonword (O) differs from its correct word (w) by one step (insertion, deletion, substitution, transposition)
From O, generate a list of candidates differing by one step and appearing in the lexicon, e.g.
Error Corr Corr letter Error letter Pos Type
caat cat  a 2 ins
caat carat r  3 delV`Y,P1How do we decide which correction is most likely?jWe want to find the lexicon entry w that maximizes P(typow) P(w)
How do we estimate the likelihood P(typow) and the prior P(w)?
First, find some corpora
Different corpora needed for different purposes
Some need to be labeled  others do not
For spelling correction, what do we need?
Word occurrence information (unlabeled)
A corpus of labeled spelling errorsL4
@;KCat vs Carat2
Suppose we look at the occurrence of cat and carat in a large (50M word) AP news corpus
cat occurs 6500 times, so p(cat) = .00013
carat occurs 3000 times, so p(carat) = .00006
Now we need to find out if inserting an a after an a is more likely than deleting an r after an a in a corrections corpus of 50K corrections ( p(typoword))
suppose a insertion after a occurs 5000 times (p(+a)=.1) and r deletion occurs 7500 times (p(r)=.15)XZXZZlZ%&lHoB4qThen p(wordtypo) = p(typoword) * p(word)
p(catcaat) = p(+a) * p(cat) = .1 * .00013 = .000013
p(caratcaat) = p(r) * p(carat) = .15 * .000006 = .000009
Issues:
What if there are no instances of carat in corpus?
Smoothing algorithms
Estimate of P(typoword) may not be accurate
Training probabilities on typo/word pairs
What if there is more than one error per word?+ZpZZ3ZZ.Z*Z/Z+p3 .*/,+
+C5/A More General Approach: Minimum Edit Distance00How can we measure how different one word is from another word?
How many operations will it take to transform one word into another?
caat > cat, fplc > fireplace (*treat abbreviations as typos??)
Levenshtein distance: smallest number of insertion, deletion, or substitution operations that transform one string into another (ins=del=subst=1)
Alternative: weight each operation by training on a corpus of spelling errors to see which most frequentT@EC@P
1~mG9Alternative: count substitutions as 2 (1 insertion and 1 deletion)
Alternative: DamerauLevenshtein Distance includes transpositions as a single operation (e.g. cta cat)
Code and demo for simple Levenshtein MED?d**VP>00F84MED Calculation is an Example of Dynamic Programming&5!SDecompose a problem into its subproblems
e.g. fp > firep a subproblem of fplc > fireplace
Intuition: An optimal solution for the subproblem will be part of an optimal solution for the problem
Solve any subproblem only once: store all solutions
Recursive algorithm
Often: Work backwards from the desired goal state to the initial state)G)
G
7
?
{E7
For MED, create an editdistance matrix:
each cell c[x,y] represents the distance between the first x chars of the target t and the first y chars of the source s (e.g the xlength prefix of t compared to the ylength prefix of s)
this distance is the minimum cost of inserting, deleting, or substituting operations on the previously considered substrings of the source and target2)T(TH:&Edit Distance Matrix, Subst=2  Wrong''
P<"NB: Subst x for x Cost is 0, not 2SummaryMWe can apply probabilistic modeling to NL problems like spellchecking
Noisy channel model, Bayesian method
Training priors and likelihoods on a corpus
Dynamic programming approaches allow us to solve large problems that can be decomposed into subproblems
e.g. MED algorithm
Apply similar methods to modeling pronunciation variationlGZQZhZZ:ZGQh:OJ;Allophonic variation + register/style (lexical) variation
butter/tub, going to/gonna
Pronunciation phenomena can be seen as insertions/deletions/substitutions too, with somewhat different ways of computing the likelihoods
Measuring ASR accuracy over words (WER)
Next time: Ch 6:9:9/LMNOQRSTUV W
XYZ
[\]^_`abcdP
z(
r
S_`
S`{<$0
z `
,
uW
,$@0`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,"
0K
,$@0"
0p
,$D0H
0h ? f___PPT10.+:?+D@' =
@B D' =
@BA?%,( <+O%,(
<+D~' =%(D&' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!;%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*Bu%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*u%(D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+# +
z(
r
St~`
Sp{<$0
z `
,
uW
,$@0`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,"
0K
,$D0"
0p
,$@0H
0h ? f___PPT10.+gDg' =
@B D"' =
@BA?%,( <+O%,(
<+D~' =%(D&' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*!;%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*?p%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*p%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+ +
z(
r
S`
S{<$0
z `
,
uW
,$@0`
0f
H `
0=
H `
0H `
0nH `
0t$H `
0R
,`
0) 
,`
0 ,`
0 Z,`
0` ,"
08
,$@0"
0p
,$D0H
0h ? f___PPT10.+Dg' =
@B D"' =
@BA?%,( <+O%,(
<+D~' =%(D&' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*8%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*<m%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*m%(DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*%(+8+0+ +xMOSApvRlX DF xP1l(ħRm`$^LJyă'/~yT
3綨`H:tgvfvvvgfߛw+kNR9~ci/t ;
6}.}5a{`\!H8VGϫyϕ;yO_0ArS,$iB83*`)>r?@^?i㿀',_ХO{="]VZw!1؆1m@g&<`k`x"r߈3=fݙ=49n뗩tvܛUa_JeT]}h٣.Øȟ90EQRT%[ba3}KLk]O(Э+/B#hүtB0](deU7,zǣLO:b68D&T2k&cn*XU0Yǃw}n"L^4 ~W"o۪W[MyQ\+ڈqy>rgy"wvTF2oMyQEEq~d]qlrg7z^K]z͈d2{\2P{FbS` /`F5U}r{
a'axpX҂Z ڗ2oB48q5~^[Mj&l?rLr 240je;1
2\e`(
K $
]9Equation Equation.30,Microsoft Equation 3.0*<,Microsoft Equation 3.0 Equation.30,Microsoft Equation 3.0=Equation Equation.30,Microsoft Equation 3.0?Equation Equation.30,Microsoft Equation 3.0@Equation Equation.30,Microsoft Equation 3.03lhttp://www.cs.colorado.edu/~martin/SLP/slperrata.htmllhttp://www.cs.colorado.edu/~martin/SLP/slperrata.htmlfwhy?^http://home.earthlink.net/~dcrehr/whyqwert.htmleEquation Equation.30,Microsoft Equation 3.0PCode and demo for simple Levenshtein MEDBhttp://www.merriampark.com/ld.htm/0DTimes New Roman0
0DSymbolew Roman0
0 DWingdingsRoman0
0
B .
@n?" dd@ @@``x`x3
!"#$%&'()*+,./012342$xJD8Q}
iI2$8XfuE,+/e2$r;9,egqD2$<
3.
tCan humans understand what is meant as opposed to what is said/written ?
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteers are at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
How do we do this?>L\L\lL
(Detecting and Correcting Spelling ErrorsApplications:
Spell checking in M$ word
OCR scanning errors
Handwriting recognition of zip codes, signatures, Graffiti
Issues:
Correct nonwords (dg for dog, but frplc?)
Correct wrong words in context (their for there, words of rule formation)
jwj%PPatterns of ErrorLHuman typists make different types of errors from OCR systems  why?
Error classification I: performancebased:
Insertion: catt
Deletion: ct
Substitution: car
Transposition: cta
Error classification II: cognitive
People don t know how to spell (nucular/nuclear)
Homonymous errors (their/there)$qB#Qq
#
NA7/D*f0AE/How do we decide if a (legal) word is an error?How likely is a word to occur?
They met there friends in Mozambique.
The Noisy Channel Model
Input to channel: true (typed or spoken) word w
Output from channel: an observation O
Decoding task: find w = P(wO)!&!$j
Bayesian Inference$Population: 10 Columbia students
J!#4/Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a vegetarian? p(v) = .4
That a rcs who is a vegetarian is also a CS major? p(cv) = .5
That a rcs is a vegetarian (and) CS major? p(c,v) = .2!ZZZZ!
>U$<.50Bayesian InferencePopulation: 10 Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is vegetarian? p(v) = .4
That a rc who is a vegetarian is also a CS major p(cv) = .5
That a rcs is a vegetarian and a CS major? p(c,v) = .2 = p(v) p(cv) (.4 * .5)!!
PR";561Bayesian InferencePopulation: Columbia students
4 vegetarians, 3 CS major
Probability that a rcs is a CS major? p(c) = .3
That rc who is a CS major is also a vegetarian? p(vc) = .66
That rcs is a vegetarian CS major? p(c,v) = .2 = p(c) p(vc) = (.3 * .66)
PO ;/
Bayes RuleSo, we know the joint probabilities
p(c,v) = p(c) p(vc)
p(v,c) = p(v) p(cv)
p(c,v) = p(v,c)
Using these equations, we can define the conditional probability p(cv) in terms of the prior probabilities p(c) and p(v) and the likelihood p(vc)
p(v) p(cv) = p(c) p(vc)
$::)
9>2Returning to Spelling...
Channel Input: w; Output: O
Decoding: hypothesis w = P(wO)
or, by Bayes Rule...
w =
and, since P(O) doesn t change for any entries in our lexicon we are going to consider, we can ignore it as constant, so&
w = P(Ow) P(w) (Given that w was intended, how likely are we to see O)
I'05VA37How could we use this model to correct spelling errors?kSimplifying assumptions
We only have to correct nonword errors
Each nonword (O) differs from its correct word (w) by one step (insertion, deletion, substitution, transposition)
From O, generate a list of candidates differing by one step and appearing in the lexicon, e.g.
Error Corr Corr letter Error letter Pos Type
caat cat  a 2 ins
caat carat r  3 delV`Y,P1How do we decide which correction is most likely?jWe want to find the lexicon entry w that maximizes P(typow) P(w)
How do we estimate the likelihood P(typow) and the prior P(w)?
First, find some corpora
Different corpora needed for different purposes
Some need to be labeled  others do not
For spelling correction, what do we need?
Word occurrence information (unlabeled)
A corpus of labeled spelling errorsL4
@;KCat vs Carat2
Suppose we look at the occurrence of cat and carat in a large (50M word) AP news corpus
cat occurs 6500 times, so p(cat) = .00013
carat occurs 3000 times, so p(carat) = .00006
Now we need to find out if inserting an a after an a is more likely than deleting an r after an a in a corrections corpus of 50K corrections ( p(typoword))
suppose a insertion after a occurs 5000 times (p(+a)=.1) and r deletion occurs 7500 times (p(r)=.15)XZXZZlZ%&lHoB4qThen p(wordtypo) = p(typoword) * p(word)
p(catcaat) = p(+a) * p(cat) = .1 * .00013 = .000013
p(caratcaat) = p(r) * p(carat) = .15 * .000006 = .000009
Issues:
What if there are no instances of carat in corpus?
Smoothing algorithms
Estimate of P(typoword) may not be accurate
Training probabilities on typo/word pairs
What if there is more than one error per word?+ZpZZ3ZZ.Z*Z/Z+p3 .*/,+
+C5/A More General Approach: Minimum Edit Distance00How can we measure how different one word is from another word?
How many operations will it take to transform one word into another?
caat > cat, fplc > fireplace (*treat abbreviations as typos??)
Levenshtein distance: smallest number of insertion, deletion, or substitution operations that transform one string into another (ins=del=subst=1)
Alternative: weight each operation by training on a corpus of spelling errors to see which most frequentT@EC@P
1~mG9Alternative: count substitutions as 2 (1 insertion and 1 deletion)
Alternative: DamerauLevenshtein Distance includes transpositions as a single operation (e.g. cta cat)
Code and demo for simple Levenshtein MEDd))NP>0F84MED Calculation is an Example of Dynamic Programming&5!SDecompose a problem into its subproblems
e.g. fp > firep a subproblem of fplc > fireplace
Intuition: An optimal solution for the subproblem will be part of an optimal solution for the problem
Solve any subproblem only once: store all solutions
Recursive algorithm
Often: Work backwards from the desired goal state to the initial state)G)
G
7
?
{E7
For MED, create an editdistance matrix:
each cell c[x,y] represents the distance between the first x chars of the target t and the first y chars of the source s (e.g the xlength prefix of t compared to the ylength prefix of s)
this distance is the minimum cost of inserting, deleting, or substituting operations on the previously considered substrings of the source and target2)T(TH:&Edit Distance Matrix, Subst=2  Wrong''
P<"NB: Subst x for x Cost is 0, not 2SummaryMWe can apply probabilistic modeling to NL problems like spellchecking
Noisy channel model, Bayesian method
Training priors and likelihoods on a corpus
Dynamic programming approaches allow us to solve large problems that can be decomposed into subproblems
e.g. MED algorithm
Apply similar methods to modeling pronunciation variationlGZQZhZZ:ZGQh:OJ;Allophonic variation + register/style (lexical) variation
butter/tub, going to/gonna
Pronunciation phenomena can be seen as insertions/deletions/substitutions too, with somewhat different ways of computing the likelihoods
Measuring ASR accuracy over words (WER)
Next time: Ch 6:9:9/LMNOQRSTUV W
XYZ
[\]^_`abcdP
8(
8l
8 C``
l
8 C4
`
8
c$A???
H
80h ? fy___PPT10Y+D=' =
@B +a
(
l
C4`
l
C5
H
0h ? fy___PPT10Y+D=' =
@B +xMOSApvRlX DF xP1l(ħRm`$^LJyă'/~yT
3綨`H:tgvfvvvgfߛw+kNR9~ci/t ;
6}.}5a{`\!H8VGϫyϕ;yO_0ArS,$iB83*`)>r?@^?i㿀',_ХO{="]VZw!1؆1m@g&<`k`x"r߈3=fݙ=49n뗩tvܛUa_JeT]}h٣.Øȟ90EQRT%[ba3}KLk]O(Э+/B#hүtB0](deU7,zǣLO:b68D&T2k&cn*XU0Yǃw}n"L^4 ~W"o۪W[MyQ\+ڈqy>rgy"wvTF2oMyQEEq~d]qlrg7z^K]z͈d2{\2P{FbS` /`F5U}r{
a'axpX҂Z ڗ2oB48q5~^[Mj&l?rLr AGDeF;
JeRoot EntrydO)9ʱH@
PicturesUCurrent User3PSummaryInformation(T K
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFIJLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{}~
!"#$%&'()*+,./0124.html"http://www.merriampark.com/ld.htm"http(_4JJulia HirschbergJulia Hirschberg(_julia hirschbergjulia hirschberg