...

、}ー( 1 2 ) Lvaι 。( F -FU I ) Jt 、,t ,l A句J i -- ) t U J l …-

by user

on
Category: Documents
71

views

Report

Comments

Transcript

、}ー( 1 2 ) Lvaι 。( F -FU I ) Jt 、,t ,l A句J i -- ) t U J l …-
STREAM WEIGHT OPTI恥lIZATION OF SPEECH A ND LIP 1恥1AGE
SEQUENCE FOR AUDIO-VISUAL SPEECH RECOGNITION
五'日tO:ihi l\'ιkiL1n'/lra1
Hi.de!.οs!t., 11ゲ
J(iyりhi7'O Slú/.;比no-':
1 ATI1 SPOkC 1 La11gllnge Tmllslation [{esc8 rch し はbora t ories
1
2-2-:2, H ik込ridni , Seikn- ch o , Sorakll-区1111, r(�,()("o (jH)-0288, .Japal1
2Gmclllはte School üf Inl"ornmtioll Sciellce, \'ara Instilllle o!討ιiCllむUは11仁11',巴ChllOlogy
8!)l(j司5, Talmymna-cho, [kollla-凶hi. :'inra,
(j3ú-OlOl噌.JAPAN
}
、
ー
1
2
)
Lv
aι
。
F
FU
I
Jt
、,
t
,
l
A句
J
i
)
t
U
J
l
E- lll 孔il : 11礼ka mU l・孔 (ú:sl i .il (1 .l'o.j P
印lrly imcI;ral.ion �元:hcmeじ<111 u(, describ巴d as follows,
ABSTRACT
\\'h巴re the O llt P l l t probability i尽ubtan巴く1 by prodllc t 01' Olit­
pllt prolmbilites 01' audio anc ! vislIal slre孔IIIS.
Thi::; papcr
aclclre凶ses a llovcl llIetJlOd which opLilllizcs叫lむは1lI \VcighLS
J '@l!$I: A叫ん()!:I;' :
(
Hiv! i\ 1-1川市川
比lldio- visllal spce仁h reco民nitio11 i討OIW prulllisill誌はpplυ<lch
-
i討 presenL. Th巳c�arly i11 ll'民r at l Ol I :,trはtl:'目、1'01
(
((l.1十川 =
been sho w lI to vielcl betler reco日Iiiliül' j)l'r1"0I'l1 IHllct' th川}
purel、 aむOllStIc SyS tL' IIiS . csp日ciall、 whcli bはιk民rolllid l10Îsι、
)
-
ßilliOdal SPC,(dl reco 民lli tio n svsl点111凡 with thlゐ U討巴 けl' vト
Sll 九i inful'IlIiJ.tÎOII t,Q訓Ipplc�lllellt礼c(J'lstic i11!Ol'l'l孔Liüll. hav(・
\'i.';{wl)
w h <・1'(' U(ol) , UA (υ;\), illldiJdυ; ) ë.l.JむOlltpllt. pl'Obabilil.附
はl till\e 1. Jor Hlldiu-vi引wJ.礼lIc1io, allcl vi凶u札|凶treallls. I'e­
,
sp c・じ tivcl,v.
Iti\ 川\d
(Y\'
aI 日 :;tI'CUIII WCI只ht:、 I'or alldio
visuill SLl 的 IIlIS. r<国pc叱t.ively
a nd
A-.; 孔Illcthud for est i ll lntillg strealll weiglItお ー the Illax・
so as to 1lI乱XiIlli7.C rcむognition perl'Ol'lllallCe, The proposed
illllllll lik(�lih()(叫'1) b州!d 11川hod or
lIIethod estimates the strcai1l IVcights ba討ed 01 1 a n0I'1I1<11噂
I lwthod ÌI札Vい l.)(!ell pl'lJj)os(,d.
tl同GPDf5]
b,凶刊l
How府vel・. due fìl・は III昨thod
ized 1 0陪likelihoou whiclt is deriv日cl uy ratio 01' likd il lO od
h孔忌II serio u s c!sti III比tion proulem lmd tlw latter rn日thod ha.s
of aじorrect worcl al lcl hi民llest likclihüüd 01' illcorre<.:t words,
aじrllCÎai ].J ro bi elll sllch that III込lly adaptatioll clat品is Ileじes­
Tlte isolatccl worcl recognitioll 巴xp巴rilllelit r csll.J ts show that
sary for the weil!;I1t. esti ll l比 t iO ll. [1I orc[日r to cop'" with th(出e
the auclio-visual討peeclt recog11itioll by propoぉecl methocl
probleI1ls, we pl'Oposccl a Il巳w weight estimatioll lllethod
lISillg only a few adaptation data. Th巴isolated word exper・
attainδ56.:2
% (10
clß), 55.2
%
(Oclß) alld 1;',2 % (20clß)
better performance COlllpUI巴ι1 to that ollly lIsing audio in­
inl日nt.s show thaL thc proposecl weighl estimatiol1 imprüvt's
fOl' llw.tion. Tlle reslIlts nlso show the proposecl method <川n
reclllce a llllmuer of aclaptatioll、vords
the 1・巴cogliitiOIl pcr['ornmllce ill low SNR ellvirυll111ents and
reclllces tn巴!lulllbcr of礼dapt:品tio!l words,
2.
1.
I恥1AGE SEQUENCE
Speec!t recognitioll pcrfol'lnallじ日 h,出 bcell clra:叫icully illl­
prov(!cl rcじ巴ntly.
INTEGRATION EFFECTS OF USING LIP
INTRODUCTION
!-lIJW口ver, il is 孔l討o well-kno\V1l tlmt t.hι‘
perforlllUllce will be seriou,;ly clegr乱cled if thc凶y叫elll is l'X­
Thc b i llloιl 川 sp ecch recogllition Iws a po討sibility to illl­
" rlllance iIlC()I'­
prove (;OllvcllLion叫1 Sp l'(.ch recogllitiol\ perlo
plll正lting vislIul lip ill to l' llmtion . The spe刊�h recognitioll
ト111111九日行 p出y nt t c ll L iull Ilot
100'
nclverse environlllellts, Th�、lip 1百aclillg is tht、extrClill' t礼町
T
さ:. 80 !
posecl in Ilois,\i envirolllnellts.
only to speak巴15
" speech bllt alsu to 凶peaker's 1lI01lth ill slIch
if it. is i rn p ossible to get凡IlV alldio 沿igllaJ.
This以199c.:�sls
孔fact. that spel�ιh 1 日c og n ition c品11 Ill' illlp roV(:d iJ.v inむ01'­
porat.ing 11l01lth illIages, Thi行kincl ul' bi-lIwdal inll!巴rutIoll
is avail乱ble ill al1110st eVl. ry situutioll,
ìl 'l ally st,uclies IwV<'
been presentcd relnted to illlprovclnellts 01' spcech recυ且11卜
tioll by lip illlages[l)[2][3],
For instullcc, the recognition孔じClll1lCy 1'0 1 ・ voiced 仁 0 11忌0 -
M川s /b/,/cl/,/g/ωn be il1lprovccl by incorpomti刊lip
illtagむinformation, sinc日lip closlIre for bilabinl phon巴mes
ーウ
C leal、
一
三日�.- '二一叫ご30 c:j B
ffCEh--\
\ !
i
i
Q)
.;:
柿
/.' .'
/. sr
-:_.ノ:..
I
,
.--_ ,
_, 10dB
J叫0. 0
As <ln auclio幽visllal integrntion lllethod I'or speech recog­
)
( visual
。円Iy I
nitioll, an 巴日rly int日gration lllethod is well-knowll, wlwre
the Olltpllt probnbility is obt自IIcd by prodllcL 01' Olltpllt
probnbilites 01" allclio allcl vislIal5treallls , [n the early int�・
'..1'
•
1
,!i";多 _.. ・、ヘ、 、 _ 10dB I
否 60 !þ �←{・ ー~ー. _
、
E T
-、-、一
、\
;
B ",
',.
....
.
..
Od
:a
pI
、、
-.
,
、
40
I
\、 |
ll 骨四clean
、、.
、 、 ll
L
30dB
・、
"
‘1
20 r - 20CíB
可
こ
I
is relatively ensy to clωcril1linatc lIsing im叫e illfol'lllation,
20dB
�…--.
竺8 {
0.2
0.4
」
0.6
weightFA
、、、
i
0,8
J
1,0
二 .、・ 、」.
川 口)I
(, aonly
Figure 1. Effects of stream weights in bi!l10dal�
speech recognition
gr礼tion 以:hellle, it仁川1 be fllrther possible to illlprove the
recognitiOIl pel・fOl'luance by optimizing str巴am weigltls fOl
the Olltpllt probabilitics of allclio allcl visllnl sll'巴山lns, The
20
275
('
1 ι OllS札当孔li(口.三<1川lけIlv 110i 討V
1 1、\'lro山
)ll卜也汁II川I1川('(' 1お {什l 【ドふi広仰と引I 白川d�町d il】II ;.山<
IJIド川.寸1'1 り 1'1日川11.1
I
、W叶、V什dll'ω山、1引iれ〈川 vis剖州川1I山Ii川 l日川111'01'11川
ド
1川Ii
11μ《ド什川川‘可引1
1
l f'
1川I1Itけ)r111alioll iLse i� illぉllllicil'llL 10 iJlIild speech rl'仁U以11il.io11
行illCl' iL:; 1ρ)11011悶巴tiじ ιdi:;【仁:1'11川l日川11i11HLive 1ρ〉じωrf'oω)1'1川l日l日 11ιC<ド‘ l凶 :;0
川 �I 1'111
: IVIWrt, L(ι)!JlJu,.",.,.,.,.,) Hlld L(ο!,�1 "".",
)乱1'< -' k\g­
lik(:Iiltooc! li)1 川1 nclapl.ntioll \\'()J'(I 1.0 tlte <':01'1 ρr:1 、\'01'<1
hypULItりメi日 正lll<.l
Livel y . j-'NJ: is
111(' iIIιりrreじ1, Wリr<J IlYI 川tiw:、1::-;・ J'(-',;PI"."­
Jelincd :30 HS LO indiじ はしピ ci日討じrilllillHtioll
P'け》礼川t
jJP-rrol'lll札口仁日、、,,11m巴乱おυclunl 1・巴c ogni ti ull il(:CIlI'i.IC:V rl o 円ミ nOL
‘ けj'
什ν ,、可い川l じ什11礼ι ピ
1111川1 <
I,'i山 Il'l' 1昔hows t.Jll' <.'xpυl'illll'l1L l'esllll.S o!'づいe<llwr c!('IJ<èll­
"い111 ILl(J \\,()r<1汚billlodal isu i 九I l'c J 1I'0rd reC Uþ(lli t iO l l ÌJ<lSじd U11
\ V() rd I'<�C,)且1111'1011 aCC:llrH<':�.' allιi II()l'lll<lJi�æd I()jよーìikelihonJ ill
11"・'_'<lI'I.v i11 I . ,_'gl パ t.i O ll . TI"'('lll'\中S (I}"{'り|パ汁111刊11刊('h<ll1は111品
川戸1 1','<1111 \\'('il!:tll lór illl<liυi"j'()l'1l1<1liull. IJ1'<lk"りf' 1'1吋リglliljい1 1
I 、
1;11.'、 ;\1'(' )1)�I'rvド11 111 11ドllV"1'11 ill"iiu-,"Il、 fllHì vÎ州1,!i-<lll
I
‘川lciili<ll1メill "hlll.'''1 lll;UllS1.i(' メぶ j{ Cl1、iI'U1111l<'I11S. 'l'1Il'引b
" "'''' illdil'al日11' ",いIr(,c:l,S 01' ilIl'ol'lllil1.io11 i 1 1l. < ' g l< l1 .iOll ()I' dij'1 '
川I1 1.\'1)1" ,,1' 1.1I" i11I'Ol'lI川1.l0ll
.
1"1
J'ltí,.; 1川1)l�1' <lcÌ1
< I行計刷出 rhド pl'< l b ic'l 1 1 ilOW I,() けJll 11111日む
、11"り1111 lI' eigh lぷ fol' ;Iudiu は11ιi Vi S ll凡1 sl.n:Hl11メIISIII民" jοW
仁111111民'".
.,
、,・
3.2.
Estilllutioll Pl'OC官dure
,,1民orit.llIll is 叫11 11 1ll il ri ;l,c d a.� f'()llolV比
J. I'repare adはj)LIlLion lVorcl凶 il1 い出LllIg ell V lronll l巴II L:3 ,llId
r"cugllit.iull c1i('tiollary used ill ,Iclapti<tioll. This <lct山lト
LはliulI di c L Î olIHl'y illclucles ad n pL aLiu ll words.
Thl'
1.1"1' 1'<百円ill!.i()1I !úr cs1,Ìllwl,i,川
コ
H(�cり日Ill;l,(‘
'l 'hiメpuper proposesはl1P\v IIll'llIod which tril'!:3 LO向。Ive llIl'
(
V,le i gh t s
!,'i民.J 吋IIOIV対 LlIe 日中l以hl sl'iln::h j)1りι日討S ill Lhe a lgoriLllll1
'
'1'11<' II'l'i民"1討 孔1'(' t il'cI f'or all phonellle凶 はIIcI叫孔ll's. 'J IH'
lal 且erδize of' Lhe uci<Ij)Li1Liυ11 clictiol1nry gives tlt� bct.-
PROPOSED METHOD
j!J' )IJkllI S lIol,('d ill ! hl' plれV10llS S<,じ1 iOll.
'1'11(' ('xjJ,�ril1l(�lIt日行holV reIa.tioll叶l ship IWIλV"<�II I,!l<,
(\ li.1t.(lr :-;l'CLlOII
;'llïIJl tllÎon dal凡f'roll1 Lltn lc討(. CIlVil'UlllllClll討
Lh,�
白d凶ptati(JIl
The 111<:,llH>d C片
IIsi n g ::; kinds けl
I-11V' J\'J t.rallsilioll prob­
words
SII'0.<llll W日ighLS shown in Fig.3.
:d川liti(,s孔Ild outpu!, probabilitie日凡rc� eぉ(;illl乱tcd l.Ising
111l1;1l.ピs l.h日stl巴山111 wei仏ht持討o ;IS LO 111はXlllllZe n�co民 ll1tlü n
じleall凶peech.
川l'Uf(1ιy. 'J'hドesLilllatioll i討仁孔l'l' ied Oll t. b'y凶刊Irchil1g ileLLel
:3. Sek'cl one of' it Jower or high巴r haJf of th巴 targeL wei ght
いpl.illl<d SLre山111 weighls, LiIe illCl 山目。of' 111<' \Vol'd 以cl l rHcy
'1. Scl tll<' lI'eighl. rcgioll sele仁tecl in pr eviolls step. Th巴Il
ill'lλ1.(' f'rolll step '2. Stup itel ιt io ll w hell Lhe \V引igh l .
\\'C'i広ht.s it.er,ltively i11 twü rl'gioll討
J-1o\\'口、中l', SlIlじ巴Lhl' wOl'd
aCl"lIrUじy <lO(':311'L IIa\'(:' gOüd 1 むsol t lsiol1 ello11gh I'ur lilldillg
IIsually ;;(.opぶill olll.v a j刊v il.<.'ml.iollS. Thc plり(J(肘('d al­
g<ll'it.illll 1I町、持品 川(ll'lllit!iíl巴d lilwlihood i11討Lenc ! りl' l i 1ド \ vl)l'cl
山'Cll1'i:.lじ.v. Tlllè jlropり出て1 nlgol'iLlllll 111;.I.V hドll1<)ll)!;ltI'. ilS <>11ρ
ki11d 1)1' ;1 j)j)mx il ll n L i ,川 口 l ' I.hド1l1illil111l111 <:1日出ili(',lii<J1lド1')'01
{判1 111 川 L i üll .
()
3.1.
(
1'<:'giOll LW<:<lIIICS sl1 w l l er tlll1ll 0.05.
4.
4,1.
'j‘IIl' lVurcl ‘ I.CC llrnc y IS 1101品uud 1l1l',ISUre I.u l'刈1111,lll'討しremll
\\ ' C:,igh t札 凶ill n' Lhl' e,.,LillwLioli札口J> 討Lùl肘ill i< k'\\' iLじra­
:1り11S hec孔llS(-' of 1,ICk uf rcsolllliull. Thl'i'<.'lúl'<' ill 111iぉJl,IJl日l
\VC! lIS引 Lhe l10nnnliz巴ιIlü民ーlik('lihou<l 1'01'
vs l il 11al ill見 メlj"(_-)illll
\lol'llInli日日d 10 民ー likぜli i I uud . Î'NTi\Uif\/υ"川" ,/) is
正!eli11ed 品討10110日ι
Audio-visual speech
database
the
c!;tl.ubnsc lls(!cI i11 LlIi自
叫llcly. Video recol'di11g is pcrl'onllecl al白話O\lncl-pl・oof' rOOlll
(
alld liglIlill民 is sl 'l 1'1'0111 Lhp f'rol l . .
Tlle h巳以1 is not lixed
bll! lh<'討1>' '山kθl' IS }'('�qll円日t("cI (U a! I.ach h日l' l川じk lü tlw封印Il目
\Iol'l"uv('r. LIIl' SI凡.'akcl' i" alsn I'eqllι、LPd 1,0 CI05(' a 111011Lh
belúl'(' <111ι|はf'll.'r lltl<.'nl11<:e.
\,\o'e υbSt'rvecl tlI(' diffel'encl' 01
liglI!illg ('ullclitiullS. 日i;l,(' りf lip凡 und illclilllÜioll けf ;.1 rfJC(�
ドvcry II (.L<èl'a1ICe worcb. 出nce Lhp vid削) 1 削正)rding w込日む01ト
ducLed ov"r L\\'O ur 111リ1'(:' dav:;.
LNlI(UlfIJ叫川r,;,". )
(2)
second
third
word
word
。
4.2.
Pl'epro(;essing
八伝活Lated pn'viollsly‘we ob日el'ved tl1(' CI i n臼I巴nce inliglll.ing
COllclitiollS ancl illcJ illatiりn of砧facf' in r.ltp I'ec;onled vid"<,1
STEP1
I
I
STEP2
I I
1 1
• • • • •
og-!ikelihooc
'
,
)( 1 lo g-li keli hood
ú --- 附mal ized 1叶l州陶d
Figul'e 2. NOl・mal'Ïzed
=ー(X1・X2)/X1
log-likelihood
of visual infol'l1latioll
O
D
EBt
可
5
7
・E
EE
O
L(OIλfw…,.,.,./ )ー:川Xu・i"""",' L(():.�J凶…",.,.,.,./. )
L(OiM叫川,.,.,./ )
correct
word
EXPERIMENT OF STREAM WEIGHT
ESTIMATION
'1、'able 1 SllOWS I.he :甘い'!.cilicaLion 01'
NO l ' l�lUlized log-likelihood
U旬、ights.
re広1011仁Olllpa.n11民the nonnalizel
c l og-likelihoo ds
I'JりW付v<;r. 1Il<' plりp sed l11dhuc! Cilll ! ' ('(I ll じ ... é.l
11 l ll11 I>l.�r uf' aclap!al iOll cì,lla
276
U ",.",
STEP3
1 1 111
1
STEP4
11111
1
1 1 1I
1
STEP5
Figul'e 3.
Algol'ithm for weight estimation
Table 1. Audio-visual dntabase
討GI Illovi<>
File fOl'lllat
Speaker
Utt日rances
Auclio
ATlt ,'i2,Jυ.Jnp凡nesμword�ミ
s<llnplillg: [Gbit, 48 1ん[-1 z)
sal\l plillg 川町: :W
"izc:
ViSll山l
j
Oné fenlalr' speak日i
l(j()x I:.w
lJ 川111ω/“c)
}
,.--l .. ..
/ 22足立'Jよrd i!
;
001
.. 十
ム..... /�
ぷ山
,
i
. ...
JJ
i-/
ii e
l::..�:_�__ __._._._�_ �=
' ::!JJ
ι01ωr qu孔1 iLy :討bit ItCl::l
d
戸口,.raCl
I wor s
(1.002
<SNH. 20cJB>
}
,
〉
ー
t品)
Figure 4. Prepr o cess ing (a) Ol'ig i 山tl
change (c) Normalized lip positiol1.
illlages.
0001
1(:)
r
戸1
もo
0.002
1'hcn the imag巴討 ulο prcpl・oc巳吉国ccl by llisLu民1・孔m
;
T
0.2
4
0.4
r
0.0015 í
Fi民'I(a),
(b), ancl (C)礼re a recordccl originul illHl仰・a lliぉlognllll
ooos:
norlllalizecl illlage allcl比lip pùsitioll Illl1'111はliz巴d illlage, rc­
spectiv巴Iy.
円00051
Experimental cOlldition
ExperimClH COllditi()ll"; 礼l ド sh()\VII ill ']';.1>1<・2.
f
ぷmrzru惇B|
。。。υ5 1
(b) H is tog ram
nOl'lnalizaLiol1 and the lip posiLioll lIorl\lalizatiυ11 bωecl 011
4.3.
112
U.0015
pattel'll lIlatchin尽Witll a (:0111111011 kcy 1't<lIl1e.
'"
0.0
hJI' l'uch
�
ì ............/
nonno“%od
�
log-l�kOlltlOOd
0.4
0.8
{CS';i':�Iê)lO) 1
0.6
wcightl
0.2
0.002
nlOclnliLv tll<' unsiじ coelliciC'IILS al lt J I:hl'正k、1L凡じりじlliciellLメ
SillCC n I'rallte ratc o[孔vicleu illl<Il.!C:'
"1,, Hlld Lhcll Lhe
白 is :lU
rntio ol'thC' I'ralllc、mte is is l:tl. Wl' illscrL(�c1 tliC' [(>111出\llIe
llllages so <出 to 語)'lIchrulliシぽ Lhe I'uce illlHgl' I'rHlIH首 raLl' Lo
1.0
ρo,.roct
words
<SNH: Od白〉
an� collccLivdy Illl'l'gCrl illLo 011('ぉtl'!'alll
0.00噌子宮
Jlu,nbor o(
c;orroCl \Nord
け00市P
the audio speech rrallle raLc.
Then, Lhe il1mge凶 al'C 孔nalyzl'cl uy :2-clilllensioll叫l卜卜
‘ "1'
to extmct
(jxG
5
log pùlV日r 2-D specLrull1 l'or billlod孔1 Sp('('ch
recognition.
もω
L2 Jdl.:
(cll)\\'IIS<l111 pl以1)
出lI<1io
adnplntiull clεILa
1.0
Figlll'!' 5 shows tll!' llUlllb(�1 りl' corl'!�ct worcls ullcl出UIll ()f tll!'
1101・IIlaliZl'tJ log-likelihood, LNH,正:hallging stl'eUnl weighω
:J: Lip I川討itioll IIUI'II11I1 iγ:,ILion
i[: :2DートTT
12SGx2;,ü)
i九Il(1I11(�l.cr: luヰpll\\ ('1' �p刊:1:1' 1I1ll
:�:-I urd川 Tム,)二
Uはillillg d川日
lJ.U
4.4.
:2: Ilisl.ugla
・ lll IlOl'lllHlizalioll
II 1\ 11\1
{)A
0.6
wOlgt1tf
fralll(, lell以11: :32 m'=ÌC(
1: i {.C Ls "* �5(Ì grayscal<'
仁Jistl'il川Lloll
0.2
fmllle sliilL:肘'//1."c; (目
pl'e elllphnsi,,: [- u!曽Îz一品
巴xL!'itCL: 1\1 FCI ; lli、!!!込lfi" !
visual
(nSllnl,'llU�
F igure 5. WOl'd l'ecogll itio ll rates and normalized
log- li ke li hood fOl' stl'eam weights
Tnble 2. Experim.entnl condition.
出ullpling I'al.c:
10
Cilllssi,1I1: :1 �lixllll'l'
55 pIlUIIl.!Ili!アniUdl'1
lì,1り\\.υ1'<1
lG\\'ur<ls(日NIl!ロ10, :.W. 10. () < 1 13)
Estimation results
to thοad礼ptatioll dilta sれt
(IG
words ) in <!(,Oll自ticall,v noisv
CIIVir()llIliC.HII対Sll(目h <出孔udiu SNll討乱I'l� :JOdB.三OclB. [()dl1.
and Ocll3.
(;(�llC�l・孔Ily む11<・ peak 01' th(., recognil;ioll mL(� difl'P.l判 I.",I:W('('II
adnptati\)1I word持Hllcl l出till広wOl'ds. Tlw IllOro udapLill,1<11I
\Vol'd討、V,白llS';. L1H '.' lllon"畦l!lltel'll 1 di<' st.n!<llll山(,. �iμhh al'l'.
ulId till' 1I101'l'じ()lIlj)utatiulI is Iw"d(�d. 111 1:1,いむX[Wrilllじ111';
Jf> \\'orcls仁II!肘川1 ut l'illldOIll ft け111 t.estill�川lviroll11Wllls al'''
ItsctJ 1'01' LII('吋l.illtuliotl
I'ïg・ろalsu討ho\Vs that thl' lIol'lllalized log-likelillOod alld,
Lhl' IVOIι1 l'l・じU民tliLiotl rat.cs has peaks al th(' s<L1l1l' st.1'l引111
wσigllls.日1Ic1 1.11a1. thl、11りrnmlized 10ピーlikclillllOd民i、中メ[iu('r
今勺
277
38
2 P
i -- 2
9一郎 6
ァ
し.
l
一
一 一'
一
?
--0
2 一3一山
8
5・
a r・
刊 『i喧 ,
‘-一CM J
一
2 一口
8・
・7
9 9
0
Furlhel"lllore, wc an' goill広t.o t.ry to in ves tiga tc lIoise
rohusl techiques likc sp(!ctntl日1btrm:l.iOll H� p1円proc.:esメilll-(
・
.
e
for Hl1c1io illforl ll aL, i o ll, illJ1l1ゼlIC("':; frolll vis ll HI llui討仏 alld
'
-
a U
1
an
s
仁Oll l par iso ll with tlH' GPD 1l1l'thods.
E
6・
-
・
-
0
2・
7一
-
3 2 1
一一 一
い …一 一
・
へメH"" ! S.へVSP・%!�)p!者Hil. -171 ! $lDり(i!社
m
hH
:2. l・11'(' � I c�ier!$
削
7. Results
lコ.\'ll<llllic Sp巳ech Fc!ut ures for CDHì'vlM Speeιh H.eじυg・
nitioll" ! $ICASSP'97! �)p!も1267-1 :270! $1997!も
5. Gernsi1llos Potalllin.lIos PotaJuinnos! SIはns Peter G raf!$
・'j)isιrilllinative T、milling of 1-l]\I M Stren.lll Expollents
for 八uclio-visual Speech H.日c;ogllition" ! $ICASSP'9尺!$
Pp!お7::S�h3 7:W !$1 !)D氏!も
I
l
ö. :I'I .C.H.ahilll品llcl ß.-H.Ju抗日比 ‘'Sigllal bias rellloval by
11IilXillllllll likel ih o o cl estillwtioll for robl.lst. tclephulle
:;1川氏h rc�CllgllitioJl". lEJ::E Tmns. on 81リ閉じ:h cmd .4'u­
dÙJ P7'occ,;;sing, Vol. .!j、�(). J. pp. HJ・J(J. HJ!)ü.
clean
30
01' bimodal speech recogllitioll
fur c1ett'r111illillg the strcnlll \V♂igbl芯
Ö �huw:i a rela liu lls hip h川、,\'I 't'11 tlw 討iZl' uf IH:U民­
dictiullilry lI�ed for t.lH' e>:ii.illwtiull. a lld LIII' w()rd
1 aιClIrはじv for tbe l{'活Lillg d孔Lι
'l'bl'
J"('cu).\llil iUII
ies are th日孔verugt' 1・eSlllt of live dilrt'rl'IlL Sl't、
: ()j・I�
iOll words.
resolution c1eterlllining the upt inJ a l st.renlll weight.る
c1epending on the size 01" recognition dictiollary
for th e estillHltioll. We lIs ed 1000 worc ls for llw esti-
7 shows bimodal speech rc�cogll itio n rate討1"01' vnrÏous
conditiolls. The proposed Illethud olltper1"Orlll凶P孔ιh
aud io-on ly and v isual- o n ly spcech rec ogn itio ll systelll,
ill SNfl. OJB 1)11(1 CleulI C C ll ld iti o llS . 'fh{� gr内Ji. illlprovp­
l封。b出刊'vl,d ill Vヒry low;:;NIîじolldit.iけ11:>. 1'11<;",' !'(,,,ldt:;.
lじe Lh日propo別式1 llledlOd illlprove単語P'-'CじÌ11刊:ugllitiull
flltes lI::;i 11邑C1llly 15 adaptatiull \V urds ill a山111メticall,\" llui::.y
;eim
5,
CONCLUSION
1111 this pup日r we propo凶é th{' llIethucl le仁blliqlll' wbicb cl�
fectivelyeぉl.Ïnwte対日udio-visual日lreHlll we ig; hls 0(" lbοC! a rl \"
lllLt.gratioll SCbelll<・by few adaptatioll dnt.a ill acollstic,dly
1l01syenvÏl・Olllllellt:;
Thc ('sl illlaLioll c�xperillwlll話。iバITHlll \\'υigbl s arc' p<、r­
lωllwcI alld tll<' l'‘didity 01' tliis WClilliquc・1:--:討11<ハ\'11 1.\\' l b"
23
278
J. SLephell COX!$ I a i ll � la L t hews alld八IIdrew ßa 口 氏 h a1 ll ! $
"CulIl hill品入!oisc l‘01l1jlt'IISはtiulI wit.h Visual 11l1'orlllu-
.1. .I a vie r I 1 1'J"Jlu llcl 0 ! $' ]\ 1 礼X i111 11nl Likelihood Weightillg 01
•
•
•
•
・
•
•
•
•
20
AuLolllat.ic
1 �H)Î! %
一一 audio only
v刷山川 y
prOpoSal I
SNR(dB)
("ur
LiùlI ill Speeじh I lcιognit.ioll" !$AVSP'!.J7!�)p!宅53.-56!$
i
旬。
a lld Palll 1コycl1110l\"ski ! $
Fu吋iUll
Sp<,echreutlillg"! $IC八SSjJ'9(j! �)p! �;n.-1)3ü!$199ü!も
fo1' adaptatioll dic-
-ー.-ー. ._
V/ollちall民H lIrsL
"Ad‘Ipli、町l3illludal Se ll sur
•
・
•
•
・
•
•
•
•
•
•
•
•
•
•
•
•
・
•
•
•
•
•
•
•
40
REFERENCES
八IId iLu rl' 品川ι1 Vi引ml ParalllC�l.(>rs ill illl I I ]\I]\I -h;凶町l
k
f
。
1 1
『
he
1
副
n
5
日
o
y
I'esults
6.
1. 八!抗djouclani and じ!抱clloit! $・()1I th" 11ltcgratÌoll (j j
,
♂
,•
1ISIl
li--l!ie'l
nu
nu
au
au
Recognition
Also Llw e vnl u a t io l l
UII'{υ1I1,1I111Ul1S討pむ肌;h iぉ(.() he cUlIsidered
4n E E B E
山 a u d d d
-
一
9-
- 均 一 一 -
u
1
2
旧
正
m 川
J',f 川/ ' '
山
y 'B J 1 3 F
M
・ 4 4・
一
4
剛n B 白 日 目
臥 ad d d d
,i
,, ,,i
山4
Jf
巴
,・
p
川
a
U 8
6
3一 % i'
1p
,p
F ,f
a
a
'' a.
,
広J
4 ・
7
8
7
r
l ili li- -11
l
Li
9
「
l
}!
1
14
1
lili--1
1
1
11
11
1'
'!
じ
H
間
叫Fhw
E一
れ
5
.-F
e
o
o
υ
o
c
dH
!1
,
a,
-一
?
JP
'門
d,!
3
ほ
4
H
I
M
訓
a
kd
v
nv
nu
nu
nu
ハU
nu
1
1
-4r}E
E
芯E
』8
、,
O
L
司
o
n
o
o
e
同
玉
川
d
3
2
1
1
c
a
•
•
•
•
•
,‘
園
一
一
.
­
m
H
一
一
rccugllitioll experinwllt.s.
V、
L
}l
eg
J
ー
S'I' I U ': ,\ .\ I ,,\' I� I C 1-1 γ CW'I ' I :\ I I χ !\' l ' l O ぶ o 卜、 日 ド ,.' 日 C I I I\ .'\ I)
し 1 1 > l l\ I ;\ ( ; 1二
対 EU U ト; ヘ.: ( � 1 ': 卜、o l { :\ 1 : I J I ( )- V 1 吋 l ' .'\ 1 . 日 1) 1::1,;( ‘ 1 1 ! ( L ( 'UC ,\i 1 ;hikmLl〆
ふ'叫lli8hi !V(I.!,:rmL"I/."/"(I. ' . f-/ùldυ:i//."Ì 1 1 , ," II :I/.IL 1イiy"lt. i-m :..
l へ' l‘ I { 日 pO kC: l l Li\. l lば u a民ν ï ‘rf. l l l討 b L i u l 1 ! { c;sl'a rdl Ld )(J1'<I LOrir'行
I
l
t
,、
j
コー三-J..
1 [ i km i c l ; l i . 日 日 i ka-cl ! " . 出 ) r H k l l-� I I I 1 ,
I(yulo (i l �)- りゴバ8.
": t ; rad Ll(l t(� �づ 仁hool υr ] I 1 rorJ I I <L l i o l 1 ト允iCI 1仁川 .\'al"a I I 1S L i L lI Lι, u l
討ιiellCl' u l 1 d '1'1'仁 1 ! I 1 0 l o 仏〉
SD ! fト九 Tnk.りん1 I I l < l - c! I O . l ko I 1 Ja- s l ! i . N山ra . (i : 30べ ) I IJ l . J A P A N
E- I I 1 H i l : 1 1 山kn I I l U I,l:(�sl t . <I t. r ι u .j p
J3 i I l Wcl,d spl'cch recogl l i l i o l l :;_VはCIII凶, W i l. h t l l C' I IS(' 01' vi・
S l I 孔1 i l 1 lol'l l l n t i o l l lo 此 I p p l C' l l l C ' l l l nco l I日 l i c i l l l'O l' l l laLio l l , have
bじれ11 討I !OW I I lυ y i dc! bel Ll.'r l'ccog l l i L i o l l Pl�l'l'Olïl lはllCt' l l < u l l
j) lI l'l'ly aCO ll討しic SYSll!ll1叫 U凶pl'c i u l l_v w l ! e l l backgl'oll l lcl l loi話。
is prcsl' l I l .
'l'hc eurly i l l legl'a l. i o l l はraLc日y ro r IH l l\ 1 ・ ba剖可l
孔lIc1io・vislIal 討pccch rl,cog l l i l i o l l i討 olle p rol 1 1 i s i l l !!; approac h ,
w l w r<.' L l !c Oll t p u t prolml> i l i ty is obLllI lCcI b y procl u! ' L 01' o l l L­
i八l L probal ) i l i Ll's 01' H lI d i o I I l I d Vil>lIH] 刈l'l'a l l ll:i .
'1' h is pap�n'
)u.lcl r('ssc's II l I uvd 1川、I;h(}d w l ! ich o p l i l l l i i:I:'S 叫reat 11 weigh ts
討0 ,附 1,0 l l l HX l l l l lZl' rl'仁ogll i Li o l l Iwrl'O l' l l l i l l 1じい
' l'hc proposecl
l I 11'l hod l'sl.i l l l a le吋 L l !l‘ sl.rea l l l wei gh t 討 based 0 1 1 a l lO lï l l a l­
iZl,d 10以 l i kd i h()ud w h i c l ! j" dCl'i vl,'d :).\' rat:io 01' l i kc l i ll<川d
0 1 礼 c()rr<�cf. \\'ord ë l1 11t h i広I !l明L l i kc ] i hl川d o r i l 1じりn・t'ιl. word日
'l'he iso!aLccl \Vu rcl 1'(、CO民l I i ti o l l l'Xperi l l lClll. rc凶1I 1 t持 活I !υw that
t l w 比lI d io- v i行 lI H I spt'l'ch rl'co!< l I i l io l l hy pl ・ o p ose c l 1 1 1日Lhud <ll,­
L比i n:吋 !j(i . 三 沢
: Iυ
d I S ) . ;):J . � ';'イ
( lld 11) a 11 d
l ;) . � �;�
1 三りd H )
b c L Ler pι、 r l ú n l l 仏 1 1 ( '( ' じUI I I { 川 l 川I Lu L h a l ( J l l l y 1I討111民 all d io i l l­
{( 1 I' 1 1 1 は l iul l . ' ] 、 h l' 1 川 1 I 1 ts alsu s l l U\\' t h(.' pruj Ju,;くえ1 I l l uL lioc! c a ll
1'(.�d l iC(-, ;1 1 1 1I 1 1 I 1 , ,'r u r , H l a p l a l iU11 \\'u rdパ
24
279
Fly UP