208 lines
11 KiB
Plaintext
208 lines
11 KiB
Plaintext
An Introduction to Hidden Markov Models
|
||
L. R. .Rabiner B. H. Juang
|
||
ThebasictheoryofMarkovchains hasbeen known to mathematicians and engineersfor close to 80 years, but it is only in the past decadethat it has been applied explicitly to problemsinspeech processing. Oneof the major reasons why speech models, baseodn Markovchains, havenot been developed until recently was the lack of a methodfor optimizing the parameters of the Markov mtoodmelatch observed signal patterns. Such a method was proposedin the late1960’s and was immediately appliedto speech processing in several research institutions. Continiued refinementsin the theory and implementation of Markov modelling techniques have greatly enhanced the method, leadintgo awide,range of applications ofthesemodels. It is the purpose of this tutorial paper to give an introduction to,the theory.of Markov models, and to illustrate how theyhave been appliedto problems in speech recognition.
|
||
|
||
4 IEEE ASSP MAGAZINE JANUARY 1986
|
||
|
||
0740-7467/86/0100-0004$01.00@198I6EEE
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
appropriate excitationT. he easiest watyhen to addressthe den Markov models successfully treat these problemusn-
|
||
|
||
time-varying natureof theaprocess is to view it as a direct der a probabilistic or statistical framework.
|
||
|
||
concatenation of thesesmaller ”shorttime” segments, It is thus the purpose of this paper to explain- what a
|
||
|
||
eachsuchsegment being individually represented by.a hiddenJvlarkov model is, why it is appropriate for certain
|
||
|
||
linear system model. In other words, the overall model is types of problems, and how it can be used in practice. In
|
||
|
||
a synchronous sequence of symbols where each of the the next section, we illustrate hidden Markovmodels via
|
||
|
||
symbols is a linear system model representing a short seg- some simple coin tossexamplesand outline thethree
|
||
|
||
,merit of theprocess. In a sense this type of approach fundamental problems associatedwith the modelintegch-
|
||
|
||
models the observed signal using representative tokoefns nique. We then discuss how these problems can be solved
|
||
|
||
thesignal itself (or somesuitablyaveragedset of such in Section Ill. We will not direct our general discussionto
|
||
|
||
,signals if we have multiple observations).
|
||
|
||
any one particular problemb,ut at theend of this paperwe
|
||
|
||
Time-varying processes Modeling time-varying processes with the aboveap-
|
||
|
||
illustrate how HMM’s are used viaa couple ofexamples in speech recognition.
|
||
|
||
proach assumes thateverysuch short-time segmentof
|
||
|
||
observation is a unit with a prechosen duration. In gen- DEFINITION OFA HIDDENMARKOV MODEL
|
||
|
||
eral,hqwever, there doesn’texist a precise procedure to decide what the unit duration shouldbe so that both the time-invariant assumption holds, and the short-time linear system models (as well as concatenationof the models) are meaningful. In most physical systems, thdeuration of ashort-time segment is determined empirically. In
|
||
|
||
An HMM is a doubly stochastic process with an under-
|
||
|
||
lying stochastic process that is not observable (it is hid-
|
||
|
||
den), but can only be observed through another set of
|
||
|
||
stochastic processes that produce the sequence of
|
||
|
||
ob-
|
||
|
||
served symbols. We illustrate HMM’s with the following
|
||
|
||
coin toss’example.
|
||
|
||
I
|
||
|
||
many processes,of.course,one would neither expect the
|
||
|
||
properties of the process to change synchronously with Coin toss example
|
||
|
||
every unit analysis duration, nor observe drastic changes To understand the concept of the HMM, consider the
|
||
|
||
from each unit to the nextexcept at certaininstances. following simplified example.Youare in a room with a
|
||
|
||
Making nofurther assumptions aboutthe relationship be- barrier (e.g., a,curtain)throughwhich youcannot see
|
||
|
||
tween adjacent short-time models, and treating temporal what is happening. On the other side of thebarrier is
|
||
|
||
variations, small or large, as “typical” phenomena in the a,notherperson who is performing a coin (or multiple
|
||
|
||
observed signal, are key featuresin the above direct con- coin) tossing experiment. The other person will not tell
|
||
|
||
catenationtechnique.Thistemplateapproach to signal you anything about what heis doing exactly; he will only
|
||
|
||
modeling has proven to be quite useful and has been the tell you the result of each coin flip. Thus a sequence of
|
||
|
||
basis of a wide variety of speech recognition systems.
|
||
|
||
hidden coin tossing experiments is performed, and you
|
||
|
||
There are good reasontos suspect, atthis point, that the only observe the results of the coin tosses, i.e.
|
||
|
||
above approach, while useful, may not be the most effi-
|
||
|
||
-cient (interms of computation, storage, parameters etc.)
|
||
|
||
techniqueas far as representation is concerned. Many real
|
||
|
||
0,o20 3..... . :. .* . OT
|
||
|
||
world processesseem to manifest a rathersequentially
|
||
x changing behavior; theproperties ofthe process are usu- where stands for heads and T stands for tails.
|
||
|
||
ally held pretty steadily,except for minor fluctuations, Given the above experiment, thperoblem is how dowe
|
||
|
||
for a certain period of time (or a number of the above- build an HMM toexplain the observed sequencoef heads
|
||
|
||
mentioned durationunits), and then,at certain instances, and tails. One possible model is shown in Fig. l a . We call
|
||
|
||
change (graduallyor rapidly) to another set of properties. this the “l-fair coin” model. There are two states in the
|
||
|
||
The opportunity for more efficient modeling can be ex- model, but each state is uniquely associated with either
|
||
|
||
.plaited if wecan first identify theseperiods of rather heads (state 1) or tails (state 2). Hence this model is not
|
||
|
||
steadily behavior, andthen are willing toassume that the hidden because the observation sequence uniquely de-
|
||
|
||
temporal variations within each of these steady periods fines the state. The model represents a “fair coin” because
|
||
|
||
are, in a sense, statistical. A more efficient representation the probability.of generating a head (or a tail) following a
|
||
|
||
may then be obtained by using a common short .time head (or a tail) is 0.5; hence thereis no bias onthe current
|
||
|
||
model for each of the steady, or well-behaved partsof the observation: This is a degenerate example and showhsow
|
||
|
||
signal, along with some characterizationofhow one independent trials,like tossing of a fair coin, can beinter-
|
||
|
||
such period evolves to the next. This is howhidden preted as a set of sequentialevents. Of course, if the
|
||
|
||
Markov models (HMM) come about. Clearly, three prob- person behind th.e barrier is, in fact, tossing a single fair
|
||
|
||
lems have to be addressed: 1) howz’these steadily or dis- coin, this model should explain the outcomes very well.
|
||
|
||
tinctively behaving periods can be identified, 2) how the A, second possible HMM for explaining the observed
|
||
|
||
“sequentially”evolvingnature of theseperiods canbe sequence of coin toss outcomesis given iri Fig. I b . We call
|
||
|
||
characterized, and 3) what typical or common short time this modelthe “2-faircoin” model. There are aga2instates
|
||
|
||
model should be chosen for each of these periods. Hid- in the model, but neitherState is uniquelyassociated with
|
||
|
||
5 JANUARY 1986 IEEE ASSP MAGAZINE
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
.. vathn probabilitydistributions which, of course, repre- Usingthemodel, an observationsequence, 0 =
|
||
|
||
sent random variables or stochastic processes.
|
||
|
||
0, Op, .,OT,is generated as follows:
|
||
|
||
7 JANUARY 1986 IEEE ASSPMAGAZINE .
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
9 JANUARY 1986 IEEE ASSP MAGAZINE
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
10 IEEE ASSPMAGAZINEJANUARY
|
||
|
||
1986
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
sequence giventhe model. Thisis the most difficult of the three problems wehave discussed. Thereis no known way to solve for a maximum likelihoodmodel analytically. Therefore aniterative procedure, suchas the Baum-Welch method, or gradient techniques for optimization must be used. Here wewill only discuss the iterative procedure. It appears that with this procedure, the physical meaningof various parameter estimates canbe easily visualized.
|
||
To describe how we (re)estimate HMM parameters, we first define t,(i,j).as
|
||
i.e. the probability ofa path being in state qi at time t and
|
||
+ making a transition to state qi at time t 1, giventhe
|
||
observation sequence and themodel.' FromFig. 5 it should be clear that we can write t t ( i , j ) as
|
||
|
||
I
|
||
|
||
I
|
||
|
||
In the.above, at(i)accounts for the first t observations, ending in state qi at time t, the term aiibj(Ot+Jaccounts
|
||
for the transition to state 9j at time t + 1with the.occur-
|
||
rence of symbol Ot+,, and the term pt+l(j) accounts for
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
12 ,IEEE ASSP.MAGAZINE JANUAPY 1986
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
JANUARY 1986 IEEE ASSPMAGAZINE
|
||
|
||
1 3
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
rational information is often represented in a normalized form for word models, (sincethe word boundary is essentially known), in the form:
|
||
pi(//T) = probabidity of being in state j for exactly (/IT)of
|
||
the word, whereT is the numberof,frames in the
|
||
|
||
of Pr(0, / 1 A) is usually very large and max, Pr(0, / I A) is
|
||
usually the only significanttermin'thesummationfor Pr(0 / A ) . Therefore, in such cases,, either the forwardbackwakd procedure or theViterbialgorithm works equally well in the word recognition task.
|
||
|
||
REFERENCES
|
||
[I]Baker, J.K., "The Dragon System-AnOverview,''
|
||
IEEE Trans. on Acoustics Speech Signal Processing,
|
||
64, Vol.ASSP-23, No. 1, pp. 24-9, February 1975.
|
||
121 Jelinek, F., "Continuous Speech Recognition by Sta-
|
||
tistical Methods," Proc. /€€,E, Vol. pp. 532-556, April 1976.
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|
||
16 IEEE ASSP MAGAZINE JANUARY 1986
|
||
Authorized licensed use limited to: Technische Informationsbibliothek (TIB). Downloaded on January 20,2025 at 09:29:39 UTC from IEEE Xplore. Restrictions apply.
|
||
|