{"id":2903,"date":"2023-11-19T02:40:07","date_gmt":"2023-11-18T17:40:07","guid":{"rendered":"https:\/\/saraheee.com\/?p=2903"},"modified":"2023-11-20T23:39:09","modified_gmt":"2023-11-20T14:39:09","slug":"hidden-markov-models-three-key-problems-and-algorithms","status":"publish","type":"post","link":"https:\/\/saraheee.com\/ko\/2023\/11\/hidden-markov-models-three-key-problems-and-algorithms\/","title":{"rendered":"Hidden Markov Models &#038; three key Problems and Algorithms"},"content":{"rendered":"<h3 class=\"wp-block-heading\">Contents<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\">Introduction\nHidden Markov Models\nThree Key HMM Problems\n\n1) Forward-Backward algorithm\n2) Viterbi algorithm\n3) Baum-Welch algorithm<\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>This is a compilation of a series of videos in which Prof. Patterson describes the Hidden Markov Model, starting with the Markov Model and proceeding to the three core questions for HMMs.<br>A Hidden Markov Model is a machine learning model for predicting sequences of states from indirect observations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Hidden Markov Model<\/h2>\n\n\n\n<p>Markov property: a mathematical modeling constraint<\/p>\n\n\n\n<p>Discrete markov model<br>: discrete if it has distinct states that the model can be in at a given moment<\/p>\n\n\n\n<p>At discrete times the system changes state (possibly to the same state) according to the transition probabilities<\/p>\n\n\n\n<p>\\(P(q_t = S_z | q_{t-1} = S_y, &#8230; , q_2 = S_j, q_1 = S_i)\\)<\/p>\n\n\n\n<p>A full probabilistic description of a given system would require knowledge of the current state and all previous states<\/p>\n\n\n\n<p>But not so with the Markov property<\/p>\n\n\n\n<p>\\(P(q_t = S_j | q_{t-1} = S_i)\\)<\/p>\n\n\n\n<p>If we assume that our transition probabilities don&#8217;t vary over time then<\/p>\n\n\n\n<p>\\(a_{ij} = P(q_t = S_j | q_{t-1} = S_i) 1 \\leq i, j \\leq N\\)<\/p>\n\n\n\n<p>\\(a_{ij} \\geq 0\\)<\/p>\n\n\n\n<p>\\(\\sum_{j=1}^{N} a_{ij} = 1\\)<\/p>\n\n\n\n<p>zero means we&#8217;re not going to make that transition ever<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">An Observable Markov Model<\/h5>\n\n\n\n<p>Given the model is in a known state, what is the probability that it will stay in that state for d days?<\/p>\n\n\n\n<p>The formal observation sequence is<\/p>\n\n\n\n<p>\\(O = \\{S_i, S_i, S_i, &#8230;, S_i, S_j \\neq S_i\\}\\)<\/p>\n\n\n\n<p>\\(P(O|Model, q_1 = S_i) = (a_{ii})^{d-1}(1-a_{ii}) = p_i(d)\\)<\/p>\n\n\n\n<p>This is the discrete probability density function of duration d in state i \\(p_i(d)\\)<\/p>\n\n\n\n<p>From this we can calculate the average stay in any state conditioned on the model and starting in the state<\/p>\n\n\n\n<p>\\(\\bar{d_i} = \\sum_{d=1}^{\\infty}dp_i(d)\\)<\/p>\n\n\n\n<p>\\(\\bar{d_i} = \\sum_{d=1}^{\\infty}d(a_{ii})^{d-1}(1-a_{ii}) = \\frac{1}{1-a_{ii}}\\)<\/p>\n\n\n\n<p>So, the expected number of consecutive <strong>sunny<\/strong> days (given the model) is \\(\\frac{1}{1-a_{ii}} = \\frac{1}{0.2} = 5\\)<\/p>\n\n\n\n<p>Imagine a person flipping a coin behind a curtain<br>Each result is called out and becomes an observation<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>How should we model this situation?<\/li>\n\n\n\n<li>Should we assume a single fair coin is being flipped?<\/li>\n\n\n\n<li>Should we assume the caller is accurate?<\/li>\n<\/ul>\n\n\n\n<p>A Hidden Markov Model has N states<br>They are hidden, but they typically correspond to something we know about the world<br>States are generally connected aka <strong>ergodic<\/strong> S = \\(\\{S_1, S_2, S_3, S_4, S_5, &#8230;, S_N\\}\\)<br>state at time t is \\(q_t\\), \\(\\forall i : q_i \\in S (1 \\leq i \\leq t)\\)<\/p>\n\n\n\n<p>A Hidden Markov Model has M observable symbols<br>They are what can be observed<br>They are a discrete alphabet<br>V = \\(\\{V_1, V_2, V_3, V_4, V_5, &#8230;, V_M\\}\\)<\/p>\n\n\n\n<p>A Hidden Markov Model has State transition matrix A<\/p>\n\n\n\n<p>A = \\({a_{ij}}\\)<\/p>\n\n\n\n<p>\\(a_{ij} = P(q_t = S_j | q_{t-1} = S_i) 1 \\leq i, j \\leq N\\)<\/p>\n\n\n\n<p>if a_{ij} = 0 then a transition between \\(S_i\\) and \\(S_j\\) is not possible<\/p>\n\n\n\n<p>\\(\\sum_{j=1}^{N}a_{ij} = 1\\)<\/p>\n\n\n\n<p>A Hidden Markov Model has Observation probabilities<\/p>\n\n\n\n<p>B = \\({b_j(k)}\\)<\/p>\n\n\n\n<p>\\(b_j(k) = P(v_k at t | q_t = S_j)\\), 1 \u2264 j \u2264 N, 1 \u2264 k \u2264 M<\/p>\n\n\n\n<p>A Hidden Markov Model has Initial state probabilities<\/p>\n\n\n\n<p>\\(\\pi = {\\pi_i}\\)<\/p>\n\n\n\n<p>\\(\\pi_i = P(q_1 = S_i) 1 \\leq i \\leq N\\)<\/p>\n\n\n\n<p>HMM = {N, M, A, B, <em>\u03c0<\/em>}<\/p>\n\n\n\n<p>\\(O = {O_1, O_2, O_3, &#8230;, O_T}\\)<\/p>\n\n\n\n<p>To generate observations<br>Choose \\(q_1 = S_i\\) according to \\(\\pi_i\\)<br>Set t = 1<br>Choose \\(O_t = v_k\\) according to B ie \\(b_i(k)\\)<br>Transit to a new state \\(q_{t+1} = S_j\\) according to A ie \\(a_{ij}\\)<br>set t = t + 1 repeat until t &gt; T<\/p>\n\n\n\n<p>Markov Model is a special case of a Hidden Markov Model<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One in which the observation distribution has one non-zero entry<\/li>\n\n\n\n<li>There is one observation symbol for each state<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Three Key HMM Problems<\/h4>\n\n\n\n<p>What is the probability that a model generated a sequence of observations?<\/p>\n\n\n\n<p>\\(O = {O_1, O_2, O_3, &#8230;, O_T}, \\lambda = (A, B, \\pi)\\)<br>\\(P(O|\\lambda) ?\\)<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Given a sequence of observations, which apartment is the robot in?<\/li>\n\n\n\n<li>Which model best explains the observations?<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Forward-Backward algorithm<\/h3>\n\n\n\n<p>Let&#8217;s start by imagining all possible state sequences<\/p>\n\n\n\n<p>\\(Q = q_1, q_2, q_3, &#8230;, q_T\\)<\/p>\n\n\n\n<p>Probability of seeing observations given those states is<\/p>\n\n\n\n<p>\\(P(O|Q, \\lambda) = \\prod_{t=1}^{T}P(O_t|q_t, \\lambda)\\)<\/p>\n\n\n\n<p>\\(P(O|Q, \\lambda) = b_{q1}(O_1) \\cdot b_{q2}(O_2) \\cdots b_{qT}(O_T)\\)<\/p>\n\n\n\n<p>Probability of seeing those state transitions is \\(P(Q|\\lambda) = \\pi_{q1q2}a_{q2q3} \\cdots a_{qT-1qT}\\)<\/p>\n\n\n\n<p>Probability of those seeing observations <strong>and<\/strong> those state transitions is P(O, Q|\u03bb) = P(O|Q, \u03bb)P(Q|\u03bb)<\/p>\n\n\n\n<p>But we want the probability of the observations regardless of the particular state sequence P(O|\u03bb) = \\(\\sum_{all Q}\\)P(O|Q, \u03bb)P(Q|\u03bb)<\/p>\n\n\n\n<p>Calculating this directly is infeasible P(O|\u03bb) = \\(\\sum_{q_1,q_2,\u2026,q_T}\\pi_{q_1}b_{q_1}(O_1)a_{q_1q_2}b_{q_2}(O_2)a_{q_2q_3} \\cdots a_{q_{T-1}q_T}b_{q_T}(O_T)\\)<\/p>\n\n\n\n<p>How many state sequences are there? \\(N^T\\)<\/p>\n\n\n\n<p>How many multiplications per state sequence? 2T-1<\/p>\n\n\n\n<p>Total number of operations? \\((2T-1)N^T + (N^T-1)\\)<\/p>\n\n\n\n<p>T = 100 and N = 5, How many operations?<\/p>\n\n\n\n<p>\\((2T-1)N^T + (N^T-1) = (2(100) &#8211; 1)5^100 + (5^100 &#8211; 1) = 199 \\cdot 5^100 + 5^100 &#8211; 1 = 200 \\cdot 5^100 &#8211; 1 \\approx 5^103 \\approx 10^{72}\\)<\/p>\n\n\n\n<p>We need a better way The forward-backward algorithm<\/p>\n\n\n\n<p>Let&#8217;s introduce a helper function<\/p>\n\n\n\n<p>\\(\\alpha_t(i) = P(O_1, O_2, O_3, &#8230;, O_t, q_t = S_i | \\lambda)\\)<\/p>\n\n\n\n<p>Let&#8217;s solve for it inductively<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>base case: \\(\\alpha_1(i) = \\pi_i b_i(O_1)\\), 1 \u2264 i \u2264 N<\/li>\n\n\n\n<li>inductive step:<\/li>\n<\/ul>\n\n\n\n<p>\\(\\alpha_{t+1}(j) = [\\sum_{i=1}^{N}\\alpha_t(i)a_{ij}]b_j(O_{t+1})\\), 1 \u2264 t \u2264 T-1,  1 \u2264 j \u2264 N<\/p>\n\n\n\n<p>final step: P(O|\u03bb) = \\(\\sum_{i=1}^{N}\\alpha_T(i)\\)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-1-1024x811.png\" alt=\"\" class=\"wp-image-2988\" width=\"512\" height=\"406\" srcset=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-1-1024x811.png 1024w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-1-300x237.png 300w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-1-768x608.png 768w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-1.png 1526w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n<p>\\(O(TN^T) \\approx (2T-1)N^T + (N^T-1) \\approx 10^{72} calculations\\)<\/p>\n\n\n\n<p>\\(O(N^2 T) \\approx 3000 calculations\\), N = 5, T = 100<\/p>\n\n\n\n<p>The forward part is all we need to solve the first problem \\(\\alpha_t(i)\\)<\/p>\n\n\n\n<p>But let&#8217;s do the same thing backward to help solve the other problems<\/p>\n\n\n\n<p>\\(\\beta_t(i) = P(O_{t+1}, O_{t+2}, &#8230; O_T | q_t = S_i, \\lambda)\\)<\/p>\n\n\n\n<p>Let&#8217;s solve for it inductively<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>base case: \\(\\beta_T(i) = 1\\), 1 \u2264 i \u2264 N<\/li>\n\n\n\n<li>inductive step: <\/li>\n<\/ul>\n\n\n\n<p>\\(\\beta_t(i) = \\sum_{j=1}^{N}a_{ij}b_j(O_{t+1})\\beta_{t+1}(j)\\), t = T-1, T-2, &#8230;, 1, 1 \u2264 i \u2264 N<\/p>\n\n\n\n<p>\\(\\beta_t(i) = P(O_{t+1}, O_{t+2}, &#8230;O_T | q_t = S_i, \\lambda\\)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-2-741x1024.png\" alt=\"\" class=\"wp-image-2994\" width=\"371\" height=\"512\" srcset=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-2-741x1024.png 741w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-2-217x300.png 217w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-2-768x1061.png 768w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-2.png 918w\" sizes=\"(max-width: 371px) 100vw, 371px\" \/><\/figure><\/div>\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Motivating the Viterbi algorithm<\/h3>\n\n\n\n<h5 class=\"wp-block-heading\"><strong><mark style=\"background-color:var(--book-reviews-highlight)\" class=\"has-inline-color has-book-reviews-quotation-color\">Q1. What is the probability that a model generated a sequence of observations?<\/mark><\/strong><\/h5>\n\n\n\n<p>\\(O = \\{O_1, O_2, O_3, &#8230;, O_T\\}, \\lambda = (A, B, \\pi)\\)<br>\\(P(O|\\lambda) ?\\)<\/p>\n\n\n\n<p>P(O|\u03bb) = \\(\\sum_{i=1}^{N}\\alpha_T(i)\\)<\/p>\n\n\n\n<p>\\(\\alpha_t(i) = P(O_1, O_2, O_3, &#8230;, O_t, q_t = S_i | \\lambda\\)<br>\\(\\beta_t(i) = P(O_{t+1}, O_{t+2}, &#8230;O_T | q_t = S_i, \\lambda\\)<\/p>\n\n\n\n<p><strong>The Forward-Backward Algorithm<\/strong><\/p>\n\n\n\n<p><\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong><mark style=\"background-color:var(--book-reviews-highlight)\" class=\"has-inline-color has-book-reviews-quotation-color\">Q2. What sequence of states best explains a sequence of observations?<\/mark><\/strong><\/h5>\n\n\n\n<p>\\(O = \\{O_1, O_2, O_3, &#8230;, O_T\\}\\)<br>\\(Q = \\{q_1, q_2, q_3, &#8230;, q_T\\} ?\\)<\/p>\n\n\n\n<p>What does &#8220;best&#8221; mean?<\/p>\n\n\n\n<p>1) choose states that are individually most likely \\(\\gamma_t(i) = P(q_t = S_i | O, \\lambda)\\)<\/p>\n\n\n\n<p>\\( \\gamma_t(i) = \\frac{\\alpha_t(i)\\beta_t(i)}{P(O|\\lambda)} = \\frac{\\alpha_t(i)\\beta_t(i)}{\\sum_{j=1}^{N}\\alpha_t(j)\\beta_t(j)}\\)<\/p>\n\n\n\n<p>\\(\\sum_{i=1}^{N}\\gamma_t(i) = 1\\)<\/p>\n\n\n\n<p>\\(q_t = argmax_{1 \\leq i \\leq N}[\\gamma_t(i)]\\), 1 \u2264 t \u2264 T<\/p>\n\n\n\n<p>(1) This will maximize the expected number of correct states, but &#8230; it might not make sense<\/p>\n\n\n\n<p>(2) choose a <strong>path sequence<\/strong> that maximizes P(Q|O, \u03bb)<\/p>\n\n\n\n<p>P(Q, O|\u03bb)<br>\\(P(\\{q_1, q_2, q_3 &#8230; q_T\\}, \\{O_1, O_2, O_3,  &#8230;, O_T\\}|\\lambda)\\)<\/p>\n\n\n\n<p><strong>The Viterbi Algorithm<\/strong><\/p>\n\n\n\n<p>What is the path with the highest probability that accounts for the first tt observations and ends at state \\(S_i\\) ?<\/p>\n\n\n\n<p>\\(\\delta_t(i) = max_{q_1,q_2,\u2026q_{t-1}}P({q_1,q_2,q_3 \\cdots q_t = i}, {O_1,O_2,O_3, \\cdots O_t}|\\lambda)\\)<\/p>\n\n\n\n<p>induction step<\/p>\n\n\n\n<p>\\(\\delta_{t+1}(j) = [max_i \\delta_t(i)a_{ij}]\\cdot b_j(O_{t+1})\\)<\/p>\n\n\n\n<p>We need to keep track of which i maximized the result at each time step<\/p>\n\n\n\n<p>Initialization \\(\\delta_1(i) = \\pi_i b_i(O_1)\\), \\(\\psi_i(i) = 0\\)<\/p>\n\n\n\n<p>Inductive step<\/p>\n\n\n\n<p>\\(\\delta_t(j) = max_{1 \\leq i \\leq N}[\\delta_{t-1}(i)a_{ij}]\\cdot b_j(O_t)\\)<br>\\(\\psi_t(j) = argmax_{1 \\leq i \\leq N}[\\delta_{t-1}(i)a_{ij}]\\), 2 \u2264 t \u2264 T, 1 \u2264 j \u2264 N<\/p>\n\n\n\n<p>Termination<\/p>\n\n\n\n<p>\\(P^{*} = max_{1 \\leq i \\leq N}[\\delta_T(i)]\\), \\(q_T^{*} = argmax_{1 \\leq i \\leq N}[\\delta_T(i)]\\)<\/p>\n\n\n\n<p>The <strong>path sequence<\/strong> that maximizes P(Q, O|\u03bb)<\/p>\n\n\n\n<p>\\(\\delta_t(j) = max_{1 \\leq i \\leq N}[\\delta_{t-1}(i)a_{ij}]\\cdot b_j(O_t)\\)<\/p>\n\n\n\n<p>\\(\\alpha_t(j) = [\\sum_{i=1}^{N}\\alpha_{t-1}(i)a_{ij}] b_j(O_t)\\)<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h5 class=\"wp-block-heading\"><strong><mark style=\"background-color:var(--book-reviews-highlight)\" class=\"has-inline-color has-book-reviews-quotation-color\">Q3. Given a set of observations sequences how to we learn the model probabilities that would generate them?<\/mark><\/strong><\/h5>\n\n\n\n<p>Review our Resources<\/p>\n\n\n\n<p>\\(O = \\{O_1, O_2, O_3, &#8230;, O_T\\}\\)<br>\\(\\lambda = (A, B, \\pi) ?\\)<\/p>\n\n\n\n<p>\\(\\alpha_t(i) = P(O_1, O_2, O_3, &#8230;, O_t, q_t = S_i | \\lambda)\\)<\/p>\n\n\n\n<p>\\(\\beta_t(i) = P(O_{t+1}, O_{t+2}, O_3, &#8230;, O_T, q_t = S_i | \\lambda)\\)<\/p>\n\n\n\n<p>\\(\\gamma_t(i) = P(q_t = S_i | O, \\lambda)\\)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Baum-Welch Algorithm<\/h4>\n\n\n\n<p>There is no known way to solve for the globally optimal parameters of lambda<\/p>\n\n\n\n<p>We will search for a locally optimal result<br>A result that converges to a stable good answer but isn&#8217;t guaranteed to be the best answer.<\/p>\n\n\n\n<p>This is an EM method<br>&#8211; Expectation-Maximization (EM)<br>&#8211; Iterative, converges to a local optimum<\/p>\n\n\n\n<p>aka gradient &#8220;descent&#8221;<\/p>\n\n\n\n<p>We need a new mathematical tool<\/p>\n\n\n\n<p>\\(\\xi_t(i, j) = P(q_t = S_i, q_{t+1} = S_j | O, \\lambda)\\)<\/p>\n\n\n\n<p>\\(\\xi_t(i, j) = \\frac{\\alpha_t(i)a_{ij}b_j(O_{t+1})\\beta_{t+1}(j)}{P(O|\\lambda)}\\)<\/p>\n\n\n\n<p>\\(\\xi_t(i, j) = \\frac{\\alpha_t(i)a_{ij}b_j(O_{t+1})\\beta_{t+1}(j)}{\\sum_{i=1}^{N}\\sum_{j=1}^{N}\\alpha_t(i)a_{ij}b_j(O_{t+1})\\beta_{t+1}(j)}\\)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-5-1024x545.png\" alt=\"\" class=\"wp-image-3024\" width=\"512\" height=\"273\" srcset=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-5-1024x545.png 1024w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-5-300x160.png 300w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-5-768x409.png 768w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-5-1536x818.png 1536w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-5.png 1874w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n<p>\\(\\xi_t(i, j)\\) is related to \\(\\gamma_t(i)\\)<\/p>\n\n\n\n<p>\\(\\gamma_t(i) = \\sum_{j=1}^{N}\\xi_t(i, j)\\)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-7.png\" alt=\"\" class=\"wp-image-3029\" width=\"458\" height=\"361\" srcset=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-7.png 916w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-7-300x236.png 300w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-7-768x605.png 768w\" sizes=\"(max-width: 458px) 100vw, 458px\" \/><\/figure><\/div>\n\n\n<p>\\(\\gamma_t(i)\\) is the probability of being in \\(S_i\\) at time t<\/p>\n\n\n\n<p>if we sum over all t then we have a number that can be treated as the expected number of times \\(S_i\\) is ever visited<\/p>\n\n\n\n<p>\\(\\xi_t(i, j)\\) is the probability of transitioning from \\(S_i\\) to \\(S_j\\) at time t<\/p>\n\n\n\n<p>if we sum over all t then we have a number that can be treated as the expected number of times \\(S_i\\) ever transitions to \\(S_j\\)<\/p>\n\n\n\n<p>\\(\\sum_{t=1}^{T-1}\\gamma_t(i)\\) = expected number of transitions from \\(S_i\\)<\/p>\n\n\n\n<p>\\(\\sum_{t=1}^{T-1}\\xi_t(i, j)\\) = expected number of transitions from \\(S_i\\) to \\(S_j\\)<\/p>\n\n\n\n<p>So now<br>&#8211; We have an existing model, \u03bb = (A, B, \u03c0)<br>&#8211; We have a set of observations, O<br>&#8211; We have a set of tools \\(\\alpha_t(i), \\beta_t(i), \\gamma_t(i), \\xi_t(i, j)\\)<\/p>\n\n\n\n<p>How do we use these to improve our model? \\(\\bar{\\lambda}\\) = ?<\/p>\n\n\n\n<p>\\(\\bar{\\pi} = \\gamma_1(i)\\) = expected frequency in \\(S_i\\) at time (t = 1)<\/p>\n\n\n\n<p>\\(\\bar{a_{ij}} = \\frac{\\text{expected number of transitions from } S_i \\text{ to } S_j}{\\text{expected number of transitions from } S_i}\\)<\/p>\n\n\n\n<p>\\(\\bar{a_{ij}} = \\frac{\\sum_{t=1}^{T-1}\\xi_t(i,j)}{\\sum_{t=1}^{T-1}\\gamma_t(i)}\\)<\/p>\n\n\n\n<p>\\(\\bar{b_j(k)} = \\frac{\\text{expected number of times in state j and observing } v_k}{\\text{expected number of times in state j}}\\)<\/p>\n\n\n\n<p>\\(\\bar{b}_j(k) = \\frac{\\sum_{t=1 s.t.O_t=v_k}^{T}\\gamma_t(i)}{\\sum_{t=1}^{T}\\gamma_t(i)}\\)<\/p>\n\n\n\n<p>Given \u03bb = (A, B, \u03c0) and O we can produce \\(\\alpha_t(i), \\beta_t(i), \\gamma_t(i), \\xi_t(i, j)\\)<\/p>\n\n\n\n<p>Given \\(\\alpha_t(i), \\beta_t(i), \\gamma_t(i), \\xi_t(i, j)\\) we can produce \\(\\bar{\\lambda} = (\\bar{A}, \\bar{B}, \\bar{\\pi})\\)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-8-1024x735.png\" alt=\"\" class=\"wp-image-3032\" width=\"512\" height=\"368\" srcset=\"https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-8-1024x735.png 1024w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-8-300x215.png 300w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-8-768x552.png 768w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-8-1536x1103.png 1536w, https:\/\/saraheee.com\/wp-content\/uploads\/2023\/11\/image-8.png 1540w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">References<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prof. Donald J. Patterson(djp3), (1\/12) Hidden Markov Models 01: The Markov Property, Mar 20, 2020 &#8211; Apr 10, 2020, <a href=\"https:\/\/youtu.be\/J_y5hx_ySCg?si=EXAPxw9ZDfNdgSWN\" rel=\"noopener\">https:\/\/youtu.be\/J_y5hx_ySCg?si=EXAPxw9ZDfNdgSWN<\/a><\/li>\n\n\n\n<li>~<\/li>\n\n\n\n<li>Prof. Donald J. Patterson(djp3), (12\/12) Hidden Markov Models 12: the Baum-Welch algorithm, Apr 10, 2020, <a href=\"https:\/\/youtu.be\/JRsdt05pMoI?si=VYCOu8N6Ps6Akbll\" rel=\"noopener\">https:\/\/youtu.be\/JRsdt05pMoI?si=VYCOu8N6Ps6Akbll<\/a><\/li>\n<\/ul>\n\n\n\n<p><\/p>","protected":false},"excerpt":{"rendered":"<p>This is a compilation of a series of videos in which Prof. Patterson describes the Hidden Markov Model, starting with the Markov Model and proceeding to the three key questions for HMMs. A Hidden Markov Model is a machine learning model for predicting sequences of states from indirect observations.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,145],"tags":[150,148,147,151,149],"class_list":["post-2903","post","type-post","status-publish","format-standard","hentry","category-uncategorized","category-mtd","tag-baum-welch-algorithm","tag-forward-backward-algorithm","tag-hidden-markov-models","tag-nov-18-2023","tag-viterbi-algorithm"],"_links":{"self":[{"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/posts\/2903"}],"collection":[{"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/comments?post=2903"}],"version-history":[{"count":123,"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/posts\/2903\/revisions"}],"predecessor-version":[{"id":3079,"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/posts\/2903\/revisions\/3079"}],"wp:attachment":[{"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/media?parent=2903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/categories?post=2903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/saraheee.com\/ko\/wp-json\/wp\/v2\/tags?post=2903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}