畢業(yè)設(shè)計(jì)論文 外文文獻(xiàn)翻譯 數(shù)學(xué)專業(yè)
重 慶 理 工 大 學(xué)文 獻(xiàn) 翻 譯二級學(xué)院 數(shù)學(xué)與統(tǒng)計(jì)學(xué)院 譯文:摘自:The Newton Raphson Algorithm for FunctionOptimizationKevin QuinnAssistant ProfessorDepartment of Political Science andThe Center for Statistics and the Social SciencesBox 354322, Padelford HallUniversity of WashingtonSeattle, WA 98195-4322October 25, 2001一、引言通過這個(gè)課程的學(xué)習(xí)我們將有興趣計(jì)算最大似然估計(jì)(極大似然估計(jì))。例如我們常常觀察到的復(fù)雜的非線性函數(shù)的數(shù)據(jù)。因此,通過我們的計(jì)算封閉形式的去表達(dá)這極大似然估計(jì)的形式一般是不會(huì)存在模型的牛頓拉夫森算法是一個(gè)迭代的過程,可用于計(jì)算出極大似然估計(jì)。其背后的算法的基本思想的內(nèi)容。首先,圍繞一些初步的參數(shù)值構(gòu)造一個(gè)二次近似逼近的有利函數(shù)(希望能接近最大似然估計(jì))。其次是,調(diào)整參數(shù)值讓其最大限度地提高二次近似。此過程再不斷的重復(fù)進(jìn)行,直到參數(shù)值穩(wěn)定。這就說明開始容易想象出一個(gè)函數(shù)遇到最大化的一個(gè)變量。在這種情況下開發(fā),我們轉(zhuǎn)而更為一般的情況下最大化的一個(gè)變量k的函數(shù) 。二、牛頓拉夫森算法求變量1的函數(shù)的最大值2、1泰勒系列的逼近問題 牛頓拉夫遜算法的第一部分的發(fā)展是設(shè)計(jì)一個(gè)近似函數(shù)表示似然函數(shù)就可以很容易的最大化的分析。要做到這一點(diǎn),我們需要利用泰勒定理。定理1(泰勒定理(1維)假設(shè)函數(shù)f次可微的開區(qū)間I上的,對于任意的一點(diǎn)到在I區(qū)間上存在的一點(diǎn)在到上例如:. (1) 他可以表示成為從到的方程的高階項(xiàng)從到更快于從到。這就意味著(最小值)這被稱作一階泰勒的近似函數(shù)f在x上的,小的值可以構(gòu)建一個(gè)更準(zhǔn)確的逼近函數(shù):請注意第一階泰勒的近似可以重寫為被稱為一個(gè)二階泰勒的近似函數(shù)f在上的值如: 從到.這凸顯一個(gè)事實(shí),即一階泰勒的近似的線性函數(shù)在上的。同樣的,二階泰勒的近似值可以被改寫成為:當(dāng),且。這凸顯出的一個(gè)事實(shí),即是二階泰勒近似值是在上的第二階多項(xiàng)式。2、2查找到的其最大值的二階多項(xiàng)式假設(shè)出我們想要找出的值能最大化的首先,我們計(jì)算出的一階導(dǎo)數(shù)的函數(shù)為: 我們了解到這,當(dāng)?shù)闹凳菚r(shí),其中函數(shù)的值達(dá)到最大,換句話說,我們都知道 求解我們發(fā)現(xiàn)。第二階的條件就是。這意味著的值將是最大無論什么時(shí)候當(dāng).2、3牛頓拉夫森的算法假設(shè)我們想要找到的值當(dāng)最大化的二次連續(xù)可微的函數(shù)的值。記得當(dāng),且。這就意味著:一階條件的(記為)值能最大化就是是:這就意味著。換而言之就是, 在的函數(shù)值能最大化的二階泰勒近似值為函數(shù) 考慮到這一點(diǎn),我們可以指定用于一維的函數(shù)優(yōu)化問題的牛頓拉夫森算法。算法2、1:牛頓拉夫森一維的(,公差)發(fā)表評論:找出求的值能最大化的函數(shù):當(dāng)Do回到注意事項(xiàng):注意牛頓拉夫森算法,不檢查的二階的必要條件為是最大化。這就意味著,如果你給一個(gè)錯(cuò)的開始x的值的算法,你可能最終是最小的,而不是一個(gè)最大的。2、4例如:計(jì)算二項(xiàng)式抽樣模型的極大似然估計(jì)看到牛頓拉夫森算法的工程實(shí)踐中如何讓看一個(gè)簡單的示例,二項(xiàng)式抽樣與分析解的簡單的模型。我們的對數(shù)似然函數(shù)是: 當(dāng)為樣本容量時(shí),就是成功的次數(shù),是一個(gè)取得成功的概率。一階導(dǎo)數(shù)對數(shù)似然函數(shù)是:二階導(dǎo)數(shù)對數(shù)似然函數(shù)就是:解析,我們知道的最大似然估計(jì)是:。舉一個(gè)例子,假設(shè)且。解析,我們知道的最大似然估計(jì)是。讓我們來看看如何在這種情況下解出牛頓拉夫森算法。 我們首先設(shè)置公差級別。在這種情況下的,讓將它設(shè)置為0.01(在實(shí)踐中你可能想要的東西更接近0.00001)。下一步,我們初始猜測的最大似然估計(jì)(記為)。假設(shè)。的這是在絕對值大于0.01的公差。因此我們設(shè)置為:。 現(xiàn)在我們計(jì)算出,它仍然是在絕對值大于的公差。因此我們設(shè)置為:是約等于是絕對值小于的公差,這樣我們就可以停止了。牛頓拉夫森算法返回pi的的值等于到接近0.3994,這是合理的分析值0.40。請注意,我們可以設(shè)置的容忍水平接近的牛頓拉夫森過程更準(zhǔn)確(機(jī)器精密度范圍內(nèi))。三、牛頓拉夫森算法求最大的變量的函數(shù)3、1泰勒級數(shù)逼近問題維度考慮函數(shù)至少有兩次的連續(xù)可微。假設(shè)且。然后給出一階泰勒近似值在函數(shù)上的一個(gè)被寫為:給出二階泰勒近似值在函數(shù)上的一個(gè)被寫為:當(dāng)是梯度(一階導(dǎo)數(shù)的向量)的函數(shù)在上時(shí),且是 Hessian矩陣(第二衍生矩陣)在函數(shù)屬于上時(shí)。3、2找到最大值的變量的二階多項(xiàng)式 考慮 當(dāng)是一個(gè)標(biāo)量,和是關(guān)于K-向量,且是一個(gè)的對稱矩陣,負(fù)正定矩陣。這的梯度在上表示為: 我們知道,由于最大化的值滿足能最大化的梯度將是一個(gè)零矢量, 求解 我們找出結(jié)果如: 由于被認(rèn)為是負(fù)定的,而且我們知道這就是最大的。3、3在維度的牛頓拉夫森算法假設(shè)我們要找出的最大限度地提高二次連續(xù)可微函數(shù)。記得 當(dāng)且。請注意矩陣將是對稱的,這就意味著是:再一次,最大值的一階條件就是:這就意味著:換句話說就是,向量能最大化的在的二階泰勒近似值為函數(shù): 考慮到這一點(diǎn),我們就可以指定的k維函數(shù)優(yōu)化問題的牛頓拉夫森算法。算法研究3、1:牛頓拉夫森算法的KD(,公差)發(fā)表評論:求關(guān)于的值的的最大函數(shù)。當(dāng)Do回到()。譯文:摘自: The Newton Raphson Algorithm for FunctionOptimizationKevin QuinnAssistant ProfessorDepartment of Political Science andThe Center for Statistics and the Social SciencesBox 354322, Padelford HallUniversity of WashingtonSeattle, WA 98195-4322October 25, 20011 Introduction Throughout this course we will be interested in calculating maximum likelihood estimate(MLEs). Such estimates are often extremely complicated nonlinear functions of the observed data. As a result, closed form expressions for the MLEs will generally not exist for the models we are working with. The Newton Raphson algorithm is an iterative procedure that can be used to calculate MLEs. The basic idea behind the algorithm is the following. First, construct a quadratic approximation to the function of interest around some initial parameter value (hopefully close to the MLE). Next, adjust the parameter value to that which maximizes the quadratic approximation. This procedure is iterated until the parameter values stabilize. These notes begin with the easy to visualize case of maximizing a function of one variable. After this case is developed, we turn to the more general case of maximizing a function of variables. 2 The Newton Raphson Algorithm for Finding the Maximum of a Function of 1 Variable 2.1 Taylor Series Approximations The first part of developing the Newton Raphson algorithm is to devise a way to approximate the likelihood function with a function that can be easily maximized analytically. To do this we need to make use of Taylors Theorem. Theorem 1 (Taylors Theorem (1 Dimension). Suppose the function f is times differentiable on an open interval I. For any points and in I there exists a point between and such that. (1)It can be shown that as goes to the higher order terms in equation go to 0 much faster than goes to . This means that (for small values of )This is referred to as a first order Taylor approximation of f at . A more accurate approximation to can be constructed for small values of as:This is known as a second order Taylor approximation of f at Note that the first order Taylor approximation can be rewritten as: where and . This highlights the fact that the first order Taylor approximation is a linear function in . Similarly, the second order Taylor approximation can be rewritten as:Where , and . This highlights the fact that the second order Taylor approximation is a second order polynomial in 2.2 Finding the Maximum of a Second Order Polynomial Suppose we want to find the value of that maximizesFirst, we calculate the first derivative of :We know that , where is the value of at which f attains its maximum. In other words, we know thatSolving for we find that . The second order condition is . This implies that will be a maximum whenever .2.3 The Newton Raphson AlgorithmSuppose we want to find the value of that maximizes some twice continuously differentiable function . Recallwhere , , and . This implies.The first order condition for the value of (denoted ) that maximizes isWhich implies . In other words, the value that maximizes the second order Taylor approximation to at is With this in mind we can specify the Newton Raphson algorithm for dimensional function optimization. Algorithm 2.1: NewtonRaphson1D(,tolerance) comment: Find the value of that maximizes While do return Caution: Note that the Newton Raphson Algorithm doesnt check the second order conditions necessary for to be a maximizer. This means that if you give the algorithm a bad starting value for you may end up with a min rather than a max.2.4 Example: Calculating the MLE of a Binomial Sampling ModelTo see how the Newton Raphson algorithm works in practice lets look at a simple example with an analytical solution a simple model of binomial sampling.Our log-likelihood function is:where is the sample size, is the number of successes, and is the probability of a success.The first derivative of the log-likelihood function isand the second derivative of the log-likelihood function isAnalytically, we know that the MLE is .For the sake of example, suppose and . Analytically, we know that the MLE is Lets see how the Newton Raphson algorithm works in this situation.We begin by setting a tolerance level. In this case, lets set it to (In practice you probably want something closer to ). Next we make an initial guess (denoted ) as to the MLE. Suppose . which is larger in absolute value than our tolerance of . Thus we set.Now we calculate which is still larger in absolute value than our tolerance of . Thus we set is approximately equal to which is smaller in absolute value than our tolerance of so we can stop. The Newton Raphson algorithm here returns a value of pi equal to which is reasonably close to the analytical value of . Note we can make the Newton Raphson procedure more accurate (within machine precision) by setting the tolerance level closer to .3 The Newton Raphson Algorithm for Finding theMaximum of a Function of Variables3.1 Taylor Series Approximations in DimensionsConsider a function that is at least twice continuously differentiable. Suppose and . Then the first order Taylor approximation to at is given byand the second order Taylor approximation to f at is given bywhere is the gradient (vector of first derivatives) at , and is the Hessian (matrix of second derivatives) of at .3.2 Finding the Maximum of a Second Order Polynomial in VariablesConsiderwhere is a scalar, and are k-vectors, and is a symmetric, negative definite matrix. The gradient of at isSince the gradient at the value that maximizes will be a vector of zeros we know that the maximizer satisfiesSolving for we find thatSince is assumed to be negative definite we know that this is a maximum.3.3 The Newton Raphson Algorithm in k DimensionsSuppose we want to find the that maximizes the twice continuously differentiable function Recallwhere and . Note that will be symmetric. This impliesOnce again, the first order condition for a maximum iswhich implies thatIn other words, the vector that maximizes the second order Taylor approximation to at isWith this in mind we can specify the Newton Raphson algorithm for k-dimensional function optimization.Algorithm 3.1: NewtonRaphsonKD comment: Find the value of that maximizes While do Return()