LBFGS源码分析

    技术2022-05-19  24

    C min f(x),dx为方向,a为步长C g(a)=f(x+a*dx)C     ----------------------------------------------------------------------C     This file contains the LBFGS algorithm and supporting routinesCC     ****************C     LBFGS SUBROUTINEC     ****************C      SUBROUTINE LBFGS(N,M,X,F,G,DIAGCO,DIAG,IPRINT,EPS,XTOL,W,IFLAG)C      INTEGER N,M,IPRINT(2),IFLAG      DOUBLE PRECISION X(N),G(N),DIAG(N),W(N*(2*M+1)+2*M)      DOUBLE PRECISION F,EPS,XTOL      LOGICAL DIAGCOCC        LIMITED MEMORY BFGS METHOD FOR LARGE SCALE OPTIMIZATIONC                          JORGE NOCEDALC                        *** July 1990 ***CC C     This subroutine solves the unconstrained minimization problemC C                      min F(x),    x= (x1,x2,...,xN),CC      using the limited memory BFGS method. The routine is especiallyC      effective on problems involving a large number of variables. InC      a typical iteration of this method an approximation Hk to theC      inverse of the Hessian is obtained by applying M BFGS updates toC      a diagonal matrix Hk0, using information from the previous M steps.C      The user specifies the number M, which determines the amount ofC      storage required by the routine. The user may also provide theC      diagonal matrices Hk0 if not satisfied with the default choice.C      The algorithm is described in "On the limited memory BFGS methodC      for large scale optimization", by D. Liu and J. Nocedal,C      Mathematical Programming B 45 (1989) 503-528.C C      The user is required to calculate the function value F and itsC      gradient G. In order to allow the user complete control overC      these computations, reverse  communication is used. The routineC      must be called repeatedly under the control of the parameterC      IFLAG. CC      The steplength is determined at each iteration by means of theC      line search routine MCVSRCH, which is a slight modification ofC      the routine CSRCH written by More' and Thuente.C C      The calling statement is C C          CALL LBFGS(N,M,X,F,G,DIAGCO,DIAG,IPRINT,EPS,XTOL,W,IFLAG)C C      whereC C     N       is an INTEGER variable that must be set by the user to theC             number of variables. It is not altered by the routine.C             Restriction: N>0.C C     M       is an INTEGER variable that must be set by the user toC             the number of corrections used in the BFGS update. ItC             is not altered by the routine. Values of M less than 3 areC             not recommended; large values of M will result in excessiveC             computing time. 3<= M <=7 is recommended. Restriction: M>0.C C     X       is a DOUBLE PRECISION array of length N. On initial entryC             it must be set by the user to the values of the initialC             estimate of the solution vector. On exit with IFLAG=0, itC             contains the values of the variables at the best pointC             found (usually a solution).C C     F       is a DOUBLE PRECISION variable. Before initial entry and onC             a re-entry with IFLAG=1, it must be set by the user toC             contain the value of the function F at the point X.C C     G       is a DOUBLE PRECISION array of length N. Before initialC             entry and on a re-entry with IFLAG=1, it must be set byC             the user to contain the components of the gradient G atC             the point X.C C     DIAGCO  is a LOGICAL variable that must be set to .TRUE. if theC             user  wishes to provide the diagonal matrix Hk0 at eachC             iteration. Otherwise it should be set to .FALSE., in whichC             case  LBFGS will use a default value described below. IfC             DIAGCO is set to .TRUE. the routine will return at eachC             iteration of the algorithm with IFLAG=2, and the diagonalC              matrix Hk0  must be provided in the array DIAG.C C C     DIAG    is a DOUBLE PRECISION array of length N. If DIAGCO=.TRUE.,C             then on initial entry or on re-entry with IFLAG=2, DIAGC             it must be set by the user to contain the values of the C             diagonal matrix Hk0.  Restriction: all elements of DIAGC             must be positive.C C     IPRINT  is an INTEGER array of length two which must be set by theC             user.C C             IPRINT(1) specifies the frequency of the output:C                IPRINT(1) < 0 : no output is generated,C                IPRINT(1) = 0 : output only at first and last iteration,C                IPRINT(1) > 0 : output every IPRINT(1) iterations.C C             IPRINT(2) specifies the type of output generated:C                IPRINT(2) = 0 : iteration count, number of function C                                evaluations, function value, norm of theC                                gradient, and steplength,C                IPRINT(2) = 1 : same as IPRINT(2)=0, plus vector ofC                                variables and  gradient vector at theC                                initial point,C                IPRINT(2) = 2 : same as IPRINT(2)=1, plus vector ofC                                variables,C                IPRINT(2) = 3 : same as IPRINT(2)=2, plus gradient vector.C C C     EPS     is a positive DOUBLE PRECISION variable that must be set byC             the user, and determines the accuracy with which the solutionC             is to be found. The subroutine terminates whenCC                         ||G|| < EPS max(1,||X||),CC             where ||.|| denotes the Euclidean norm.C C     XTOL    is a  positive DOUBLE PRECISION variable that must be set byC             the user to an estimate of the machine precision (e.g.C             10**(-16) on a SUN station 3/60). The line search routine willC             terminate if the relative width of the interval of uncertaintyC             is less than XTOL.C C     W       is a DOUBLE PRECISION array of length N(2M+1)+2M used asC             workspace for LBFGS. This array must not be altered by theC             user.C C     IFLAG   is an INTEGER variable that must be set to 0 on initial entryC             to the subroutine. A return with IFLAG<0 indicates an error,C             and IFLAG=0 indicates that the routine has terminated withoutC             detecting errors. On a return with IFLAG=1, the user mustC             evaluate the function F and gradient G. On a return withC             IFLAG=2, the user must provide the diagonal matrix Hk0.C C             The following negative values of IFLAG, detecting an error,C             are possible:C C              IFLAG=-1  The line search routine MCSRCH failed. TheC                        parameter INFO provides more detailed informationC                        (see also the documentation of MCSRCH):CC                       INFO = 0  IMPROPER INPUT PARAMETERS.CC                       INFO = 2  RELATIVE WIDTH OF THE INTERVAL OFC                                 UNCERTAINTY IS AT MOST XTOL.CC                       INFO = 3  MORE THAN 20 FUNCTION EVALUATIONS WEREC                                 REQUIRED AT THE PRESENT ITERATION.CC                       INFO = 4  THE STEP IS TOO SMALL.CC                       INFO = 5  THE STEP IS TOO LARGE.CC                       INFO = 6  ROUNDING ERRORS PREVENT FURTHER PROGRESS. C                                 THERE MAY NOT BE A STEP WHICH SATISFIESC                                 THE SUFFICIENT DECREASE AND CURVATUREC                                 CONDITIONS. TOLERANCES MAY BE TOO SMALL.CC C              IFLAG=-2  The i-th diagonal element of the diagonal inverseC                        Hessian approximation, given in DIAG, is notC                        positive.C           C              IFLAG=-3  Improper input parameters for LBFGS (N or M areC                        not positive).C CCC    ON THE DRIVER:CC    The program that calls LBFGS must contain the declaration:CC                       EXTERNAL LB2CC    LB2 is a BLOCK DATA that defines the default values of severalC    parameters described in the COMMON section. CC C C    COMMON:C C     The subroutine contains one common area, which the user may wish toC    reference:C          COMMON /LB3/MP,LP,GTOL,STPMIN,STPMAXC C    MP  is an INTEGER variable with default value 6. It is used as theC        unit number for the printing of the monitoring informationC        controlled by IPRINT.C C    LP  is an INTEGER variable with default value 6. It is used as theC        unit number for the printing of error messages. This printingC        may be suppressed by setting LP to a non-positive value.C C    GTOL is a DOUBLE PRECISION variable with default value 0.9, whichC        controls the accuracy of the line search routine MCSRCH. If theC        function and gradient evaluations are inexpensive with respectC        to the cost of the iteration (which is sometimes the case whenC        solving very large problems) it may be advantageous to set GTOLC        to a small value. A typical small value is 0.1.  Restriction:C        GTOL should be greater than 1.D-04.C C    STPMIN and STPMAX are non-negative DOUBLE PRECISION variables whichC        specify lower and uper bounds for the step in the line search.C        Their default values are 1.D-20 and 1.D+20, respectively. TheseC        values need not be modified unless the exponents are too largeC        for the machine being used, or unless the problem is extremelyC        badly scaled (in which case the exponents should be increased).C CC  MACHINE DEPENDENCIESCC        The only variables that are machine-dependent are XTOL,C        STPMIN and STPMAX.C CC  GENERAL INFORMATIONC C    Other routines called directly:  DAXPY, DDOT, LB1, MCSRCHC C    Input/Output  :  No input; diagnostic messages on unit MP andC                     error messages on unit LP.C C C     - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -C      DOUBLE PRECISION GTOL,ONE,ZERO,GNORM,DDOT,STP1,FTOL,STPMIN,     .                 STPMAX,STP,YS,YY,SQ,YR,BETA,XNORM      INTEGER MP,LP,ITER,NFUN,POINT,ISPT,IYPT,MAXFEV,INFO,     .        BOUND,NPT,CP,I,NFEV,INMC,IYCN,ISCN      LOGICAL FINISHC      SAVE      DATA ONE,ZERO/1.0D+0,0.0D+0/CC     INITIALIZEC     ----------C      IF(IFLAG.EQ.0) GO TO 10      GO TO (172,100) IFLAGC IFLAG一开始为0,如果IFLAG=1表示上次MCSEARCH(172)需要计算g(a),g'(a)C 判断是否满足wolfe conditions,IFLAG=2表示需要用户提供H,在CRF中,不需要用C 户提供H  10  ITER= 0      IF(N.LE.0.OR.M.LE.0) GO TO 196      IF(GTOL.LE.1.D-04) THENC GTOL不能太小,默认0.9        IF(LP.GT.0) WRITE(LP,245)        GTOL=9.D-01      ENDIF      NFUN= 1C 迭代次数      POINT= 0      FINISH= .FALSE.      IF(DIAGCO) THEN         DO 30 I=1,N 30      IF (DIAG(I).LE.ZERO) GO TO 195      ELSEC 自动设置H         DO 40 I=1,N 40      DIAG(I)= 1.0D0      ENDIFCC     THE WORK VECTOR W IS DIVIDED AS FOLLOWS:C     ---------------------------------------C     THE FIRST N LOCATIONS ARE USED TO STORE THE GRADIENT ANDC         OTHER TEMPORARY INFORMATION.C     LOCATIONS (N+1)...(N+M) STORE THE SCALARS RHO.C     LOCATIONS (N+M+1)...(N+2M) STORE THE NUMBERS ALPHA USEDC         IN THE FORMULA THAT COMPUTES H*G.C     LOCATIONS (N+2M+1)...(N+2M+NM) STORE THE LAST M SEARCHC         STEPS.C     LOCATIONS (N+2M+NM+1)...(N+2M+2NM) STORE THE LAST MC         GRADIENT DIFFERENCES.CC     THE SEARCH STEPS AND GRADIENT DIFFERENCES ARE STORED IN AC     CIRCULAR ORDER CONTROLLED BY THE PARAMETER POINT.CC W=[f'_k,RHO,ALPHA,S,Y]C 其中保证第一次计算得到的值放在第一个,后面的依次循环存放C 比如在RHO_1在第二次迭代中计算得到,那存放RHO_i的地址为(k-2)%m+1C 而y_1在第二次迭代中计算得到,那么存放Y_i的地址为IYPT+(k-2)%m+1      ISPT= N+2*M      IYPT= ISPT+N*M           DO 50 I=1,N 50   W(ISPT+I)= -G(I)*DIAG(I)C q=-H*g(0)      GNORM= DSQRT(DDOT(N,G,1,G,1))C |g(0)|      STP1= ONE/GNORMC STP1=1/|g(0)|初始化第一次试探步长C     PARAMETERS FOR LINE SEARCH ROUTINEC           FTOL= 1.0D-4      MAXFEV= 20C      IF(IPRINT(1).GE.0) CALL LB1(IPRINT,ITER,NFUN,     *                     GNORM,N,M,X,F,G,STP,FINISH)CC    --------------------C     MAIN ITERATION LOOPC    --------------------C 80   ITER= ITER+1C k=ITER      INFO=0      BOUND=ITER-1      IF(ITER.EQ.1) GO TO 165C 第一次试探,用取a_l=0,a_t=STP1进入MCSEARCH搜索      IF (ITER .GT. M)BOUND=MC BOUND=min(M,k-1)表示回退的次数,比如k=2时,只需要回退1次         YS= DDOT(N,W(IYPT+NPT+1),1,W(ISPT+NPT+1),1)C NPT=(k-1),y_{k-1}*s_{k-1}      IF(.NOT.DIAGCO) THENC 无需用户给出H,lbfgs自动估计H,CRF中DIAGCO=FALSE         YY= DDOT(N,W(IYPT+NPT+1),1,W(IYPT+NPT+1),1)         DO 90 I=1,N   90    DIAG(I)= YS/YY      ELSE         IFLAG=2         RETURN      ENDIF 100  CONTINUE      IF(DIAGCO) THEN        DO 110 I=1,N 110    IF (DIAG(I).LE.ZERO) GO TO 195      ENDIFCC     COMPUTE -H*G USING THE FORMULA GIVEN IN: Nocedal, J. 1980,C     "Updating quasi-Newton matrices with limited storage",C     Mathematics of Computation, Vol.24, No.151, pp. 773-782.C     ---------------------------------------------------------C      CP= POINTC CP=(k-2)%m+1,这里k>=2,CP从1开始,代表了ALPHA_{k-1},RHO_{k-1}C 循环存放的序号,比如k=2时,第一次计算RHO,RHO_1的序号为CP=1C 而(k-1)%m代表了s_{k},y_{k}的存放序号,比如s_1的偏移为0      IF (POINT.EQ.0) CP=MC 如果k=m+1,那么CP=m      W(N+CP)= ONE/YSC RHO_{k-1}=1/[y_{k-1}*s_{k-1}],计算并保存RHO_{k-1}      DO 112 I=1,N 112  W(I)= -G(I)C q=-f'_k      CP= POINT      DO 125 I= 1,BOUND         CP=CP-1         IF (CP.EQ. -1)CP=M-1C CP=(k-1-I)%mC CP代表了s_{k-I},y_{k-I}的存放序号C s_{k-I}从W(ISPT+CP*N+1)存放到W(ISPT+CP*N+N)         SQ= DDOT(N,W(ISPT+CP*N+1),1,W,1)C s_{k-I}*y_{k-I}         INMC=N+M+CP+1         IYCN=IYPT+CP*N         W(INMC)= W(N+CP+1)*SQC ALPHA_{k-I}=RHO_{k-I}*s_{k-I}*y_{k-I}         CALL DAXPY(N,-W(INMC),W(IYCN+1),1,W,1)C q=q-ALPHA_{k-I}*y_{k-I} 125  CONTINUEC CP=POINT-BOUND=max{k-m,0}      DO 130 I=1,N 130  W(I)=DIAG(I)*W(I)C r=H0*q      DO 145 I=1,BOUNDC FOR i=k-m,...,k-1         YR= DDOT(N,W(IYPT+CP*N+1),1,W,1)         BETA= W(N+CP+1)*YRC BETA=RHO_{k-M-1+I}*y_{k-M-1+I}*r         INMC=N+M+CP+1         BETA= W(INMC)-BETA         ISCN=ISPT+CP*N         CALL DAXPY(N,BETA,W(ISCN+1),1,W,1)C r=r+s_{k-M-1+I}*BETA         CP=CP+1C CP+1是ALPHA,RHO的偏移,CP是y,s的偏移         IF (CP.EQ.M)CP=0 145  CONTINUECC     STORE THE NEW SEARCH DIRECTIONC     ------------------------------C r=-Hk*f'_k       DO 160 I=1,N 160   W(ISPT+POINT*N+I)= W(I)C y_k=rC     OBTAIN THE ONE-DIMENSIONAL MINIMIZER OF THE FUNCTION C     BY USING THE LINE SEARCH ROUTINE MCSRCHC     ---------------------------------------------------- 165  NFEV=0      STP=ONE      IF (ITER.EQ.1) STP=STP1      DO 170 I=1,N 170  W(I)=G(I) 172  CONTINUE      CALL MCSRCH(N,X,F,G,W(ISPT+POINT*N+1),STP,FTOL,     *            XTOL,MAXFEV,INFO,NFEV,DIAG)      IF (INFO .EQ. -1) THENC 需要g'(a),g(a)        IFLAG=1        RETURN      ENDIF      IF (INFO .NE. 1) GO TO 190      NFUN= NFUN + NFEVCC     COMPUTE THE NEW STEP AND GRADIENT CHANGE C     -----------------------------------------C      NPT=POINT*N      DO 175 I=1,N      W(ISPT+NPT+I)= STP*W(ISPT+NPT+I)C s_k 175  W(IYPT+NPT+I)= G(I)-W(I)C y_k      POINT=POINT+1      IF (POINT.EQ.M)POINT=0CC     TERMINATION TESTC     ----------------C      GNORM= DSQRT(DDOT(N,G,1,G,1))      XNORM= DSQRT(DDOT(N,X,1,X,1))      XNORM= DMAX1(1.0D0,XNORM)      IF (GNORM/XNORM .LE. EPS) FINISH=.TRUE.C      IF(IPRINT(1).GE.0) CALL LB1(IPRINT,ITER,NFUN,     *               GNORM,N,M,X,F,G,STP,FINISH)      IF (FINISH) THEN         IFLAG=0         RETURN      ENDIF      GO TO 80CC     ------------------------------------------------------------C     END OF MAIN ITERATION LOOP. ERROR EXITS.C     ------------------------------------------------------------C 190  IFLAG=-1      IF(LP.GT.0) WRITE(LP,200) INFO      RETURN 195  IFLAG=-2      IF(LP.GT.0) WRITE(LP,235) I      RETURN 196  IFLAG= -3      IF(LP.GT.0) WRITE(LP,240)CC     FORMATSC     -------C 200  FORMAT(/' IFLAG= -1 ',/' LINE SEARCH FAILED. SEE'     .          ' DOCUMENTATION OF ROUTINE MCSRCH',/' ERROR RETURN'     .          ' OF LINE SEARCH: INFO= ',I2,/     .          ' POSSIBLE CAUSES: FUNCTION OR GRADIENT ARE INCORRECT',/,     .          ' OR INCORRECT TOLERANCES') 235  FORMAT(/' IFLAG= -2',/' THE',I5,'-TH DIAGONAL ELEMENT OF THE',/,     .       ' INVERSE HESSIAN APPROXIMATION IS NOT POSITIVE') 240  FORMAT(/' IFLAG= -3',/' IMPROPER INPUT PARAMETERS (N OR M',     .       ' ARE NOT POSITIVE)') 245  FORMAT(/'  GTOL IS LESS THAN OR EQUAL TO 1.D-04',     .       / ' IT HAS BEEN RESET TO 9.D-01')      RETURN      ENDCC     LAST LINE OF SUBROUTINE LBFGSCC


    最新回复(0)