LBFGS源码分析

技术2022-05-19 24

C min f(x),dx为方向,a为步长C g(a)=f(x+a*dx)C ----------------------------------------------------------------------C This file contains the LBFGS algorithm and supporting routinesCC ****************C LBFGS SUBROUTINEC ****************C SUBROUTINE LBFGS(N,M,X,F,G,DIAGCO,DIAG,IPRINT,EPS,XTOL,W,IFLAG)C INTEGER N,M,IPRINT(2),IFLAG DOUBLE PRECISION X(N),G(N),DIAG(N),W(N*(2*M+1)+2*M) DOUBLE PRECISION F,EPS,XTOL LOGICAL DIAGCOCC LIMITED MEMORY BFGS METHOD FOR LARGE SCALE OPTIMIZATIONC JORGE NOCEDALC *** July 1990 ***CC C This subroutine solves the unconstrained minimization problemC C min F(x), x= (x1,x2,...,xN),CC using the limited memory BFGS method. The routine is especiallyC effective on problems involving a large number of variables. InC a typical iteration of this method an approximation Hk to theC inverse of the Hessian is obtained by applying M BFGS updates toC a diagonal matrix Hk0, using information from the previous M steps.C The user specifies the number M, which determines the amount ofC storage required by the routine. The user may also provide theC diagonal matrices Hk0 if not satisfied with the default choice.C The algorithm is described in "On the limited memory BFGS methodC for large scale optimization", by D. Liu and J. Nocedal,C Mathematical Programming B 45 (1989) 503-528.C C The user is required to calculate the function value F and itsC gradient G. In order to allow the user complete control overC these computations, reverse communication is used. The routineC must be called repeatedly under the control of the parameterC IFLAG. CC The steplength is determined at each iteration by means of theC line search routine MCVSRCH, which is a slight modification ofC the routine CSRCH written by More' and Thuente.C C The calling statement is C C CALL LBFGS(N,M,X,F,G,DIAGCO,DIAG,IPRINT,EPS,XTOL,W,IFLAG)C C whereC C N is an INTEGER variable that must be set by the user to theC number of variables. It is not altered by the routine.C Restriction: N>0.C C M is an INTEGER variable that must be set by the user toC the number of corrections used in the BFGS update. ItC is not altered by the routine. Values of M less than 3 areC not recommended; large values of M will result in excessiveC computing time. 3<= M <=7 is recommended. Restriction: M>0.C C X is a DOUBLE PRECISION array of length N. On initial entryC it must be set by the user to the values of the initialC estimate of the solution vector. On exit with IFLAG=0, itC contains the values of the variables at the best pointC found (usually a solution).C C F is a DOUBLE PRECISION variable. Before initial entry and onC a re-entry with IFLAG=1, it must be set by the user toC contain the value of the function F at the point X.C C G is a DOUBLE PRECISION array of length N. Before initialC entry and on a re-entry with IFLAG=1, it must be set byC the user to contain the components of the gradient G atC the point X.C C DIAGCO is a LOGICAL variable that must be set to .TRUE. if theC user wishes to provide the diagonal matrix Hk0 at eachC iteration. Otherwise it should be set to .FALSE., in whichC case LBFGS will use a default value described below. IfC DIAGCO is set to .TRUE. the routine will return at eachC iteration of the algorithm with IFLAG=2, and the diagonalC matrix Hk0 must be provided in the array DIAG.C C C DIAG is a DOUBLE PRECISION array of length N. If DIAGCO=.TRUE.,C then on initial entry or on re-entry with IFLAG=2, DIAGC it must be set by the user to contain the values of the C diagonal matrix Hk0. Restriction: all elements of DIAGC must be positive.C C IPRINT is an INTEGER array of length two which must be set by theC user.C C IPRINT(1) specifies the frequency of the output:C IPRINT(1) < 0 : no output is generated,C IPRINT(1) = 0 : output only at first and last iteration,C IPRINT(1) > 0 : output every IPRINT(1) iterations.C C IPRINT(2) specifies the type of output generated:C IPRINT(2) = 0 : iteration count, number of function C evaluations, function value, norm of theC gradient, and steplength,C IPRINT(2) = 1 : same as IPRINT(2)=0, plus vector ofC variables and gradient vector at theC initial point,C IPRINT(2) = 2 : same as IPRINT(2)=1, plus vector ofC variables,C IPRINT(2) = 3 : same as IPRINT(2)=2, plus gradient vector.C C C EPS is a positive DOUBLE PRECISION variable that must be set byC the user, and determines the accuracy with which the solutionC is to be found. The subroutine terminates whenCC ||G|| < EPS max(1,||X||),CC where ||.|| denotes the Euclidean norm.C C XTOL is a positive DOUBLE PRECISION variable that must be set byC the user to an estimate of the machine precision (e.g.C 10**(-16) on a SUN station 3/60). The line search routine willC terminate if the relative width of the interval of uncertaintyC is less than XTOL.C C W is a DOUBLE PRECISION array of length N(2M+1)+2M used asC workspace for LBFGS. This array must not be altered by theC user.C C IFLAG is an INTEGER variable that must be set to 0 on initial entryC to the subroutine. A return with IFLAG<0 indicates an error,C and IFLAG=0 indicates that the routine has terminated withoutC detecting errors. On a return with IFLAG=1, the user mustC evaluate the function F and gradient G. On a return withC IFLAG=2, the user must provide the diagonal matrix Hk0.C C The following negative values of IFLAG, detecting an error,C are possible:C C IFLAG=-1 The line search routine MCSRCH failed. TheC parameter INFO provides more detailed informationC (see also the documentation of MCSRCH):CC INFO = 0 IMPROPER INPUT PARAMETERS.CC INFO = 2 RELATIVE WIDTH OF THE INTERVAL OFC UNCERTAINTY IS AT MOST XTOL.CC INFO = 3 MORE THAN 20 FUNCTION EVALUATIONS WEREC REQUIRED AT THE PRESENT ITERATION.CC INFO = 4 THE STEP IS TOO SMALL.CC INFO = 5 THE STEP IS TOO LARGE.CC INFO = 6 ROUNDING ERRORS PREVENT FURTHER PROGRESS. C THERE MAY NOT BE A STEP WHICH SATISFIESC THE SUFFICIENT DECREASE AND CURVATUREC CONDITIONS. TOLERANCES MAY BE TOO SMALL.CC C IFLAG=-2 The i-th diagonal element of the diagonal inverseC Hessian approximation, given in DIAG, is notC positive.C C IFLAG=-3 Improper input parameters for LBFGS (N or M areC not positive).C CCC ON THE DRIVER:CC The program that calls LBFGS must contain the declaration:CC EXTERNAL LB2CC LB2 is a BLOCK DATA that defines the default values of severalC parameters described in the COMMON section. CC C C COMMON:C C The subroutine contains one common area, which the user may wish toC reference:C COMMON /LB3/MP,LP,GTOL,STPMIN,STPMAXC C MP is an INTEGER variable with default value 6. It is used as theC unit number for the printing of the monitoring informationC controlled by IPRINT.C C LP is an INTEGER variable with default value 6. It is used as theC unit number for the printing of error messages. This printingC may be suppressed by setting LP to a non-positive value.C C GTOL is a DOUBLE PRECISION variable with default value 0.9, whichC controls the accuracy of the line search routine MCSRCH. If theC function and gradient evaluations are inexpensive with respectC to the cost of the iteration (which is sometimes the case whenC solving very large problems) it may be advantageous to set GTOLC to a small value. A typical small value is 0.1. Restriction:C GTOL should be greater than 1.D-04.C C STPMIN and STPMAX are non-negative DOUBLE PRECISION variables whichC specify lower and uper bounds for the step in the line search.C Their default values are 1.D-20 and 1.D+20, respectively. TheseC values need not be modified unless the exponents are too largeC for the machine being used, or unless the problem is extremelyC badly scaled (in which case the exponents should be increased).C CC MACHINE DEPENDENCIESCC The only variables that are machine-dependent are XTOL,C STPMIN and STPMAX.C CC GENERAL INFORMATIONC C Other routines called directly: DAXPY, DDOT, LB1, MCSRCHC C Input/Output : No input; diagnostic messages on unit MP andC error messages on unit LP.C C C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -C DOUBLE PRECISION GTOL,ONE,ZERO,GNORM,DDOT,STP1,FTOL,STPMIN, . STPMAX,STP,YS,YY,SQ,YR,BETA,XNORM INTEGER MP,LP,ITER,NFUN,POINT,ISPT,IYPT,MAXFEV,INFO, . BOUND,NPT,CP,I,NFEV,INMC,IYCN,ISCN LOGICAL FINISHC SAVE DATA ONE,ZERO/1.0D+0,0.0D+0/CC INITIALIZEC ----------C IF(IFLAG.EQ.0) GO TO 10 GO TO (172,100) IFLAGC IFLAG一开始为0,如果IFLAG=1表示上次MCSEARCH(172)需要计算g(a),g'(a)C 判断是否满足wolfe conditions,IFLAG=2表示需要用户提供H,在CRF中,不需要用C 户提供H 10 ITER= 0 IF(N.LE.0.OR.M.LE.0) GO TO 196 IF(GTOL.LE.1.D-04) THENC GTOL不能太小,默认0.9 IF(LP.GT.0) WRITE(LP,245) GTOL=9.D-01 ENDIF NFUN= 1C 迭代次数 POINT= 0 FINISH= .FALSE. IF(DIAGCO) THEN DO 30 I=1,N 30 IF (DIAG(I).LE.ZERO) GO TO 195 ELSEC 自动设置H DO 40 I=1,N 40 DIAG(I)= 1.0D0 ENDIFCC THE WORK VECTOR W IS DIVIDED AS FOLLOWS:C ---------------------------------------C THE FIRST N LOCATIONS ARE USED TO STORE THE GRADIENT ANDC OTHER TEMPORARY INFORMATION.C LOCATIONS (N+1)...(N+M) STORE THE SCALARS RHO.C LOCATIONS (N+M+1)...(N+2M) STORE THE NUMBERS ALPHA USEDC IN THE FORMULA THAT COMPUTES H*G.C LOCATIONS (N+2M+1)...(N+2M+NM) STORE THE LAST M SEARCHC STEPS.C LOCATIONS (N+2M+NM+1)...(N+2M+2NM) STORE THE LAST MC GRADIENT DIFFERENCES.CC THE SEARCH STEPS AND GRADIENT DIFFERENCES ARE STORED IN AC CIRCULAR ORDER CONTROLLED BY THE PARAMETER POINT.CC W=[f'_k,RHO,ALPHA,S,Y]C 其中保证第一次计算得到的值放在第一个，后面的依次循环存放C 比如在RHO_1在第二次迭代中计算得到，那存放RHO_i的地址为(k-2)%m+1C 而y_1在第二次迭代中计算得到，那么存放Y_i的地址为IYPT+(k-2)%m+1 ISPT= N+2*M IYPT= ISPT+N*M DO 50 I=1,N 50 W(ISPT+I)= -G(I)*DIAG(I)C q=-H*g(0) GNORM= DSQRT(DDOT(N,G,1,G,1))C |g(0)| STP1= ONE/GNORMC STP1=1/|g(0)|初始化第一次试探步长C PARAMETERS FOR LINE SEARCH ROUTINEC FTOL= 1.0D-4 MAXFEV= 20C IF(IPRINT(1).GE.0) CALL LB1(IPRINT,ITER,NFUN, * GNORM,N,M,X,F,G,STP,FINISH)CC --------------------C MAIN ITERATION LOOPC --------------------C 80 ITER= ITER+1C k=ITER INFO=0 BOUND=ITER-1 IF(ITER.EQ.1) GO TO 165C 第一次试探，用取a_l=0,a_t=STP1进入MCSEARCH搜索 IF (ITER .GT. M)BOUND=MC BOUND=min(M,k-1)表示回退的次数，比如k=2时，只需要回退1次 YS= DDOT(N,W(IYPT+NPT+1),1,W(ISPT+NPT+1),1)C NPT=(k-1),y_{k-1}*s_{k-1} IF(.NOT.DIAGCO) THENC 无需用户给出H,lbfgs自动估计H,CRF中DIAGCO=FALSE YY= DDOT(N,W(IYPT+NPT+1),1,W(IYPT+NPT+1),1) DO 90 I=1,N 90 DIAG(I)= YS/YY ELSE IFLAG=2 RETURN ENDIF 100 CONTINUE IF(DIAGCO) THEN DO 110 I=1,N 110 IF (DIAG(I).LE.ZERO) GO TO 195 ENDIFCC COMPUTE -H*G USING THE FORMULA GIVEN IN: Nocedal, J. 1980,C "Updating quasi-Newton matrices with limited storage",C Mathematics of Computation, Vol.24, No.151, pp. 773-782.C ---------------------------------------------------------C CP= POINTC CP=(k-2)%m+1,这里k>=2,CP从1开始，代表了ALPHA_{k-1},RHO_{k-1}C 循环存放的序号,比如k=2时，第一次计算RHO,RHO_1的序号为CP=1C 而(k-1)%m代表了s_{k},y_{k}的存放序号，比如s_1的偏移为0 IF (POINT.EQ.0) CP=MC 如果k=m+1,那么CP=m W(N+CP)= ONE/YSC RHO_{k-1}=1/[y_{k-1}*s_{k-1}]，计算并保存RHO_{k-1} DO 112 I=1,N 112 W(I)= -G(I)C q=-f'_k CP= POINT DO 125 I= 1,BOUND CP=CP-1 IF (CP.EQ. -1)CP=M-1C CP=(k-1-I)%mC CP代表了s_{k-I},y_{k-I}的存放序号C s_{k-I}从W(ISPT+CP*N+1)存放到W(ISPT+CP*N+N) SQ= DDOT(N,W(ISPT+CP*N+1),1,W,1)C s_{k-I}*y_{k-I} INMC=N+M+CP+1 IYCN=IYPT+CP*N W(INMC)= W(N+CP+1)*SQC ALPHA_{k-I}=RHO_{k-I}*s_{k-I}*y_{k-I} CALL DAXPY(N,-W(INMC),W(IYCN+1),1,W,1)C q=q-ALPHA_{k-I}*y_{k-I} 125 CONTINUEC CP=POINT-BOUND=max{k-m,0} DO 130 I=1,N 130 W(I)=DIAG(I)*W(I)C r=H0*q DO 145 I=1,BOUNDC FOR i=k-m,...,k-1 YR= DDOT(N,W(IYPT+CP*N+1),1,W,1) BETA= W(N+CP+1)*YRC BETA=RHO_{k-M-1+I}*y_{k-M-1+I}*r INMC=N+M+CP+1 BETA= W(INMC)-BETA ISCN=ISPT+CP*N CALL DAXPY(N,BETA,W(ISCN+1),1,W,1)C r=r+s_{k-M-1+I}*BETA CP=CP+1C CP+1是ALPHA,RHO的偏移，CP是y,s的偏移 IF (CP.EQ.M)CP=0 145 CONTINUECC STORE THE NEW SEARCH DIRECTIONC ------------------------------C r=-Hk*f'_k DO 160 I=1,N 160 W(ISPT+POINT*N+I)= W(I)C y_k=rC OBTAIN THE ONE-DIMENSIONAL MINIMIZER OF THE FUNCTION C BY USING THE LINE SEARCH ROUTINE MCSRCHC ---------------------------------------------------- 165 NFEV=0 STP=ONE IF (ITER.EQ.1) STP=STP1 DO 170 I=1,N 170 W(I)=G(I) 172 CONTINUE CALL MCSRCH(N,X,F,G,W(ISPT+POINT*N+1),STP,FTOL, * XTOL,MAXFEV,INFO,NFEV,DIAG) IF (INFO .EQ. -1) THENC 需要g'(a),g(a) IFLAG=1 RETURN ENDIF IF (INFO .NE. 1) GO TO 190 NFUN= NFUN + NFEVCC COMPUTE THE NEW STEP AND GRADIENT CHANGE C -----------------------------------------C NPT=POINT*N DO 175 I=1,N W(ISPT+NPT+I)= STP*W(ISPT+NPT+I)C s_k 175 W(IYPT+NPT+I)= G(I)-W(I)C y_k POINT=POINT+1 IF (POINT.EQ.M)POINT=0CC TERMINATION TESTC ----------------C GNORM= DSQRT(DDOT(N,G,1,G,1)) XNORM= DSQRT(DDOT(N,X,1,X,1)) XNORM= DMAX1(1.0D0,XNORM) IF (GNORM/XNORM .LE. EPS) FINISH=.TRUE.C IF(IPRINT(1).GE.0) CALL LB1(IPRINT,ITER,NFUN, * GNORM,N,M,X,F,G,STP,FINISH) IF (FINISH) THEN IFLAG=0 RETURN ENDIF GO TO 80CC ------------------------------------------------------------C END OF MAIN ITERATION LOOP. ERROR EXITS.C ------------------------------------------------------------C 190 IFLAG=-1 IF(LP.GT.0) WRITE(LP,200) INFO RETURN 195 IFLAG=-2 IF(LP.GT.0) WRITE(LP,235) I RETURN 196 IFLAG= -3 IF(LP.GT.0) WRITE(LP,240)CC FORMATSC -------C 200 FORMAT(/' IFLAG= -1 ',/' LINE SEARCH FAILED. SEE' . ' DOCUMENTATION OF ROUTINE MCSRCH',/' ERROR RETURN' . ' OF LINE SEARCH: INFO= ',I2,/ . ' POSSIBLE CAUSES: FUNCTION OR GRADIENT ARE INCORRECT',/, . ' OR INCORRECT TOLERANCES') 235 FORMAT(/' IFLAG= -2',/' THE',I5,'-TH DIAGONAL ELEMENT OF THE',/, . ' INVERSE HESSIAN APPROXIMATION IS NOT POSITIVE') 240 FORMAT(/' IFLAG= -3',/' IMPROPER INPUT PARAMETERS (N OR M', . ' ARE NOT POSITIVE)') 245 FORMAT(/' GTOL IS LESS THAN OR EQUAL TO 1.D-04', . / ' IT HAS BEEN RESET TO 9.D-01') RETURN ENDCC LAST LINE OF SUBROUTINE LBFGSCC

专利

最新回复(0)