Master Note: How to diagnose Database Performance - FAQ [ID 402983.1]

技术2022-05-14 29

Applies to:

Oracle Server - Enterprise Edition - Version: 6.0.0.0 and later [Release: 6.0 and later ] Oracle Server - Personal Edition - Version: 7.1.4.0 and later [Release: 7.1.4 and later] Oracle Server - Standard Edition - Version: 7.0.16.0 and later [Release: 7.0 and later] Enterprise Manager for RDBMS - Version: 8.1.7.4 and later ] Information in this document applies to any platform.

Purpose

This document outlines a number of Frequently Asked Database Performance Questions

Questions and Answers

Investigating a Database Performance Issue

To investigate a slow performance problem, begin by deciding what diagnostics will be gathered. To do this, consider the following questions and take the appropriate action:- Is the performance problem constant or does it occur at certain times of the day ? CONSTANT - Gather an AWR or statspack report for a period of time when the problem occurs (a 1 hour report is usually sufficient). If you have an historic report which covers the same time of day and period when the performance was OK then take that too. CERTAIN TIMES - Gather an AWR or statspack report for a period of time which covers the problem existing (For instance, if you have a problem when something is run between 12 and 3 then make sure the report covers either that time or part of that time). ADDITIONALLY gather an AWR or statspack report for a similar period of time when the problem does not occur for comparison. Always ensure that you are making a fair comparison - for instance, the same time of day or the same workload and make sure the duration of the report is the same. NOTE:- As much as possible statspacks reports should be minimum 10 minutes, maximum 30 minutes. Longer periods can distort the infomation and reports should be re-gathered using a shorter time period. With AWR a 1hr report is OK.

Does the problem affect one session, several sessions or all sessions ? ONE SESSION - Gather 10046 trace for the session. SEVERAL SESSIONS - Gather 10046 trace for one or two of the problem sessions ALL SESSIONS - Gather AWR or statspack reports

Does the database ''actually'' hang or just ''appear'' to hang? (ie do sessions never complete their tasks (HANG or SPIN?) or do they it eventually finish (SLOW) ) HANG - Take some systemstates as well as a statspack report SPIN? - See: Document 68738.1 No Response from the Server, Does it Hang or Spin? SLOW - Gather 10046 for a selection of slow sessions.

Is the CPU usage high for one or more sessions when things run slowly ? YES - Take some errorstacks from the suspect CPU process. (* If unable to gather errorstacks then gather pstack reports)

Diagnostics

- AWR reports/Statspack reports

AWR/Statspack reports provide a method for evaluating the relative performance of a database. In 10G, to check for general performance issues use the Automatic Workload Repository (AWR) and specifically the Automatic Database Diagnostic Monitor (ADDM) tool for assistance. This is covered in the following Document 276103.1 PERFORMANCE TUNING USING 10g ADVISORS AND MANAGEABILITY FEATURES Note: If uploading reports to support, please ensure that they are in Text format For 9i and 8i, statspack, rather than AWR, reports should be gathered. To gather a statspack report, please refer to Document 94224.1 FAQ- Statspack Complete Reference. To interpret statspack output refer to: http://www.oracle.com/technology/deploy/performance/pdf/statspack_tuning_otn_new.pdf

- 10046 Trace

10046 trace gathers tracing information about a session.

alter session set timed_statistics = true; alter session set statistics_level=all; alter session set max_dump_file_size = unlimited; alter session set events '10046 trace name context forever,level 12'; -- run the statement(s) to be traced -- select * from dual; exit;

Also see Document :376442.1 Recommended Method for Obtaining 10046 trace for Tuning.

- Querying V$Session_wait

The view V$Session_wait can show useful information about what a session is waiting for. Multiple selects from this view can indicate if a session is moving or not. When wait_time=0 the session is waiting, any other value indicates CPU activity:

set lines 132 pages 999 column event format a30 select sid,event,seq#,p1,p2,p3,wait_time from V$session_wait where SID = &&SID; select sid,event,seq#,p1,p2,p3,wait_time from V$session_wait where SID = &&SID; select sid,event,seq#,p1,p2,p3,wait_time from V$session_wait where SID = &&SID;

See: Document 43718.1 VIEW "V$SESSION_WAIT" Reference

** Important ** v$session_wait is often misinterpreted. Often people will assume we are waiting because see an event and seconds_in_wait is rising. It should be remembered that seconds_in_wait only applies to a current wait if wait_time =0 , otherwise it is actually "seconds since the last wait completed". The other column of use to clear up the misinterpretation is state which will be WAITING if we are waiting and WAITED% if we are no longer waiting.

- Finding session id

This select is useful for finding the current session information for tracing later:

select p.pid,p.SPID,s.SID from v$process p,v$session s where s.paddr = p.addr and s.audsid = userenv('SESSIONID') /

- System State Dumps

If the database is hung then we need to gather systemstate dumps to try to determine what is happening. At least 3 dumps should be taken as follows: Login to sqlplus as the internal user: sqlplus "/ as sysdba"

rem -- set trace file size to unlimited: alter session set max_dump_file_size = unlimited; alter session set events '10998 trace name context forever, level 1'; alter session set events 'immediate trace name systemstate level 10'; alter session set events 'immediate trace name systemstate level 10'; alter session set events 'immediate trace name systemstate level 10'; or (If using 10G or higher) sqlplus "/ as sysdba" oradebug setmypid oradebug unlimit oradebug dump systemstate 266 wait 90 seconds oradebug dump systemstate 266 wait 90 seconds oradebug dump systemstate 266 quit

For further information refer to: Document 452358.1 Database Hangs: What to collect for support. If no connection is possible at all then please refer to the following article which describes how to collect systemstates in that situation: Document 121779.1 Taking a SYSTEMSTATE dump when you cannot CONNECT to Oracle.

- Errorstack

Errorstack traces are Oracle Call Stack dumps that can be used to gather stack information for a process. Attach to the process and gather at least 3 errorstacks: login to SQL*Plus:

oradebug unlimit oradebug event 10046 trace name context forever,level 12 oradebug dump errorstack 3 << wait 1min>> oradebug dump errorstack 3 << wait 1min>> oradebug dump errorstack 3 exit connect / as sysdba oradebug setospid 9834

- PSTACK

Pstack is an operating system tool that can be used to gather stack information on some unix platforms. Attach to the process and gather about 10 pstacks while the job is running. % script pstacks.txt % /usr/proc/bin/pstack pid % exit The PID is the o/s process id of the process to be traced. Repeat the pstack command about 10 times to capture possible stack changes. Further details of pstack are in Document 70609.1 How To Display Information About Processes on SUN Solaris PLSQL Profiler. The PL/SQL profiler provides information abour PL/SQL code with regard to CPU usage and other resource usage information. See Document 243755.1 Implementing and Using the PL/SQL Profiler.

Hanganalyze

Hanganalyze is often gathered for hang situations. Typically systemstates are more useful. The following describes how to gather hanganalyze dumps Document 175006.1 Steps to generate HANGANALYZE trace files.

Interpreting the Results/Traces

Statspack reports - look at the Top 5 waiters section and work to reduce the time spent in the top waiter first, then regather a statspack report and see what effect that has had.The following assumptions hold true:-

Top waiter is IO/CPU -> Main issue is likely to be SQL tuningTop waiter is any other event -> Database performance issue

10046 traces - Run the 10046 trace through tkprof and look at the total time spent in SQL, then search back through the tkprof report looking for a SQL Statement which takes up the most proportion of the report. Then look at the breakdown of time and wait events for that SQL. Always remember that the 'number of executions' is important as although the time for a statement may be high this may be accompanied by an equally high execution count. Assume the following:-

If most of the time is spent in parsing there may be a parsing issueIf the number of physical IOs is high then look at changing the access path of the query to do less work or increasing the buffer cache to get buffers from memory rather than blocks from disk. If the wait events are enqueue related then generally this is an application design issue.

Determine the enqueue which is being waited for and address appropriately. For further assistance see:

Document 21154.1 EVENT 10046 "enable SQL statement tracing (including binds/waits)"Document 39817.1 Interpreting Raw SQL_TRACE and DBMS_SUPPORT.START_TRACE output

@Document 94160.1 Summary of Oracle DATATYPES Systemstates - These should be sent to Oracle Support Services to interpret. Hanganalyze - These should be sent to Oracle Support Services to interpret. Errorstacks - These should be sent to Oracle Support Services to interpret (Some of the calls on the stack are generic and as a result of how an errorstack works so , if searched for on Metalink, can lead to incorrect analysis.

Top Database Performance Issues/Problems and How To Resolve Them

- Library Cache/Shared Pool Latch waits

Typically Library Cache/Shared Pool Latch waits is a contention problem caused by unshared SQL (in the case of the library cache latch), or exhaustion of space in the shared pool (for the shared pool latch). For the shared pool latch, while new space allocations will require the latch it is typically the freeing AND allocation of space through too small a shared pool which causes problem. Document 62143.1 Understanding and Tuning the Shared Pool

- High Version Counts

High version counts occur when there are multiple copies of the 'same' statement in the shared pool, but some factor prevents them from being shared wasting space and causing latch contention. Document 296377.1 Handling and resolving unshared cursors/large version_counts

- Log File Sync waits

Log file sync waits occur when sessions wait for redo data to be written to disk. Typically this is caused by slow writes or committing too frequently in the application. See Document 34592.1 WAITEVENT: "log file sync" Reference Note. It is recommended that customers experiencing log file sync issues on 10.2.0.3 proactively apply the patch for Bug 5896963 .

- Buffer Busy waits/Cache Buffers Chains Latch waits

Buffer Busy waits occur when a session wants to access a database block in the buffer cache but it cannot as the buffer is "busy" Cache Buffers Chains Latch waits are caused by contention where multiple sessions waiting to read the same block. Typical solutions are:-

Look for SQL that accesses the blocks in question and determine if the repeated reads are necessary.Check for suboptimal SQL (this is the most common cause of the events) - look at the execution plan for the SQL being run and try to reduce the gets per executions which will minimise the number of blocks being accessed and therefore reduce the chances of multiple sessions contending for the same block.

Further information can be found in: Document 34405.1 WAITEVENT: "buffer busy waits" Reference Note Document 42152.1 LATCH: CACHE BUFFERS CHAINS Document 155971.1 Ext/Pub Resolving Intense and "Random" Buffer Busy Wait Performance Problems: Document 163424.1 Ext/Pub How To Identify a Hot Block Within The Database Buffer Cache. TX - Document 62354.1 TX Transaction locks - Example wait scenarios TM - Document 33453.1 REFERENTIAL INTEGRITY AND LOCKING

- WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!

This Issue occurs when the database detects that a waiter had waited for a resource for longer than a particular threshold. The message "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" appears in the alert log and trace and systemstates are dumped. Typically this is caused by two (or more) incompatible operations being run simltaneously. Document 278316.1 Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "

- ORA-60 DEADLOCK DETECTED/enqueue hash chains latch

Refer to Document 62365.1 What to do with "ORA-60 Deadlock Detected" Errors. The reason 'enqueue hash chains latch waits' are here is that, typically, during deadlock detection (ie the routine Oracle uses to determine if a deadlock actually exists), there is a heavy need for the latch which can cause issues for other sessions. If there is a problem with this latch, check if a trace file is generated for the ORA-60 and resolve that issue. - For RAC, Procwatcher can be used. Refer to Document 459694.1 Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes

专利

最新回复(0)