长时间latch free等待——记一次系统异常的诊断过程
上一篇 / 下一篇 2007-03-24 00:00:00 / 个人分类:ORACLE
今天发现一个报表数据库中SQL运行异常,简单记录一下问题的诊断和解决过程。
uQ#jM*cB&J@0
问题是在检查ALERT文件时发现的,一个过程运行时间太长而出现了ORA-1555错误。
错误信息:
ORA-01555 caused by SQL statement below (Query Duration=38751 sec, SCN: 0x0000.fe5b584a):
:o`Qu6F&f!{
`bY/w0INSERT INTO MAN_ORDER_ITEM (ITPUB个人空间pr,x6@2jg*cu*_
ID,ITPUB个人空间x {
F[N:~4e
REQUEST_QTY,ITPUB个人空间}$h.^GG3E/?
SALER_ID,ITPUB个人空间 s/AVZ`
Y"Mze
PRODUCT_ID,
,o(M,q!K:a,k0UNIT_PRICE,ITPUB个人空间0]HJJ
[V^Z"L
CREATE_DATE,
.B]0d,@1d7U4?0ANSWER_DATE,ITPUB个人空间7P{RWZ j0? sk
BUYER_IDITPUB个人空间0JUvY1~&J*O)Z``u4C
)
2J9f2I9U9}W0SELECTITPUB个人空间G:K Sc&VF[a
A.RECORD_ID,ITPUB个人空间f/^3`h:|kT(~1O*G4Ox
A.REQUEST_QTY,ITPUB个人空间5uh:w$hc??
A.SALER_ORGID,ITPUB个人空间TcV{7RnNS1R
A.PRODUCT_ID,ITPUB个人空间6t8Bk/i.M&h:b
A.UNIT_PRICE,ITPUB个人空间5Rc0n"e,Q-WxX0Dw
A.CREATE_DATE,
QK3K-~7{~y5D"]0NULL,ITPUB个人空间
s3bP;xq{m;E
A.BUYER_ORGIDITPUB个人空间 Sn@`4z5\\3t(t
FROM ORD_ORDER_ITEM A
J9_k
[H;YhgzY8u0WHERE A.CREATE_DATE >= TO_DATE('2004-01-01 0:0:0', 'YYYY-MM-DD HH24:MI:SS')
x*^?uKh0AND A.CREATE_DATE < TRUNC(SYSDATE)
@
zP1i?3@Y0AND EXISTS (SELECT 1 FROM MAN_PRODUCT WHERE ID = A.PRODUCT_ID)ITPUB个人空间X Dv#A4f4ni*UK
AND EXISTS (SELECT 1 FROM MAN_DEALER WHERE ID = A.SALER_ORGID)ITPUB个人空间0J5k4F*i}7yC*vO[
AND EXISTS (SELECT 1 FROM MAN_BUYER WHERE ID = A.BUYER_ORGID)
由于这是个JOB调用,在失败后JOB会自动重试,于是从DBA_JOBS_RUNNING中查看相关的JOB和SESSION信息。
SQL> SELECT SID, JOB FROM DBA_JOBS_RUNNING;
SID JOB
W:]Cp W*x0---------- ----------ITPUB个人空间
?"ca0`L.TQG0U5a z,Q
70 208
检查这个SESSION目录在执行什么SQL:
SQL> SELECT SQL_TEXT FROM V$SQL SQL, V$SESSION S
XO%d)Bp5l02 WHERE SQL.HASH_VALUE = S.SQL_HASH_VALUE
U9Jh"X
byCZ*u zJ03 AND SQL.ADDRESS = S.SQL_ADDRESSITPUB个人空间}0r
LtE)]!z i(T0Hv
4 AND S.SID = 70;
SQL_TEXT
Fi0D,x)de*H|7z0------------------------------------------------------------------------------ITPUB个人空间
D+R0Q(p ~.c)X
INSERT INTO MAN_ORDER_ITEM ( ID, REQUEST_QTY, SALER_ID, PRODUCT_ID, UNIT_PRICE, CREATE_DATE, ANSWER_DATE,
(_*G8x+nt
y-}3X0BUYER_ID ) SELECT A.RECORD_ID, A.REQUEST_QTY, A.SALER_ORGID, A.PRODUCT_ID, A.UNIT_PRICE, A.CREATE_DATE,
"P1K[
r`!?A
^fDM0NULL, A.BUYER_ORGID FROM ORD_ORDER_ITEM A WHERE A.CREATE_DATE >= TO_DATE('2004-01-01 0:0:0', 'YYYY-MM-DD HH24:MI:S
Ar-{+YF,R#OC0S') AND A.CREATE_DATE < TRUNC(SYSDATE) AND EXISTS (SELECT 1 FROM MAN_PRODUCT WHERE ID = A.PRODUCT_ID) AND EXISTS (SEL
-VBc~5[*BdWw"H$l0ECT 1 FROM MAN_DEALER WHERE ID = A.SALER_ORGID) AND EXISTS (SELECT 1 FROM MAN_BUYER WHERE ID = A.BUYER_ORGID)
从SQL上看,就是刚才失败的那个SQL语句,那么看看SESSION在等待什么:
SQL> SELECT SID, EVENT, P1TEXT, P1RAW, P2TEXT, P2, SECONDS_IN_WAIT FROM V$SESSION_WAIT
4_+cy fCa"B8].n02 WHERE SID = 70;
SID EVENT P1TEXT P1RAW P2TEXT P2 SECONDS_IN_WAIT
+p WW'}n/wQ0------- -------------- -------- ---------------- ------- ----- ---------------ITPUB个人空间uzhu2H-\J
70 latch free address 00000004125AB718 number 98 330
通过观察发现,Session的等待事件一直是LATCH FREE。第一个感觉是可能和其他的进程产生了争用。
查询一下Oracle等待的具体latch的类型。
SQL> SELECT LATCH#, NAME FROM V$LATCH WHERE LATCH# = 98;
LATCH# NAME
q/mbq"D6H%q&?-A0---------- ----------------------------------------------------------------
u kddv4f7UE6Zw098 cache buffers chains
而查询V$LOCK和V$LATCHHOLDER视图,发现没有其他的进程对JOB运行构成影响:
SQL> SELECT SID, TYPE, ID1, ID2, LMODE, REQUEST, CTIME, BLOCKITPUB个人空间*Ai6w)FnQl,C~
2 FROM V$LOCKITPUB个人空间9x1Q:\^2l7J
3 WHERE SID > 8;
SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
rSVEz;`k0S0---------- -- ---------- ---------- ---------- ---------- ---------- ----------
a/q&EE0y(d-{B070 TM 35258 0 3 0 12072 0ITPUB个人空间4Uhj/} Iq0_^
70 JQ 0 208 6 0 12155 0
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
SQL> SELECT * FROM V$LATCHHOLDER;
PID SID LADDR NAME
BN0D%gw0---------- ---------- ---------------- ------------------------------
4Q,i`2J"i6_2ta015 70 0000000412564D98 cache buffers chains
SQL> SELECT * FROM V$LATCHHOLDER;
no rows selected
可以看到,并没有其他对象影响JOB进程。
由于等待事件是LATCH FREE,怀疑和系统本身的问题有关。
通过下面的脚本可以看到,目前正在等待的这个子LATCH的信息:
SQL> SELECT ADDR, LATCH#, CHILD#, NAME FROM V$LATCH_CHILDRENITPUB个人空间D u
pQ9_eBB5M;Y
2 WHERE ADDR IN (SELECT P1RAW FROM V$SESSION_WAIT WHERE SID = 70);
ADDR LATCH# CHILD# NAMEITPUB个人空间i5wC3I'?
---------------- ---------- ---------- ---------------------------------------
"Jf$t^s7lit |00000000412550518 98 327 cache buffers chains
观察LATCH_MISSES的信息:
SQL> COL PARENT_NAME FORMAT A20ITPUB个人空间D*m
i6H0q
SQL> COL WHERE FORMAT A35ITPUB个人空间{T!{O;L&PQ
SQL> SELECT *
_| Y T"T2j8X[02 FROM ITPUB个人空间b(HG[*no
3 (ITPUB个人空间!Q;C8h7yO8^[
4 SELECT PARENT_NAME, "WHERE", SLEEP_COUNT, WTR_SLP_COUNT, LONGHOLD_COUNT ITPUB个人空间-^at#|&z6`0PYI?
5 FROM V$LATCH_MISSES
@)DK:o*]"n$iJ5v p06 WHERE PARENT_NAME = 'cache buffers chains'
D` t
p#FkJDH4a07 ORDER BY SLEEP_COUNT + WTR_SLP_COUNT + LONGHOLD_COUNT DESCITPUB个人空间t-uq;Y&?#wb$~
8 )
#P9l@Br3l]4Xc\6rAF09 WHERE ROWNUM < 20;
PARENT_NAME WHERE SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT
[.g%Z.^gEJ0-------------------- ------------------------------- ----------- ------------- --------------ITPUB个人空间%~8T`"|!_V
t
cache buffers chains kcbgtcr: kslbegin excl 1202658 884364 906374ITPUB个人空间X F*~/i,a1NY-z*x)Vb
cache buffers chains kcbrls: kslbegin 480030 852471 335799ITPUB个人空间'}aX)T"c9~Hd9D$l
cache buffers chains kcbzwb 95994 90482 84373ITPUB个人空间O7@4c8v8q
cache buffers chains kcbgtcr: kslbegin shared 89385 84911 62640
\AglT2f'E#o%^0cache buffers chains kcbgtcr: fast path 69352 113120 51014