-
使用dbms_backup_restore.zerodbid(0)后的问题
2008-09-06 02:34:46
这是我自己的TESTDB
之前使用dbms_backup_restore.zerodbid(0)之后没有重启DB
因此造成在写system时实例down.alert日志中出现如下信息:
Fri Sep 5 14:20:07 2008
Errors in file /oracle/admin/orcl/bdump/orcl_ckpt_4414.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/oradata/system01.dbf'
ORA-01206: file is not part of this database - wrong database id
Fri Sep 5 14:20:07 2008
CKPT: terminating instance due to error 1242
Instance terminated by CKPT, pid = 4414处理方法如下:
[oracle@asm2 ~]$ sqlplus /nologSQL*Plus: Release 10.2.0.1.0 - Production on Fri Sep 5 14:27:58 2008
Copyright (c) 1982, 2005, Oracle. All rights reserved.
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.Total System Global Area 264241152 bytes
Fixed Size 1218892 bytes
Variable Size 67110580 bytes
Database Buffers 192937984 bytes
Redo Buffers 2973696 bytes
Database mounted.
ORA-01221: data file 1 is not the same file to a background process
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.Total System Global Area 264241152 bytes
Fixed Size 1218892 bytes
Variable Size 67110580 bytes
Database Buffers 192937984 bytes
Redo Buffers 2973696 bytes
Database mounted.
SQL> recover database;
Media recovery complete.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01221: data file 1 is not the same file to a background process
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery
SQL> recover database using backup controlfile ;
ORA-00279: change 1445010 generated at 09/05/2008 14:18:57 needed for thread 1
ORA-00289: suggestion : /oradata/arch/1_816_663837713.dbf
ORA-00280: change 1445010 for thread 1 is in sequence #816
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log '/oradata/arch/1_816_663837713.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
ORA-00308: cannot open archived log '/oradata/arch/1_816_663837713.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
SQL> alter database until cancel;
alter database until cancel
*
ERROR at line 1:
ORA-02231: missing or invalid option to ALTER DATABASE
SQL> recover database until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database using backup controlfile ;
ORA-00279: change 1445010 generated at 09/05/2008 14:18:57 needed for thread 1
ORA-00289: suggestion : /oradata/arch/1_816_663837713.dbf
ORA-00280: change 1445010 for thread 1 is in sequence #816
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel;
ORA-00308: cannot open archived log 'cancel;'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open
SQL> alter database resetlogs;
alter database resetlogs
*
ERROR at line 1:
ORA-02231: missing or invalid option to ALTER DATABASE
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/oradata/system01.dbf'
SQL> recover datafile 1;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 1445010 generated at 09/05/2008 14:18:57 needed for thread 1
ORA-00289: suggestion : /oradata/arch/1_816_663837713.dbf
ORA-00280: change 1445010 for thread 1 is in sequence #816
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
SQL> alter database open resetlogs;Database altered.
-
rman备份失败的处理
2008-08-17 01:39:19
rman备份失败的处理
晚上刚跑步回来,就接到用户电话说备份失败,数据库已经有两天没有备份成功。错误信息如下:
Starting backup at 16-AUG-08
released channel: t1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 08/16/2008 22:12:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog
配好VPN马上拨上去,检查incarnation,发现两边一致。crosscheck等命令仍然是报这个错误。
reset database和resync database等所有操作全部都报错。
作出最后决定,重新注册catalog信息。
rman target / catalog rman/XXX@rman;
偿试
unregister database ;
居然发现DB不认这个命令。晕倒,
赶快登陆至rman数据库。
查到dbid和db_key
使用这以下这个包没有问题。
dbms_rcvcat.unregisterdatabase(db_key,dbid);
rman target / catalog rman/XXX@rman;register database;
重新注册后数据库备份成功。
处理好之后,才想起来原来之前有作过全库的恢复,并且在直接在rman中使用了resetlogs
打开了DB。这样就会造成target和catalog的不一致。如果在sqlplus中resetlogs是不会
出现此类问题。
-
WARNING:Oracle instance running on a system with low open file descriptor
2008-08-03 18:56:32
WARNING:Oracle instance running on a system with low open file descriptor
limit. Tune your system to increase this limit to avoid
客户的一个生产库在把SGA调至3.4GB并升级至9208时,发现alert中出现以下错误
WARNING:Oracle instance running on a system with low open file descriptor
limit. Tune your system to increase this limit to avoid
此错误主要由于limits方面的限制,检查os,发现limits.conf未设置以下参数;
vi /etc/security/limits.conforacle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 65536加上后问题解决。
因此在安装oracle的时候,一定要仔细的弄清楚每个参数的意义。
这样在troubleshooting的时候才能比较快速进行定位。
-
设置大sort_area_size时发生的ora-00600[925]错误
2008-08-03 18:38:45
设置大sort_area_size时发生的ora-00600[925]错误
今天rebuild一个超大的index,在设置手工sort_area_size到达2GB时
触发ora-00600[925],修改不成功。SQL> alter session set sort_area_size=2097152000;
ERROR:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00600: internal error code, arguments: [925], [hash_area_size], [-100663296], [], [], [], [], []
查询hash_area_sizeSQL> show parameter hash_area_size
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
hash_area_size integer 1048576发现为1M,
而sort_area_size的限制是<2*31方,并且不能比hash_area_size大这么多,
考虑到这个因素,我们把hash_area_size改大
SQL> alter session set hash_area_size=10485760;Session altered.
SQL> alter session set sort_area_size=2097152000;
Session altered.
SQL> show parameter sort
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
nls_sort string
sort_area_retained_size integer 0
sort_area_size integer 2097152000
可以看到设置成功。其实这个也算是oracle的一个bug了。
见Bug 2205276 and 4011860
-
升级过程执行catpatch时报04021错误的处理办法
2008-08-03 11:55:12
升级过程执行catpatch时报04021错误的处理办法
有时候在升级9.2X的时候会发生类似于以下的错误:
create type XMLTypePI OID '0000000000000000000000000002014F' as
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object SYS.XMLTYPEPIType created.
grant execute on XMLTypePI to public with grant option
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object SYS.XMLTYPEPI发生这个错误的同时,脚本是在这儿等待很长的时间,v$session_wait中会显示
library cache pin这一等待。发生这个错误的主要原因在于在更新xmldb的时候,
oracle需要一个版本一个版本往上提,如9204->9205->9206,如中间缺少,那么就
出现这种报出错误的情况,解决这个错误的方法就是手工执行xdbpatch脚本。
如:
startup migrate
spool xmlpatch.log
@?/rdbms/admin/xdbpatch
spool off;然后可以重新跑一遍catpatch执行一次字典的更新,
可以发现该错误被解决。当然,这个实际上也是一个bug,oracle说在9207中已经fix掉,但实际上在9208上面还会
有这个错误发生。 -
ora-00600[2103]错误分析
2008-04-19 01:18:53
ora-00600[2103]错误分析
环境是两台HP 8400, hpux11.11i ,oracle9208 做data guard.MAXIMUM AVAILABILITY模式下. 今天中午主库出现出现hang住的情况.
查看主库alert日志,
Fri Apr 18 11:00:44 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
ORA-16099: internal error ORA-00600 occurred at standby database
Fri Apr 18 11:18:49 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
ORA-16099: internal error ORA-00600 occurred at standby database
Fri Apr 18 11:39:53 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
ORA-16099: internal error ORA-00600 occurred at standby database
Fri Apr 18 12:00:58 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
ORA-16099: internal error ORA-00600 occurred at standby database
Fri Apr 18 12:22:03 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
ORA-16099: internal error ORA-00600 occurred at standby database
Fri Apr 18 12:40:09 2008备库alert日志:
Fri Apr 18 11:40:05 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25083.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Fri Apr 18 11:40:05 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25083.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Fri Apr 18 11:46:05 2008
RFS: Possible network disconnect with primary database
Fri Apr 18 12:01:08 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25339.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Fri Apr 18 12:01:09 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25339.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Fri Apr 18 12:01:09 2008
Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25339.trc:
ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
Fri Apr 18 12:07:10 2008
RFS: Possible network disconnect with primary database
一眼就看出现主库的rfs这里出了问题,那么我们查看备库IO情况,iostat发现, 每秒为324K,这
个速度明显是极为不正常的.怎么说也是va呀. 使用ioscan,发现已经无法找到盘.这里已经确信为
IO子系统出现了故障.联系硬件工程师,更换存储系统.=这里我们顺便来看一下,关于ora-00600[2103]的一些处理办法:
其中有一种是会造成instance crash的,如下:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
LGWR: Detected ARCH process failure
LGWR: Detected ARCH process failure
LGWR: STARTING ARCH PROCESSES这里是由于bug6520821(unpublished)所引起的错误,主要是在做alter database backup controlfile ...时
所引起的bug, 针对这个bug,oracle目前没有提供opatch,但是10202的patchset修复了这个bug.
因此在9208上遇到这个bug,可以不做alter database backup ....这样的操作,因此要注意你
的备份策略的设置.
第二种情况是:oracle进程会由于获取CF(controlfile enqueue)时间太长而导致超时造成的.
主要原因有:
由于IO子系统速率过于低下,当然通常这种情况都是由于存储故障所导致,如控制器链路故障
以及接交换机链路出现问题都会导致IO出现问题,这时候
os故障.
AIO多个写进程导致 ,这里可以考虑设置max_io_server和min_io_server至一个合适的值.
日志切换过于频繁,或者日志文件过小以及日志组太少.
使用隐含参数来控制超时时间,如:
_CONTROLFILE_ENQUEUE_TIMEOUT = 10800当然,可能会有一些情况这里没有一一列出.这里只是作为2103错误的一个参考和解决的思路.
-
ora-00600[4400]错误的解决
2008-02-06 03:16:33
ora-00600[4400]错误的解决
4400错误通常在分布式事务中出现,如到这个错误不会对数据库产生重大的影响,
以下是一个生产库的实际解决的例子。
故障描述:
Errors in file .../udump/ora_15899_$ORACLESID.trc:
ORA-00600: 内部错误代码,自变量: [4400], [48], [], [], [], [], [], []
Tue Feb 5 17:07:18 2008
Errors in file .../bdump/reco_1297_$ORACLESIDl.trc:
ORA-02062: distributed recovery received DBID e23ac21, expected 232acd6
ARC1: media recovery disabled原因及解决方法:
由于db_link断开,两阶段事务不能正常处理,查询dba_2pc_pending
LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX A TRAN_COMMENT FAIL_TIME FORCE_TIM RETRY_TIM OS_USER
---------------------- ------------------------------ ---------------- --- - -------------------- --------- --------- --------- ----------------------------------------------------------------
OS_TERMINAL
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HOST DB_USERCOMMIT#
-------------------------------------------------------------------------------------------------------------------------------- ------------------------------ ----------------
192.168.1.1 jpb.6c5f8f5f.192.168.1.1 collecting no 05-FEB-08 05-FEB-08 Administrator
DGDX-WR
WORKGROUP\DGDX-WR IBSS7740378455873192.168.1.2 jpb.6c5f8f5f.192.168.1.2 collecting no 05-FEB-08 05-FEB-08 Administrator
DGDX-WR
WORKGROUP\DGDX-WR IBSS7740389820541
192.168.1.112 jpb.6c5f8f5f.192.168.1.112 collecting no 05-FEB-08 05-FEB-08 Administrator
DGDX-WR
WORKGROUP\DGDX-WR IBSS7740970651708
通过以下方法来清除这些事务:SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('192.168.1.2');
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('192.168.1.1');
PL/SQL procedure successfully completed.
SQL> commit;
Commit complete.
SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('192.168.1.112');
PL/SQL procedure successfully completed.
SQL> commit;
-
AIX 5.3上升级oracle9206的注意事项
2007-11-06 00:00:00
AIX 5.3上升级oracle9206的注意事项
给AIX5.3版本上的oracle升级至9206时,有以下几点需要注意,特此纪录:
1、在root用户下,执行以下命令清除内存中没有使用的库文件。
# /usr/sbin/slibclean2、在执行./runInstaller过程中会报以下错误:
Starting Oracle Universal Installer…
Checking installer requirements…
Checking operating system version: must be 5200 or 5100
Starting Oracle Universal Installer Failed <<<<方法1:
使用ignoreSYSPrereqs参数可以跳过这个检查过程
./runInstaller -ignoreSYSPrereqs
方法2:
也可修改升级程序包中的oraparam.ini文件,
vi Disk1/install/oraparam.ini
[Certified Versions]
。。。
AIX=5200,5100,5300 在这里加上53003、在升级完以后,有一个地方是特点需要注意的:
$ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
这个文件里面的
aru_id这个值需要从610改为212
这样修改以后才能打opatch,否则在opatch时会出错而导致无法打上opatch -
记录一下10G中EM乱码的解决
2007-09-26 00:00:00
d记录一下10G中EM乱码的解决
这个方法是google到的。
[oracle@rac2 ~]$ cd $ORACLE_HOME/jdk/jre/lib
[oracle@rac2 lib]$ ls
audio font.properties.ja.Redhat7.2 font.properties.zh_CN.Sun jce.jar
charsets.jar font.properties.ja.Redhat7.3 font.properties.zh_CN_UTF8.Sun jsse.jar
cmm font.properties.ja.Redhat8.0 font.properties.zh_HK.Sun jvm.hprof.txt
content-types.properties font.properties.ja.Turbo font.properties.zh_HK_UTF8.Sun jvm.jcov.txt
ext font.properties.ja.Turbo6.0 font.properties.zh_HK_UTF8.Sun2003 locale
flavormap.properties font.properties.ko_KR.Sun font.properties.zh.Turbo logging.properties
font.properties font.properties.ko_KR_UTF8.Sun font.properties.zh_TW.Redhat plugin.jar
font.properties.ja font.properties.ko_KR_UTF8.Sun2003 font.properties.zh_TW.Redhat2.1 psfontj2d.properties
font.properties.ja_JP.Sun font.properties.ko.Redhat font.properties.zh_TW.Sun psfont.properties.ja
font.properties.ja_JP.Sun2003 font.properties.ko.Redhat2.1 font.properties.zh_TW_UTF8.Sun rt.jar
font.properties.ja_JP_UTF8.Sun font.properties.Redhat6.1 font.properties.zh_TW_UTF8.Sun2003 security
font.properties.ja_JP_UTF8.Sun2003 font.properties.Redhat8.0 fonts sunrsasign.jar
font.properties.ja.Redhat3 font.properties.SuSE8.0 i386 zi
font.properties.ja.Redhat6.1 font.properties.zh_CN.Redhat im
font.properties.ja.Redhat6.2 font.properties.zh_CN.Redhat2.1 images
[oracle@rac2 lib]$ ls font.properties.zh_CN.Redhat
font.properties.zh_CN.Redhat
[oracle@rac2 lib]$ cp font.properties.zh_CN.Redhat font.properties
[oracle@rac2 lib]$ cd $ORACLE_HOME/oc4j/j2ee/oc4j
[oracle@rac2 oc4j_applications]$ pwd
/opt/u01/app/oracle/product/10201/db1/oc4j/j2ee/oc4j_applications/applications/em/em/cabo/images/cache/zhs
[oracle@rac2 zhs]$ pwd
/opt/u01/app/oracle/product/10201/db1/oc4j/j2ee/oc4j_applications/applications/em/em/cabo/images/cache/zhs
[oracle@rac2 zhs]$ ls -lrt
total 80
-rw-r----- 1 oracle oinstall 915 Sep 26 21:03 tb-0.imx
-rw-r----- 1 oracle oinstall 452 Sep 26 21:03 tb-0.gif
-rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-f8.imx
-rw-r----- 1 oracle oinstall 715 Sep 26 21:03 b-f8.gif
-rw-r----- 1 oracle oinstall 502 Sep 26 21:03 b-f81cbf.imx
-rw-r----- 1 oracle oinstall 829 Sep 26 21:03 b-f81cbf.gif
-rw-r----- 1 oracle oinstall 496 Sep 26 21:03 b-7c2d.imx
-rw-r----- 1 oracle oinstall 777 Sep 26 21:03 b-7c2d.gif
-rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-70.imx
-rw-r----- 1 oracle oinstall 715 Sep 26 21:03 b-70.gif
[oracle@rac2 zhs]$ rm *.gif
[oracle@rac2 zhs]$ ll
total 40
-rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-70.imx
-rw-r----- 1 oracle oinstall 496 Sep 26 21:03 b-7c2d.imx
-rw-r----- 1 oracle oinstall 502 Sep 26 21:03 b-f81cbf.imx
-rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-f8.imx
-rw-r----- 1 oracle oinstall 915 Sep 26 21:03 tb-0.imx
[oracle@rac2 zhs]$ emctl stop dbconsole
TZ set to Asia/Chungking
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://localhost.localdomain:1158/em/console/aboutApplication
Stopping Oracle Enterprise Manager 10g Database Control ...
... Stopped.
[oracle@rac2 zhs]$ emctl start dbconsole
TZ set to Asia/Chungking
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
http://localhost.localdomain:1158/em/console/aboutApplication
Starting Oracle Enterprise Manager 10g Database Control .......... started.
------------------------------------------------------------------
Logs are generated in directory /opt/u01/app/oracle/product/10201/db1/localhost.localdomain_rac1/sysman/log -
使用gdb来诊断系统问题的一个例子
2007-06-10 00:00:00
d使用gdb来诊断系统问题的一个例子
这是以前处理的一个问题,现将处理过程描述出来,
主要是因为在imp时报错.
使用gdb调试、跟踪启动失败过程中的core dump文件,信息如下:gdb $ORACLE_HOME/bin/imp core.21277
GNU gdb Red Hat Linux (6.3.0.0-0.31rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".Core was generated by `imp '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /u01/app/oracle/product/8.1.7/lib/libclntsh.so.8.0...done.
Loaded symbols for /u01/app/oracle/product/8.1.7/lib/libclntsh.so.8.0
Reading symbols from /u01/app/oracle/product/8.1.7/lib/libwtc8.so...done.
Loaded symbols for /u01/app/oracle/product/8.1.7/lib/libwtc8.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0 0x001d612a in memcpy () from /lib/libc.so.6
(gdb) where
#0 0x001d612a in memcpy () from /lib/libc.so.6
#1 0x0805a45f in imprbl ()
#2 0x0805add3 in imprdi ()
#3 0x08065f2d in impsel ()
#4 0x08064bfa in impskr ()
#5 0x08064a3d in imprpd ()
#6 0x0805fde7 in imptabskp ()
#7 0x08057a0e in impdta ()
#8 0x0804eaa8 in main ()
(gdb)在memcpy函数处,因为收到信号亮signal 11,导致imp终止。
这里可以判断出来oracle软件造成的错误,经重装oracle软件后正常使用 -
启动实例hang的一例的诊断
2007-06-09 00:00:00
d启动实例hang的一例的诊断
数据库被人无意断开电源。
1、启动数据库时HANG住:ALERT日志停在以下的行。RECO started with pid=7
CJQ0 started with pid=8
QMN0 started with pid=9
Sun May 27 11:22:28 2007
starting up 1 shared server(s) ...
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Sun May 27 11:22:28 2007
ALTER DATABASE MOUNT
Sun May 27 11:22:35 2007
Successful mount of redo thread 1, with mount id 2423313207.
Sun May 27 11:22:35 2007
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Sun May 27 11:22:35 2007
ALTER DATABASE OPEN
Sun May 27 11:22:35 2007
Beginning crash recovery of 1 threads
Sun May 27 11:22:35 2007
Started first pass scan
Sun May 27 11:22:36 2007
Completed first pass scan
0 redo blocks read, 0 data blocks need recovery
Sun May 27 11:22:36 2007
Started recovery at
Thread 1: logseq 78153, block 3, scn 2110.630842439
Recovery of Online Redo Log: Thread 1 Group 3 Seq 78153 Reading mem 0
Mem# 0 errs 0: /dev/ora/redolog3
Sun May 27 11:22:36 2007
Ended recovery at
Thread 1: logseq 78153, block 3, scn 2110.630862440
0 data blocks read, 0 data blocks written, 0 redo blocks read
Crash recovery completed successfully
Sun May 27 11:22:36 2007
Thread 1 advanced to log sequence 78154
Thread 1 opened at log sequence 78154
Current log# 2 seq# 78154 mem# 0: /dev/ora/redolog2
Successful open of redo thread 1.
Sun May 27 11:22:36 2007
LOG_CHECKPOINT_INTERVAL was set when MTTR advisory was switched on.
Sun May 27 11:22:36 2007
SMON: enabling cache recovery启动到mount模式下,
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /disk1/ora9i/920/admin/ora9i/udump/ora9i_ora_1202.trc
SQL>trace文件内容如下:
*** SESSION ID:(12.4) 2007-05-27 10:34:21.904
*** 2007-05-27 10:34:21.903
==============
HANG ANALYSIS:
==============
Open chains found:
Other chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/3/1/0xb7a8fc98/957/LGWR wait for redo copy>
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/8/1/0xb7a915c0/967/wakeup time manager>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/9/3/0xb7a924d8/989/No Wait>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
<0/12/4/0xb7a92ee8/1202/No Wait>
Extra information that will be dumped at higher levels:
[level 5] : 4 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 10] : 7 node dumps -- [IGN]
可以看到这里有一个等待:LGWR wait for redo copy,诊断应与undo有关。
于是设置events 10015,对于该events的解释:Undo Segment Recovery
event="10015 trace name context forever,level 10"
5. 在生成的TRACE 文件内容如下:
KCRA: buffers claimed = 0/0, eliminated = 0
Acquiring rollback segment SYSTEM
Recovering rollback segment _SYSSMU5$
Recovering rollback segment _SYSSMU6$
Recovering rollback segment _SYSSMU7$
Recovering rollback segment _SYSSMU8$
Recovering rollback segment _SYSSMU9$
Recovering rollback segment _SYSSMU10$这里可以发现从_SYSSMU5$在做recover时hang住了。
那么可以将此undo segment offline,于是我们使用隐含参数来打开数据库,当然这个过程log需要reset
_ALLOW_RESETLOGS_CORRUPTION = TRUE
_OFFLINE_ROLLBACK_SEGMENTS=(_SYSSMU5$)再打开数据库,OK。
-
9206RAC上的一个bug
2007-05-06 00:00:00
d前些日子准备改RAC连接方式时在指定cluster_interconnects发现无效。
觉得是非常的奇怪,因为之前在9207上也是经常修改这个参数以强制通过私网进行互连。
由于当前版本为9206。上metalink一查。原来是一个bug.
Patch Set Exception : 4069045 Base Bug : 4066495打上4066495这个patch后,修改生效。
-
truncate不能回收空间的问题
2007-04-11 00:00:00
9206 RAC
客户说truncate之后,user_segments中的大小居然不变.而实际上空间是已经被回收了.也就是说数据字典在truncate之后是没有被更新的.如果作drop table操作.数据字典是更新了.查metalink,发现在9206上确实有这样的一个bug.但是很遗憾,客户的系统是windows.而这个bug的patch并没有windows平台上的.workground是:
在作完truncate之后,再作一次alter table XXX allocate extent;即可更新数据字典.
-
RMAN-06726
2007-02-13 00:00:00
以在RAC环境中一个RMAN备份出错的案例.
环境是四节点的RAC.在备份归档日志时出现以下错误:
allocated channel: c1
channel ch00: sid=132 devtype=DISK
allocated channel: c2
channel ch01: sid=32 devtype=DISK
allocated channel: c3
channel ch02: sid=156 devtype=DISK
allocated channel: c4
channel ch02: sid=123 devtype=DISK
Starting backup at 03-FEB-07
released channel: c1
released channel: c2
released channel: c3
released channel: c4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 02/03/2007 01:17:34
RMAN-06726: could not locate archivelog XXXXXXXXXXXXXXXXX查找原因.检查脚本.无任何问题.检查归档路径,无任何问题.
归档日志全部都存在.检查参数.发现其中一个实例中cluster_database_instances 参数值为3. 因为是新添加了一个节点.因此值并未作修改.将cluster_database_instances 修改为4后备份归档日志恢复正常.
-
rac ora-12545
2007-02-09 00:00:00
hpunix
oracle9205 rac
由于node1已很长时间没能起来,直到同步vg信息后才完成RAC的重组。
startup 两node.
启动应用,发现无法连接。ORA-12545: Connect failed because target host or object does not exist ,但重试又可连上.检查原因后,发现两实例都设了local_listener和remote_listener.将其清除后问题解决!
-
处理了一个ora-12500
2007-01-18 00:00:00
接到电话说客户端无法正常连接到数据库,急忙赶到现场.
在停掉部分应用后,发现可以正常连接,再启动应用.发现又无法连接.当连接数达到283后就无法登录了.报ora-12500.
由于客户方的系统为windows2000 2GB内存.
查看sga发现居然有1.4G.加上pga的大小1.5 G
这种情况下,如果连接数高了,资源吃紧,oracle无可用的内存.从而导致连接受限.将sga调整到900M后,问题解决.
我觉得的奇怪的是,为什么XX的数据库(虽然是这个库是小库).也不应该用这么低的配置吧. ......
-
给用VCS的一个提醒
2007-01-08 00:00:00
给oracle rac升级,一定别忘了给oracle lib作同步了:
$ cp /opt/VRTSvcs/ops/lib/libskgxp92_64.so $ORACLE_HOME/lib/libskgxp9.so
$ cp /opt/ORCLcluster/lib/9iR2/libskgxn2_64.so $ORACLE_HOME/lib/libskgxn9.so
否则有可能运行了大半年甚至一年之后问题爆发出来你找不到北了。 -
RAC下trace暴涨诊断
2007-01-07 00:00:00
DB版本oracle 9207 rac
OS版本solaris 9
集群件Veritas cluster server 4.1
故障:平均三秒钟产生一个trace文件。Trace文件不断增加,导致磁盘空间迅速减小
而alter中无任何错误信息,只有一行:
:
Thu Jan 4 11:34:53 2007
Errors in file /oracle_bin/rac9i/admin/XXrac/udump/XXrac2_ora_1942.trctrace文件:
/oracle_bin/rac9i/admin/XXrac/udump/XXrac2_ora_1942.trc
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
ORACLE_HOME = /oracle_bin/rac9i/product
System name: SunOS
Node name: XXXXX
Release: 5.9
Version: Generic_118558-13
Machine: sun4u
Instance name: yjrac2
Redo thread mounted by this instance: 2
Oracle process number: 13
Unix process pid: 1942, image: oracle@XXXXX (TNS V1-V3)*** SESSION ID:(85.7779) 2006-04-23 01:13:12.231
=================================
Begin 4031 Diagnostic Information
=================================
The following information assists Oracle in diagnosing
causes of ORA-4031 errors. This trace may be disabled
by setting the init.ora parameter _4031_dump_bitvec = 0
======================================
Allocation Request Summary Information
======================================
Current information setting: 00654fff
Dump Interval=300 seconds SGA Heap Dump Interval=3600 seconds
Last Dump Time=04/14/2030 14:07:45
Allocation request for: kglsim object batch
Heap: 380032950, size: 4032
******************************************************
HEAP DUMP heap name="sga heap(2,0)" desc=380032950
extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0
parent=0 owner=0 nex=0 xsz=0x1
====================
Process State Object
====================
----------------------------------------
SO: 41f4ee448, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=13, calls cur/top: 41f6d3320/41f6d3320, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
holding 428c53f60 Child library cache level=5 child#=5
Location from where latch is held: kglobpn: child:: latch
Context saved from call: 6
state=busy
Process Group: DEFAULT, pseudo proc: 41f5cbab0
O/S info: user: orarac, term: UNKNOWN, ospid: 1942
OSD pid info: Unix process pid: 1942, image: oracle@XXXXX(TNS V1-V3)
=========================
User Session State Object
=========================
----------------------------------------
SO: 4204f6d48, type: 4, owner: 41f4ee448, flag: INIT/-/-/0x00
(session) trans: 0, creator: 41f4ee448, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 438fa8348, psql: 0, user: 0/SYS
O/S info: user: , term: , ospid: , machine:
program:
temporary object counter: 0
...No current library cache object being loaded
...No instantiation object
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksm_4031_dump()+186 CALL ksedst() 00000000B ? 000000000 ?
8 000000000 ? 103327258 ?
00000003E ?
FFFFFFFF7FFF7938 ?
ksmasg()+352 CALL ksm_4031_dump() 000103756 ? 380000030 ?
380032950 ? 000654FFF ?
103756000 ? 103751768 ?
kghnospc()+364 PTR_CALL 0000000000000000 1037519C8 ? 380000030 ?
000000FC0 ? 000000FC0 ?
380000078 ?
FFFFFFFF7FFFB138 ?
kghalo()+4156 CALL kghnospc() 1037519C8 ? 380032950 ?
000000000 ? 004000000 ?
102D430F8 ? 103519280 ?
kglsim_chk_objlist( CALL kghalo() 000000000 ?
)+340 FFFFFFFF7FFFB2A0 ?
1037519C8 ? 000001000 ?
428D58638 ? 000000000 ?
很明显,由于4031造成的错误,但是为什么这么频繁的4031错误产生呢?
通过v$resource_limit,我们发现以下情况:
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
processes 479 490 1000 1000
sessions 486 506 1105 1105
enqueue_locks 165 468 13282 13282
enqueue_resources 179 343 5080 UNLIMITED
ges_procs 478 488 1001 1001
ges_ress 32169 59712 20754 UNLIMITED
ges_locks 29613 55281 32150 UNLIMITED
ges_cache_ress 1929 29346 0 UNLIMITED
ges_reg_msgs 1087 2301 2230 UNLIMITEDRESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
ges_big_msgs 106 1183 2230 UNLIMITED
ges_rsv_msgs 0 0 1000 1000
gcs_resources 161607 189852 211052 211052
gcs_shadows 88914 99594 211052 211052
dml_locks 26 370 4860 UNLIMITED
temporary_table 0 2 UNLIMITED UNLIMITED
_lockstransactions 8 21 1215 UNLIMITED
branches 0 1 1215 UNLIMITED
cmtcallbk 0 1 1215 UNLIMITEDRESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
sort_segment_lo 27 41 UNLIMITED UNLIMITED
cksmax_rollback_se 14 15 244 244
gmentsmax_shared_serv 0 0 20 20
ersparallel_max_se 0 5 6 6
rversRESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
--------------- ------------------- --------------- -------------------- --------------------
22 rows selected.我们可以发现ges_ress和ges_locks的当前分配数量已经超出了初始分配数据,最大分配数甚至超出了几倍,
我们知道,当ges_ress和ges_locks超出初始分配的数量时,就会从shared_pool_size里面强行申请内存片。
超出的越多,当然就占用更多的内存区域。而这时候,数据库的压力又非常的大,从而不断的产生4031错误。
这时候我们能做的就是将这两个指标的数量控制在一定范围之内。或者将他的初始分配值扩大,以及限制它的最高值。
可以使用两个隐含参数来控制:_lm_locks和_lm_ress。如_lm_locks=(200000,200000) 。这么设定的意思是:初始值、最大值。
也就是说初始分配200000,最大也只能使用200000, 但是设置这个需要注意一点的是,会增加oracle使用内存的数量。
假如您使用了10G的sga.那么设定1000000的话大概就是多出1G.加上SGA的话。就是11G的内存了。因此在遇到这种情
况时一定要注意主机的内存情况,因为修改完该参数后,重启实例时就会预分配内存。这个可以通过ipcs看出来。 -
RAC平台无法创建任何对象解决一例
2006-12-30 00:00:00
Solaris5.9
集群件:veritas cluster server
文件系统:VXFS
oracle9207 on rac
接到电话,说在任何一个实例上无法创建任何数据库对象。之前曾经发生过此类故障,客户为重启实例后解决。结果这几天该故障重现,
查看到v$lock存在锁定情况,通过查询dba_waiters后发现sid为12的会话持有obj$表的锁。发现在v$sql中的sql语句为:
select o.owner#,o.obj#,decode(o.linkname,null, decode(u.name,null,'SYS',u.name),o.remoteowner),
o.name,o.linkname,o.namespace,o.subname
from user$ u, obj$ o where u.user#(+)=o.owner# and o.type#=1
and not exists (select p_obj# from dependency$ where p_obj# = o.obj#)
order by o.obj# for update;
再通过与v$process关联取得spid后,发现此会话为smon进程。而该进程直接持有了obj$的锁。也就是说,两个实例SMON互锁,导致无法创建任何数据库对象。但是为什么会造成互锁呢?我们知道smon的作用是在实例启时负责进行恢复工作。此外,还负责清除系统中不再使用的临时段, 以及为数据字典管理的表空间合并相邻的可用数据扩展,而在RAC中,一个实例的 SMON 进程能够为出错的 CPU 或 实例进行实例恢复。这我们的情况中,RAC并没有出现实例crash. 当然不存在恢复。那么还有可能的情况就是清除临时段了,通过查询,我们发现运营商的DBA发生了一个严重的错误,那就是将temp表空间使用datafile来创建,而且并且是dict管理模式。这样导致了两个实例的smon相互对此临时段进行清除以及合并,从而导致了互锁,才导致后面出现的无法创建任何数据库对象。解决方法,重建temp表空间,使用tempfile,以及创建为local方式的temp tablespace.然后重启节点。另外一种方法就是使用
Event来禁止某个实例smon
Event=’ 10052 trace name context forever’;
目前我所采用的是第一种策略了.从半个月时间看来,数据库运行正常.
-
10G中的ORA-3136
2006-12-09 00:00:00
经常看到alert日志中报出错误为:
WARNING: inbound connection timed out (ORA-3136)
后来找到metalink的方法,帖出来如下:
1.set INBOUND_CONNECT_TIMEOUT_ =0 in listener.ora
2. set SQLNET.INBOUND_CONNECT_TIMEOUT = 0 in sqlnet.ora of server.
3. stop and start both listener and database.
4. Now try to connect to DB and observe the behaviour
标题搜索
数据统计
- 访问量: 4637
- 日志数: 649
- 建立时间: 2007-12-21
- 更新时间: 2008-09-06

