发布新日志

  • 使用dbms_backup_restore.zerodbid(0)后的问题

    2008-09-06 02:34:46

    这是我自己的TESTDB
    之前使用dbms_backup_restore.zerodbid(0)之后没有重启DB
    因此造成在写system时实例down.

    alert日志中出现如下信息:
      Fri Sep  5 14:20:07 2008
    Errors in file /oracle/admin/orcl/bdump/orcl_ckpt_4414.trc:
    ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
    ORA-01122: database file 1 failed verification check
    ORA-01110: data file 1: '/oradata/system01.dbf'
    ORA-01206: file is not part of this database - wrong database id
    Fri Sep  5 14:20:07 2008
    CKPT: terminating instance due to error 1242
    Instance terminated by CKPT, pid = 4414

    处理方法如下:
    [oracle@asm2 ~]$ sqlplus /nolog

    SQL*Plus: Release 10.2.0.1.0 - Production on Fri Sep 5 14:27:58 2008

    Copyright (c) 1982, 2005, Oracle.  All rights reserved.

    SQL> conn / as sysdba
    Connected to an idle instance.
    SQL> startup
    ORACLE instance started.

    Total System Global Area  264241152 bytes
    Fixed Size                  1218892 bytes
    Variable Size              67110580 bytes
    Database Buffers          192937984 bytes
    Redo Buffers                2973696 bytes
    Database mounted.
    ORA-01221: data file 1 is not the same file to a background process


    SQL> shutdown abort;
    ORACLE instance shut down.
    SQL> startup mount;
    ORACLE instance started.

    Total System Global Area  264241152 bytes
    Fixed Size                  1218892 bytes
    Variable Size              67110580 bytes
    Database Buffers          192937984 bytes
    Redo Buffers                2973696 bytes
    Database mounted.
    SQL> recover database;
    Media recovery complete.
    SQL> alter database open;
    alter database open
    *
    ERROR at line 1:
    ORA-01221: data file 1 is not the same file to a background process


    SQL> alter database open resetlogs;
    alter database open resetlogs
    *
    ERROR at line 1:
    ORA-01139: RESETLOGS option only valid after an incomplete database recovery


    SQL> recover database using backup controlfile ;
    ORA-00279: change 1445010 generated at 09/05/2008 14:18:57 needed for thread 1
    ORA-00289: suggestion : /oradata/arch/1_816_663837713.dbf
    ORA-00280: change 1445010 for thread 1 is in sequence #816


    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
    auto
    ORA-00308: cannot open archived log '/oradata/arch/1_816_663837713.dbf'
    ORA-27037: unable to obtain file status
    Linux Error: 2: No such file or directory
    Additional information: 3


    ORA-00308: cannot open archived log '/oradata/arch/1_816_663837713.dbf'
    ORA-27037: unable to obtain file status
    Linux Error: 2: No such file or directory
    Additional information: 3


    SQL> alter database until cancel;
    alter database until cancel
                         *
    ERROR at line 1:
    ORA-02231: missing or invalid option to ALTER DATABASE


    SQL> recover database until cancel;
    ORA-00283: recovery session canceled due to errors
    ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


    SQL> recover database using backup controlfile ;
    ORA-00279: change 1445010 generated at 09/05/2008 14:18:57 needed for thread 1
    ORA-00289: suggestion : /oradata/arch/1_816_663837713.dbf
    ORA-00280: change 1445010 for thread 1 is in sequence #816


    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
    cancel;
    ORA-00308: cannot open archived log 'cancel;'
    ORA-27037: unable to obtain file status
    Linux Error: 2: No such file or directory
    Additional information: 3


    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
    cancel
    Media recovery cancelled.
    SQL> alter database open;
    alter database open
    *
    ERROR at line 1:
    ORA-01589: must use RESETLOGS or NORESETLOGS option for database open


    SQL> alter database resetlogs;
    alter database resetlogs
                           *
    ERROR at line 1:
    ORA-02231: missing or invalid option to ALTER DATABASE


    SQL> alter database open resetlogs;
    alter database open resetlogs
    *
    ERROR at line 1:
    ORA-01113: file 1 needs media recovery
    ORA-01110: data file 1: '/oradata/system01.dbf'


    SQL> recover datafile 1;
    ORA-00283: recovery session canceled due to errors
    ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


    SQL> recover database using backup controlfile until cancel;
    ORA-00279: change 1445010 generated at 09/05/2008 14:18:57 needed for thread 1
    ORA-00289: suggestion : /oradata/arch/1_816_663837713.dbf
    ORA-00280: change 1445010 for thread 1 is in sequence #816


    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
    cancel
    Media recovery cancelled.
    SQL> alter database open resetlogs;

    Database altered.

     

  • rman备份失败的处理

    2008-08-17 01:39:19

    rman备份失败的处理

     

    晚上刚跑步回来,就接到用户电话说备份失败,数据库已经有两天没有备份成功。错误信息如下:

    Starting backup at 16-AUG-08

    released channel: t1

    RMAN-00571: ===========================================================

    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

    RMAN-00571: ===========================================================

    RMAN-03002: failure of backup command at 08/16/2008 22:12:04

    RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog


    配好VPN马上拨上去,检查incarnation,发现两边一致。

    crosscheck等命令仍然是报这个错误。

    reset database和resync database等所有操作全部都报错。

    作出最后决定,重新注册catalog信息。

    rman target / catalog rman/XXX@rman;

    偿试
    unregister database ;


    居然发现DB不认这个命令。

    晕倒,

    赶快登陆至rman数据库。

    查到dbid和db_key

    使用这以下这个包没有问题。
    dbms_rcvcat.unregisterdatabase(db_key,dbid);


    rman target / catalog rman/XXX@rman;

    register database;

    重新注册后数据库备份成功。

    处理好之后,才想起来原来之前有作过全库的恢复,并且在直接在rman中使用了resetlogs

    打开了DB。这样就会造成target和catalog的不一致。如果在sqlplus中resetlogs是不会

    出现此类问题。


     

  • WARNING:Oracle instance running on a system with low open file descriptor

    2008-08-03 18:56:32


    WARNING:Oracle instance running on a system with low open file descriptor
            limit. Tune your system to increase this limit to avoid


    客户的一个生产库在把SGA调至3.4GB并升级至9208时,发现alert中出现以下错误
    WARNING:Oracle instance running on a system with low open file descriptor
            limit. Tune your system to increase this limit to avoid


    此错误主要由于limits方面的限制,检查os,发现limits.conf未设置以下参数;
    vi /etc/security/limits.conf

    oracle soft nproc 2047
    oracle hard nproc 16384
    oracle soft nofile 1024
    oracle hard nofile 65536

    加上后问题解决。

    因此在安装oracle的时候,一定要仔细的弄清楚每个参数的意义。

    这样在troubleshooting的时候才能比较快速进行定位。

     

  • 设置大sort_area_size时发生的ora-00600[925]错误

    2008-08-03 18:38:45

    设置大sort_area_size时发生的ora-00600[925]错误

    今天rebuild一个超大的index,在设置手工sort_area_size到达2GB时
    触发ora-00600[925],修改不成功。

    SQL> alter session set sort_area_size=2097152000;
    ERROR:
    ORA-02097: parameter cannot be modified because specified value is invalid
    ORA-00600: internal error code, arguments: [925], [hash_area_size], [-100663296], [], [], [], [], []


    查询hash_area_size

    SQL> show parameter hash_area_size

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    hash_area_size                       integer     1048576

    发现为1M,

    而sort_area_size的限制是<2*31方,并且不能比hash_area_size大这么多,

    考虑到这个因素,我们把hash_area_size改大


    SQL> alter session set hash_area_size=10485760;

    Session altered.

    SQL>  alter session set sort_area_size=2097152000;

    Session altered.

    SQL> show parameter sort

    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    nls_sort                             string
    sort_area_retained_size              integer     0
    sort_area_size                       integer     2097152000


    可以看到设置成功。

    其实这个也算是oracle的一个bug了。

    见Bug 2205276 and 4011860

  • 升级过程执行catpatch时报04021错误的处理办法

    2008-08-03 11:55:12

    升级过程执行catpatch时报04021错误的处理办法


    有时候在升级9.2X的时候会发生类似于以下的错误:
    create type XMLTypePI OID '0000000000000000000000000002014F' as
    *
    ERROR at line 1:
    ORA-04021: timeout occurred while waiting to lock object SYS.XMLTYPEPI

     

    Type created.

    grant execute on XMLTypePI to public with grant option
                     *
    ERROR at line 1:
    ORA-04021: timeout occurred while waiting to lock object SYS.XMLTYPEPI

     

    发生这个错误的同时,脚本是在这儿等待很长的时间,v$session_wait中会显示

    library cache pin这一等待。发生这个错误的主要原因在于在更新xmldb的时候,

    oracle需要一个版本一个版本往上提,如9204->9205->9206,如中间缺少,那么就

    出现这种报出错误的情况,解决这个错误的方法就是手工执行xdbpatch脚本。

    如:

    startup migrate

    spool xmlpatch.log
    @?/rdbms/admin/xdbpatch
    spool off;

    然后可以重新跑一遍catpatch执行一次字典的更新,
    可以发现该错误被解决。

    当然,这个实际上也是一个bug,oracle说在9207中已经fix掉,但实际上在9208上面还会
    有这个错误发生。

     

  • ora-00600[2103]错误分析

    2008-04-19 01:18:53

    ora-00600[2103]错误分析


    环境是两台HP 8400, hpux11.11i ,oracle9208 做data guard.

    MAXIMUM AVAILABILITY模式下. 今天中午主库出现出现hang住的情况.

    查看主库alert日志,
    Fri Apr 18 11:00:44 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
    ORA-16099: internal error ORA-00600 occurred at standby database
    Fri Apr 18 11:18:49 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
    ORA-16099: internal error ORA-00600 occurred at standby database
    Fri Apr 18 11:39:53 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
    ORA-16099: internal error ORA-00600 occurred at standby database
    Fri Apr 18 12:00:58 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
    ORA-16099: internal error ORA-00600 occurred at standby database
    Fri Apr 18 12:22:03 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/bdump/c3cdb615_arc1_29593.trc:
    ORA-16099: internal error ORA-00600 occurred at standby database
    Fri Apr 18 12:40:09 2008

    备库alert日志:
    Fri Apr 18 11:40:05 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25083.trc:
    ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
    Fri Apr 18 11:40:05 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25083.trc:
    ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
    Fri Apr 18 11:46:05 2008
    RFS: Possible network disconnect with primary database
    Fri Apr 18 12:01:08 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25339.trc:
    ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
    Fri Apr 18 12:01:09 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25339.trc:
    ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
    Fri Apr 18 12:01:09 2008
    Errors in file /oracle/app/oracle/admin/c3cdb615/udump/c3cdb615_rfs_25339.trc:
    ORA-00600: internal error code, arguments: [2103], [0], [0], [1], [900], [], [], []
    Fri Apr 18 12:07:10 2008
    RFS: Possible network disconnect with primary database


    一眼就看出现主库的rfs这里出了问题,那么我们查看备库IO情况,iostat发现, 每秒为324K,这
    个速度明显是极为不正常的.怎么说也是va呀. 使用ioscan,发现已经无法找到盘.这里已经确信为
    IO子系统出现了故障.联系硬件工程师,更换存储系统.=

    这里我们顺便来看一下,关于ora-00600[2103]的一些处理办法:

    其中有一种是会造成instance crash的,如下:
    ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
       LGWR: Detected ARCH process failure
       LGWR: Detected ARCH process failure
       LGWR: STARTING ARCH PROCESSES

    这里是由于bug6520821(unpublished)所引起的错误,主要是在做alter database backup controlfile ...时
    所引起的bug, 针对这个bug,oracle目前没有提供opatch,但是10202的patchset修复了这个bug.
    因此在9208上遇到这个bug,可以不做alter database backup ....这样的操作,因此要注意你
    的备份策略的设置.


    第二种情况是:

    oracle进程会由于获取CF(controlfile enqueue)时间太长而导致超时造成的.
    主要原因有:
    由于IO子系统速率过于低下,当然通常这种情况都是由于存储故障所导致,如控制器链路故障
    以及接交换机链路出现问题都会导致IO出现问题,这时候
    os故障.
    AIO多个写进程导致 ,这里可以考虑设置max_io_server和min_io_server至一个合适的值.
    日志切换过于频繁,或者日志文件过小以及日志组太少.
    使用隐含参数来控制超时时间,如:
    _CONTROLFILE_ENQUEUE_TIMEOUT = 10800

    当然,可能会有一些情况这里没有一一列出.这里只是作为2103错误的一个参考和解决的思路.

  • ora-00600[4400]错误的解决

    2008-02-06 03:16:33

    ora-00600[4400]错误的解决

    4400错误通常在分布式事务中出现,如到这个错误不会对数据库产生重大的影响,

    以下是一个生产库的实际解决的例子。

    故障描述:
    Errors in file .../udump/ora_15899_$ORACLESID.trc:
    ORA-00600: 内部错误代码,自变量: [4400], [48], [], [], [], [], [], []
    Tue Feb 5 17:07:18 2008
    Errors in file .../bdump/reco_1297_$ORACLESIDl.trc:
    ORA-02062: distributed recovery received DBID e23ac21, expected 232acd6
    ARC1: media recovery disabled

     

    原因及解决方法:
    由于db_link断开,两阶段事务不能正常处理,查询dba_2pc_pending
    LOCAL_TRAN_ID          GLOBAL_TRAN_ID                 STATE            MIX A TRAN_COMMENT         FAIL_TIME FORCE_TIM RETRY_TIM OS_USER
    ---------------------- ------------------------------ ---------------- --- - -------------------- --------- --------- --------- ----------------------------------------------------------------
    OS_TERMINAL
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    HOST                                                                                                                             DB_USERCOMMIT#
    -------------------------------------------------------------------------------------------------------------------------------- ------------------------------ ----------------
    192.168.1.1              jpb.6c5f8f5f.192.168.1.1     collecting       no                         05-FEB-08           05-FEB-08 Administrator
    DGDX-WR
    WORKGROUP\DGDX-WR                                                                                                                IBSS7740378455873

    192.168.1.2             jpb.6c5f8f5f.192.168.1.2    collecting       no                         05-FEB-08           05-FEB-08 Administrator
    DGDX-WR
    WORKGROUP\DGDX-WR                                                                                                                IBSS7740389820541
    192.168.1.112             jpb.6c5f8f5f.192.168.1.112    collecting       no                         05-FEB-08           05-FEB-08 Administrator
    DGDX-WR
    WORKGROUP\DGDX-WR                                                                                                                IBSS7740970651708


    通过以下方法来清除这些事务:

    SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('192.168.1.2');
    PL/SQL procedure successfully completed.
    SQL> commit;
    Commit complete.
    SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('192.168.1.1');
    PL/SQL procedure successfully completed.
    SQL> commit;
    Commit complete.
    SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('192.168.1.112');
    PL/SQL procedure successfully completed.
    SQL> commit;


     

  • AIX 5.3上升级oracle9206的注意事项

    2007-11-06 00:00:00


    AIX 5.3上升级oracle9206的注意事项

    给AIX5.3版本上的oracle升级至9206时,有以下几点需要注意,特此纪录:

    1、在root用户下,执行以下命令清除内存中没有使用的库文件。
    # /usr/sbin/slibclean

    2、在执行./runInstaller过程中会报以下错误:

    Starting Oracle Universal Installer…
    Checking installer requirements…
    Checking operating system version: must be 5200 or 5100

    Starting Oracle Universal Installer Failed <<<<

    方法1:
    使用ignoreSYSPrereqs参数可以跳过这个检查过程
    ./runInstaller -ignoreSYSPrereqs

    方法2:
    也可修改升级程序包中的oraparam.ini文件,
    vi Disk1/install/oraparam.ini
    [Certified Versions]
    。。。
    AIX=5200,5100,5300 在这里加上5300

    3、在升级完以后,有一个地方是特点需要注意的:
    $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
    这个文件里面的
    aru_id这个值需要从610改为212

    这样修改以后才能打opatch,否则在opatch时会出错而导致无法打上opatch

  • 记录一下10G中EM乱码的解决

    2007-09-26 00:00:00

    d

    记录一下10G中EM乱码的解决

    这个方法是google到的。
    [oracle@rac2 ~]$ cd $ORACLE_HOME/jdk/jre/lib
    [oracle@rac2 lib]$ ls
    audio font.properties.ja.Redhat7.2 font.properties.zh_CN.Sun jce.jar
    charsets.jar font.properties.ja.Redhat7.3 font.properties.zh_CN_UTF8.Sun jsse.jar
    cmm font.properties.ja.Redhat8.0 font.properties.zh_HK.Sun jvm.hprof.txt
    content-types.properties font.properties.ja.Turbo font.properties.zh_HK_UTF8.Sun jvm.jcov.txt
    ext font.properties.ja.Turbo6.0 font.properties.zh_HK_UTF8.Sun2003 locale
    flavormap.properties font.properties.ko_KR.Sun font.properties.zh.Turbo logging.properties
    font.properties font.properties.ko_KR_UTF8.Sun font.properties.zh_TW.Redhat plugin.jar
    font.properties.ja font.properties.ko_KR_UTF8.Sun2003 font.properties.zh_TW.Redhat2.1 psfontj2d.properties
    font.properties.ja_JP.Sun font.properties.ko.Redhat font.properties.zh_TW.Sun psfont.properties.ja
    font.properties.ja_JP.Sun2003 font.properties.ko.Redhat2.1 font.properties.zh_TW_UTF8.Sun rt.jar
    font.properties.ja_JP_UTF8.Sun font.properties.Redhat6.1 font.properties.zh_TW_UTF8.Sun2003 security
    font.properties.ja_JP_UTF8.Sun2003 font.properties.Redhat8.0 fonts sunrsasign.jar
    font.properties.ja.Redhat3 font.properties.SuSE8.0 i386 zi
    font.properties.ja.Redhat6.1 font.properties.zh_CN.Redhat im
    font.properties.ja.Redhat6.2 font.properties.zh_CN.Redhat2.1 images
    [oracle@rac2 lib]$ ls font.properties.zh_CN.Redhat
    font.properties.zh_CN.Redhat
    [oracle@rac2 lib]$ cp font.properties.zh_CN.Redhat font.properties
    [oracle@rac2 lib]$ cd $ORACLE_HOME/oc4j/j2ee/oc4j
    [oracle@rac2 oc4j_applications]$ pwd
    /opt/u01/app/oracle/product/10201/db1/oc4j/j2ee/

    oc4j_applications/applications/em/em/cabo/images/cache/zhs
    [oracle@rac2 zhs]$ pwd
    /opt/u01/app/oracle/product/10201/db1/oc4j/j2ee/oc4j_applications/

    applications/em/em/cabo/images/cache/zhs
    [oracle@rac2 zhs]$ ls -lrt
    total 80
    -rw-r----- 1 oracle oinstall 915 Sep 26 21:03 tb-0.imx
    -rw-r----- 1 oracle oinstall 452 Sep 26 21:03 tb-0.gif
    -rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-f8.imx
    -rw-r----- 1 oracle oinstall 715 Sep 26 21:03 b-f8.gif
    -rw-r----- 1 oracle oinstall 502 Sep 26 21:03 b-f81cbf.imx
    -rw-r----- 1 oracle oinstall 829 Sep 26 21:03 b-f81cbf.gif
    -rw-r----- 1 oracle oinstall 496 Sep 26 21:03 b-7c2d.imx
    -rw-r----- 1 oracle oinstall 777 Sep 26 21:03 b-7c2d.gif
    -rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-70.imx
    -rw-r----- 1 oracle oinstall 715 Sep 26 21:03 b-70.gif
    [oracle@rac2 zhs]$ rm *.gif
    [oracle@rac2 zhs]$ ll
    total 40
    -rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-70.imx
    -rw-r----- 1 oracle oinstall 496 Sep 26 21:03 b-7c2d.imx
    -rw-r----- 1 oracle oinstall 502 Sep 26 21:03 b-f81cbf.imx
    -rw-r----- 1 oracle oinstall 490 Sep 26 21:03 b-f8.imx
    -rw-r----- 1 oracle oinstall 915 Sep 26 21:03 tb-0.imx
    [oracle@rac2 zhs]$ emctl stop dbconsole
    TZ set to Asia/Chungking
    Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
    Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
    http://localhost.localdomain:1158/em/console/aboutApplication
    Stopping Oracle Enterprise Manager 10g Database Control ...
    ... Stopped.
    [oracle@rac2 zhs]$ emctl start dbconsole
    TZ set to Asia/Chungking
    Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
    Copyright (c) 1996, 2005 Oracle Corporation. All rights reserved.
    http://localhost.localdomain:1158/em/console/aboutApplication
    Starting Oracle Enterprise Manager 10g Database Control .......... started.
    ------------------------------------------------------------------
    Logs are generated in directory /opt/u01/app/oracle/product/10201/db1/localhost.localdomain_rac1/sysman/log

  • 使用gdb来诊断系统问题的一个例子

    2007-06-10 00:00:00

    d

    使用gdb来诊断系统问题的一个例子

    这是以前处理的一个问题,现将处理过程描述出来,
    主要是因为在imp时报错.
    使用gdb调试、跟踪启动失败过程中的core dump文件,信息如下:

    gdb $ORACLE_HOME/bin/imp core.21277
    GNU gdb Red Hat Linux (6.3.0.0-0.31rh)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

    Core was generated by `imp '.
    Program terminated with signal 11, Segmentation fault.
    Reading symbols from /u01/app/oracle/product/8.1.7/lib/libclntsh.so.8.0...done.
    Loaded symbols for /u01/app/oracle/product/8.1.7/lib/libclntsh.so.8.0
    Reading symbols from /u01/app/oracle/product/8.1.7/lib/libwtc8.so...done.
    Loaded symbols for /u01/app/oracle/product/8.1.7/lib/libwtc8.so
    Reading symbols from /lib/libdl.so.2...done.
    Loaded symbols for /lib/libdl.so.2
    Reading symbols from /lib/libm.so.6...done.
    Loaded symbols for /lib/libm.so.6
    Reading symbols from /lib/libpthread.so.0...done.
    Loaded symbols for /lib/libpthread.so.0
    Reading symbols from /lib/libc.so.6...done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /lib/ld-linux.so.2...done.
    Loaded symbols for /lib/ld-linux.so.2
    Reading symbols from /lib/libnss_files.so.2...done.
    Loaded symbols for /lib/libnss_files.so.2
    #0 0x001d612a in memcpy () from /lib/libc.so.6
    (gdb) where
    #0 0x001d612a in memcpy () from /lib/libc.so.6
    #1 0x0805a45f in imprbl ()
    #2 0x0805add3 in imprdi ()
    #3 0x08065f2d in impsel ()
    #4 0x08064bfa in impskr ()
    #5 0x08064a3d in imprpd ()
    #6 0x0805fde7 in imptabskp ()
    #7 0x08057a0e in impdta ()
    #8 0x0804eaa8 in main ()
    (gdb)

    在memcpy函数处,因为收到信号亮signal 11,导致imp终止。


    这里可以判断出来oracle软件造成的错误,经重装oracle软件后正常使用

  • 启动实例hang的一例的诊断

    2007-06-09 00:00:00

    d

    启动实例hang的一例的诊断

    数据库被人无意断开电源。
    1、启动数据库时HANG住:ALERT日志停在以下的行。

    RECO started with pid=7
    CJQ0 started with pid=8
    QMN0 started with pid=9
    Sun May 27 11:22:28 2007
    starting up 1 shared server(s) ...
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    Sun May 27 11:22:28 2007
    ALTER DATABASE MOUNT
    Sun May 27 11:22:35 2007
    Successful mount of redo thread 1, with mount id 2423313207.
    Sun May 27 11:22:35 2007
    Database mounted in Exclusive Mode.
    Completed: ALTER DATABASE MOUNT
    Sun May 27 11:22:35 2007
    ALTER DATABASE OPEN
    Sun May 27 11:22:35 2007
    Beginning crash recovery of 1 threads
    Sun May 27 11:22:35 2007
    Started first pass scan
    Sun May 27 11:22:36 2007
    Completed first pass scan
    0 redo blocks read, 0 data blocks need recovery
    Sun May 27 11:22:36 2007
    Started recovery at
    Thread 1: logseq 78153, block 3, scn 2110.630842439
    Recovery of Online Redo Log: Thread 1 Group 3 Seq 78153 Reading mem 0
    Mem# 0 errs 0: /dev/ora/redolog3
    Sun May 27 11:22:36 2007
    Ended recovery at
    Thread 1: logseq 78153, block 3, scn 2110.630862440
    0 data blocks read, 0 data blocks written, 0 redo blocks read
    Crash recovery completed successfully
    Sun May 27 11:22:36 2007
    Thread 1 advanced to log sequence 78154
    Thread 1 opened at log sequence 78154
    Current log# 2 seq# 78154 mem# 0: /dev/ora/redolog2
    Successful open of redo thread 1.
    Sun May 27 11:22:36 2007
    LOG_CHECKPOINT_INTERVAL was set when MTTR advisory was switched on.
    Sun May 27 11:22:36 2007
    SMON: enabling cache recovery

    启动到mount模式下,
    SQL> oradebug setmypid
    Statement processed.
    SQL> oradebug unlimit
    Statement processed.
    SQL> oradebug hanganalyze 3
    Hang Analysis in /disk1/ora9i/920/admin/ora9i/udump/ora9i_ora_1202.trc
    SQL>

    trace文件内容如下:

    *** SESSION ID:(12.4) 2007-05-27 10:34:21.904
    *** 2007-05-27 10:34:21.903
    ==============
    HANG ANALYSIS:
    ==============
    Open chains found:
    Other chains found:
    Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/3/1/0xb7a8fc98/957/LGWR wait for redo copy>
    Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/8/1/0xb7a915c0/967/wakeup time manager>
    Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/9/3/0xb7a924d8/989/No Wait>
    Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/12/4/0xb7a92ee8/1202/No Wait>
    Extra information that will be dumped at higher levels:
    [level 5] : 4 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
    [level 10] : 7 node dumps -- [IGN]


    可以看到这里有一个等待:LGWR wait for redo copy,诊断应与undo有关。


    于是设置events 10015,对于该events的解释:Undo Segment Recovery


    event="10015 trace name context forever,level 10"
    5. 在生成的TRACE 文件内容如下:
    KCRA: buffers claimed = 0/0, eliminated = 0
    Acquiring rollback segment SYSTEM
    Recovering rollback segment _SYSSMU5$
    Recovering rollback segment _SYSSMU6$
    Recovering rollback segment _SYSSMU7$
    Recovering rollback segment _SYSSMU8$
    Recovering rollback segment _SYSSMU9$
    Recovering rollback segment _SYSSMU10$

    这里可以发现从_SYSSMU5$在做recover时hang住了。
    那么可以将此undo segment offline,于是我们使用隐含参数来打开数据库,当然这个过程log需要reset
    _ALLOW_RESETLOGS_CORRUPTION = TRUE
    _OFFLINE_ROLLBACK_SEGMENTS=(_SYSSMU5$)

    再打开数据库,OK。

  • 9206RAC上的一个bug

    2007-05-06 00:00:00

    d

    前些日子准备改RAC连接方式时在指定cluster_interconnects发现无效。
    觉得是非常的奇怪,因为之前在9207上也是经常修改这个参数以强制通过私网进行互连。
    由于当前版本为9206。上metalink一查。原来是一个bug.
    Patch Set Exception : 4069045 Base Bug : 4066495

    打上4066495这个patch后,修改生效。

  • truncate不能回收空间的问题

    2007-04-11 00:00:00


    9206 RAC

    客户说truncate之后,user_segments中的大小居然不变.而实际上空间是已经被回收了.也就是说数据字典在truncate之后是没有被更新的.如果作drop table操作.数据字典是更新了.查metalink,发现在9206上确实有这样的一个bug.但是很遗憾,客户的系统是windows.而这个bug的patch并没有windows平台上的.workground是:

    在作完truncate之后,再作一次alter table XXX allocate extent;即可更新数据字典.

  • RMAN-06726

    2007-02-13 00:00:00


    在RAC环境中一个RMAN备份出错的案例.

    环境是四节点的RAC.在备份归档日志时出现以下错误:

    allocated channel: c1
    channel ch00: sid=132 devtype=DISK
    allocated channel: c2
    channel ch01: sid=32 devtype=DISK
    allocated channel: c3
    channel ch02: sid=156 devtype=DISK
    allocated channel: c4
    channel ch02: sid=123 devtype=DISK
    Starting backup at 03-FEB-07
    released channel: c1
    released channel: c2
    released channel: c3
    released channel: c4
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-03002: failure of backup command at 02/03/2007 01:17:34
    RMAN-06726: could not locate archivelog XXXXXXXXXXXXXXXXX

    查找原因.检查脚本.无任何问题.检查归档路径,无任何问题.

    归档日志全部都存在.检查参数.发现其中一个实例中cluster_database_instances 参数值为3. 因为是新添加了一个节点.因此值并未作修改.将cluster_database_instances 修改为4后备份归档日志恢复正常.

  • rac ora-12545

    2007-02-09 00:00:00


    hpunix

    oracle9205 rac

    由于node1已很长时间没能起来,直到同步vg信息后才完成RAC的重组。

    startup 两node.

    启动应用,发现无法连接。ORA-12545: Connect failed because target host or object does not exist ,但重试又可连上.检查原因后,发现两实例都设了local_listener和remote_listener.将其清除后问题解决!

  • 处理了一个ora-12500

    2007-01-18 00:00:00


    接到电话说客户端无法正常连接到数据库,急忙赶到现场.

    在停掉部分应用后,发现可以正常连接,再启动应用.发现又无法连接.当连接数达到283后就无法登录了.报ora-12500.

    由于客户方的系统为windows2000 2GB内存.

    查看sga发现居然有1.4G.加上pga的大小1.5 G

    这种情况下,如果连接数高了,资源吃紧,oracle无可用的内存.从而导致连接受限.将sga调整到900M后,问题解决.

    我觉得的奇怪的是,为什么XX的数据库(虽然是这个库是小库).也不应该用这么低的配置吧. ......

  • 给用VCS的一个提醒

    2007-01-08 00:00:00


    给oracle rac升级,一定别忘了给oracle lib作同步了:
    $ cp /opt/VRTSvcs/ops/lib/libskgxp92_64.so $ORACLE_HOME/lib/libskgxp9.so
    $ cp /opt/ORCLcluster/lib/9iR2/libskgxn2_64.so $ORACLE_HOME/lib/libskgxn9.so


    否则有可能运行了大半年甚至一年之后问题爆发出来你找不到北了。

  • RAC下trace暴涨诊断

    2007-01-07 00:00:00


    DB版本oracle 9207 rac
    OS版本solaris 9
    集群件Veritas cluster server 4.1


    故障:

    平均三秒钟产生一个trace文件。Trace文件不断增加,导致磁盘空间迅速减小

    而alter中无任何错误信息,只有一行:

    Thu Jan 4 11:34:53 2007
    Errors in file /oracle_bin/rac9i/admin/XXrac/udump/XXrac2_ora_1942.trc

    trace文件:
    /oracle_bin/rac9i/admin/XXrac/udump/XXrac2_ora_1942.trc
    Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
    JServer Release 9.2.0.7.0 - Production
    ORACLE_HOME = /oracle_bin/rac9i/product
    System name: SunOS
    Node name: XXXXX
    Release: 5.9
    Version: Generic_118558-13
    Machine: sun4u
    Instance name: yjrac2
    Redo thread mounted by this instance: 2
    Oracle process number: 13
    Unix process pid: 1942, image: oracle@XXXXX (TNS V1-V3)

    *** SESSION ID:(85.7779) 2006-04-23 01:13:12.231
    =================================
    Begin 4031 Diagnostic Information
    =================================
    The following information assists Oracle in diagnosing
    causes of ORA-4031 errors. This trace may be disabled
    by setting the init.ora parameter _4031_dump_bitvec = 0
    ======================================
    Allocation Request Summary Information
    ======================================
    Current information setting: 00654fff
    Dump Interval=300 seconds SGA Heap Dump Interval=3600 seconds
    Last Dump Time=04/14/2030 14:07:45
    Allocation request for: kglsim object batch
    Heap: 380032950, size: 4032
    ******************************************************
    HEAP DUMP heap name="sga heap(2,0)" desc=380032950
    extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0
    parent=0 owner=0 nex=0 xsz=0x1
    ====================
    Process State Object
    ====================
    ----------------------------------------
    SO: 41f4ee448, type: 2, owner: 0, flag: INIT/-/-/0x00
    (process) Oracle pid=13, calls cur/top: 41f6d3320/41f6d3320, flag: (0) -
    int error: 0, call error: 0, sess error: 0, txn error 0
    (post info) last post received: 0 0 0
    last post received-location: No post
    last process to post me: none
    last post sent: 0 0 0
    last post sent-location: No post
    last process posted by me: none
    (latch info) wait_event=0 bits=20
    holding 428c53f60 Child library cache level=5 child#=5
    Location from where latch is held: kglobpn: child:: latch
    Context saved from call: 6
    state=busy
    Process Group: DEFAULT, pseudo proc: 41f5cbab0
    O/S info: user: orarac, term: UNKNOWN, ospid: 1942
    OSD pid info: Unix process pid: 1942, image: oracle@XXXXX(TNS V1-V3)
    =========================
    User Session State Object
    =========================
    ----------------------------------------
    SO: 4204f6d48, type: 4, owner: 41f4ee448, flag: INIT/-/-/0x00
    (session) trans: 0, creator: 41f4ee448, flag: (41) USR/- BSY/-/-/-/-/-
    DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
    txn branch: 0
    oct: 0, prv: 0, sql: 438fa8348, psql: 0, user: 0/SYS
    O/S info: user: , term: , ospid: , machine:
    program:
    temporary object counter: 0
    ...No current library cache object being loaded
    ...No instantiation object
    ----- Call Stack Trace -----
    calling call entry argument values in hex
    location type point (? means dubious value)
    -------------------- -------- -------------------- ----------------------------
    ksm_4031_dump()+186 CALL ksedst() 00000000B ? 000000000 ?
    8 000000000 ? 103327258 ?
    00000003E ?
    FFFFFFFF7FFF7938 ?
    ksmasg()+352 CALL ksm_4031_dump() 000103756 ? 380000030 ?
    380032950 ? 000654FFF ?
    103756000 ? 103751768 ?
    kghnospc()+364 PTR_CALL 0000000000000000 1037519C8 ? 380000030 ?
    000000FC0 ? 000000FC0 ?
    380000078 ?
    FFFFFFFF7FFFB138 ?
    kghalo()+4156 CALL kghnospc() 1037519C8 ? 380032950 ?
    000000000 ? 004000000 ?
    102D430F8 ? 103519280 ?
    kglsim_chk_objlist( CALL kghalo() 000000000 ?
    )+340 FFFFFFFF7FFFB2A0 ?
    1037519C8 ? 000001000 ?
    428D58638 ? 000000000 ?


    很明显,由于4031造成的错误,但是为什么这么频繁的4031错误产生呢?
    通过v$resource_limit,我们发现以下情况:
    RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
    --------------- ------------------- --------------- -------------------- --------------------
    processes 479 490 1000 1000
    sessions 486 506 1105 1105
    enqueue_locks 165 468 13282 13282
    enqueue_resources 179 343 5080 UNLIMITED
    ges_procs 478 488 1001 1001
    ges_ress 32169 59712 20754 UNLIMITED
    ges_locks 29613 55281 32150 UNLIMITED
    ges_cache_ress 1929 29346 0 UNLIMITED
    ges_reg_msgs 1087 2301 2230 UNLIMITED

    RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
    --------------- ------------------- --------------- -------------------- --------------------
    ges_big_msgs 106 1183 2230 UNLIMITED
    ges_rsv_msgs 0 0 1000 1000
    gcs_resources 161607 189852 211052 211052
    gcs_shadows 88914 99594 211052 211052
    dml_locks 26 370 4860 UNLIMITED
    temporary_table 0 2 UNLIMITED UNLIMITED
    _locks

    transactions 8 21 1215 UNLIMITED
    branches 0 1 1215 UNLIMITED
    cmtcallbk 0 1 1215 UNLIMITED

    RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
    --------------- ------------------- --------------- -------------------- --------------------
    sort_segment_lo 27 41 UNLIMITED UNLIMITED
    cks

    max_rollback_se 14 15 244 244
    gments

    max_shared_serv 0 0 20 20
    ers

    parallel_max_se 0 5 6 6
    rvers

    RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
    --------------- ------------------- --------------- -------------------- --------------------


    22 rows selected.

    我们可以发现ges_ress和ges_locks的当前分配数量已经超出了初始分配数据,最大分配数甚至超出了几倍,
    我们知道,当ges_ress和ges_locks超出初始分配的数量时,就会从shared_pool_size里面强行申请内存片。
    超出的越多,当然就占用更多的内存区域。而这时候,数据库的压力又非常的大,从而不断的产生4031错误。


    这时候我们能做的就是将这两个指标的数量控制在一定范围之内。或者将他的初始分配值扩大,以及限制它的最高值。
    可以使用两个隐含参数来控制:_lm_locks和_lm_ress。如_lm_locks=(200000,200000) 。这么设定的意思是:初始值、最大值。
    也就是说初始分配200000,最大也只能使用200000, 但是设置这个需要注意一点的是,会增加oracle使用内存的数量。
    假如您使用了10G的sga.那么设定1000000的话大概就是多出1G.加上SGA的话。就是11G的内存了。因此在遇到这种情
    况时一定要注意主机的内存情况,因为修改完该参数后,重启实例时就会预分配内存。这个可以通过ipcs看出来。

  • RAC平台无法创建任何对象解决一例

    2006-12-30 00:00:00


    Solaris5.9

    集群件:veritas cluster server

    文件系统:VXFS

    oracle9207 on rac

    接到电话,说在任何一个实例上无法创建任何数据库对象。之前曾经发生过此类故障,客户为重启实例后解决。结果这几天该故障重现,

    查看到v$lock存在锁定情况,通过查询dba_waiters后发现sid12的会话持有obj$表的锁。发现在v$sql中的sql语句为:

    select o.owner#,o.obj#,decode(o.linkname,null, decode(u.name,null,'SYS',u.name),o.remoteowner),

    o.name,o.linkname,o.namespace,o.subname

    from user$ u, obj$ o where u.user#(+)=o.owner# and o.type#=1

    and not exists (select p_obj# from dependency$ where p_obj# = o.obj#)

    order by o.obj# for update;

    再通过与v$process关联取得spid后,发现此会话为smon进程。而该进程直接持有了obj$的锁。也就是说,两个实例SMON互锁,导致无法创建任何数据库对象。但是为什么会造成互锁呢?我们知道smon的作用是在实例启时负责进行恢复工作。此外,还负责清除系统中不再使用的临时段, 以及为数据字典管理的表空间合并相邻的可用数据扩展,而在RAC中,一个实例的 SMON 进程能够为出错的 CPU 实例进行实例恢复。这我们的情况中,RAC并没有出现实例crash. 当然不存在恢复。那么还有可能的情况就是清除临时段了,通过查询,我们发现运营商的DBA发生了一个严重的错误,那就是将temp表空间使用datafile来创建,而且并且是dict管理模式。这样导致了两个实例的smon相互对此临时段进行清除以及合并,从而导致了互锁,才导致后面出现的无法创建任何数据库对象。解决方法,重建temp表空间,使用tempfile,以及创建为local方式的temp tablespace.然后重启节点。另外一种方法就是使用

    Event来禁止某个实例smon

    Event=’ 10052 trace name context forever;

    目前我所采用的是第一种策略了.从半个月时间看来,数据库运行正常.

  • 10G中的ORA-3136

    2006-12-09 00:00:00


    经常看到alert日志中报出错误为:

    WARNING: inbound connection timed out (ORA-3136)

    后来找到metalink的方法,帖出来如下:

    1.set INBOUND_CONNECT_TIMEOUT_ =0 in listener.ora
    2. set SQLNET.INBOUND_CONNECT_TIMEOUT = 0 in sqlnet.ora of server.
    3. stop and start both listener and database.
    4. Now try to connect to DB and observe the behaviour

321/212>
Open Toolbar