死锁的概念
什么是死锁呢? 其实我们生活中也有很多类似死锁的例子。 我先举一个生活中的例子:过年回家,父亲买了一把水弹枪,儿子和侄子争抢着要先玩,谁也不让谁,拆开包装后,一个抢了枪, 一个逮住了子弹和弹夹。两个都争着要先玩,但是都互不相让。结果两个人都玩不了。如果儿子要先玩,就必须让侄子把子弹和弹夹给他,如果侄子要先玩,就必须让儿子把枪给侄子。他们就这样对峙了十几分钟,互不相让。 我出来调停,让儿子把枪先给侄子玩,每个人玩十分钟。然后两个人开开心心一起玩起来。其实这就是一个活生生的死锁(Dead Lock)的例子。
我们再来看看数据库死锁的概念, 所谓死锁,是指两个会话,每个会话都持有另外一个会话想要的资源,因争夺资源而造成的一种互相等待的现象,此时就会出现死锁,若无外力作用,它们都将无法推进下去。此时称系统处于死锁状态或系统产生了死锁,这些永远在互相等待的进程称为死锁进程。Oracle对于“死锁”采取的策略是回滚其中一个事务,让另外一个事务顺利进行。
英文关于deadlock的概念如下:
A deadlock occurs when a session (A) wants a resource held by another session (B) , but that session also wants a resource held by the first session (A). There can be more than 2 sessions involved but the idea is the same.
死锁的模拟
上面了解了死锁的概念,接下来,我们先人工构造一个简单的死锁(Dead Lock)案例来加深理解一下死锁(Dead Lock),如下所示,我们先准备测试案例使用的表和数据,测试环境为Oracle Database 10g Release 10.2.0.5.0
SQL> create table dead_lock_test( id number(10), name varchar2(32));
Table created.
SQL> insert into dead_lock_test values(101, 'kerry');
1 row created.
SQL> insert into dead_lock_test values(102, 'ken');
1 row created.
SQL> commit;
Commit complete.
SQL>
在会话1(SID为788)中执行下面SQL语句:
SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
SID STATISTIC# VALUE
---------- ---------- ----------
788 0 1
SQL> update dead_lock_test set name='kerry1_101' where id=101;
1 row updated.
SQL>
然后在会话2(SID为770)中执行下面SQL语句:
SQL> show user;
USER 为 "TEST"
SQL> select * from v$mystat where rownum=1;
SID STATISTIC# VALUE
---------- ---------- ----------
770 0 1
SQL> update dead_lock_test set name='kerry2_102' where id=102;
已更新 1 行。
SQL> update dead_lock_test set name='kerry2_101' where id=101;
如上所示,会话2(SID为770)更新id=101这条记录时,会话被阻塞了。然后我们在会话1(SID为788)中执行下面SQL语句:
SQL> update dead_lock_test set name='kerry1_102' where id=102;
此时你会立马看到会话2(SID为770)出现ORA-00060错误,如下所示:
如果对上面的操作过程的流程有点不直观,那么可以参下面表格:
当然,如果你以下面这样的顺序更新,那么会话1就会出现ORA-0060的错误,会话1会被当做牺牲的会话进行回滚。
此时在告警日志中就会出现trc文件。注意RAC环境和单机环境稍有不同。在RAC环境中,是由LMD(Lock Manager Daemon)进程统一管理各个节点之间的锁资源的,所以,RAC环境中trace文件是由LMD进程来生成的。
Tue Mar 28 15:36:30 CST 2017
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/admin/SCM2/bdump/scm2_s000_15815.trc
trace文件的部分内容如下所示:
*** 2017-03-28 15:36:30.917
*** ACTION NAME:() 2017-03-28 15:36:30.917
*** MODULE NAME:(SQL*Plus) 2017-03-28 15:36:30.917
*** SERVICE NAME:(SCM2) 2017-03-28 15:36:30.917
*** SESSION ID:(770.8) 2017-03-28 15:36:30.917
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-0006002e-001e409f 15 770 X 16 788 X
TX-0007002c-001f6346 16 788 X 15 770 X
session 770: DID 0001-0010-00000002 session 788: DID 0001-000F-00000001
session 788: DID 0001-000F-00000001 session 770: DID 0001-0010-00000002
Rows waited on:
Session 788: obj - rowid = 00094900 - AACUkAABEAACLUeAAB
(dictionary objn - 608512, file - 68, block - 570654, slot - 1)
Session 770: obj - rowid = 00094900 - AACUkAABEAACLUeAAA
(dictionary objn - 608512, file - 68, block - 570654, slot - 0)
Information on the OTHER waiting sessions:
Session 788:
sid: 788 ser: 9 audsid: 201878652 user: 132/TEST
flags: (0xe1) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x8)
pid: 16 O/S info: user: oracle, term: UNKNOWN, ospid: 15817
image: oracle@getlnx14uat.xxxx.com (S001)
O/S info: user: oracle, term: pts/2, ospid: 23047, machine: DB-Server.localdomain
program: sqlplus@DB-Server.localdomain (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update dead_lock_test set name='kerry1_102' where id=102
End of information on OTHER waiting sessions.
Current SQL statement for this session:
update dead_lock_test set name='kerry2_101' where id=101
===================================================
死锁的检测
关于死锁的检测,对于单实例来说,基本上秒级完成,对于RAC环境,Oracle 10g基本上是1分钟, Oracle 11g是10秒,这个是通过隐含参数_lm_dd_interval控制的。这个参数可以修改,但是不建议修改。
COL NAME FOR A32;
COL KSPPDESC FOR A32;
COL KSPPSTVL FOR A32;
SELECT A.INDX,
A.KSPPINM NAME,
A.KSPPDESC,
B.KSPPSTVL
FROM X$KSPPI A,
X$KSPPCV B
WHERE A.INDX = B.INDX
AND LOWER(A.KSPPINM) LIKE LOWER('%&PARAMETER%');
死锁的分析(DeadLock Troubleshooting)
以上面的例子来说,数据库一旦出现死锁,立马会在告警日志里面生成这样一条记录“ORA-00060: Deadlock detected. More info in file xxxxx”,那么从trc文件能分析出什么信息呢? 下面我们以上面的例子来简单分析一下
其实trc文件里面最重要、最有用的信息是Deadlock graph。从这部分,我们可以分析得到下面一些有用信息:
1: 产生死锁的两个会话信息
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-0006002e-001e409f 15 770 X 16 788 X
TX-0007002c-001f6346 16 788 X 15 770 X
session 770: DID 0001-0010-00000002 session 788: DID 0001-000F-00000001
session 788: DID 0001-000F-00000001 session 770: DID 0001-0010-00000002
从上面可以看到Blocker(s)与Waiter(s)的相关信息
Resource Name : 被持有或等待的锁资源名字
锁资源名字由三部分组成 Type-ID1-ID2,ID1和ID2代表的意思由锁类型决定。
具体可以参考v$lock_type
process : V$PROCESS.PID
session : V$SESSION.SID
holds : 锁持有的模式(Mode the lock is held in)
waits : 锁等待的模式(Mode the lock is requested in (waiting for))
解读以上死锁的案例:
SID 770 (Process 15) 以排它模式持有锁:TX-0006002e-001e409f ,以排它模式请求锁:TX-0007002c-001f6346。
SID 788 (Process 16) 以排它模式持有锁:TX-0007002c-001f6346 ,以排它模式请求锁:TX-0006002e-001e409f。
这一段可以看到,778 阻塞了770, 然后770又阻塞了778 刚好构成了死锁的条件。这里要看生成的记录是两行还是一行,是TX还是TM,如果只有一行那么说明是同一个SESSION,可能是自治事务引起的死锁。
2:死锁发生在那个对象?
Rows waited on:
Session 788: obj - rowid = 00094900 - AACUkAABEAACLUeAAB
(dictionary objn - 608512, file - 68, block - 570654, slot - 1)
Session 770: obj - rowid = 00094900 - AACUkAABEAACLUeAAA
(dictionary objn - 608512, file - 68, block - 570654, slot - 0)
3:会话的的机器、应用程序等信息
Session 788:
sid: 788 ser: 9 audsid: 201878652 user: 132/TEST
flags: (0xe1) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x8)
pid: 16 O/S info: user: oracle, term: UNKNOWN, ospid: 15817
image: oracle@xxxxxx.xxxx.com (S001)
O/S info: user: oracle, term: pts/2, ospid: 23047, machine: DB-Server.localdomain
program: sqlplus@DB-Server.localdomain (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
Current SQL Statement:
update dead_lock_test set name='kerry1_102' where id=102
End of information on OTHER waiting sessions.
Current SQL statement for this session:
update dead_lock_test set name='kerry2_101' where id=101
从上面我们可以看到会话788是从机器DB-Server.localdomain上的SQL*Plus应用程序发出的SQL,如果是正式环境,你会看到相关的机器和应用程序名称。这个会话最后执行的SQL语句为update dead_lock_test set name='kerry1_102' where id=102 。
另外一个会话执行的最后语句为update dead_lock_test set name='kerry2_101' where id=101,但是如何找到对应的机器、应用程序信息呢?
如下截图所示,我们在PROCESS STATE部分,找到对应的SID 770的事务,可以看到user,term、machine、program信息。剩下的事情,就是你和开发人员分析脚本,缕清细节,然后如何避免死锁的问题。
死锁的分类
死锁如何分类呢?在Metalink上这篇文章中"How to Identify ORA-00060 Deadlock Types Using Deadlock Graphs in Trace (文档 ID 1507093.1)"有关于死锁的分类:如下所示:
"Key Signature" | Lock Type | Requested | Deadlock Graph | Likely | Comments |
Type TX Lock Requesting Mode X (6) | TX | X(6) | TX X X | Application | TX Lock Held in Mode X (6) Requesting Mode X (6) |
Type TM Lock Requesting Mode SSX (5) | TM | SSX (5) | TM SX SSX SX SSX | Missing Index on Foreign Key (FK) Constraint | TM Lock Held in Mode SX (3) Held SSX (5) Requested |
Type TX Lock Requesting Mode S(4) | TX | S(4) | TX X S | Insufficient Interested Transaction List (ITL) Provision | TX Lock Held in Mode X (6) Requesting Mode S (4) ITL, Bitmap Index and PK/UK Index Signatures are the Same. Further Investigation will be required to identify absolute cause |
Type TX Lock Requesting Mode X (6) | TX | X(6) | TX X X | Self Deadlock | This looks the same as a standard application deadlock except that there is only a single row in the deadlock graph. |
Type UL Lock in Deadlock Graph | UL | ANY | UL ? ? | Application Deadlock Featuring User Defined Locks | This is very similar to the standard application deadlock except that it features User Defined Locks |
李华荣这篇博客Oracle死锁(DeadLock)的分类及其模拟里面对死锁进行了一个分类,个人觉得是一个通俗、很赞的一个死锁分类。本文很多地方也是参考、借鉴他博客的内容。
那么我们接下来看看这些死锁产生的场景,并进行一些分析,很多知识点都是参考Metalink上的一些知识点。
1:应用程序死锁(Application Deadlock)
其实最上面那个死锁的例子,就属于Application Deadlock,这个Application Deadlock是发生在同一个表,下面我们介绍一下Application Deadlock发生在两个表之间不同顺序相互更新操作引起的死锁。下面开始我们的实验。创建两个测试表,并初始化数据。
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
PL/SQL Release 10.2.0.5.0 - Production
CORE 10.2.0.5.0 Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> create table lock_test_one(name varchar(32));
Table created.
SQL> create table lock_test_two(name varchar(32));
Table created.
SQL> insert into lock_test_one values('aaaaaaaa');
1 row created.
SQL> commit;
Commit complete.
SQL> insert into l
SQL> select * from v$mystat where rownum=1;
SID STATISTIC# VALUE
有位图索引存在的表上面,非常容易就引发阻塞与死锁。这个阻塞不是发生在表上面,而是发生在索引上。因为位图索引锁定的隐者出本文出处:http://www.cnblogs.com/kerrycode/
如果你真心觉得文章写得不错,而且对你有所帮助,那就不妨小小打赏一下吧,如果囊中羞涩,不妨帮忙“推荐"一下,您的“推荐”和”打赏“将是我最大的写作动力!
本文版权归作者所有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接.
http://www.cnblogs.com/kerrycode/p/6928263.html